A comprehensive implementation, analysis, and simulation of the PageRank algorithm across six computational methods, with an interactive browser-based visualisation engine.
Each requirement builds on the last — from implementing the raw algorithm to building a full search engine with real-time simulation.
A novel enhancement to power iteration that freezes low-rank pages after convergence, achieving 1.53× speedup with zero accuracy loss on 2,000-node graphs.
A BFS web scraper with 4 bug fixes, validated against a 791-node synthetic documentation graph. All 7 PageRank methods benchmarked with L1 error and timing.
Custom PageRank algorithm using manual power iteration and NetworkX. Implements the core mathematical formula with explicit dangling node handling, convergence tracking, and validation against exact matrix algebra solution.
A full Streamlit web app with live BFS crawling, Vis.js graph visualisation, heat-scale colour coding, TF-IDF search, and Google benchmark comparison.
Dynamic source-code browser. Code files are parsed and highlighted live directly from your workspace directory path context.
# Processing workspace logic variables... Please wait.
All results are measured on real graph executions, not estimated.
| Method | L1 Error | Time |
|---|---|---|
| 1 NetworkX | 0 (ref) | 6.5 ms |
| 2 Manual | 1.47e-4 | 79.2 ms |
| 3 Matrix | 1.47e-4 | 30.1 ms |
| 4 SciPy | 1.47e-4 | 0.6 ms |
| 5 Monte Carlo | 2.76e-2 | 645 ms |
| 6 HITS | 2.48e-1 | 214 ms |
| 7 Pruning | 1.57e-4 | 93.9 ms |
| N | Std Time | Prune | Speedup |
|---|---|---|---|
| 50 | 0.1 ms | 0.2 ms | 0.51× |
| 500 | 6.8 ms | 16.7 ms | 0.41× |
| 2,000 | 176 ms | 115 ms | 1.53× |
L1 Error = 0.00e+00 in all cases
| Surfers | L1 Error | Time |
|---|---|---|
| 100 | 0.0284 | 8 ms |
| 1,000 | 0.0092 | 76 ms |
| 10,000 | 0.0031 | 760 ms |
| 50,000 | 0.0018 | 3.8 s |
| Test | Status |
|---|---|
| Constructor & init | PASS |
| Return type & structure | PASS |
| Pure PageRank mode | PASS |
| Keyword relevance | PASS |
| Recency weight | PASS |
| Topic-sensitive PR | PASS |
| Combined weights | PASS |
| Edge cases | PASS |
A browser-based Streamlit application built in Week 6. Crawl any website, run all 5 methods, search with TF-IDF, and compare with Google.
Crawl any URL with depth, path prefix, keyword, and asset filters.
Nodes sized and coloured by PageRank on a blue→amber→red heat scale.
Combined score = 0.6×PR + 0.4×TF-IDF.
Multi-tier fuzzy URL matching. Reports Spearman ρ and Kendall τ.
Each recommendation is validated by modifying the graph and measuring exact PageRank changes using NetworkX.
One link from python_tutorial (PR=0.10) increased web_scraping's rank from 0.010 to 0.039.
Adding a db_hub page increased database_sql PageRank from 0.049 to 0.066.
5 inbound links raised PageRank from 0.010 to 0.086 — a 763% total increase.
Adding one link back to the homepage increased homepage PR from 0.190 to 0.236.
Hub-and-spoke equalises distribution — A improves +29.8%, E drops 65%.
All code requires Python 3.8+. Install dependencies once, then run any requirement independently.
pip install networkx numpy scipy matplotlib requests beautifulsoup4 streamlit plotly pandas # ============================================================ # experiments.py # ============================================================ # Run a specific experiment: python experiments.py --exp 1 # Convergence Speed python experiments.py --exp 2 # Scalability python experiments.py --exp 3 # Accuracy Comparison python experiments.py --exp 4 # Damping Factor Sensitivity python experiments.py --exp 5 # Monte Carlo Accuracy # Run all experiments: python experiments.py --exp all # ============================================================ # pr_simulator.py # ============================================================ streamlit run app.py # ============================================================ # progressive_pruning_pr.py # ============================================================ python pruning_test.py # ============================================================ # realistic_data.py # ============================================================ # Demo with synthetic graph (no internet required): python req2_realistic_data.py --demo # Scrape a real website (internet required): python req2_realistic_data.py --scrape --url https://docs.python.org/3/ --depth 2 # Demo on a previously saved edge list: python req2_realistic_data.py --demo --input edge_list.csv # Dependencies: pip install networkx numpy scipy matplotlib requests beautifulsoup4 # ============================================================ # search_engine.py # ============================================================ python search_engine_test.py # Dependencies: pip install networkx numpy # ============================================================ # metro_daily_with_recommendations.py # ============================================================ python metro_daily_with_recommendations.py