Milliseconds.dev Posted on May 31 • Originally published at milliseconds.dev NetworkX vs CSR + TensorPrimitives: PageRank on 28M Edges # csharp # dotnet # performance # benchmarks Overview PageRank is the canonical graph algorithm. NetworkX implements it in pure Python — its dict-of-dict adjacency representation means every power-iteration step dispatches millions of Python attribute lookups. When the graph has 1.8 million nodes and 28.5 million edges (Wikipedia category hyperlinks), those lookups dominate the runtime. The .NET replacement uses a CSR (Compressed Sparse Row) matrix — two flat int[] arrays for the graph structure — and TensorPrimitives for the SIMD-accelerated normalization step inside each iteration. Benchmark Setup Five SNAP datasets of increasing size: Dataset Nodes Edges wiki-Vote 7,115 103,689 soc-Epinions1 75,879 508,837 web-Stanford 281,903 2,312,497 web-Google 875,713 5,105,039 wiki-topcats 1,791,489 28,511,807 Algorithm: power-iteration PageRank, damping=0.85, tol=1e-6. Both implementations converge to identical top-10 node rankings. Results Dataset Python (NetworkX) .NET (CSR) Speedup wiki-Vote (103k edges) ~0.8 s ~100 ms ~8× soc-Epinions1 (508k edges) ~8 s ~600 ms ~13× web-Stanford (2.3M edges) ~120 s ~5 s ~24× web-Google (5.1M edges) ~5.5 min ~12 s ~28× wiki-topcats (28.5M edges) ~47 min ~60 s ~47× The speedup grows with graph size because NetworkX's Python dispatch cost scales with edge count, while the CSR inner loop is a tight JIT-compiled SIMD pass. Why CSR Beats NetworkX NetworkX represents each node's neighbors as a Python dict. Iterating the adjacency in one power-iteration step means: Calling G.neighbors(node) — a Python method call Iterating a dict — unboxing int keys, chasing heap pointers Accumulating a float into another dict value — another boxing step That happens for every edge, every iteration, roughly 50–80 times to convergence. CSR collapses the graph to two arrays: rowPtr[n+1] (where each node's neighbors start) and
LIVE
