GSoC 2025 - NetworkX(nx-parallel)

# nx-parallel: GSoC 2025 sync meetings **Thursdays**: - 06:30 to 07:30 pm IST - 09:00 to 10:00 am New York Time - 01:00 to 02:00 pm UTC **Meeting link**: https://colgate.zoom.us/j/281534728 **Notes**: https://hackmd.io/tF0saJhyQ2i25e8tWiBrmw ## 20225-08-21 **Present:** Akshita, Dan ### Topics: - [`is_reachable`](https://github.com/networkx/nx-parallel/pull/119) - mem-mapping vs pure python? : https://github.com/networkx/nx-parallel/pull/119#issuecomment-3128470584 - idea: leave as mem-mapping and add should_run - [`harmonic_centrality`](https://github.com/networkx/nx-parallel/pull/124#issuecomment-3194283003) - adding `should_run_if_sparse` :thumbsup: - [CI failure on python 3.13 in Windows not finding _posixsubprocess](https://github.com/networkx/nx-parallel/issues/140) - Recent merges to Cpython: https://github.com/python/cpython/commits/main/ - Recent releases of joblib? ### WIP: - re-open [PR #44](https://github.com/networkx/nx-parallel/pull/44) and add `should_run` to it - [set `should_run=False` unless nodes is None](https://github.com/networkx/nx-parallel/issues/137) - finishing touches after merging ## 2025-08-14 **Present:** Akshita, Dan, Aditi ### Topics: - [`is_reachable`](https://github.com/networkx/nx-parallel/pull/119) - leave the numpy implementation as is? - [`harmonic_centrality`](https://github.com/networkx/nx-parallel/pull/124) - why `betweeness_centrality` shows speedups and `harmonic_centrality` doesnt? - Order of PR merging (with revert): - revert PR with test_get_chunks - should_run - add changes for test_get_chunks in the algorithms PRs - merge community detection as 1st algorithm - might have conflicts after 2nd algorithm in the 3rd algorithm - algorithm PRs that should have should_run - Order of PR merging (without revert): - merge PRs with the algorithms - add should_run functions for algorithms to the should_run PR - merge the should_run PR - Order 2 of PR merging (without revert): - merge the should_run PR - add should_run functions for algorithms to respective PRs - merge PRs with the algorithms Final decision : **Order 2** ### PRs Ready for Review #### Independent PRs - [Refactor ASV benchmarks](https://github.com/networkx/nx-parallel/pull/126) - [`triangles`](https://github.com/networkx/nx-parallel/pull/106) - [link prediction algorithms](https://github.com/networkx/nx-parallel/pull/127) #### Dependent PRs 1. [Make `n_jobs=-1` default where appropriate](https://github.com/networkx/nx-parallel/pull/122) - [Improve timing script](https://github.com/networkx/nx-parallel/pull/114) (it can use the default number of cores instead of setting joblib configs). 2. [`should_run` parameter](https://github.com/networkx/nx-parallel/pull/123) Add `should_run` to the following algorithms. - [`number_` algorithms](https://github.com/networkx/nx-parallel/pull/117) - [`clustering` and `average_clustering`](https://github.com/networkx/nx-parallel/pull/130) - [`average_neighbor_degree`](https://github.com/networkx/nx-parallel/pull/132) - [`v_structures` and `colliders`](https://github.com/networkx/nx-parallel/pull/134) - [`is_reachable`](https://github.com/networkx/nx-parallel/pull/119/) ### TO DO: 1. Add tests to [`should_run` parameter](https://github.com/networkx/nx-parallel/pull/123) and add `should_run` to more algorithms. -------------------------- No meeting on 2025-06-26 -------------------------- ## 2025-07-31 **Present:** Akshita, Aditi ### Topics: - asv `setup` function PR - [Aditi] getting error while running benchmarks: `Couldn't load asv.plugins._mamba_helpers because No module named 'libmambapy'` - [Aditi] `params = [backends, num_nodes, edge_prob]` or `params = [(backends), (num_nodes), (edge_prob)] -- ideally first one should be fine - [Aditi] graph conversion time is negligible so excluding or including it doesn't make much difference-- but document in README whatever we are including or excluding in the benchmarks. - Adding `average_clustering` algorithm - should we use NetworkX's `clustering` implementation and parallelise sum, len functions in `average_clustering`? - should we simply use parallel `clustering` in `average_clustering`? - should we use parallel `clustering` and also parallelise sum, len functions in `average_clustering`? - [adding `should_run` parameter](https://github.com/networkx/nx-parallel/pull/123) - [Aditi] LGTM! - Adding should_run to more algorithms - including should_run for `if n_jobs=None or 1 or 0` for all nx-parallel algorithms - [parallel link prediction algorithms](https://github.com/networkx/nx-parallel/pull/127) - Refer PR - https://github.com/networkx/networkx/pull/8170 - error msg: ```G = <networkx.classes.graph.Graph object at 0x7f60a49920d0>, u = 2 community = 'community' def _community(G, u, community): """Get the community of the given node.""" node_u = G.nodes[u] try: return node_u[community] except KeyError as err: > raise nx.NetworkXAlgorithmError( f"No community information available for Node {u}" ) from err E networkx.exception.NetworkXAlgorithmError: No community information available for Node 2 /opt/hostedtoolcache/Python/3.13.5/x64/lib/python3.13/site-packages/networkx/algorithms/link_prediction.py:685: NetworkXAlgorithmError ``` - [parallel `is_reachable()`](https://github.com/networkx/nx-parallel/pull/119) - PRs ready for review: - [make `n_jobs=-1`](https://github.com/networkx/nx-parallel/pull/122) - [improve timing script](https://github.com/networkx/nx-parallel/pull/114) - Algorithms: - [parallel `number_` algorithms](https://github.com/networkx/nx-parallel/pull/117) - [parallel `triangles`](https://github.com/networkx/nx-parallel/pull/106) - [adding `should_run` parameter](https://github.com/networkx/nx-parallel/pull/123) - Merge [parallel `number_` algorithms](https://github.com/networkx/nx-parallel/pull/117) after adding `should_run` parameter. - [parallel `_apply_pred`](https://github.com/networkx/nx-parallel/pull/127) - This can be merged after both [PR#8170](https://github.com/networkx/networkx/pull/8170) and [PR#129](https://github.com/networkx/nx-parallel/pull/129) get merged - [parallel `clustering`](https://github.com/networkx/nx-parallel/pull/130) - [TO DO]: - Re-open [PR#74](https://github.com/networkx/nx-parallel/pull/74) and add `should_run` to it once merged. - Add parallel implementation of `average_clustering` and `average_neighbor_degree` ## 2025-07-24 **Present:** Akshita, Dan, Aditi ### Topics: - [using setup functions](https://github.com/networkx/nx-parallel/pull/126) - benchmarks on tournaments, `_apply_prediction`, `number_` requires a setup function - it's fine to merge the setup function PRs but wouldn't recommend giving more time to surface-level asv benchmarking. - [adding `should_run` parameter](https://github.com/networkx/nx-parallel/pull/123) - combine repeated `should_run` checks into a single reusable function to avoid duplication? - Goal: minimize code redundancy; one way might be to not have should_run as a function property but have should_run as an independent function. - [adding link prediction algorithms](https://github.com/networkx/nx-parallel/pull/127) - figure out a compatible return type for `_apply_prediction`? - [adding harmonic centrality](https://github.com/networkx/nx-parallel/pull/124) - address declining performance - keep an eye out for possible ways to apply logging? - Handling parallel functions with inner calls to some other parallel functions? - should we give the control of the inner parallel function to the user(i.e. let them set its configs)? - should we choose networkx over nx-parallel? or otherway around? and what's the basis of that choice? - [is_reachable()](https://github.com/networkx/nx-parallel/pull/119) - PRs ready for review: - [refactor `test_get_chunks()`](https://github.com/networkx/nx-parallel/pull/128) - [make `n_jobs=-1`](https://github.com/networkx/nx-parallel/pull/122) - [parallel `number_` algorithms](https://github.com/networkx/nx-parallel/pull/117) - [improve timing script](https://github.com/networkx/nx-parallel/pull/114) - final heatmap look -- https://raw.githubusercontent.com/networkx/nx-parallel/f2d9970faef96bfccd491c2afda96a9f446617e2/timing/heatmap_betweenness_centrality_timing.png - [parallel `triangles`](https://github.com/networkx/nx-parallel/pull/106) ## 2025-07-17 **Present:** Akshita, Aditi, Dan ### Topics: - PR review: - [GSoC related PRs in nx-parallel](https://github.com/networkx/nx-parallel/pulls/akshitasure12) - [NetworkX PR to optimize is_reachable](https://github.com/networkx/networkx/pull/8112) - PRs to be opened: harmonic centrality, clustering and (`_apply_prediction` in link_prediction.py) - PR: 106(triangles) - ready - TODO: breaking into 2 PRs: - 117(should_run, number_ algorithms) - 122(n_job=-1) - need 2 days for final touch ups - PR: 114(timing script): after n_jobs=-1 PR gets merged - PR: 119 (is_reachable) - hold on till `is_reachable()` gets merged in NetworkX - Daily updates via gform(privately) or in github blogs repo(publically)? -- communicate by EoTomorrow. Including: - Date - hours spent - What did you learn/worked-on today? - should only take 10-15 mins at the EoD - informal language with spelling errors allowed :) - no to general-sounding work items like "worked on PR123", or "read about xyz" instead describe in brief the changes pushed in PR123 or the conclusions after reading on xyz, respectively. -------------------------- No meeting on 2025-06-26, 2025-07-03 or 2025-07-10 -------------------------- ## 2025-06-19 **Present:** Aditi, Akshita, Dan ### Topics: - Should_run: ``` NETWORKX_BACKEND_PRIORITY_ALGOS=parallel NETWORKX_FALLBACK_TO_NX=False python3 test_should_run.py Call to 'number_of_isolates' has inputs from {'networkx'} backends, and will try to use backends in the following order: ['parallel', 'networkx'] Backend 'parallel' shouldn't run `number_of_isolates` with arguments: (G=<networkx.classes.graph.Graph object at 0x1055fb6e0>), because: Fast algorithm; not worth converting Trying next backend: 'networkx' Call to 'isolates' has inputs from {'networkx'} backends, and will try to use backends in the following order: ['parallel', 'networkx'] Backend 'parallel' does not implement 'isolates' Trying next backend: 'networkx' 5 ``` `test_should_run.py` contents: ```python import networkx as nx import logging nxl = logging.getLogger("networkx") nxl.addHandler(logging.StreamHandler()) nxl.setLevel(logging.DEBUG) G = nx.empty_graph(5) print(nx.number_of_isolates(G)) ``` ## 2025-06-12 **Present:** Aditi, Akshita, Dan ### Topics: - timing script PR almost ready to review: - changing the scale and color gradient - revert `tournament.py` changes - timeit vs perf_counter? --> timeit runs multiple time (better for getting more accurate results) - numpy array approach for `is_reachable`? (7x speed-up with `n_jobs=1`) - `for`-loops better with numpy - `dict` --> hash table access - further discussions at community meet and a potential PR in networkx/networkx - should_run ## 2025-06-05 **Present:** Aditi, Akshita, Dan ### Topics: - How do we handle algorithms with a fast NetworkX implementation? - https://github.com/networkx/nx-parallel/issues/77 - `should_run` - Scope of algorithms which aren't embarassingly parallel - TODO: drop 3.11 (`itertools.batch` was added in 3.12) - Dan have some work done on his branch - Aditi will create a follow-up PR - The Python 3.14 in October will allow us to remove support for Python 3.11 anyway. So this only moves up the removal by ~4 months. - Memory could be the issue we get lower speedups with larger graphs. Pattern : speedup increases and then decreases. - can we avoid copying of variables(especially `G`) in joblib's parallel processes? - Sometimes test dependencies get installed sometimes they don't-- why? - https://github.com/networkx/nx-parallel/actions/runs/15452987789/job/43499295871 - https://github.com/networkx/nx-parallel/actions/runs/15455906459/job/43508038492?pr=114 - probably a bug on `pip`'s side --> "NO" - fix: move numpy back to the `test` dependencies - add optional dependencies for heatmaps - `closeness_centrality` --> ask the original author if they want to continue otherwise Akshita can continue their work. ## 2025-05-22 **Present:** Akshita, Aditi, Dan ### Topics: - [TODO] Akshita will share update proposal soon. - Changing `n_jobs=-1` and `active=True` - https://github.com/networkx/nx-parallel/issues/111 - Dan and Aditi : upvote for changing the default - https://github.com/networkx/nx-parallel/pull/112/files#diff-6c4558902ff3766f8b71b0bef5e764e5933ab96e80fab09c8ffaff9b6445f1eaR41 : `>=` instead of `==` --- ## 2025-05-15 **Present:** Akshita, Aditi ### Topics: - introductions and setting expectations - Expectations of 5 parties involved: - GSoC: https://google.github.io/gsocguides/ - NumFOCUS: https://github.com/numfocus/gsoc/blob/master/CONTRIBUTING-students.md - Contributor(Akshita): I want to learn as much as possible during my time here, get technical and career guidance. - Mentors(Aditi, Dan): Learn as much as possible during these 3-4 months, and right now(during community bonding period) - interact with the community and get to know everyone - iterate on the working plan for the coding period -- with well-defined goals - more details on coordination, etc. communicated in the invitation email - NetworkX: usually open-source projects participate in programs like GSoC to get regular contributors/maintainers. After internship is over, the contributor can decide if they want to stick around or not. - [Dan via email] one place to post your blogs is on the https://blogs.scientific-Python.org blog page. To do that create a PR using instructions: https://blog.scientific-python.org/about/submit/ - [Dan via email] You can attend any and all NetworkX meetings. But you don’t need to attend them. It’s good to stay in touch with people, but we also know that sometimes it won’t work. The important thing is to be making progress on your project. And learning about how it all fits together is importantly too. The meetings can help with that aspect. - timing script not giving speedups for Akshita: - quick patch fix: Add the following lines: ```python import joblib joblib.parallel_config(n_jobs=-1) ``` - try in a fresh development environment - try to observe the CPU usage in the Activity Monitor(terminate/pause any heavy processes running in background) - In long run: Fix and update the whole timing script to surpass all the issues brought up in [Issue#51](https://github.com/networkx/nx-parallel/issues/51); also a better visualisation diagram(something better than heatmaps) for showcasing performance improvements would be nice! - output(with speedups-- from Aditi's machine): ``` 200 0.7826707363128662 0.3163788318634033 Finished <function is_reachable at 0x1040c4ae0> 400 0.6777710914611816 1.5500338077545166 Finished <function is_reachable at 0x1040c4ae0> 800 11.148853063583374 5.316954851150513 Finished <function is_reachable at 0x1040c4ae0> 1600 137.0498948097229 21.71297287940979 Finished <function is_reachable at 0x1040c4ae0> is_reachable ``` - final outcome: something like this(https://github.com/python-graphblas/graphblas-algorithms) for benchmarking visualisation - TODO[Aditi]: review [PR#112](https://github.com/networkx/nx-parallel/pull/112)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.