This repository mainly includes source code for benchmarking the performance of EasyGraph with hybrid programming and multiprocessing techniques.
Benchmarking code that compares the performance of EasyGraph with hybrid programming. Specifically, we compare EasyGraph (with C++ binding) with the igraph library.
For different types of networks, we compare a series of network analysis functions including:
Undirected Network | Directed Network |
---|---|
Network Loading | Network Loading |
the Shortest Path | the Shortest Path |
the K-core Centrality | the K-core Centrality |
the Closeness Centrality | the Closeness Centrality |
the Betweenness Centrality | the Betweenness Centrality |
/ | the PageRank Centrality |
The PageRank centrality is not suitable for undirected networks.
Benchmarking code that compares the performance of multiprocessing techniques used in EasyGraph.
For different types of networks, we compare a series of network analysis functions including:
Undirected Network | Directed Network |
---|---|
the Local Cluster Coefficient | the Local Cluster Coefficient |
the Hierarchy | the Hierarchy |
the Closeness Centrality | the Closeness Centrality |
the Betweenness Centrality | the Betweenness Centrality |
We compare EasyGraph (with C++ binding) with the igraph library.
The specific way the function is called is shown in the following file
'"import easygraph as eg"'
loading(undirected): "'eg.GraphC().add_edges_from_file(filename, weighted=False,is_transform=True)'"
loading(directed): "'eg.DiGraphC().add_edges_from_file(filename, weighted=False,is_transform=True)'"
pagerank: "'eg.pagerank(g,alpha=0.85)'"
shortest path: "'eg.multi_source_dijkstra(g, sources = eg_node_list)'"
connected_components(undirected): "'eg.connected_components(g)'"
connected_components(directed): "'eg.strongly_connected_components(g)'"
closeness: "'eg.closeness_centrality(g, sources = eg_node_list)'"
betweenness: "'eg.betweenness_centrality(g)'"
k-core: "'eg.k_core(g)'"```
'"import igraph as ig"'
igraph(version:0.10.4):
loading(undirected): "'ig.Graph.Read_Edgelist(filename, False)'"
loading(directed): '"ig.Graph.Read_Edgelist(filename, True)"'
pagerank: '"g.pagerank(damping=0.85)"'
shortest path: '"g.distances(source = ig_node_list,weights=[1]*len(g.es))"'
connected components: '"g.connected_components()"'
k-core: '"g.coreness()"'
closeness: '"g.closeness_centrality(weights=[1]*len(g.es), sources = ig_node_list)"'
betweenness(directed): '"g.betweenness(directed=True, weights=[1]*len(g.es))"'
betweenness(undirected): "'g.betweenness(directed=False, weights=[1]*len(g.es))'"
We compare EasyGraph with the different numbers of workers.
The specific way the function is called is shown in the following file
'"import easygraph as eg"'
loading(undirected): "'g = erdos_renyi_M(size, edge=m, directed=False)'"
loading(directed): "'g = erdos_renyi_M(size, edge=m, directed=True)'"
hierarchy: "'eg.hierarchy(g, n_workers=n_workers)'"
clustering: "'eg.clustering(g, n_workers=n_workers)'"
closeness: "'eg.closeness_centrality(g, sources = eg_node_list)'"
betweenness: "'eg.betweenness_centrality(g)'"
3.9 <= python <= 3.10
is required.
First, to run these scripts, you need to clone the repo.
To install EasyGraph
:
Installation with pip
pip install Python-EasyGraph
Install from scratch
You will need to build EasyGraph wheels from scratch if your platform does not support them. Please run the following code to install the module.
git clone https://github.com/easy-graph/Easy-Graph && cd Easy-Graph && git checkout pybind11
pip install pybind11
python3 setup.py build_ext
python3 setup.py install
To install the igraph library, please refer to https://python.igraph.org/en/stable/
For objective1:
Iteration: 3 times for directed network datasets, 5 times for undirected network datasets.
Node sample: 1,000 nodes are sampled from directed network datasets when testing the algorithm of identification of the shortest paths and the metric of closeness centrality.
Subgraph generation: the largest connected components/strongly connected components are calculated by NetworkX, code is presented in get_lcc_edgelist.py
For objective2:
Iteration: 3 times for directed network datasets, 5 times for undirected network datasets.
Node sample: 1,000 nodes are sampled from directed network datasets when testing the algorithm of the identification of the shortest paths and the metric of closeness centrality.
Subgraph generation: the largest connected components/strongly connected components are calculated by NetworkX, code is presented in get_lcc_edgelist.py
cd easygraph_benchmark_20230501
There are the following scripts in this directory:
benchmark.py // tool for calculating the time consumption of each function
get_lcc_edgelist.py // tool for getting the largest connected components of each dataset
profile_directed_hp_er.py // bench on directed random networks generated by Erdős-Renyi random network model with hybrid programming
profile_directed_hp_rw.py // bench on directed real-world networks generated by Erdős-Renyi random network model with hybrid programming
profile_directed_mp_rw.py // bench on directed real-world networks with multiprocessing techniques
profile_directed_mp_er.py // bench on directed random networks generated by Erdős-Renyi random network model with multiprocessing techniques
profile_undirected_hp_er.py // bench on undirected random networks generated by Erdős-Renyi random network model with hybrid programming
profile_undirected_hp_rw.py // bench on undirected real-world networks with hybrid programming
profile_undirected_mp_rw.py // bench on undirected real-world networks with multiprocessing techniques
- Open your terminal, and make sure you have entered into the right Python environment that is equipped with EasyGraph and the igraph library.
- Choose a script and run the script as follows:
python profile_directed_hp_rw.py
The er_*
Erdos-Renyi random graphs are generated with eg.erdos_renyi_M()
with artificially defined numbers of nodes and edges.
Dataset Name | nodes | edges | is_directed | average_degree | density |
---|---|---|---|---|---|
ER_10k_u | 10,000 | 20,000 | False | 4.0 | 0.0004 |
ER_50k_u | 50,000 | 100,000 | False | 4.0 | 8.0e-05 |
ER_100k_u | 100,000 | 200,000 | False | 4.0 | 4.0e-05 |
ER_200k_u | 200,000 | 400,000 | False | 4.0 | 2.0e-05 |
ER_10k_d | 10,000 | 20,000 | True | 4.0 | 0.0002 |
ER_50k_d | 50,000 | 100,000 | True | 4.0 | 4.0e-05 |
ER_100k_d | 100,000 | 200,000 | True | 4.0 | 2.0e-05 |
ER_200k_d | 200,000 | 400,000 | True | 4.0 | 1.0e-05 |
ca-HepTh | 9,877 | 25,998 | False | 5.26 | 0.0005 |
email-Enron | 36,692 | 183,831 | False | 10.02 | 0.0003 |
LastFM | 7,624 | 27,806 | False | 7.2943 | 0.0010 |
ca-HepPh | 12,008 | 118,521 | False | 19.74 | 0.0016 |
ca-CondMat | 23,133 | 93,497 | False | 8.08 | 0.0003 |
soc-Epinions1 | 75,879 | 508,837 | True | 13.41 | 8.8e−05 |
wikivote | 7,115 | 103,689 | True | 29.15 | 0.0020 |
pgp.edgelist | 39,796 | 301,498 | True | 15.15 | 0.0002 |
p2p-Gnutella04 | 10,876 | 39,994 | True | 7.35 | 0.0003 |
soc-Slashdot0811 | 77,360 | 905,468 | True | 23.41 | 0.0002 |
web-NotreDame | 325,729 | 1,497,134 | True | 9.19 | 1.5e−07 |
email-EuAll | 265,214 | 420,045 | True | 3.17 | 0.0003 |