Here is my code for generating edge list based on Price model algorithm. I think there have two points to reduce execution time:
- use proper data type
- use faster random sampling
Originally, I used list instead of NumPy array and then changed from extending array till c x n to setting initial fixed array (and change the initial values).
It made execution time faster but still way slower than same implementation written in Matlab.
import pickle from random import random, sample import numpy as np def gen_edge_list(a, c, n): """ Generate edge list based on 14.1.1 """ edge_list = np.zeros(c*n) p = float(c) / (float(c)+a) # % for preferential attachment edges edge_list[0] = 1 # edge from vertex 2 to 1 idx = 1 # point to the next index for t in range(3, n+1): if t <= c+1: """ If network size is smaller than c+1, connect among all vertices. """ edge_list[idx:idx+t-1] = [i for i in range(1, t)] idx = idx+t-1 else: """ decide preferential attachment or uniformly random attachment by given p """ n_pref = len([True for i in range(c) if random() < p]) edge_list[idx:idx+n_pref] = sample(edge_list[0:idx-1], n_pref) idx += n_pref edge_list[idx:idx+c-n_pref] = sample(range(1, t+1), c-n_pref) idx = idx + c - n_pref if __name__ == "__main__": a = [1.5] c = [3] n = 10**6 edge_lists = [] for i in range(len(a)): edge_lists.append(gen_edge_list(a[i], c[i], n)) output = open('edge_list.pkl', 'wb') pickle.dump(edge_lists, output) output.close()
My biggest concern is especially the following:
""" decide preferential attachment or uniformly random attachment by given p """ n_pref = len([True for i in range(c) if random() < p]) edge_list[idx:idx+n_pref] = sample(edge_list[0:idx-1], n_pref) idx += n_pref edge_list[idx:idx+c-n_pref] = sample(range(1, t+1), c-n_pref) idx = idx + c - n_pref
Here is my friends code written in Matlab:
a = [1.5]; c = [3]; n = 1^6; edgelist = cell(numel(c),1); for i = 1:numel(c) p = c(i)./(c(i)+a(i)); edgelist{i} = zeros(1, c(i).*n); edgelist{i}(1) = 1; idx = 2; for t = 3:n if t <= c(i)+1 edgelist{i}(idx:(idx+t-2)) = 1:(t-1); idx = idx+t-1; else pref_or_rand = rand(1,c(i)) < p; prefn = sum(pref_or_rand); edgelist{i}(idx:(idx+c(i)-1)) = [edgelist{i}(randi(idx-1,1,prefn)) randi(t,1,c(i)-prefn)]; idx = idx+c(i); end end end
I don't know what makes this huge difference on execution time between them. (40 sec in Matlab on mac book pro vs 40 min with Python code in recent i5 machine on Debian)
If you have any idea, please let me know.