Features and heterogeneities in growing network models

Many complex networks from the World-Wide-Web to biological networks are growing taking into account the heterogeneous features of the nodes. The feature of a node might be a discrete quantity such as a classification of a URL document as personal page, thematic website, news, blog, search engine, social network, ect. or the classification of a gene in a functional module. Moreover the feature of a node can be a continuous variable such as the position of a node in the embedding space. In order to account for these properties, in this paper we provide a generalization of growing network models with preferential attachment that includes the effect of heterogeneous features of the nodes. The main effect of heterogeneity is the emergence of an"effective fitness"for each class of nodes, determining the rate at which nodes acquire new links. The degree distribution exhibits a multiscaling behaviour analogous to the the fitness model. This property is robust with respect to variations in the model, as long as links are assigned through effective preferential attachment. Beyond the degree distribution, in this paper we give a full characterization of the other relevant properties of the model. We evaluate the clustering coefficient and show that it disappears for large network size, a property shared with the Barab\'asi-Albert model. Negative degree correlations are also present in the studied class of models, along with non-trivial mixing patterns among features. We therefore conclude that both small clustering coefficients and disassortative mixing are outcomes of the preferential attachment mechanism in general growing networks.


I. INTRODUCTION
In the last ten years statistical mechanics has made great advances [1][2][3][4][5][6] in the understanding of the dynamics and the characteristic structural properties of complex networks. These findings shed light on the universal organization principles beyond a large variety of biological, social, communication and technological systems. Recently, large attention as been addressed to spatial networks [7] in which the links are determined by the "similarity" or proximity of the nodes in the physical or hidden space in which the networks are embedded. Spatial networks [7] are found in communication [8], transportation [9,10] and even social networks [11]. The role of space in complex networks significantly affects the dynamical properties of the graphs changing their navigability properties [8,12], the critical behavior of the Ising model [13] or the epidemic spreading [7].
However, the similarity between two nodes might also correspond to a modular organization of the network [14]. The general networks models that consider a modular structure are block-models [15] and multifractal models [16] that are intrinsically static models of modular networks. Nevertheless several networks are simultaneously growing and developing a modular structure. For example, the World Wide Web contains pages falling into different classes with widely different features (personal pages, thematic websites, news, blogs, search engines, social networks. . . ) and links between any two pages are influenced by their qualities as well as by the specific classes to which they belong.
Moreover many molecular networks, as for example protein-protein interaction networks, transcription networks and coexpression networks are scale-free [17] and growing but have in addition a relevant modular structure. In coexpression networks there is a good correspondence between network modules and biological functions as characterized by Gene Ontology or pathway enrichment analyses [18]. Connectivity between and within modules depends on tissue and species [19] and should be taken into account in realistic models of coexpression networks.
In light of these results it is necessary to understand how similarity, spatial embedding, modular structure or other features might change the nowadays classic description of growing networks following preferential attachment [20]. In this mechanism, new nodes are added at constant rate and connected to existing nodes of the network. The probability Π(i) that a new node connects to a node i is proportional to the degree k i of the node i, i.e. Π(i) ∝ k i . Preferential attachment mechanism has been directly measured for many complex networks and remains a successful explanation for their scale-free degree distribution. Nevertheless, pure preferential attachment, as implemented in the Barabási-Albert (BA) model [20], has some drawbacks: for example, it generates a clustering coefficient that is small compared with the ones observed in real networks and follows a "firstmover-advantage" mechanisms to the extent that older nodes are systematically associated with larger degree. Several rules for network growth giving rise to an effective preferential attachment mechanism have been studied to overcome these limitations (see reviews in [1,21,22]).
Shortly after the seminal paper by Barabási and Albert [20], it was recognized that heterogeneity between nodes is an important ingredient for more realistic models, destroying the age-degree correlation present in the BA model. The first proposal in this direction was the addition of node quality or "fitness" to the BA model [23,24]. This model by Bianconi and Barabási paved the way for the study of a wider class of models with other node features such as position in space [25][26][27][28][29]. Some properties of preferential attachment networks on metric spaces, such as the degree and link length distributions, were derived analytically by two of the authors [30]. The result is that both fitness [23] and space [30] give rise to networks with multi-scaling in the degree distribution (i.e., a sum of power laws).
In this paper we provide a general analysis of growing networks with both preferential attachment and features. The preferential attachment probability from a new node to node i is proportional to the degree k i of the node multiplied by a generic positive function of the features h of the new node and h i of the node i: The features can have different interpretations: they can represent spatial embedding of the network, or discrete features of the nodes defining a modular structure of the network, or they can represent fitness describing the higher ability of some nodes to acquire new links. The space of feature and the connection function σ(h i , h) are fixed and do not coevolve with the network. We give a full account of the model determining the degree distribution, clustering and assortativity. We derive the general expression for the asymptotic degree distribution in the rate equation approach. The form of this distribution is a convolution of power-laws depending on an "effective fitness" of the node, which is determined by the similarity matrix through a self-consistent equation. We also discuss the conditions under which the rate equation approach breaks down and the fate of the network in these cases.
We derive the general expression for the asymptotic clustering coefficient, showing that clustering always decreases as an inverse power of the network size and therefore disappears in the thermodynamic limit. Assortativ-ity in node degree and features are also studied in these networks, both numerically and analytically. Node degree correlations are negative as in the BA model: disassortativity increases with the heterogeneity of the nodes and decreases slowly with the network size.
Finally we allow for several variations on the model, including addition and rewiring of links, and show that the results are robust with respect to these variations as long as the connections are assigned through preferential attachment.

II. DEGREE DISTRIBUTION OF GROWING NETWORKS WITH FEATURES
A. The model We present a general class of models for networks with preferential attachment and features. In these models, each node has a feature h randomly chosen from a set S with probability p(h). We call h i and k i the feature and degree of the ith node. At each time step, a node with m links is added to the network. These links are connected to existing nodes with probability where h corresponds to the feature of the new node and σ(h ′ , h) is a (positive) connection function from h to h ′ . Such a model is therefore completely defined by the functions p(h) and σ(h ′ , h).
In modular or community models, denoting by N c the number of communities, h is an integer number in 1 . . . N c while p(1) . . . p(N c ) denote the relative sizes of the different communities. In the fitness model, h is the quality of the nodes, that is, a real positive number. In spatial models, h is the spatial position of the node; for example, in models on a plane, h = (x, y) and p(x, y) is the node density.

B. Degree distribution
We derive the degree distribution through the rate equation approach introduced by Krapivsky, Redner and Leyvraz [22], modified as in [30]. The equation for N k (h), which is the average number of nodes with feature h and degree k, is where the δ k,m term in the left hand side accounts for the birth of new nodes. If the features are continuous, the number of nodes is substituted with the number density in the equation above, and the sums over the features with integrals. More generally, if S is not a finite set, p(h) denotes a probability measure over S and the sum h∈S p(h)f (h) should be read as S p(h)f (h), while N k (h, t) for fixed t is a finite measure over S. We assume a linear scaling with time for the quantity where we neglect finite-size corrections contained in the o(t) term, which depend on the initial nodes of the network [31]. Then we can define n k (h, t) = N k (h, t)/t and rewrite it as where q(h) plays the same role as the (average) fitness of the node [23,24] and is defined as where C(h) is determined by solving asymptotically the above equation (5), obtaining and substituting in the definition of C(h) to obtain We can join (6) and (8) in a single functional equation for q(h): From the point of view of the degree distribution, this class of models is equivalent to the fitness model of Bianconi and Barabási [23,24], but in this case the fitness distribution is determined by p(h) and σ(h, l) through equation (9). The resulting degree distribution is so the distribution is a sum of power laws similarly to the fitness model, and for a regular distribution of h and q(h) it typically reduces to a power law with logarithmic corrections [23].
Actually, the fitness model itself is a particular example of such a model with a feature h = η ∈ [0, 1] distributed as ρ(η) and a connection probability σ(h, l) = h = η. Then the equation (9) reduces to where C depends on the whole distribution of η and is determined by the usual consistency equation therefore the model reduces to the fitness model with q(η) = η/C. Also the spatial network models discussed in [25-28, 30, 32] are special cases of the above model, with the position playing the role of feature. Note that since the sum of all node degrees i k i (t) should be equal to twice the total number of links mt, and since its mean is given by However, this identity is not new but can be derived from (6) and (8) by substituting the definition of C(h) in the numerator of the identity 1 = h p(h)C(h)/C(h) and rearranging.
Interestingly, many simple models are based on "symmetric" features, that is, all the values of the features are equivalent. In other terms, there is a group of bijective transformations T α from S to itself such that the distribution and connection function are invariant (that is, p(T α (h)) = p(h) and σ(T α (h), T α (h ′ )) = σ(h, h ′ )) and moreover the action of the group is transitive (that is, for every pair h, h ′ there is a transformation T β mapping h in h ′ , T β (h) = h ′ ). We denote these models as homogeneous models. In this case, if all the sums in the above equations are convergent, the symmetry implies q(T (h)) = q(h) and transitivity implies that all q are the same, then from equation (13) we obtain immediately q = 1/2. This means that all homogeneous models have the same degree distribution n k ∼ k −3 of the Barabási-Albert model. We will see some examples in section II C. We can actually extend the argument to a slightly more general condition, following [33]: if the quantities l∈S p(l)σ(l, h) and l∈S p(l)σ(h, l) are equal and independent of h, then the degree distribution is the same of the BA model.
For non-homogeneous models, the equations (9) often need to be solved numerically. For discrete features, a solution can be obtained by root-finding methods. For continuous features, an effective way to solve this kind of equations was presented in [30].
C. Examples

Bipartite networks
A bipartite network is usually composed by two classes of nodes (h = 1, 2) with links connecting only nodes of different classes. To build a scale-free bipartite network, we choose σ (1,1) = σ (2,2) = 0. We assume that a node can belong to class 1 or 2 with probabilities p 1 , p 2 (with p 1 + p 2 = 1). Note that in this model the non-zero terms of the connection function can be redefined as σ (1,2) = σ (2,1) = 1, so the function is actually symmetric.
The qualities of the two classes from equation (9) are then q 1 = p 2 , q 2 = p 1 (14) and the degree distribution is given by that if p 1 = p 2 = 1/2, the model is actually homogeneous and n k ∼ k −3 as expected from our general arguments.
Similar models have been applied to the human sexual networks, which have a bipartite heterosexual component [34].

A network with asymmetric connection function
Another simple but interesting example can be obtained by assuming two kinds of nodes, "central" and "periferic" (with probabilities p C and p P = 1 − p C ) such that two periferic nodes are never connected (i.e. the connection function satisfies σ (P,P ) = 0). The other terms of the connection function can be always redefined as σ (C,C) = σ (C,P ) = 1 so its only parameter is σ ≡ σ (P,C) ∈ [0, +∞]. This parameter controls the asymmetry of the connection function.
The relevant solution of equation (9) with this connection function is In the limitσ → ∞, the model reduces to a bipartite network and the solution to q C = 1 − p C , q P = p C as expected.

Community structure
To model scale-free networks with community structure [35], a simple possibility is to label each community by a feature h c and choose a connection function with σ(h c , h c ) > σ(h c , h c ′ ). In the simplest model, all communities have the same size and connect randomly to the other communities, with some preference for selfconnections. (The corresponding connection function is In this case the model is actually homogeneous and has the same degree distribution as the BA model, independently on the number of communities and the value ofσ.

Hierarchical structure and navigable networks
Navigable networks are often based on a hierarchical structure [36], with a connection probability that depends on the distance on a tree representing the hierarchical levels and the nodes. Scale-free navigable networks can be easily built by choosing a symmetric connection function depending only on the distances on the tree. (Note that these models are actually spatial models, since a tree is an ultrametric space.) Even if there is a lot of interesting structure in these networks, the degree distribution follows the simple multi-scaling behaviour in equation (10). In particular, the simplest cases of binary or n-ary trees (or more general trees where the length and the number of branches splitting from a single branch depend only on the level), with nodes located at the top of the terminal branches, have the usual degree distribution n k ∼ k −3 , since these trees are homogeneous spaces.

Modular structure with fitness
As a final example, we discuss a model with both modular structure and fitness. This model can be considered a simplified model of the WWW. We assume that each page is assigned to some category n according to type, content and functionality. Each category could have a different relative size π n and a distribution of page fitness ρ n (η). Moreover, the relative importance of different categories for a page of category q is given by the weights w n,q , with q w n,q = 1. (For a page of category n, the weights w n,q affect its probability of being linked by pages from other categories.) The network evolves as follows: at each time, a new node is added to the network and assigned to the nth category with probability π n , then its fitness is randomly extracted from ρ n (η). The node is connected to the existing nodes according to a probability proportional to the fitness η, the weight w and the degree k of the nodes: so the model dynamics follows equation (2) with h = (n, η), σ(h i , h) = η i w ni,n and p(h) = π n ρ n (η). Similar models appear in a natural way in the study of many Continuous lines represent the theoretical predictions. The hierarchy is A<B<C and nodes in each module can connect only to nodes in the same or higher modules with equal probabilities. The fitness distributions are ρA(η) = U (1/2,1) (η), ρB(η) = U (0,1) (η) and ρC(η) = 2η. Simulated networks have size N = 10 7 and initial node degree m = 5. Qualities are measured asq = log ( k(t) / k(t/2) ) / log 2 as detailed in [30].
systems. In this model, beyond the fitness, the dynamics of a node is influenced by the relative size of its category π n as well as by the weights w n,q and the sizes of other categories π q . In the WWW example, there are millions of blogs but only a few search engines, and there is a good probability that a blog links a search page; this explains the different connectivity and degree distribution of these categories.
The node qualities for this model from equation (9) are equivalent to a modified fitness model q(η, n) = η/γ n , where the coefficients γ n solve the nonlinear equations Simulation results are in very good agreement with the numerical solution of these equations, as shown in Figure  1.

D. Breakdown of the rate equation approach
If the space S of features has a finite number of elements, the approach presented here gives rise to a finite number of nonlinear equations (6), (8) in the variables q(h), C(h). If these equations admit a single solution, as we expect, then the degree distribution follows equation (10).
If the number of features is infinite, equations (6),(8) could involve divergent sums (or integrals). In partic-ular, there are two situation where the rate equations break down: (i) the sums of the connection function are divergent, that is, the connection function is not measurable, or (ii) all sums converge but there is no solution to the selfconsistency equations. We discuss these two scenarios in the next sections.

Heterogeneity-driven attachment
If the sums of the connection function are divergent, links attach preferentially to some nodes not chosen on the basis of preferential attachment but belonging to the sets of features with divergent sums of the connection function. This can give rise to exponential tails or condensation or other behaviour, depending on form of the connection function. As an example, spatial models with a divergent connection function near d = 0 show a behaviour similar to nearest-neighbour attachment and consequently an exponential tail [25,30]. Another example is given by networks on a flat space of dimension D with uniform node density and a connection function σ(x, x ′ ) that is an inverse power law in the distance between the node and a pointx, for example In this case the divergence is localized aroundx, prompting condensation on the nodes closer tox since these nodes get most of the new connections.

Bose-Einstein condensation
Even if all the sums of the connection function are convergent, it is still possible to find cases where the equations for q(h) do not admit solutions. In fitness models, the lack of solution of the self-consistent equations is a signal of Bose-Einstein condensation of links [24] on the nodes of highest fitness.
In more general models, condensation occurs on nodes close to a feature h c determined as follows. Each h ∈ S corresponds to an element of the set M of measures . Conversely, given a Radon metric on M , each point in its closureM can be mapped to a feature belonging to an "extended" spaceS. The Radon metric onM induces a metric onS, which can therefore be thought of as the closure of S: the elements ofS are "limit points" of S and are the candidates for condensation. In particular, the location h c of the condensate can be found by considering the addition of a single node with variableh ∈S to the network and maximizing its asymptotic link share nh: As a simple example, in the fitness model with S = [0, η max ), the set M contains the measures µ η (η ′ ) = ηp(η ′ ) and its closure corresponds toS = [0, η max ], which is the closure of S. In this case nη is zero unlessη = η max , because the consistency equation with an additional node has no finite solution for other values ofη, therefore condensation occurs near η max as expected.

A. Clustering
We define the average clustering coefficient C clust of a network as C clust = 3 · n triangles /n triples . It is well known that the clustering coefficient of the Barabási-Albert model decreases as the network size t increases, converging to zero in the thermodynamic limit [37]. As we show in the next sections, this property is quite general, being shared by all heterogenous models under some conditions on the convergence of sums of σ(h, h ′ ). This mean that heterogeneity or features cannot account for non-vanishing clustering coefficients observed in most real networks.

Clustering in the Barabási-Albert and in homogeneous model
Both the average number of triangles and the average number of triples can be easily computed from the preferential attachment rule if we assume that node degrees follow the continuum equation for the mean degree [38]. This approach has been applied to the BA model in [39]. Here we generalize it to models with features. First, we summarize the computation for the BA case. The asymptotic number of triangles is given by Denoting by t A < t B < t C the birth times of a triplet of nodes, BA networks contain three kind of triples: Since older nodes have the highest degrees and are therefore the most attractive under preferential attachment, for large t almost all triples are of the last kind and their number is given by therefore obtaining the known result for the asymptotic clustering coefficient Details of the calculations can be found in appendix A. For homogeneous models (q(h) = 1/2) the same calculation is valid for the number of triples, while the number of triangles and therefore the clustering coefficient are multiplied by a factor dependent on the connection function: Features appear only in this factor. For example, bipartite networks have no triangles and the above factor is zero, giving C clust = 0. Note that the above factor could be divergent: in this case our approach breaks down and the asymptotic behaviour of C clust (t) could change. However, we are not aware of any model of this kind with non-zero clustering in the limit t → ∞ and a power-law tail in the degree distribution. For a symmetric model with community structure, the clustering coefficient is An additional remark applies to spatial networks with short-range interactions and networks with community structure. At short times, these models resemble geometric random graph models, which exhibit strong clustering [27]. However, when the average number of nodes in the interaction volume increases, the clustering coefficient decreases as a consequence of the preferential attachment dynamics. At longer times, clustering falls according to eqs. (22),(23) but remains higher for spatial networks than for the Barabási-Albert model, as it is shown in Figure 2.

Clustering in general models
For general models with some q(h) = 1/2, the calculation is similar to the homogeneous case. The dominant contribution comes from triples and triangles with a vertex of maximum fitness q M = q(h M ) > 1/2 with h M = arg max h∈S q(h). The leading contribution to the average number of triangles is while the number of triples is then the clustering coefficient is This expression is valid as long as the sum inside it is finite. So the asymptotic clustering is sligthly smaller than in the BA model. From this general result we can extract the clustering for the fitness model of Bianconi and Barabási [23] in the fit-get-rich phase: assuming η max = 1 and ρ(η max ) > 0.

B. Assortativity: features
Nodes with different h are not randomly connected: in-and out-going links connect preferentially nodes with some features. These preferences are embedded in the in-and out-distributions f IN (h i , h ′ ) and f OUT (h i , h ′ ), which are asymptotically defined by the equations where k IN i and k OUT i are the number of in-and outgoing links for the ith node (note that in these models are the numbers of in-and out-going links between the ith node and nodes with variable h ′ . From the definitions above, we have by plugging in equation (29) and comparing with the continuum equation for giving as a result while f OUT can be obtained from equation (2) by substituting k i with the mean of the total degree of nodes of feature h ′ : From these distributions it is easy to find the fraction of links between nodes with variables h ′ , h ′′ : that should be compared with the null value of the same quantity, obtained by a random re-arrangement of links that preserves degree (i.e., unassortative connections): As an example, consider the simplest model with community structure in section II C 3. Denote the number of communities by N c and the only parameter of the connection function byσ = σ(h, h ′ )/σ(h, h) for h ′ = h. The null distribution of links is given by ϕ 0 (h ′ , h ′ ) = 1/N 2 c , ϕ 0 (h ′ , h ′′ ) = 2/N 2 c for h ′′ = h ′ , while the actual assortativity depends on the parameterσ: so links are randomly distributed between features if σ = 1, while the mixing is assortative (that is, For the Bianconi-Barabási fitness model, the distributions are and therefore the system shows disassortative mixing, since the ratio ϕ(η, η ′ )/ϕ 0 (η, η ′ ) is higher between high and low fitness than between similar fitnesses.

C. Assortativity: degree correlations
Nontrivial connection functions σ(h, h ′ ) and node distributions p(h) do not only induce assortative mixing between nodes with different variables, but affect also the degree-degree correlations between neighbours, as shown in [40] for the BA model. The BA model is disassortative [41]; however, in models with features, the pattern of (disassortative) mixing between nodes with different degree is influenced by the hidden space and the connection function.

Analytical results: continuum approximation
A first understanding of degree correlations can be obtained by assuming deterministic evolution of node degree k = m(t/t 0 ) q . In this approximation, average nearest-neighbour degree is given by is the probability that if we choose a random neighbour of a random node with features h and degree k, the node chosen has degree k N N and features i. If we denote the birth times of these nodes by t 0 and t 0N N respectively, then if t 0N N < t 0 (i.e. (k N N /m) 1/q(i) > (k/m) 1/q(h) ) this probability can be obtained from the connection probability m σ(i,h)kNN (t0) mC(h)t0 multiplied by 1/k (which accounts for the random neighbour chosen), p(i) (the probability that a node of feature i is born at time t 0N N ) and dt 0N N /dk N N (the Jacobian of the mapping from k N N to t 0N N ). The final result is given by dt0NN . In the other case, if t 0N N > t 0 , the relevant connection probability is m σ(h,i)k(t0NN ) mC(i)t0NN . Substituting the present values of k and k N N , we obtain For the BA model, we have P (k N N |k) = mk −2 N N and the average nearest neighbour degree k N N (k) = m log(t)/2 is independent of k, so the model shows no assortativity at all in this approximation. On the other hand, models with features tend to be disassortative, i.e. k N N (k) decreases with k. However, this is not true for all classes of nodes: for example, low quality nodes tend to be assortative, i.e. they attach more often to low quality nodes with similar degree.
In figure 3 we compare the above results (41)  correctly the qualitative pattern of degree correlations in these models, even if it does not fully account for their disassortativity.
It is also possible to estimate the asymptotic behaviour of the assortativity coefficient in the same approximation. We denote the maximum quality by q M = max h q(h). As in the BA model, the coefficient C ass tends to zero in the thermodynamic limit t → ∞. However, its asymptotic behaviour is C ass ∼ t qM −1 , and since q M ≃ 1 in many models with features, its decrease is typically very slow compared to the BA model (C ass ∼ t −1/2 ).

Analytical results: rate equation approach
Assortativity in node degree can be easily estimated from the matrix n k,l , defined as the fraction of links in the network joining a node of degree k with a younger node of degree l. In models with features, we define n (h,i) k,l as the fraction of links joining a node of degree k and variable h with a younger node of degree l and variable i. The matrix n (h,i) k,l takes into account correlations between both degrees and features. Asymptotically, this matrix can be obtained as in [40], by solving the difference equation This equation implies that n (h,i) k,l ∝ F (q(h), q(i)) · σ(h, i)p(i)p(h)/C(i) where F (q(h), q(i)) is a function of the qualities, in agreement with the form (35) for the assortative mixing between features. The matrix n k,l can then be obtained as n k,l = h,i n (h,i) k,l . In principle, the equation (42) can be solved by generating function methods discussed in appendix B. The general solution for m = 1 is . However, extracting information from this solution is difficult.
The existence of degree correlations can be also shown simply by looking at the scaling of the quantity n (h,i) k,l for k ≫ l and k ≪ l. In the BA model with m = 1, the scaling is n k,l ∼ k −2 l −2 and kl −5 respectively [40], which is different from the naive k −2 l −2 expected in the absence of degree correlations.
The scaling k −2 l −2 for k ≫ l can be understood from the following simple arguments. The degree of young nodes is dominated by outgoing connections, which select random nodes with probability proportional to kn k ∼ k −2 . On the other way, the attractiveness of old nodes decays with time as t −1/2 , therefore the distribution of the linked nodes (assuming a deterministic evolution k = l(t) ∼ t −1/2 as a function of the birth time t) is dt(l)/ t(l) = dl/l 2 .
To obtain the scaling for n (h,i) k,l in these models with m = 1, we approximate the difference equation with the corresponding differential equation n which is quite different from the null scaling k −1/q(h) l −1/q(i) . This shows that degree correlations exist also in heterogeneous model. Note that the scaling for k ≫ l is consistent with the continuum approximation (41). The same scaling for k ≫ l and l fixed can also be found directly from the generating function (B3) using Tauberian theorems. For m > 1, the scaling for k ≫ l is the same as for m = 1, while the scaling for l ≫ k changes to n (h,i) k,l ∼ k m l −1−(1+(m+1)q(h))/q(i) .
The BA model is slightly disassortative in degree and the addition of features does not change this property. This is already apparent from the numerical results in the previous sections, but can be also understood from the above scaling properties. In fact, in the BA model the asymptotic ratio between n k,l and its null value scales between 1 for k ≫ l and (k/l) 3 ∼ 0 for k ≪ l, therefore decreasing while k and l get closer. In homogeneous models (q = 1/2), like the symmetric communities model, the scaling is the same as in the BA model. On the other side, in models with different qualities (like the Bianconi-Barabasi fitness model) the pattern of degree correlations is nontrivial, as already observed in the previous section: for example, old and well-connected nodes with features h are preferentially linked to younger nodes of features i and similar degree if qualities are low (q(i) + q(h) < 1), but they link instead to younger nodes of low degree if qualities are high (q(i) + q(h) > 1).

Numerical results: spatial networks
As an example, we consider spatial networks with preferential attachment [30]. In these networks S corresponds to a metric space and the connection function depends only on the distance σ(x, y) =σ(d(x, y)). We simulated network growth on disks in 2-dimensional spaces with constant curvature (sphere, flat and hyperbolic space) and calculated both the assortativity coefficient C ass [41] and the average nearest neighbour degree k N N (k) . We show the results in table I and figures 4, 5 and 6.
It is apparent that the short-range connections and the spatial structure make the network slightly more disassortative, because hubs tend to be sparse (close hubs compete between them and reduce their degree). Hyperbolic spaces at strong curvature show even higher disassortativity because of their almost star-like connectivity.
Also, the generally low values of the assortativity coefficients suggest that they decrease with time as it happens in the BA model.

IV. GENERALIZATIONS
The Barabási-Albert model is based purely on addition of nodes and preferential attachment. Realistic models can include many other ingredients: addition of extra links, rewiring and removal of links, directed links, variable initial degree, attachment functions that are only asymptotically linear in k, etc. Some generalizations of the BA model are reviewed in [1,21].
In this section we present several variations on the heterogeneous models with preferential attachment presented in section II A. For most of these generalizations, the degree distribution is a sum of power-laws, showing that scale-free or multi-scaling behaviour is a robust feature of preferential attachment models. The results are generally similar when different generalizations are combined together.
The consistency equations for q(h) in these models can be obtained through a rate equation approach or, more easily, by using the continuum approach, i.e. the deterministic evolution of node degree dk i /dt = q(h i )k i /t or equivalently k i (t) = k i (t 0 )(t/t 0 ) q(hi) [38]. The corresponding equations for q(h) are exact, as explained in appendix C.

A. Heterogeneity in initial degree
We consider a model of growing networks with the usual preferential attachment rule (2). However, new nodes have a initial degree m(h) that depends on their feature h. The continuum equation for the node degree is The degree distribution is given by equation (47) Interestingly, if σ(h, h ′ ) = 1, the model is equivalent to a variation on the BA model with a random initial degree for the new nodes. In practice, the feature is the initial degree m of each node. Assume that the average initial connectivitym is finite. Then if the distribution of m decays faster than m −3 , the degree distribution of this model for k ≫m is the same as the BA model. Instead, if the distribution of m decays as a power law with exponent −α greater than −3, the sum in equation (46) gives an additional factor k 3−α and therefore n k ∼ k −α . More generally, we can consider a general variation on the BA model with degree distribution p 0 (k) ∼ k −γ for fixed m, and modify this model to allow for a stochastic initial degree with distribution p(m) ∼ m −α . In this case the degree distribution from equation (46) is n k ∼ k − min(α,γ) , in agreement with the formal results in [42].

B. Heterogeneous links
In this model, the attractiveness of a node does not depend only on the number of links, but also on the types of nodes to which they are attached. For examples, in copying vertex models or walking models [43], new nodes could preferentially explore existing nodes with a specific feature (e.g. search engines in the WWW example), therefore the effective preferential attachment dynamics would give an higher weight to the links that connect to this feature.
The preferential attachment probability is a positive linear combination of k i(h) , which is the number of links between the ith node and nodes with variable h: The distribution follows equation (10) with quality q(h) defined by the set of consistency equations C. Shifted preferential attachment The preferential attachment rule is modified by the addition of a positive term a(h i , h) independent of the degree but dependent on the features. Similar models without features were proposed in [40,44].
The degree distribution for large k follows equation (10) with quality q(h) defined by (6) and C(h) defined by

D. Directed links
In this model the preferential attachment probability is proportional to the number of incoming links k IN : Note that the positive term a(h) is needed to specify the initial attachment probability because initially k IN = 0. The results are similar to the shifted preferential attachment case, with C(h) defined by

E. Addition of links
In this model, in addition to the usual growth rules, extra links are added at rate mr + and attached to nodes i, j according to the probability This model can be solved similarly to the usual one under some extra assumptions, like the scaling The degree distribution follows equation (10) with quality q(h) defined by

F. Preferential rewiring in directed networks
Modifications to the usual preferential attachment growth of the node degree include also possible losses of links, either because of removal or rewiring [45]. An high rate of link removal/rewiring could result in a degree distribution with an exponential tail instead of the usual power-law tail. However, if the removal/rewiring process is not too fast, the resulting distribution is typically a power-law with an exponent dependent on the rates of the different processes.
Several heterogeneous models with preferential rewiring and/or removal of links can be analyzed with the techniques of this paper. In these models the growth of node degrees can be characterized by an effective fitness q(h) and the stochastic noise due to link addition and removal does not change significantly the tail of the degree distribution, as explained in appendix C. Here present a simple example of such a model.
We consider a directed network growing under the same rules as the model (2) and with rewiring taking place at rate mr. In each rewiring process, a random link is selected with probability proportional to σ − (h in , h out ) where h in and h out are the variables of the attached nodes. The ingoing end of the link is then detached and reattached to another node according to the probability In this model, at least for large k, the degree distribution follows a multi-scaling behaviour similar to equation (10), given by with quality q(h) defined by the set of consistency equations In this model (and more generally in models including rewiring/removal of links) the quality q(h) can also be negative, thus requiring the factor θ(q(h)) in equation (63).

G. Fixing the connection probability between features
The last variation is a more radical departure from the heterogeneous models presented in this paper, since it is a modification of the attachment probability (2). In this model, a new node with feature h attaches to nodes with different variables h ′ according to a probability π(h ′ |h) independent of the degrees. Then, once a feature h ′ is chosen at random according to π(h ′ |h), a specific node i with h i = h ′ is chosen according to the usual preferential attachment rule. (If no such node exists, another variable h ′′ is chosen according to π(h ′′ |h) .) Asymptotically, the overall probability is then It is possible to define a quality q(h) also for these models.
Assuming the scaling we obtain C(h) = 1/(1 − q(h)) and the quality can be obtained explicitly: The degree distribution follows the usual equation (10). This model works if the feature space S is discrete and finite, but can be generalized to continuous spaces through discretization and equation (72) is valid with p(h) and π(h ′ |h) interpreted as probability densities.

V. CONCLUSIONS
The addition of node features to growing network models with preferential attachment is an important step towards realistic network modeling and results in a wide class of models, for which this paper provides several analytical results. In particular, this work shows that the power-law scaling of the degree distribution generated by preferential attachment is quite robust with respect to the heterogeneity between nodes. The main effect of heterogeneity is the emergence of an "effective fitness" q(h) for each class of nodes, therefore their degree distribution resembles the fitness model of Bianconi and Barabási [23,24].
Beyond the degree distribution, other network properties were studied. The clustering coefficient of these networks disappears for large network size, a property shared with the BA model. Negative degree correlations are also present in these models, along with non-trivial mixing patterns among features. Both small clustering coefficients and disassortative mixing are therefore outcomes of the preferential attachment mechanism in general growing networks.
The effect of the features h associated to each node has been presented as non-random, but the formalism applies to any kind of heterogeneity. In particular it is easy to include random variables or features with random effects as well, as long as their values do not change with time. In fact, any random effect can be parametrized by some extra random variables χ with a distribution p R (χ). Then it is possible to redefine a non-random variableh = (h, χ) with frequencyp(h) = p(h)p R (χ). The connection functionσ(h,h ′ ) now takes into account both random and non-random components. So the formalism captures stochastic as well as deterministic node features.
Moreover, the growth of many scale-free networks is based on some local dynamics such as the vertex copying/duplication rules for growth of molecular networks. However, from the point of view of the link distribution, the local dynamics often results in an effective preferential attachment mechanism. Our methods and results on degree distribution and assortativity apply to these models as well, if we take the connection probability (2) as an effective dynamics for the growth of the node connectivities. Therefore the main results of this paper, i.e. powerlaw multiscaling of the degree distribution and disassortative mixing in degree, are generally valid for models with effective preferential attachment. On the other way, the clustering coefficient depends on the specific model and not only on the effective form (2) for the attachment probability, therefore our proof that the clustering vanishes in the thermodynamic limit is valid only for models with pure preferential attachment.
In future works it would be very interesting to map the metric space implied by the class of growing network models discussed in this paper with the hidden metrics recently introduced to model complex networks in hyperbolic spaces [46]. Finally, an interesting extension of the model would be to include features that fluctuate in time, in order to determine how time-dependent heterogeneities affect the power-law behaviour of the degree distribution and the other properties discussed here, and features that coevolve with the network, for example spaces that expand with time while the node density remains constant.
In a general heterogenous model, we consider a triplet of nodes born at times t A < t B < t C with qualities q A , q B , q C . The average number of triangles can be found by integrating the probability that all three nodes are connected on the birth times t A , t B , t C : The average number of triples can be found in a similar way. These networks can contain three kind of triples: A ← B ← C, A ← C → B and B → A ← C. Since each new node has m outgoing links, it increases the number of triples A ← C → B by a factor m(m−1)/2 and the number of triples A ← B ← C by a factor m 2 , independently of the model, so their total number is m(3m− 1)t/2. The number of triples B → A ← C is given by the integral We give the full result for BA networks, since the other cases are straightforward (but cumbersome) generalizations. In the BA case we have q A = q B = q C = 1/2 and the clustering coefficient is including some finite corrections to the leading behaviour C(t) ∝ ln 2 t/t. noise into account, we promote the above equation to a Langevin equation obtained by adding a stochastic term + (q + + q − )k/t · η(t) with η(t) a white Gaussian noise with variance 1. Defining q = q + − q − and Q = q + + q − , we finally obtain the integrated Fokker-Planck equation ∂n(k, t) ∂t = − n(k, t) t − q t ∂(kn(k, t)) ∂k + Q 2t ∂ 2 (kn(k, t)) ∂k 2 (C2) for the average degree distribution n(k, t) = t 1 dt 0 P (k, t|t 0 )/t, where P (k, t|t 0 ) is the distribution for a node born at time t 0 . The rate equation approach leads to the same equation. The (normaliz-able) stationary solution of this equation can be derived as a power series: (1 + nq) Γ(n + q −1 ) 2 Γ(n + 1) − Q q n k −n (C3) This is an asymptotic series that can be resummed by Borel summation in the variable z = k −1 . The leading contribution to the Borel transform of the sum in (C3) behaves as B(z) ∼ Qz for z → 0, therefore for large k the Borel sum is Qk −1 and the resulting degree distribution is n k ∼ k −1−q −1 (1 + O(Qk −1 )), thereby confirming the validity of the simple continuum approach (without the diffusion term) for k ≫ Q.