Network meta‐analysis and random walks

Network meta‐analysis (NMA) is a central tool for evidence synthesis in clinical research. The results of an NMA depend critically on the quality of evidence being pooled. In assessing the validity of an NMA, it is therefore important to know the proportion contributions of each direct treatment comparison to each network treatment effect. The construction of proportion contributions is based on the observation that each row of the hat matrix represents a so‐called “evidence flow network” for each treatment comparison. However, the existing algorithm used to calculate these values is associated with ambiguity according to the selection of paths. In this article, we present a novel analogy between NMA and random walks. We use this analogy to derive closed‐form expressions for the proportion contributions. A random walk on a graph is a stochastic process that describes a succession of random “hops” between vertices which are connected by an edge. The weight of an edge relates to the probability that the walker moves along that edge. We use the graph representation of NMA to construct the transition matrix for a random walk on the network of evidence. We show that the net number of times a walker crosses each edge of the network is related to the evidence flow network. By then defining a random walk on the directed evidence flow network, we derive analytically the matrix of proportion contributions. The random‐walk approach has none of the associated ambiguity of the existing algorithm.


INTRODUCTION
Network meta-analysis (NMA) has been established as a central tool of evidence synthesis in clinical research 1,2,3 . Combining direct and indirect evidence from multiple randomised controlled trials, NMA makes it possible to compare interventions that have not been tested together in any trial 4,5,6 . The term 'network meta-analysis' derives from the fact that one can mathematically represent the collection of interventions and trials as a graph. A graph consists of a set of nodes and a set of edges connecting pairs of nodes. The nodes of an NMA graph represent the different treatment options, and edges are comparisons made between the treatments in the trials. In line with Rücker (2012) 7 we will refer to networks of treatment options and comparisons between treatments as 'meta-analytic graphs'.
An NMA combines data from multiple trials, each comparing different combinations of treatment options. The accuracy of the conclusions from an NMA depends on potential biases associated with individual trials, and on assumptions such as betweentrial homogeneity and consistency between direct and indirect evidence. In this context it is useful to study the so-called 'flow of evidence' 8 in the network. This describes the influence different network components have on the estimates of treatment effects. For example, the comparison between two particular treatments may enter as indirect evidence into the estimate of the relative effect of two different nodes in the network. Understanding how exactly evidence flows in the graph then allows one to assess the impact of potential bias originating from different pieces of evidence in the network 8,9,10 .
Previous literature has, for example, looked at the relative influence of direct evidence compared to indirect evidence 11,12 . Other work has been concerned with measures of network geometry, capturing the frequency with which different comparisons are represented in the trials underpinning an NMA 13,14 . One then asks how the network structure affects NMA estimates of treatment effects, heterogeneity and rank metrics 13,14,15,16,17,18 . König et al (2013) 8 observed that in a two-step ('aggregate') NMA model 11 each row of the 'hat matrix' represents an evidence flow network for a particular treatment effect. König et al then visualised the evidence flow on weighted directed acyclic graphs in which nodes represent treatments, and edges indicate the direction and quantity of evidence flow through each direct comparison. Based on this observation, Papakonstantinou et al (2018) 9 introduced 'streams' of evidence and developed a numerical algorithm to calculate these streams. The streams of evidence are then used to derive the 'proportion contribution' of each direct comparison to each treatment effect in the graph. This allows one to quantify how limitations of individual studies impact on the estimates obtained from the network. Indeed, the algorithm in Papakonstantinou et al. is implemented in software such as CINeMA (confidence in network meta-analysis) 10 and ROB-MEN (risk-of-bias due to missing evidence in NMA) 19 , used in clinical practice for the evaluation of results from an NMA.
More widely, the study of networks plays a key role in a variety of disciplines including ecology, economics, electrical engineering and sociology 20,21,22 . Through the representation of treatment options and comparisons in trials as a graph, one can therefore take advantage of the extensive literature in network theory, and of ideas developed in the disciplines in which networks are studied. For example one of us 7 used the graph representation of NMA to make the connection between meta-analytic and electrical networks. This allows one to demonstrate that graph theoretical tools routinely applied to electrical networks are also of use in NMA. This approach has since led to advancements in NMA methodology such as frequentist ranking methods 23 and component NMA 24 . It is also the basis for the software package netmeta 25 .
In this paper we present a new analogy between random walks and NMA. A random walk on a graph is a stochastic process consisting of a succession of 'hops' between vertices connected by edges. Random walks are of interest for a wide range of applications, including statistical physics, biology, ecology, genetics, transport and economics (for a selection of references see 26,27,28,29,30 ). Random walks are also a popular tool to study the properties of networks themselves 31,32,33 .
It is well known that there is a connection between random walks and electrical networks 34,35,36,37 . In this context, edges of the electric network are conducting connections (wires). The correspondence between random walks and electrical networks can be established by asserting that the probability that a random walker currently at node moves to node in the next step is proportional to the conductance (inverse resistance) of the edge connecting and . Quantities in the electrical network such as currents along edges or electric potentials at the nodes then have an interpretation in the random-walk picture. For further details we refer to 37 .
Motivated by the connection between electrical networks and NMA on the one hand, and that of electrical networks and random walks on the other, we construct a random walk on the meta-analytic network. We show that the random-walk picture we develop can be used to study the flow of evidence in the NMA network. In particular there is a random-walk interpretation of the elements in the hat matrix. Further, we construct a second random-walk model, this time on the evidence flow network. From this we derive an analytical expression for proportion contributions which overcomes the limitations of Papakonstantinou et al 9 .
In particular, the algorithm in Papakonstantinou et al selects only a subset of paths on the evidence flow network. This means that paths of evidence that potentially contribute risk of bias are missed. Furthermore, the paths identified by the algorithm are not unambiguous and instead depend on the order in which certain steps are carried out. In contrast, the random-walk approach identifies all possible paths of evidence. The method delivers an unambiguous analytical result for proportion contributions, and less computational effort is required than for the numerical algorithm. In addition, unlike the method in Papakonstantinou et al, the random-walk approach is able to handle networks with multi-arm trials.
The remainder of this paper is set out as follows: We present a motivating data set in Section 2. In Section 3 we provide the relevant background information. We describe an aggregate-level frequentist NMA model and show how the associated hat matrix can be interpreted as evidence flow. In Section 4 we introduce the analogies between NMA, electrical networks and FIGURE 1 A network of psychological treatments for depression (original data from Linde et al (2013) 38 ; presented in Rücker and Schwarzer (2014) 39 ). We use numerical labels from 1 to 11, these are the same as in 39 . Two treatments are connected by an edge if a direct comparison of the two treatments was made in at least one trial; the edge width indicates the number of trials that make the comparison. The network contains one 4-arm trial (comparing treatments 1-6-7-9), eight 3-arm trials (3-5-9, 2-6-8, 1-6-11, 1-3-9, 2-6-11, 2-6-8, 3-6-9, and 3-4-9) and 17 2-arm trials. Multi-arm trials are not explicitly indicated in the network graph. The data, including the number of trials per comparison, is described in detail in Rücker and Schwarzer (2014) 39 . random walks. Using the analogies to electrical networks in both the NMA and random-walk literature, we then express the flow of evidence in an NMA in terms of properties of random walks on the aggregate network. In Section 5 we introduce a second random-walk model, now on the directed evidence flow network. We use this to analytically derive the matrix of proportion contributions. In Section 6, we apply our method to the motivating data set and demonstrate that the random-walk approach overcomes the limitations of the numerical algorithm previously proposed by Papakonstantinou et al (2018) 9 . We summarise our results in Section 7 and discuss potential future impact of the analogy between NMA and random walks.

MOTIVATING EXAMPLE
We use an NMA of psychological treatments for patients with depressive disorders 38 to motivate our work. The data is described in detail in Rücker and Schwarzer (2014) 39 . For convenience we will occasionally refer to this as the 'depression data set'. The NMA compares = 11 treatments based on = 26 randomised controlled trials. Of these one is a four-arm trial, eight are three-arm trials and 17 contain just two arms. In total, the trials provide = 20 pairs of treatments which are directly compared in at least one trial. The primary outcome of the trials was a binary variable representing patient response after treatment completion. The odds ratio (OR) was used as the measure of relative treatment effect. The graph representing this set of treatments and trials is shown in Figure 1. Vertices in the graph are treatments, and edges represent comparisons between pairs of treatments (two vertices are connected if they have been directly compared in at least one trial). The graph therefore has = 11 vertices and = 20 edges. The thickness of the edges in the figure represent the number of trials making the different comparisons.
NMA aims at estimating treatment effects for all pairs of interventions within this network. One aim of our paper is to determine the contribution (as a proportion) of each direct comparison to these estimates.

Definitions and notation
Among the multiple equivalent frequentist formulations of NMA 6,11,13,39,40 we choose a so-called 'aggregate level' (or two-step) approach 11 to the graph theoretical model developed in Rücker (2012) 7 . Rücker's original (one-step) model is implemented in the R package netmeta 25 . In Appendix A.1 we outline how the aggregate-level graph theoretical approach relates to other frequentist NMA models.
We consider a network of treatments, denoted = 1, … , , and studies, = 1, … , . Throughout this article we will use the lower case letters , , and to refer to treatment nodes. Occasionally we also use and as dummy indices referring to nodes in sums or products. Study compares a subset of treatments (i.e., is the number of treatments in trial ). We use a random-effects model where we focus on relative, rather than absolute effects. To this end, we write , for the observed effect of treatment in trial relative to treatment . We denote the variance associated with this observation by 2 , . The heterogeneity, 2 , in the network can be estimated, for example, using the method-of-moments approach 41 . The estimated heterogeneity is added to the within-trial variance estimate from each study to make the total variance 2 , + 2 . Trial has arms and contributes = ( − 1)∕2 observed relative treatment effects and associated variances. For a trial with = 2, comparing treatments and , the weight assigned is given by the inverse variance, , = 1∕( 2 , + 2 ). In order to account for correlations induced by multi-arm trials ( ≥ 3), we use an adjustment method described in detail in References 7,39,42 . The method involves adjusting the variances associated with each pairwise comparison in a multi-arm trial. For multi-arm trial this results in ≥ 3 weights, , , where and run through all treatments compared in that trial. This defines a complete sub-graph of two-arm trials which is equivalent to the multi-arm trial.

Aggregate-level description
The set of adjusted weights { , } for all trials = 1, … , defines a network of ∑ =1 two-arm trials. This network is equivalent to the original network of (potentially multi-arm) trials in that the resulting relative treatment effect estimates from the network of two-arm trials described by { , } are the same as those obtained from the original network 39 .
We write for the set of trials ∈ {1, … , } comparing treatments and . Using the weights { , }, we perform a pairwise meta-analysis across each of the edges in the network. For the edge connecting nodes and , the direct estimate is calculated as the weighted mean,̂ This results in direct estimates of the relative treatment effects,̂ dir , which we collect in the vector̂ dir . The weight associated with the direct estimatê dir (and to be used in the subsequent analysis) is given by The direct estimates of the relative treatment effects have been termed 'aggregate' data 8,43 . Therefore, Equations (1) and (2) describe the observations and inverse-variance weights for an aggregate-level model. The aggregate model can be represented by an 'aggregate network' where is the weight associated with the edge . We collect the aggregate edge weights in a × diagonal matrix, = diag( ). Figure 2 (a) shows a fictional example of an aggregate network with five treatments = 1, 2, 3, 4, 5. The aggregate weight matrix for this example is = diag (1,3,4,6,5,2,7). We write for the × edge-incidence matrix of the aggregate network. Each column of corresponds to a treatment in the network and each row corresponds to an edge. To construct the matrix, one of the two treatments in each edge is designated as the 'baseline treatment' for this edge without loss of generality. Entries are +1 in the column corresponding to the 'baseline' treatment of the comparison represented by that row, and −1 in the column corresponding to the FIGURE 2 (a) A fictional example of an aggregate meta-analytic network with edges weighted and labelled by their respective (inverse-variance) weights. (b) The resulting evidence flow network for the comparison 1-2 from the aggregate network in (a); the comparison 1-2 is indicated by depicting these nodes and their labels in blue. Edges are directed according to the sign of the corresponding element of the hat matrix, and are weighted by the absolute value of the hat matrix element. (c) The random walk on the aggregate network in (a) for a walker starting at node 1 and finishing at node 2; edges are labelled by the associated transition probabilities. treatment compared to that baseline. For the example in Figure 2 (a) the edge incidence matrix can be chosen as where the columns represent treatments 1, 2, 3, 4, and 5, and the rows represent the edges (direct comparisons) 1-2, 1-3, 1-5, 2-3, 2-4, 3-5, and 4-5. In the following we will use this hyphenated notation when we refer to specific comparisons (e.g. 1-2 for the comparison between treatments 1 and 2). When we refer to a comparison between unspecified treatments and , then we will use the notation , to avoid confusion with ' minus '.

Hat matrix and network estimates
The network estimates of the relative treatment effectŝ net are obtained viâ where the hat matrix associated with the aggregate model is 39 The hat matrix has dimension × where each row and each column correspond to one edge. We denote the element in the row and column by ( ) . The matrix = ⊤ , with dimensions × and rank − 1, is the Laplacian of the aggregate network. The matrix + = ( ⊤ ) + is its pseudo-inverse 7,42 . The hat matrix describes how the direct evidence combines to give the network estimates. Each network estimate is a weighted linear combination of direct and indirect evidence. The coefficients of the estimateŝ dir for each network treatment effect are found in the corresponding row of . The diagonal elements of give the coefficients for the direct evidence while the off-diagonal elements indicate the contribution of indirect evidence. The larger the diagonal elements, the more weight is given to direct evidence 8 . König et al (2013) 8 noted that each row in the hat matrix can be interpreted as a flow network. Focusing on one row of the hat matrix, the magnitude of the flow of evidence between two nodes is given by the absolute value of the element in the corresponding column of . The direction is determined by the sign of the element of the hat matrix. For the -row of the hat matrix one defines evidence flows ( ) (from to ) and ( ) (from to ) as follows 8 :

Evidence flow
Flows are non-negative, and only one of ( ) and ( ) is non-zero.
It is important to note that each comparison gives rise to a separate network of flows. We refer to these graphs as 'evidence flow networks'. Due to the properties of the hat matrix each of these evidence flow networks is directed and acyclic. Specifically, in the network corresponding to the comparison , node only has outgoing edges, and node only incoming edges. The flow network then has the following properties: 1. The total outflow from is equal to one, ∑ ( ) = 1; 2. the sum of inflows to node is also one, ∑ ( ) = 1; 3. and at every intermediate node, ≠ , , the sum of outflows equals the sum of inflows, These properties were stated in Reference 8 , and an algebraic proof for the first and the second property was given in Reference 9 . We provide a heuristic argument for all three properties in Appendix B. Figure 2 (b) shows the evidence flow network for the comparison 1-2 for the aggregate network in Figure 2 (a).

NMA, ELECTRICAL NETWORKS AND RANDOM WALKS
In this section we set up the analogies between NMA, electrical networks and random walks. A summary of these analogies can be found in Table 1.

NMA and electrical networks
The connection between meta-analytic and electrical networks was first introduced by one of us 7 . In the meta-analytic network, treatments are nodes connected by edges representing pairwise comparisons. On the other hand, edges in an electrical network represent resistors that connect at the nodes. In electric networks one then assigns an electric potential to each node, resulting in voltages (=differences in electric potential) across all edges. This in turn induces currents across the edges (current=voltage divided by resistance). Currents may also flow into/out of each node from/to the exterior to guarantee Kirchhoff's current law 44 at each node. These external currents occur, for example, when a voltage source (battery) is attached to a pair of nodes. The analogy between NMA and electric networks is based on the observation that resistances in parallel and sequential electrical circuits combine in the same way as variances of treatment effects in an NMA. Variance therefore corresponds to resistance. One can show that relative treatment effects are the analogue of voltages measured across edges, and weighted treatment effects the analogue of electrical current (see Rücker (2012) 7 for details). This allows one to use graph theoretical tools, routinely applied to electrical networks, to address questions in NMA.
In Rücker (2012) 7 , no voltages or external currents are applied directly to the electric circuit representing the NMA network. Instead, the starting point is given by (potentially) inconsistent measurements of treatment effects (voltages) across the edges of the network. It is then shown that finding the NMA estimates of treatment effects corresponds to finding the set of consistent voltages across all edges minimising the (Euclidian) distance to the inconsistent measurements.
Here, we extend this analogy and show that the elements of the hat matrix have an interpretation in the electric-circuit picture. More precisely, the elements of the row in the hat matrix corresponding to the comparison between treatments and can be inverse-variance weight associated with edge on the aggregate network = −1 conductance (inverse resistance) across edge probability that a walker at node hops to node in the next step The aggregate hat matrix element ( ) that defines the flow of evidence through the direct comparison for the network treatment effect Flow of current through edge when a battery is attached across nodes and such that a unit current flows into and out of Expected net number of times a walker starting at and ending at crosses the edge from to obtained as follows: Connect a battery to nodes and in the electric circuit so that one unit of current flows from the exterior into node , and out of the network (to the exterior) from node . The external currents into/out of all other nodes are maintained at zero. This induces currents across the edges in the network. Our main result is then the following: The current along edge is identical to the hat matrix element ( ) . A detailed mathematical proof can be found in Appendix C.
We illustrate this with a simple network of four nodes in Figure 3. Panel (a) shows a generic electrical circuit resulting from a meta-analytic graph with four treatment options and with direct comparisons between all pairs of treatments except treatments 1 and 4. We focus on the row in the hat matrix corresponding to the comparison between treatments 1 and 2. Using Equation (4) we have for this examplê Our result indicates that the coefficients (1)(2) can be obtained from the setup shown in Figure 3. A battery is attached to nodes 1 and 2 and the voltage of the battery is chosen such that one unit of current flows into node 1 (from the battery) and out of node 2 (into the battery). This induces currents in the five edges (resistors) of the electric circuit. These currents are the hat matrix elements in Equation (7). Via Equation (6) these then determine the flow of evidence.

Electrical networks and random walks 4.2.1 Definitions and notation
As illustrated in Figure 4, a random walk on a graph is a stochastic process consisting of succession of 'hops' between neighbouring nodes (nodes connected by an edge). We use the word 'path' to describe the sequence of nodes visited by the walker, including repeat visits to individual nodes. We always assume that time is discrete. The walk is then a Markov process described by an × transition matrix, , where is the number of nodes in the network. The element of this matrix is the probability that a walker, currently at node , moves to node in the next time step. These probabilities only depend on the current position of the walker, and not on the path taken to reach that position. One has ∑ = 1 for all , i.e. is a stochastic matrix. The connection between random walks and electrical networks has been recognised for some time 34,35,36 and is described extensively in Doyle and Snell (2000) 37 . Here we will only summarise the concepts and known results that are most relevant for our work.
Starting from an electrical network with given resistances a random walk process can be constructed by defining the transition probabilities ( ≠ ) For this particular realisation of the random walk, the net number of times the walker crosses edges 1-3 and 3-2 is one, while all other edges are crossed net zero times. The expected net number of times the walker crosses an edge is given by the currents shown in (b) for that edge 37 . The focus on the comparison of nodes 1 and 2 in panels (b) and (c) is indicated by the blue colour of these nodes.
This definition indicates that transitions from one node to another occur in proportion to the inverse resistance of the direct connection between the two nodes (if there is no direct connection, then no hop can occur between the two nodes). We set = 0 for all . The denominator in Equation (8) ensures normalisation ( ∑ = 1). We always assume the network does not divide into multiple disconnected components. As a result, the transition matrix defined in Equation (8) is such that a walker starting at any node will eventually reach any other node ≠ with finite probability.

Interpretation of electrical current
Electrical current can be interpreted in the random-walk picture as follows 37 : When a voltage is applied between two nodes and such that the total current flowing into and out of from the exterior is 1, the current induced in each edge, , is equal to the expected net number of times a random walker, starting at and walking until it reaches , moves along the edge from to . The net number of times the walker moves from to is the number of crossings in the direction from to minus the number of crossings in the opposite direction.
To describe this mathematically we need to ensure that no more hops occur when the walker reaches the designated end point . In other words, this node must become absorbing. This is achieved by setting the elements to zero for all . For later convenience we denote the resulting modified transition matrix by ( ) , recognising that the modifications made only depend on the choice of , and not on . Mathematically, we have ( ) = 0 for all ≠ , and ( ) = for ≠ and all . We set ( ) to unity. Now consider random walks starting at node and then following the process defined by the transition matrix ( ) . All walks therefore end at node . The probability that a walker takes a particular path connecting and can be written as where the notation { ∈ } indicates the set of pairs of successive nodes in the path . We note that ( ) ( ) is non-zero if and only if the path starts at and ends when is reached for the first time.
The average number of net crossings from node to node along paths starting at and ending at can therefore be obtained as where ( ) is the net number of crossings from to along path . We note that this quantity can be negative; this occurs if the walker makes more transitions from to than from to . The sum in Equation (10) extends over all paths connecting and .
To develop some intuition, consider again the electrical network in Figure 3 (a). Assume that we are interested in the scenario where the external current flows into node 1 and out of node 2, but not into or out of any of the other nodes. We then start the random-walk process at node 1, and use transition probabilities as defined in Equation (8) until the walker reaches node 2. In the first step, the walker either hops to node 2 (this occurs with probability 1-2 ) or to node 3 (with probability 1-3 ). If the walker hops to node 2, the walk stops and the path taken by the walker is 1 ← →2. Otherwise, the walker is at node 3 and in the next step it can transition to 2, 4 or back to 1 with respective probabilities 3-2 , 3-4 and 3-1 . This process continues until the walker eventually reaches node 2. The current through the edge is then given by the expected net number of times such a walker crosses the edge from to before it arrives at node 2. A crossing in the direction from to contributes negatively to this value.
Since the random walker can move in both directions along the network edges, there are infinitely many paths the walker can take as it travels from node 1 to node 2 in this example. Figure 3 The probability the random walker takes this path is given by the product of the individual transition probabilities along the path, that is Although ( ) ( ) can be obtained relatively easily for each path , carrying out the sum in Equation (10) by exhaustive enumeration of all relevant paths is not practicable. This is because there are generally infinitely many paths starting and ending at the designated nodes (due to the possibility to hop back to nodes visited earlier). The analogy between electrical circuits and random walks 37 however can be be used to calculate the expected number of net crossings through an edge analytically. This is detailed in Appendices D and E, see in particular Equation (E42).
The expected number of net crossings can also be obtained from simulations of the random-walk process. An ensemble of walkers is released at the starting point . Each walker then independently hops from node to node on the network with transition rates as in Equation (8) until it hits the designated endpoint (node ). The process then stops. For each walker the net number of crossings from to can be recorded, and this is then averaged over the ensemble of walkers.

Random walk on a meta-analytic network
As described above, conductance (inverse resistance) in an electrical network has an analogue in terms of both NMA, and random walks. Exploiting these analogies, we now define a random-walk process on a meta-analytic network via the transition rates with weights associated with the edges as discussed in Section 3.2, see in particular Equation (2). In order to study walks starting at node and ending at we use the matrix ( ) as defined in Section 4.2.2. This enforces absorption of the walker at node when this node is reached. For the example aggregate network in Figure 2 (a), the transition matrix for a random walk starting at node 1 and ending at node 2 is Each row and column of (1-2) represents a treatment in the network, = 1, 2, 3, 4, 5. Given that we focus on the comparison between treatments 1 and 2, node 1 is the start point of the walk, and node 2 is absorbing. Therefore, the row corresponding to treatment 2 contains only zeroes except for the diagonal element which is equal to one (when the walker reaches node 2 it stays there indefinitely). The entries in each row of the matrix in Equation (13) sum to one. The diagonal elements of (1-2) (except for the element relating to node 2) are zero. This indicates that, with the exception of the absorbing state, the random walker cannot stay at the same place at any step. Figure 2 (c) illustrates the dynamics of the random walk from node 1 to node 2 for this example.
In Section 4.1 we made the connection between the flow of electric current and the flow of evidence in an NMA. Using the interpretation of current as a random walk we can now establish the following analogy: For the comparison of treatments and , the hat matrix element ( ) that defines the flow of evidence through the direct comparison is equal to the expected net number of times a random walker starting at node on the aggregate NMA network moves along the edge from to before it reaches node . In other words, we equate and define the flow of evidence ( ) in terms of ( ) via Equation (6).
In summary, we have used existing analogies between electric circuits and random walks on the one hand, and network-meta analysis and electric circuits on the other to introduce an interpretation of the flow of evidence in network meta-analysis in terms of random walks. The analogies between all three areas are highlighted in Table 1.

PROPORTION CONTRIBUTION
In this section we present a random-walk interpretation and construction of the so-called 'proportion contribution matrix' 9 . While the general idea is similar to the random-walk approach to evidence flow in the previous section, it is important to note that the random walk now no longer takes place on the meta-analytic network. Instead, walkers move on the evidence flow network. As explained in more detail below, the entries of the proportion contribution matrix in NMA can then be obtained from this random walk.
We show that the random-walk approach overcomes the limitations of the algorithm proposed for the evaluation of proportion contributions in Papakonstantinou et al (2018) 9 . In particular, it provides an analytical expression for proportion contributions, removing ambiguity and reducing the computational effort required. Furthermore, unlike the numerical algorithm, the randomwalk approach identifies all paths of evidence so that all potential sources of bias are taken into account. In Section 5.1 we introduce the concept of proportion contributions. In 5.2 we describe the algorithm in Papakonstantinou et al and its limitations. We then present and discuss the random-walk approach in Section 5.3.

FIGURE 5
Illustration of evidence flow, streams of evidence and proportion contributions for a network of topical antibiotics without steroids for chronically discharging ears presented in Macfadyen (2005) 46 . Node 1 is no treatment; 2 is quinolone antibiotic; 3 is antiseptic; and 4 is non-quinolone antibiotic.

Background and definition
In NMA it is important to assess the influence of individual study bias on the estimates obtained from the network. To this end, the CINeMA framework and software 10,45 provides a user friendly system to assess confidence in the results from an NMA. One function of the software is to display the relative influence of evidence that comes from studies with high, moderate and low risk of bias on each network treatment effect. This assessment involves calculating the matrix of so-called 'proportion contributions' 9 . This matrix describes how much each direct treatment effect contributes to each network treatment effect as a relative proportion. The idea of the proportion contribution matrix is based on the hat matrix. The elements of the hat matrix are the coefficients of the linear relation between network estimates and direct estimates in the NMA as described in Equation (4). These coefficients can be positive or negative. The proportion contribution matrix uses the properties of the hat matrix and translates the elements of to positive proportion contributions, where the total contribution is normalised to one. We now explain this in more detail using the work of Papakonstantinou et al (2018) 9 .
Consider the example network in Figure 5 (a). This relates to an NMA of the four topical antibiotics given in the figure caption for the treatment of chronically discharging ears 46 . To keep the text concise we label the treatments 1, 2, 3 and 4. In accordance with Equation (4), the network estimate of comparison 1-2 is given by the linear equation (7), which we repeat here for clarity, We can think of the expression on the right-hand side as a combination of different direct and indirect estimates of 1-2 . The direct estimate is simplŷ dir 1-2 . We obtain one indirect estimate using node 3 and the consistency equation, A second indirect estimate is found via nodes 3 and 4, These three ways of estimating 1-2 correspond to so-called 'paths of evidence' on the evidence flow network 9 . We label these paths ( = 1, 2, 3). As illustrated in Figure 5 We can now write the network estimatê net 1-2 as a linear combination of the estimateŝ dir 1-2 ,̂ ind(1) 1-2 , and̂ ind(2) 1-2 . That is, The coefficients, , define the flow of evidence through each path , see Papakonstantinou et al (2018) 9 . Figure 5 (b) shows how the flows in each edge, described by the hat matrix coefficients, are deconstructed into the flows through each path of evidence, described by the coefficients . In this example, only the edge 1-3 is used for more than one path. When calculating the flow through each path, the flow in edge 1-3 is 'split' between the paths 2 = 1 ← → 3 ← → 2, and according to the flow in the subsequent edges along those two paths. A so-called 'stream' of evidence 9 is a pair consisting of a path and the flow associated with this path, = ( , ). The proportion contribution of each direct comparison to the network estimate of each comparison , is then defined as 9 where | | is the number of edges that make up the path. The sum extends over all paths in the evidence flow network for the comparison that contain the edge . We note that all such paths start at and end at , and, because the evidence flow network is acyclic, multiple visits to the same node do not occur.
For simple examples, such as the one in Figure 5, one can obtain the path flows by directly comparing coefficients in Equations (15) and (18). Using the properties of the hat matrix in Section 3.4 one can then also see that ≥ 0 for all , and that ∑ = 1. This means that the proportion contributions in Equation (19) are also non-negative, and sum to one. Figure 5(c) shows the proportion contributions, expressed as percentages, for the example in Figure 5 (a).
For larger, more connected networks it is not immediately clear how to obtain the . In particular, when there are more paths than edges, expressing the in terms of the coefficients of the hat matrix is non-trivial. Papakonstantinou et al 9 present an iterative algorithm to identify streams for a general evidence flow network. We will now briefly describe this.

Existing iterative numerical algorithm to determine streams of evidence
Broadly speaking, each iteration of the algorithm consists of the following steps: (i) A path in the evidence flow network is selected. (ii) The minimum flow through the edges making up the path is identified. This is assigned as the flow associated with the path. (iii) The flow of the path is subtracted from the values of flow in the edges that make up that path. This means that the edge corresponding to the minimum flow in that path is removed from the graph. (iv) A new path is then selected from the remaining graph. The process repeats until all the evidence flow in the edges has been assigned to a path.
Different methods for selecting the paths in step (i) give rise to multiple variants of the algorithm. For example, paths may be selected at random or in order from shortest to longest. We refer to these approaches as 'Random' and 'Shortest' respectively. To deal with multiple paths of the same length, the Shortest algorithm assigns a cost to each path based on the evidence flow in each edge along the path 9 . The algorithm then selects paths in order from smallest to largest cost. Each time the Random algorithm is run it selects the paths in a different order and, potentially, gives a different outcome. For simple networks such as the example in Figure 5, the order of selection does not affect the outcome. However, for more complicated networks this is not the case. In some graphs, the flow of evidence is fully assigned to streams before every possible path has been selected. The remaining paths can then not be associated with any flow. Critically, this approach means that many paths of evidence are not identified and their contribution (along with any potential bias) is not accounted for. The set of paths that are missed in this way can depend on the order in which paths are selected by the algorithm. Examples of this behaviour are presented in Supplementary File 3 in Papakonstantinou et al (2018) 9 and in Section 6.2 of this paper.
One potential remedy consists of averaging results from the Random algorithm by Papakonstantinou et al over a large number of realisations. We call this method 'Average'. Provided enough realisations are generated the Average algorithm will eventually identify every evidence path. However, because of the nature of the algorithm, the number of times a particular path is sampled by this method can depend on features of the network not directly related to the path. In step (iii) of the algorithm the edge associated with the smallest flow in a particular path is removed from the network. This means that any other path containing this edge can no longer be selected. As a result, paths that do not share edges with any other paths will be selected in every run of the algorithm, whereas paths which do share edges with other paths will be sampled less often. It is therefore not clear how to interpret average proportion contributions determined in this way. Furthermore, this approach is computationally intensive as it TABLE 2 Summary of the two random-walk approaches to NMA. In one approach ('aggregate') the walker moves on the undirected aggregate network. In the second ('evidence flow'), the walker moves on the directed acyclic evidence flow network for a particular comparison of treatments. The transition matrices are denoted by and ( ) respectively. Except for the imposition of a suitable absorbing state (see text) the transition probabilities on the aggregate network do not depend on the particular comparison that is studied. In contrast, there are separate evidence flow networks (and hence random-walk models) for each comparison , hence the superscript in ( ) . The first column in the Proportion of walkers taking a particular path while travelling from to Evidence streams for the comparison between and relies on repeating the (already iterative) algorithm many times. For this reason, this version of the algorithm is not implemented in current software.
To overcome these limitations, we develop a random-walk approach for deriving the streams of evidence. We will now describe this.

Random walk on the evidence flow network
To obtain the evidence streams we define a random walk on the evidence flow network for comparison . We denote the transition matrix for this model by ( ) to distinguish it from the random-walk on the aggregate NMA network defined in Section 4.3. We note that there is a different evidence flow network for each treatment comparison . We indicate this by the superscript ( ). Since the evidence flow network has directed edges the walker can only move in one direction along each edge (in the direction of evidence flow). Node in the evidence flow network for comparison has only outgoing edges, and node only incoming edges. We also note that the evidence flow network is acyclic 8 . This means that a walker can never visit any node more than once.
It is important to distinguish carefully between the random-walk model on the aggregate network and that on the evidence flow network. In Section 4.3 we defined a transition matrix for a random walker moving from node to node on the aggregate meta-analytic network. The walker was allowed to move in both directions along the edges of the network. We labelled this transition matrix ( ) where the superscript indicates the start and end nodes of the walk, i.e., the treatment comparison we are interested in. By analysing the average movement of the walker, we obtained the evidence flow. In this section we focus instead on a random walk on the evidence flow network, and our aim is to construct streams of evidence. The two approaches are summarised and contracted in Table 2.
To illustrate this, we consider the evidence flow network for comparison 1-2 in Figure 5 (a). We now construct a transition matrix for a random walk on this directed acyclic graph assuming that the walker starts at node 1. In contrast to random walks on the undirected meta-analytic graphs in Section 4.3, the walker can only move in one direction across each edge as indicated by the direction of evidence flow. If the flow ( ) = 0 (because the associated hat matrix element ( ) ≤ 0), then no hop from to can occur. Each possible transition occurs with probabilities proportional to the evidence flows indicated in Figure 5 (a).
More generally, for the evidence flow network of comparison , the elements of the transition matrix ( ) are given by For the comparison , the walker remains at indefinitely once it gets there, i.e., we have ( ) = 1, and the probability of transitioning from to any other node ≠ is ( ) = 0. All other elements of the matrix ( ) are given by Equation (20). For the example in Figure 5 (a), the transition matrix for a random walk on this graph is The third row of (1-2) corresponds to transitions from node 3. From Equation (20) and the edge flows shown in Figure 5 (a), we find that if the walker is at node 3, then it moves to either node 2 or node 4 with probabilities 0.251∕(0.251 + 0.114) and 0.114∕(0.251 + 0.114) respectively. Similar calculations are done to find the elements in the other rows. Once arrived at 2 the walker remains there indefinitely. This behaviour is described by the second row of (1)(2) . The walker can take one of three paths from 1 to 2: These are the same as the paths of evidence defined in Section 5.1 and are illustrated in Figure 5 (b). The probability of a walker taking a certain path is given by the product of the individual transition probabilities associated with each edge along that path (Equation (9)). For example, the probability that a random walker takes the path 1 ← → 3 ← → 2 is (1-2) ( 2 ) = (1-2)

1-3
(1-2) 3-2 = 0.365 × 0.688. The probability that a walker takes a given path can also be measured from simulations of the random-walk process on the evidence flow network. To do this one simulates a large ensemble of independent walkers, and measures the proportions of walkers taking each path. We can think of this as flows of walkers through the different paths. We use this interpretation to provide a general analytical definition of the flow of evidence through a particular path: for the evidence flow network for comparison , we define With this definition we can construct decompositions such as the one in Equation (18) for all networks. From the the proportion contributions can then be calculated via Equation (19). For the example in Figure 5 (a), Equation (22) leads to the streams, For this simple example the random-walk approach results in the same evidence streams (and therefore proportion contributions) as the algorithm by Papakonstantinou et al, see Figure 5 (b). The random-walk approach provides an analytical construction of the proportion contributions. The outcome is unambiguous and the method is computationally more efficient than the iterative numerical algorithm. In the following section we demonstrate how the random-walk approach can be used for the more intricate network from Section 2.

FIGURE 6
The aggregate network for the depression data set in Section 2. Treatments 1 to 11 are defined in Figure 1. Here the thickness of each edge represents the associated weight, . The aggregate weights, as presented in the box, were calculated using the methods described in Section 3. The values are quoted to 3 decimal places.

APPLICATION TO REAL DATA SET
We now apply the random-walk approach to the data set described in Section 2. Following Rücker and Schwarzer (2014) 39 , we choose a fixed-effect model ( 2 = 0). The edge weights in the aggregate network were obtained using the methods described in Section 3 and are shown in Figure 6.

Evidence flows
First, we use the random-walk approach described in Section 4.3 to obtain the evidence flows for a certain comparison. We focus on the comparison of treatments 1 (tricyclic or tetracyclic antidepressants) and 3 (psychotherapy + usual care). To this end, we define the transition matrix for a random walker on the aggregate network ( Figure 6) starting at node 1 and ending at node 3. Using Equation (12)

FIGURE 7
The evidence flow network for the comparison of treatments 1 and 3 in the depression data set in Section 2. The thickness of each edge corresponds to the expected net number of times a random walker crosses each edge of the aggregate network in Figure 6 as it travels from node 1 to node 3. The direction of flow is indicated by the arrow. These values are summarised in the box and quoted to 3 decimal places.
We have labelled the rows and columns according to the treatments they represent and we quote the values of the entries in the matrix to 3 decimal places. The third row of (1-3) is constructed such that once the walker reaches node 3 (the end node) it remains there indefinitely. As described in Section 4.3, the evidence flow through each direct comparison for the network comparison 1-3 is obtained from the expected net number of times a walker crosses each edge as it travels from node 1 to node 3 on the aggregate network ( Figure 6). The expected net number of times a walker crosses each edge can be estimated by simulating a large ensemble of random walkers, each moving independently as described by the transition matrix (1)(2)(3) . For each walker we count the net number of times it crosses the designated edge, and we then subsequently average over all walkers. The more walkers we simulate, the more accurate our estimation.
Alternatively, we can use the analogy to electrical networks described in Section 4.2, to obtain an analytical result for this value in terms of electric current. These methods are described in more detail in Appendix E. We choose the analytical approach which results in the evidence flow network shown in Figure 7. We find that for the comparison of treatments 1 and 3, most of the evidence flows directly from 1 to 3 or indirectly via treatment 9. Comparing Figures 6 and 7 we observe that the pairwise comparison of treatments 7 and 10 is the only piece of direct evidence that has no influence on the network comparison 1-3.
The hat matrix of the aggregate model for this data is given in Appendix F.1. The flow network obtained from the row of the hat matrix corresponding to the comparison of treatments 1 and 3 is identical to the network in Figure 7.

Proportion contributions
Next, we calculate the proportion contributions for the network comparison 1-3. To do this we first define the transition matrix for a random walker moving from node 1 to node 3 on the evidence flow network (Figure 7). From Equation (20) where we have again labelled the rows and columns. Matrix entries are quoted to 3 decimal places. The third row indicates that once a walker reaches node 3 it remains there indefinitely. Since treatment 10 is disconnected from all other nodes in the evidence flow network (Figure 7), the probability of transitioning to this node from any other is zero. Similarly, if the walker starts at node 10, it remains there forever ( (1-3) 10-10 = 1). The set of all possible paths that a random walker can take on the evidence flow network can be found using a recursive algorithm 47 . The probability with which the walker takes a particular path is calculated from Equation (22). This is the flow of evidence through that path. For the comparison of treatments 1 and 3 in the depression data set, we find 27 distinct paths. These paths and their associated flow make up the evidence streams presented in Table 3. We find ≥ 0 and ∑ = 1. Using these values we can construct the network estimatê net 1-3 as a linear combination of direct and indirect estimates following each evidence path listed in Table 3. This leads to the same odds ratios as those quoted in Rücker and Schwarzer (2014) 39 up to the precision provided. Table 3 also contains the streams identified by the algorithm in Papakonstantinou et al 9 (see Section 5.1). We present the results for three versions of the algorithm, Shortest, Random and Average. The results for the Random algorithm are obtained from one single run. Each result in the column labelled 'Average' is an average over 10 8 runs of the Random algorithm. From Table 3, it is clear that the streams identified by the iterative algorithm depend on the order in which paths are selected. For this example, fewer than half of the possible paths are identified by the Shortest and Random algorithms (paths not detected are indicated by the symbol '-'). Therefore, these versions of the algorithm fail to take into account multiple evidence paths that contribute to the NMA (and potentially have a high risk of bias).
Compared to the Shortest and Random versions of the algorithm, the Average algorithm produces results which are more similar to flows obtained from the random-walk approach. However, as described in Section 5.1, the frequency with which a path is selected across different runs depends on whether it shares edges with other paths in the network. Therefore, the results of the Average algorithm do not necessarily converge to the results from the random-walk approach even as the number of iterations becomes large.
Using Table 3 and Equation (19), we calculate the proportion contribution of each direct estimate to the network comparison of treatments 1 and 3 from the random-walk approach. These contributions are presented as percentages in the second column of Table 4. The direct evidence from trials comparing treatments 1 and 3 has the largest contribution followed by indirect evidence from trials comparing 3 and 9, and 1 and 9. Table 4 also contains the proportion contributions obtained from the three versions of the algorithm (Shortest, Random and Average). As before, these results depend on the order in which paths are selected.

The analogy between random walks and evidence flow, and the role of the graph theoretical model
In this paper, we have presented a novel analogy between NMA and random walks. Edge weights from the aggregate graph theoretical NMA model define a transition matrix for a random walk on the network of evidence. The walker moves around on the aggregate network along edges corresponding to direct evidence. The movement of the random walker contains information about the propagation of evidence through the network. In particular, we have shown that the expected net number of times a walker crosses an edge can be interpreted as the evidence flow through the direct comparison represented by that edge. Therefore, we can obtain the elements of the hat matrix of the aggregate model from the random-walk process on the aggregate network. The flow of evidence defined by König et al (2013) 8 is based on a two-step version of the standard frequentist NMA model (see Appendix A.1). In the first step, the direct estimates are obtained by pooling evidence from trials making the same comparisons. For two-arm trials, a pairwise meta-analysis is performed. For multi-arm trials that compare a particular subset of treatments, an NMA is performed on the sub-graph described by the multi-arm trial design. The direct estimates are therefore separated into evidence that comes from two-arm trials and evidence from multi-arm trials. This is reflected in the hat matrix of this model. Consequently, in König et al's evidence flow networks, the flow through multi-arm trials is displayed separately. This is an interesting feature but, as the authors note, it is only feasible for simple networks 8 .
In our definition of evidence flow, we have instead used a two-step version of the so-called graph theoretical model 7 . We make use of the fact that the adjusted weights describe a network of two-arm trials which is equivalent to the network of multiarm trials. The direct estimates are then obtained from pairwise meta-analyses using the adjusted edge weights. The elements in the row of the hat matrix for a particular comparison then assign a single value of flow to each direct treatment comparison in the network. The flow through an edge therefore represents the combined contribution from all studies, two-arm and multi-arm, that make that comparison. While this means that the specific contribution of multi-arm studies is not displayed, our approach makes it easier to display evidence flow networks for graphs with a large number nodes, edges and multi-arm trials of varying designs. In addition, it is this property of the aggregate level graph theoretical approach that means we are able to make the analogy to random walks in the general case (i.e., networks including multi-armed trials). As explained in Appendix A.1, the standard NMA model, the graph theoretical model and the aggregate level versions of both these models, all yield the same network treatment effect estimates 8,39 . For networks containing exclusively two-arm trials, the hat matrices of the two aggregate level models are the same. Therefore, for these networks, the evidence flow networks we define are the same as those in König et al.
The graph theoretical approach provides a straightforward visualisation of the flow of evidence for each treatment comparison. Random effects models and networks with multi-arm trials can be accounted for with no extra complications. For networks with both of these characteristics, heterogeneity needs to be combined with the original observed variances (i.e. one needs to use 2 , + 2 instead of 2 , ) before adjusting the weights to deal with multi-arm trials 39,42 .

The random walk derivation of evidence streams overcomes the limitations of previous algorithms
We have shown that the random-walk analogy for NMA leads to an analytical derivation of evidence streams. In doing so, we defined a second transition matrix, this time for a random walker moving on the evidence flow network. For each comparison of treatments there is one separate evidence flow network. The network is directed and it has no cycles. Walkers can only move in one direction along each edge, according to the direction of flow. All paths on this graph start at and end at . As the walker travels from to it moves along paths of direct and indirect evidence. Imagining a large number of independent random walkers undergoing this process, we interpret the proportion of walkers flowing through a particular path as the flow of evidence through that path, i.e., the flow of evidence through a path is the probability of a walker taking that path. This can be expressed analytically as the product of the transition probabilities along the edges that make up the path. The analytical definition of evidence streams leads directly to an analytical derivation of the so-called proportion contributions defined in Papakonstantinou et al (2018) 9 . The result is unambiguous in contrast with previously proposed algorithms whose output depends on the order in which paths are selected. Furthermore, individual runs of the algorithm in Papakonstantinou et al can fail to identify all paths of evidence on the evidence flow network. This means that in the calculation of proportion contributions, multiple paths of evidence and their potential bias are not taken into account. Running the algorithm many times and subsequently performing an average, we are eventually able to identify every path of evidence. However, the frequency with which a given path is selected depends strongly on the number of other paths with which it shares edges. As a result, the average flow obtained in this way does not accurately reflect the contribution of each path. The random-walk approach overcomes these limitations. All possible paths of evidence are identified and they are each assigned a value of flow that reflects the properties of the hat matrix. Therefore, all possible sources of bias are taken into account in the calculation of the proportion contributions. Since the result is purely analytical, the random-walk approach also offers superior computational efficiency.
For multi-arm trials, the method presented in Papakonstantinou et al (2018) naïvely treats each pairwise comparison in a multi-arm trial as an independent two-arm study 9 . This does not account for correlations due to multi-arm trials. By instead using the adjusted weights from the graph theoretical model, we are able to define a network of two-arm trials that is equivalent to the original network of multi-arm trials. Therefore, an additional advantage of the methods presented in this paper, is that networks with multi-arm trials are handled more appropriately.
The CINeMA software currently relies on the algorithm in Papakonstantinou et al to calculate the relative contribution of studies with high, moderate and low risk of bias to each network treatment effect. Similarly, ROB-MEN (risk of bias due to missing evidence in network meta analysis 19 ) also uses the contribution matrix. Due to the significant advantages of the randomwalk approach in deriving evidence streams we expect that applications such as these would benefit significantly in terms of accuracy and speed from the implementation of the method described in this paper. The recently updated PRISMA guidelines 48 require systematic reviewers to assess their body of evidence for risk of bias. The results of our paper mean that existing software tools to help researchers make this assessment can now be made more reliable and efficient. We also plan to implement the aggregate hat matrix in netmeta, along with the random-walk approaches to evidence flow and evidence streams.

Potential future impact
We believe that the analogy between NMA and random walks is interesting and that it provides new insight into NMA methodology. In our work we have explored the applications of only a small subset of the random-walk literature; there is, therefore, scope for the impact of this analogy to be investigated further. We hope that by presenting this analogy, more ideas will be shared between the two disciplines and additional practical applications of the random walk-approach will be developed in the future.
For example, we have looked at the interpretation of the number of times a walker crosses each edge in the network. However, there is potentially also interest in investigating the number of times the walker visits each node. The random walk transition probabilities are proportional to the respective edge weights. Therefore, a walker is more likely to travel across an edge corresponding to a more precise treatment effect estimate. The expected number of times a walker visits a certain node will depend on how many connections the node has, and the weight (i.e., the inverse variance) associated with each of these connections. A node corresponding to a treatment that is involved in many direct comparisons will be visited more often than a node corresponding to a treatment with comparatively few connections. Furthermore, the larger the weight associated with the edges connected to a certain node, the more often the random walker will visit that node. Potentially, this value provides a measure of vertex centrality that accounts for both connectivity and the precision of treatment effect estimates. There may also be interest in measuring random walk variation. The variability in the paths traversed by a walker moving on the evidence flow network may indicate inconsistency between paths of indirect evidence.
In summary, by using the analogy to electrical networks as an intermediate step, we have made a novel connection between NMA and random walks. The interdisciplinary analogy provides new insight into NMA methodology. In particular, the analogy leads to an analytical derivation of the proportion contribution matrix without the ambiguity of existing numerical algorithms. Our approach can therefore be used to reliably quantify the contribution of individual study limitations to the resulting network treatment effects. We hope that this paper will provide a starting point for future developments of NMA methodology that can benefit from ideas in the random-walk literature.

Data Availability Statement
The data, results and associated codes used in this work can be found in the GitHub repository here 49 . For further details please contact the corresponding author.

A.1.1 Standard frequentist NMA
The standard frequentist approach to NMA is a regression analysis 6,13,40 . The method relies on a design matrix which is constructed to have full rank. Each -arm trial contributes − 1 independent observations from which we aim to estimate − 1 independent network treatment effects. Therefore, the matrix has dimensions ∑ ( − 1) × ( − 1). The 'global baseline' treatment is chosen as treatment 1. Each column of then refers to a treatment ∈ {2, … , }. The rows represent the comparisons to the trial-specific baseline in each study. For a given row, the entry in the column corresponding to the treatment that is compared with the trial-specific baseline treatment is +1. If the trial specific baseline treatment is not the global baseline treatment, there is a −1 in the column corresponding to the trial-specific baseline. All other elements in the row are zero.
The so-called 'information matrix' is defined as ⊤ −1 where is the block-diagonal variance-covariance matrix. Each trial contributes an ( − 1) × ( − 1) block to with observed variances on the diagonal and covariances (due to multi-arm trials) off the diagonal.
The hat matrix of the standard model is 8,39

A.1.2 Graph theoretical approach
Rücker introduced an alternative graph theoretical approach to NMA based on electrical network theory 7 . This model is formulated around an edge-vertex incidence matrix 0 with dimensions is the total number of pairwise comparisons in the network. We write 0 for this matrix to distinguish it from the (similar) matrix in the aggregate model described in Section 3.2 of the main paper. Each -arm study contributes ( −1) 2 rows to 0 . Each column represents a treatment ∈ {1, … , }. Unlike the design matrix, 0 does not have full rank. Indeed, the elements in each row of 0 sum to zero 39,7 . Entries of 0 are +1 in the column corresponding to the 'baseline' treatment of the comparison represented by that row, and −1 in the column corresponding to the treatment compared to that baseline.
We write 0 for the weight matrix of this model. Again, this is distinct from the matrix in the main paper. and contains on its diagonal the adjusted weights, , , defined in the main paper. We obtain the adjusted weights from a method described in References 7,39,42 which accounts for the correlations introduced by multi-arm trials. An important result of this method is that the adjusted weights describe the weights associated with a network of two-arm trials that is equivalent to the original network of multi-arm trials in the sense that the resulting relative treatment effect estimates from the network of two-arm trials are the same as those from the original network. By using these weights, we can therefore apply any NMA methodology that is only valid for networks of two-arm trials. The hat matrix of this model is,

A.1.3 'Reduce dimension' vs 'reduce weights'
The design matrix contains the same information about the structure of the network as 0 but has lower dimensions and full rank. For this reason Rücker and Schwarzer (2014) 39 termed the standard model the 'reduce dimension' approach. The alternative (graph theoretical) method relies on reducing the weights associated with observations from multi-arm trials. Therefore, this was termed the 'reduce weights' approach 39 . In Rücker and Schwarzer (2014) the authors proved that, although their respective hat matrices are different, the two approaches give rise to the same network treatment effect estimates and are, therefore, equivalent.

A.2 Two-step models and evidence flow
The concept of evidence flow was introduced by König et al (2013) 8 . Their approach was based on a two-step, or 'aggregate', version of the reduce dimensions (standard) model 11,43 : Step 1. In the first step, evidence from all trials making the same comparisons is pooled. For two-arm trials, a pairwise metaanalysis is performed. For multi-arm trials with a particular design, an NMA is performed on the sub-graph described by the multi-arm design. The results from this first step define the direct evidence.
Step 2. In step two, the direct estimates are used as observations in a linear regression model. The hat matrix associated with this model defines the evidence flow. Since the direct evidence is separated into evidence from two-arm trials and evidence from multi-arm trials, König et al display the flow through multi-arm trials separately on the evidence flow networks. The authors note that, with this approach, there is no unique way to represent evidence flow through multi-arm trials. Furthermore, explicitly showing multi-arm trials on evidence flow networks becomes increasingly difficult for large, highly connected networks.
In the main paper, we instead describe a two-step (aggregate) version of the reduce weights (=graph theoretical) approach. The fact that the reduce weights model defines a matrix of two-arm trials that is equivalent to the matrix of multi-arm trials makes the two-step approach simpler. In the first step, we perform a pairwise meta-analysis across each edge using the adjusted weights. In the second step, we combine this aggregate (direct) data in a network meta-analysis. This approach yields exactly the same relative treatment effect estimates as the one-step reduce weights approach and, consequently, the reduce-dimensions approach. This equivalence also holds true for random effects models. One then needs to account for heterogeneity, i.e. 2 , is replaced by 2 , + 2 , before using the adjustment method 7,39,42 to obtain the adjusted weights. For networks containing exclusively two-arm trials, the hat matrices from the two aggregate models are exactly equal. Therefore, in this scenario, our evidence flow networks are the same as those defined by König et al 8 . The differences arise in the presence of multi-arm trials. Our approach does not explicitly show the flow through multi-arm trials. Instead, the flow through each edge represents the pooled contribution from all studies that make that comparison. This is only made possible by using the reduce weights method to define a network of two-arm trials. Since each edge is associated with only one value of evidence flow, our approach makes it easier to construct evidence flow networks for complicated networks, i.e. those with many nodes, many connections, and many different multi-arm trials. This also makes it possible to calculate the proportion contribution matrix for networks of multi-arm trials. With the evidence flow networks defined by König et al, this was not possible as the presence of multi-arm trials meant there were multiple values of flow associated with each edge.

B HEURISTIC ARGUMENT FOR PROPERTIES OF THE HAT MATRIX AND EVIDENCE FLOW
In this section we give a brief heuristic argument for the properties of the hat matrix in Section 3.4. These properties were stated in Reference 8 , an algebraic proof for some of the properties was given in Reference 9 . We present our argument using the example in Figure 5, but this can be generalised to more complex networks.

FIGURE B1
Meta analytic graph of the example in Figure 5 (a). We focus on the comparison between treatments 1 and 2, as indicated by the blue colour of the nodes representing the treatments. Arrows show the sign conventions for the direction of evidence flow. Direct evidence for the relative treatment effects from the trial data are also indicated next to each comparison.
The network in Figure 5 (a) is the evidence flow network for the comparison between treatments 1 and 2. It contains four nodes. For illustration and to fix sign conventions for the flow of evidence, the network is shown again in Figure B1. Without loss of generality we assume that the direction of all edges are chosen such that (1)(2) > 0 for all edges shown in Figure B1. This means that (1)(2) = (1)(2) for all .
The three properties in Section 3.4 translate into We address these one-by-one. To do this we use Equation (7), (1)(2) = (1)(2) , and the above sign convention to note that Imagine we have one set of direct estimates,̂ resulting in a network estimatê net 1-2 via Equation (B3). Imagine now a different set of direct estimateŝ such that̂ We writê ′net 1-2 for the network estimate from the dataset̂ ′dir . Using the sign convention in whicĥ dir denotes the effect of treatment minus that of , Equation (B6) indicates that the direct effect of treatment 2 compared to treatment 1 in the dataset̂ ′dir is Δ units greater than in dataset̂ dir . Similarly, the relative effect of treatment 3 relative to treatment 1 is Δ units higher. Given that treatments 2 and 3 are the only ones treatment 1 is compared to directly in this network (see Figure B1) we would then expect Using Equation (B3) and its analogue for the dashed treatment effects, we find and we therefore conclude We again imagine a second set of data, now witĥ This means that treatment 2 is now consistently doing better by Δ units in relation to all treatments it is compared to directly in the network. The overall effect of this must be that̂ i.e., the effect of treatment 2 relative to that of treatment 1 is now Δ units greater. Using again Equation (B3) for the data setŝ dir and̂ ′dir respectively, we now havê from Equation (B10). Therefore The first of these identities can be shown by looking at̂ and by realising that this means that treatment 3 now performs Δ units worse compared to all treatments it is directly compared to. This cannot affect the network estimate treatment effect of 2 compared to 1, i.e., we expect̂ ′net 1-2 =̂ net 1-2 . This leads to

C ELECTRIC CURRENT AND EVIDENCE FLOW
In this section we demonstrate the relationship between electrical current and evidence flow. Consider an electrical network with nodes and edges. We define the vector of nodal or 'external' currents as = ( 1 , 2 , … , ) ⊤ . These represent currents flowing between a node of the network and an external sink or source. Our sign convention is such that a positive entry > 0 indicates that a current goes into node , whereas if < 0, a current goes out of node . We write = ( 1 , 2 , … , ) ⊤ for the currents in the edges = , = 1, 2, … , . A positive value of indicates a flow of current from to , and we set = − . We define  = { } as the vector of voltages (potential differences) across the edges. That is,  = − where and are the potentials at nodes and respectively. Ohm's law 50 can then be written as where is the × diagonal matrix of conductances (inverse resistances, = ( ) −1 ). Using this and Kirchhoff's laws, Rücker 7 demonstrated that  can be written as where is the edge-incidence matrix of the network defined in Section 3. Substituting this into Ohm's Law (Equation (C15)) yields the edge currents, To make the analogy to evidence flow, we consider an electrical network with a battery attached across the nodes corresponding to the treatment comparison we are interested in. For comparison the external current at node is = +1, at we have = −1. The current at every other node ∉ { , } is zero. We can do this in turn for each of the edges in the network. For convenience we label these = 1, … , . We write ( ) for the vector of nodal currents resulting in a situation where the battery is connected to the start and end points of edge .
We then have relations of the form in Equation (C17), We collect the internal currents ( ) in a × matrix̃ = (1) (2) … ( ) . Similarly, we define the × matrix = (1) (2) … ( ) . We then havẽ (C20) Each row of represents a node, and each column represents a different placement of the battery. The first column corresponds to a battery attached across edge 1-2. Therefore, there is a +1 in the row corresponding to node 1, a −1 in the row corresponding to node 2 and a 0 for node 3. Similar reasoning is used to construct the other columns. From this construction, it is clear that the matrix of nodal currents for this setup is equal to the transpose of the edge incidence matrix,̃ We can write the resulting matrix of edge currents in terms of its composite elements, where ( ) is the current through edge when a battery is attached across edge . In the evidence flow analogy, we interpret the flow of current ( ) as the flow of evidence through edge for the network comparison . If the analogy holds (a proof follows below), we can write the elements of the hat matrix in terms of the edge currents. For the simple example above we have From Equations (C22) and (C23), it is clear that we need to prove that̃ ⊤ = . We now do this for a general setup. Taking the transpose of Equation (C19), we find From the definition of the pseudo-inverse it is possible to show that ( + ) ⊤ = ( ⊤ ) + for a general matrix (see Reference 51 ). Using̃ = ⊤ and the fact that matrices and = ⊤ are symmetric ( ⊤ = and ⊤ = ) we find We now recall that the hat matrix of the aggregate model is (see Equation (5) in the main paper) The weight associated with each edge in the aggregate network is given by the conductance (=inverse resistance) of that edge = −1 , see Section 4.1 in the main paper. The matrices and contain these weights and conductances on their respective diagonals ( = diag( ) and = diag( )), and we therefore have Substituting this into Equation (C25), we find̃ which is what we wanted to prove.

D RANDOM WALKS AND ELECTRIC NETWORKS
In this section, we demonstrate the relationship between electric current and random walks. This relationship is well known 37 , and we include it here for completeness.

D.1 Dirichlet problem for electric circuits
We start from Ohm's law. Rather than using matrix notation as in Appendix C, we formulate Ohm's law for the current in the edge , where and are the potentials at nodes and respectively. We have used the sign conventions of Reference 37 to define the direction of current. As mentioned above we have = − . In this section we focus on the scenario where a unit current flows into node (from the exterior) and out of node (to the exterior). No flows between the network and the exterior are possible at any other nodes. To create such a situation we imagine a battery connected to nodes and . The potential at is set to zero, and that at is = * , with * such that the external current into is equal to unity (the external current out of is then also equal to unity). The asterisk indicates the choice of resulting in a unit current into . An illustration of this setup is shown in Figure 3 (b) in the main paper.
We use the superscript ( ) to indicate a battery attached across as described above, that is, we use ( ) . Kirchhoff's law states that the total current at any node ≠ , is zero, Substituting Equation (D29) into Equation (D30) and rearranging yields for ≠ , where we have used the definition of transition probabilities in Equation (8) in the main paper. One can define a Laplacian matrix for this setup, ( ) = 1 − ( ) , where 1 is the identity matrix 52 . A twice continuously differentiable function ∶  → is then called harmonic if it satisfies the Laplace equation 53 , ( ) = .
Equation (D31) indicates that the function  → is harmonic at all points ≠ , . It also has boundary values at and : = * is chosen such that the current going into node from the exterior is one, and we have = 0. This constitutes a Dirichlet problem 34 . The uniqueness principle for Dirichlet problems then implies that is uniquely determined for all , given the boundary conditions at and . For further details see Reference 37 .

D.2 Dirichlet problem for random walks
We will now show that a quantity related to the expected net number of times a random walker visits a particular node while travelling from to fulfills the same Laplace equation, and shares the same boundary conditions as the electric potentials in Section D.1. The uniqueness of the solution of the Dirchlet problem then allows one to establish the analogy between electric networks and random walks. We now describe this in more detail.
We consider a walker starting at node and reaching absorption when it arrives at node . We write for the expected number of times the walker visits node before reaching (with the convention that the final arrival at does not constitute a visit to , i.e., we have = 0). The following relation then holds for ≠ , , This equation can be understood as follows: In order to arrive at node the walker must previously visit a neighbouring node . The quantity is the expected number of times this occurs. From such a node the walker must then transition to to contribute to . This occurs with probability . Summing over all results in Equation (D32). Equation (D32) is of a similar form to Equation (D31) in the electrical network. However appears on the right-hand side of Equation (D32), whereas one has in Equation (D31). We therefore write in terms of . Using Equation (8), the definition = −1 , and the fact that = , we find Substituting this into Equation (D32) and re-arranging gives Therefore, the object  → ∕( ∑ ) is harmonic at all points ≠ , . Given that = 0, we have the boundary condition ∕( ∑ ) = 0. We note that Equation (D34) and the boundary condition = 0 can be derived for any quantity that is proportional to the number of visits at the different nodes. The Laplace equation and the boundary condition therefore only fix up to a factor. The uniqueness theorem for the Dirichlet problem also confirms that ∕( ∑ ) is proportional to from Section D.1 for all . The constant of proportionality is fixed by the boundary condition for .
We now show that the choice = ( ∑ ) * (with * as in Section D.1) is required if we want to be the expected number of times a walker starting at visits node before it reaches . This choice implies for all ≠ , by virtue of the uniqueness theorem, and using Equations (D31) and (D34). In other words, ∕( ∑ ) is then not only proportional to , but identical to for all .
We now prove that this is the appropriate choice. All we need to check is that the normalisation of the is consistent with the interpretation of as the number of times the walker visits node . To do this we keep in mind that the walker starts at and finishes at . Over the course of the walk returns to node are possible. The net number of times the walker leaves node however must be one, given that it starts at and ends at (this is the number of times the walker leaves minus the number of times it arrives at , not counting the initial placement of the walker at ). If is the number of times a walker visits during the walk, then the expected net number of departures from node is given by ∑ ( − ). Therefore we must have ∑ ( − ) = 1. This condition is necessary for the correct normalisation of the , and it is also sufficient to verify that the boundary condition = ( ∑ ) * delivers this. This is what we will do next.
The boundary condition = ( ∑ ) * leads to Equation (D35) as explained above. Substituting Equation (D35) into Ohm's law (Equation (D29)), we find where, in the second step, we have used = . Finally, using Equation (8), we find The setup in Section D.1 is such that the current into node (from the exterior) is equal to one. This means that the total current from node to all its neighbours in the network is also one, ∑ ( ) = 1. We conclude that ∑ ( − ) = 1, confirming the correct normalisation of the .
In Appendix E we show how to obtain these edge currents analytically.

E.1 Details of the calculation
The interpretation of the flow of evidence as a random walk can be stated as follows: For the network comparison of treatments and , the hat matrix element ( ) that defines the flow of evidence through the direct comparison (via Equation (6) in the main paper) is equal to the expected net number of times a random walker, starting at on the aggregate NMA network and walking until it reaches , moves along the edge from to .
In Section 4.3 of the main paper we demonstrated how to construct a transition matrix for a random walker on the aggregate network. For a particular comparison , we can use the transition matrix ( ) to simulate a large ensemble of independent random walkers on the aggregate network starting their journey at and stopping once they reach . For each walker we count the number of times it moves across the different network edges in each direction. From this, we find the net number of times the walker move along a particular edge. By averaging these values over all of the simulated random walkers, we obtain an estimate of the evidence flow network for this comparison. The more walkers we simulate, the better our estimate of the evidence flow.
By using the analogy between random walks and electrical networks, we can also obtain an analytical result for the evidence flow. To do so we make use of the equations in Appendix D. First, we apply a 1 volt battery between nodes and so that the voltage at is = 1 and at is = 0. With these boundary conditions we then solve the simultaneous equations described by Equation (D31), to obtain the nodal voltages, , for all nodes ≠ , . Using Ohm's law, we find the edge currents for the case of a 1 volt battery, these are indicated by ′ ( ) to distinguish them from the normalised currents ( ) in Section D. In Equation (E40) we have used the fact that the conductance of edge is equal to the aggregate weight associated with that edge, = .
To make the analogy to evidence flow we require that the total external current flowing into node is 1. Therefore, to obtain the required currents we must normalise the currents ′ ( ) by dividing through by the total current flowing into when = 1, that is As shown in Appendix D, these currents are equal to the expected net number of times a random walker crosses each edge . Therefore, from Equation (E41) we obtain an analytical expression for the evidence flow network in terms of random walkers as follows: and ( ) is obtained from ( ) via Equation (6). The potentials are obtained from Equation (E39).

E.2 Implementing the calculation
The above calculation can be written as a linear equation in matrix form. We provide this notation as it is useful for implementation. As above, we focus on the comparison in a network of nodes such that our initial boundary conditions are = 1 and = 0. From Equation (E39) we write, for ≠ , , where we have inserted the known potentials, = 1 and = 0. Using the fact that = 0 (for ≠ ) we eliminate the term = on the right-hand side, and obtain − ∑ for ≠ , . We collect the potentials , ≠ , in a vector of length − 2. This is the vector of unknown potentials we wish to calculate. Similarly, we write the transition probabilities , ≠ , as an ( − 2)-vector, ( ) ⋅ . Therefore, we re-write Equation (E44) in matrix form as where 1 is the ( − 2) × ( − 2) identity matrix, and ( ) is a reduced version of ( ) obtained from the full × transition matrix by removing the rows and columns corresponding to nodes and . We then solve this equation for the vector of unknown potentials, To obtain the full vector of the potentials at all nodes, we use the fact that = 1 and = 0 and the entries of . The set of potential differences − in Equation (E40) is then obtained by applying the edge-vertex incidence matrix to the vector of potentials, . Finally, multiplying by the weight matrix, , we obtain the vector of non-normalised edge currents (Equation (E40) in matrix notation), The normalised currents are then found by dividing through by the total current flowing from node into the network, F APPLICATION TO REAL DATA
The hat matrix of the aggregate model is calculated using Therefore, for the depression data set we find = 1-3 1-6 1-7 1-9 1-11 2-6 2-8 2-11 3-4 3-5 3-6 3-9 4-9 5-9 6-7 6-8 6-9 6-11 7-9 7-10 The numerical values for the matrix entries are shown to 3 decimal places. The rows and columns are labelled by the treatment comparison they represent. The first row of the hat matrix refers to the network comparison of treatments 1 and 3. By comparing this row to Figure 7 in the main text, it is clear that the evidence flow network defined by the hat matrix is equivalent to the evidence flow network obtained from the random-walk approach.