Abstract
In this article, production process databases originating from environmental sciences, more specifically from life cycle inventory (LCI), are considered as bipartite directed random networks. To model the observed directed hierarchical connection patterns, we turn to recent development concerning trophic coherence. Extending the scope to include bipartite networks, we compare several LCI networks to networks from other fields, and show empirically that they have high coherence and belong to the loopless regime, or close to its boundary.

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
Production processes are central to organized societies and have thus been extensively studied in complex systems and complex networks literature from various viewpoints: interfirm supply-chain [1–3], trade [4–6], material flow analysis [7]. Recent works use block models to reconstruct an interfirm network [8–11] from partial knowledge, or model manufacturing process below the factory level [12].
While they give deep insight into the structure of production processes, those works leave aside the technical details of production: inputs and outputs are usually aggregated, and detailed production processes are not considered. Furthermore, they neglect the ordering intrinsic to production processes and tend to focus on sectoral layers or communities [13].
In the field of life cycle inventory (LCI), at the crossroads of engineering, environment sciences and economy, large and detailed databases of production processes have been built up for decades, and offer a more precise description. As explained in section 2.2, two types of nodes are considered (processes and flows) and are typically used to perform Life Cycle Analysis (LCA), that is computing the emissions to the environment associated with the production of a unit of a specific product. LCI shares common methods with input-output analysis [14], which uses monetary units and operates at the sector level.
In the LCI literature, a few articles consider complex networks tools to analyze production: in [15], the two-mode structure is projected onto a directed monopartite network. Short mean path-length and power-law distribution of degrees are observed. This is explained by the authors by the existence of a hub such as utility sectors. The network is asymmetric 1 , and displays very low clustering and reciprocity. Furthermore, a comparison to networks corresponding to input-output (IO) data is made and shows very different characteristics.
Similarly, several authors have analyzed IO datasets with network tools [16] or with some interest in topology [17], but those nets are highly aggregated, which leads to dense weighted networks where each sector seems connected to every other.
Conversely, little attention was paid to LCI databases in the complex networks community. While complex networks methods are not likely to improve performance of classical problems in LCI represented algebraically (see section 2.2), they can help tackle problems more naturally expressed in a network way, and that so far remain unaddressed, for example: identifying central processes, probing the robustness to attacks of a given supply chain to avoid cascading failures, or examining dynamical phenomena such as propagation on the production network.
In the present article, we thus look at LCI databases from a network science standpoint and describe the properties of production networks at the process level, to propose new explanations for unexplained observations.
Firstly, the observed networks are directed: each process takes inputs and generates outputs. Then, in LCI databases, process nodes are connected to product (or ‘flow’) nodes, but are not directly connected, making the network bipartite or ‘two-mode’. (As explained in section 2.2 some LCI databases are built in a way such that they can be projected to monopartite networks, preserving some empirical properties).
Being directed networks, production processes share some of their properties. It has been remarked already in the literature [18, 19] that real directed networks were ‘underlooped’, that is they have less directed cycles than randomized versions of the network. Examples provided included a power grid, food webs, a metabolic network, the worldwide web, a neural network and a genetic transcription network.
In [20] the notion of inherent directionality is put forward, relying on a measure of hierarchical order inspired by trophic level [21]. A model is proposed and compared to empirical networks, and confirms the underlooped nature of empirical networks. As will be shown in section 3 it turns out that this property is also present in production processes depicted by LCI databases, which has not been reported so far, up to our knowledge.
The notion of hierarchical order was used in [2], and also in the random directed acyclic graphs (DAG) literature, which deals with random directed graphs without cycles. As explained in [22]: ‘it is the ordering of the vertices and not their acyclic structure that is the definitive property of the network. The acyclic structure is merely a corollary of the ordering’. Such ordering is domain-dependent and is often related to time ordering, as in citation networks. Production processes can be expected to display significant order since processes are partially ordered: some steps of the production process can be reversed, while others cannot. For example in food production, plants first grow, then are processed, packed, shipped and sold in retail stores. Finding an underlying order in DAG requires solving the topological ordering problem, used in scheduling optimization. Strategies to randomize empirical DAG while preserving ordering were discussed for example in [23, 24] and rely on the topological ordering. As remarked in [24] the latter can be non-unique, which may bias the generated ensemble.
Furthermore, DAG random models do not allow the presence of a small number of cycles. Recent works explain low abundance of loops in directed networks by the notion of trophic coherence, which quantifies to what extent a network can be ordered into discrete levels [25–28]. In these works, random graph ensembles are defined not from block-like assumptions but supposing that trophic coherence is constant, set to the empirically observed value. Some important properties of those random ensembles can be computed approximately and compared to empirical networks, for example the largest eigenvalue or the branching factor. Comparisons are made with life sciences (ecological food webs, genetic and metabolic), economic (international trade, input-output, supply-chains) [29, 30], linguistic and technical networks.
Nevertheless, trophic coherence theory concerns general unipartite directed networks, which disregards the particularities of bipartite networks encountered in LCI datasets. Conversely economic complexity measures such as country fitness and product complexity in [5] are adapted to bipartite undirected networks. We show below that some of the results of the trophic coherence theory can be extended to handle the bipartite case, providing the necessary correction to the bias induced by bipartivity. Then we analyze LCI datasets considered as networks, with a trophic coherence point of view and compare them to networks studied in other domains.
Section 2 introduces some elements of the theory as well as the datasets. Section 3 discusses our results and applications, section 4 concludes.
2. Methods and data
2.1. Trophic level and trophic coherence theory
In this section, we give a few definitions selected in the theory by Johnson and colleagues in [25, 27]. Let A be the adjacency of a graph G with the convention if there is an edge
2
from node j to node i, and
and
the in and out-degrees of node i.
A walk is a ‘sequence of nodes such that every consecutive pair of nodes in the sequence is connected by an edge’ [31]. A directed walk respects the direction specified by the directed network. A path is a self-avoiding walk, that does not intersect itself.
The branching factor, whose role was evidenced in [18], is:

In ecology [21], the notion of trophic level si of a node i was introduced to reveal the hierarchical ordering of species in food webs, as illustrated in figure 1:

Figure 1. Trophic level of node i, si
has value . The trophic difference
has value
, which equals 1 if
.
Download figure:
Standard image High-resolution image
si
is similar to the PageRank measure [32] defined for monopartite directed networks, with a normalization of the sum term by instead of
. Thus si
is also akin to PageRank-related measures such as countries’ fitness and products’ complexity, as discussed in appendix
Equation (2) defines a set of linear equations that has a solution when each node belongs to a walk starting at a ‘basal’ node, i.e. a node such that . Differences between trophic levels
and their standard deviation q, named ‘trophic coherence’, were introduced in [25]:

In [27] the relationship between q and the propensity for a network to include loops was examined. The authors introduce the ‘coherence ensemble’, which is the set of random graphs with a specified number of nodes, degree sequence, and coherence q. They compute nν
the total number of walks of length ν in G, mν
the total number of cycles of length ν, and cν
the expected proportion of walks of length ν that are cycles, and their average values in the ‘coherence ensemble’. The authors show that the network ensemble belongs either to a ‘loopful’ (resp. ‘loopless’) regime depending on a parameter τ being positive (resp. negative), with the following definition:

where α is the branching factor defined above, and is the average value of coherence in the ‘basal ensemble’ associated to G. This ‘basal ensemble’ is a restriction of the directed configuration ensemble with the additional constraint that the proportion of in-neighbors connected to non-basal nodes is kept fixed with value
, where L is the number of edges and
is the number of edges connected to basal nodes.
Notably, in the large ν limit, the eigenvalue with leading real part noted λ1 can be related to τ:

with the average value of x in the ‘coherence ensemble’.
2.2. LCI, datasets and methods
In this section we discuss basic definitions and problems in the field of LCI. Performing the LCI of a unit process consists in ‘the compilation and quantification of (its) inputs and outputs’ [33]. Inputs and outputs are usually called flows and each unit process is associated to a reference flow, i.e. its main output. A product system is a collection of such processes and flows.
For example, production of electricity can be modeled as a unit process: in a simplified way, to produce 10 kWh of electricity it uses 2 litres of fuel, and outputs 1 kg of carbon dioxide CO2 and 0.1 kg of sulphur dioxide SO2 [34].
There are several ways to represent unit processes and product systems [15]. Firstly, a unit process can be represented as a process vector, for example in the case of electricity production. The first dimension is associated to litres of fuel, the second one to kWh of electricity, etc
. Negative values stand for inputs, and positive ones for outputs. A product system will then correspond to a set of m-dimensional vectors with m the number of flows, assembled in a rectangular matrix
, with n the number of processes. Secondly, an equivalent graph representation of a unit process is given in figure 2. The graph is directed and bipartite, since processes are only connected to flows, and conversely. It is weighted by the amount of inputs an outputs necessary to produce a given quantity of reference output, but each weight has a specific unit (for examples liters, kg, kWh, etc
).
Figure 2. LCI example. Simplified bipartite graph representation of electricity production, from [34]. Reference flow is in bold case. Circles (resp. squares) represent flows (resp. processes). of fuel,
k Wh of electricity,
kg of CO2,
kg of SO2.
Download figure:
Standard image High-resolution imageA few more definitions are needed: elementary flows go from a process to the environment or reversely, intermediate flows (like products and wastes) are generated by processes. Processes can take both elementary and intermediate flows as input and output.
With the matrix representation, scaling up all inputs and outputs of process 1 by factor s1, of process 2 by factor s2, etc amounts to multiplying P by a scale vector
. This representation is relevant to address the inventory problem, i.e. finding the scale vector s such that a demand flow f is met, with
. For example, if 100 kWh of electricity are necessary, then p1 should be scaled by
. Then this unit process will output 10 kg of CO2 and 1 kg of SO2 to the environment. Similarly if a product system P is scaled by vector s for some reason (for example meeting the demand in electricity, heat, and iron) then the total amount of environmental input and output flows can be readily computed. This constitutes LCA, that is the assessment of the environmental impacts of a product system meeting a certain demand.
Performing LCI and LCA, requires adequate datasets, prepared by experts and made available to process engineers. To put it simply, such datasets are large P matrices, with additional domain-specific metadata.
Then, we present briefly the common characteristics of LCI datasets and preprocessing steps required to build bipartite and monopartite graphs from them. Databases are selected because they are free to download, and have the following description taken from nexus.openlca.org:
- Agribalyse: ‘the French LCI database for the agriculture and food sector (
) comprises LCIs for 2500 agricultural and food products produced and/or consumed in France’.
- ELCD: ‘(European reference Life Cycle Database) comprises LCI data from EU-level business associations and other sources for key materials, energy carriers, transport, and waste management’.
- Worldsteel: ‘This study contains global and regional LCI data for 16 steel products, from hot rolled coil to plate, rebar, sections, and coated steels’.
- Bioenergiedat: ‘Processes for bioenergy supply chains, with German background’.
- Ozlci: ‘The database inventory groups cover 958 (Australasian regional) supply chains’. It includes building products, chemicals, electric products, fabrics, farm and forest products, metal, minerals, as well as ‘utilities comprising use of freight, fuel, water and power by energy source and state grid’.
Some databases have a wide span, for example Agribalyse, which starts from the farm and goes to the distribution. Others have a narrower scope, for example Worldsteel.
In several databases, there is a convention that processes have a single product output, plus elementary flows. As in [15], we filter elementary flows since they do not connect vertices, which can result in processes having out-degree equal to one.
We do not perform monopartite projection because it loses information and can modify dramatically the networks’ properties [35]. In the case where process out-degree is equal to one, some properties (e.g. in/out degree) are conserved, but this is not the general case for other properties.
Additional preprocessing steps may be required, for example when modeling conventions result in spurious loops. This is documented for ELCD database in the case of transport 3 , and will be dealt with below by selective node removal.
2.3. Other datasets
Because of bipartivity, we cannot directly compare our results to those in [27]. For comparison purposes, a list of repositories of empirical networks was explored (see detailed list in appendix ). These are metabolic “perpetual motion machines” and do not occur in biological reality’.
Lastly we mention the topic of finite size random graph samplers, that usually serve as a point of comparison in network science. Two of them are of interest in our case. Firstly, sampling from the basal ensemble in section 2.1 can be realized approximately using an off-the-shelf edge-rewiring algorithm, using a specific network g0 as a seed (for example an empirically observed network, or a random graph with a degree sequence sampled from a specific distribution, such as the power law). Indeed, ensemble equivalence was exemplified in the monopartite case by [27, SI appendix §1.3] with finite size Erdös–Rényi and scale-free random graphs. When network size increases, this leads to , where
is the average under random sampling. Thus
, from equation (4). Hence τ in the initial graph g0 is not preserved in general by such a random rewiring algorithm. The same can be observed in the bipartite case, with a rewiring algorithm that preserves both direction an bipartiteness, such as graph-tool [39]. Secondly, a sampler was proposed in [26] that allows to generate a random graph with a fixed value for q. But this sampler does not take a seed graph as an input, nor any degree sequence. Thus is does not allow to generate a random graph with specified q and degree sequence, which would allow to control α, and τ. Further, it has not been extended to the bipartite case. To conclude, random graph generators with a controllable degree sequence and trophic parameters at the same time do no exist in the literature so far, up to our knowledge, and this constitutes an interesting research direction.
3. Results
In this section, we adapt to the bipartite case the results of [27], partly summed-up in section 2.1, more particularly equations (4) and (5).
3.1. Bipartite directed configuration model
In this section, we express mν
and nν
in the bipartite case, more specifically their average value and
in the directed configuration model. Following the convention in [27],
does not concern simple cycles but includes multiple counts with different starting points in loops. Only the results are shown, and the full derivation is in appendix
Using the properties of adjacency matrices of bipartite networks, and the bipartite version of a directed configuration model, with an expected number of directed edges from node j to node i, we put A in the form:

and derive the average number of walks of length 2ν:

with ,
,
and
. It can be compared to the expression of the monopartite directed configuration model in [27]:

Then the average number of cycles of length 2ν is:

which is the counterpart of the following expression in the monopartite case:

Interestingly is even: this is because it does not count the number of simple cycles but includes the same loops at different starting points, and each loop contains an even number of vertices.
Following [27] we finally define:

3.2. Basal ensemble
Adapting equation (4) in the bipartite case requires an expression for , the coherence in the basal ensemble. In short, we show that the expression found in [27]
is only slightly modified, and the leading term is still
in the common situation in which L is much larger than
.
To see that, we keep the setting with N nodes, L edges and basal edges, and add bipartiteness. Left nodes include
basal and
non-basal nodes. Right nodes are exclusively non-basal and can connect only to left nodes, either basal or non-basal, as represented in figure 3. Basal edges go only from left basal nodes to right non-basal nodes. Non-basal edges are established between left non-basal and right nodes, in both directions. For consistency, left non-basal nodes must have in-degree greater than 1, which writes:

Figure 3. Bipartite basal ensemble network.
Download figure:
Standard image High-resolution imageIn the basal ensemble, the proportion of in-neighbors connected to right nodes is kept fixed with value . Right nodes receive a total of
edges. Noting
the average trophic level of right nodes, and
(resp.
) the average trophic level of left basal (resp. non-basal) nodes, we get from the definition in equation (2):


Similarly, non-basal left nodes receive edges from right nodes only and we get:

This yields:


We remark that has the same value as
in the monopartite case [27], which explains why
is to leading order close to
when L is much larger than
. This is treated in detail in appendix
Lastly, we explored the possibility that the numbers of edges leaving non-basal left layer and leaving the right layer are unbalanced, to reflect what is observed empirically with the LCI dataset. We report that indeed this modulates measured values for q. However, we were not able to find a simple yet accurate enough model for the observed behavior. Therefore in first approximation, we keep the balanced model below.
3.3. Leading eigenvalue
From equation (9) we have an expression for the average number of cycles of length 2ν in the bipartite directed configuration model. In section 3.2 we noticed that
the coherence in the bipartite ensemble had approximately the same expression as in the monopartite case.
Following [27], and taking only loops with even length into account, we derive an expression similar to that in the monopartite case equations (4) and (5) :


The full derivation can be found in appendix has values that can be directly related to the monopartite branching factor α in certain particular cases, but not in general. As remarked in section 3.2 the bipartite definition
can be well approximated by the usual monopartite value, in the particular case of balanced left and right layers.
3.4. Application to empirical graphs
In this section, the datasets presented in sections 2.2 and 2.3 are analyzed using the tools depicted in sections 3.1–3.3, adapted to the bipartite setting.
Table 1 shows that LCI networks are coherent, with an average q much lower than for other considered datasets, and are more likely in the loopless regime than other datasets. This leaves room for fluctuation inside the LCI category. For example, as shown in table 2, the Ozlci dataset has a trivial structure, since it occupies only two layers, and hence q = 0. Figure 4 presents two LCI networks embedded in 2d space
4
, with differing behaviors: the ELCD network is very coherent (q = 0.21), and more coherent than the corresponding average randomized networks (). The Bioenergiedat network is less coherent with value q = 0.99, and
ratio just above 1.
Figure 4. LCI networks embedded in 2d space. y-axis represents trophic level with inverted axis, low si at the top. x-axis. (left) ELCD; (right) Bioenergiedat.
Download figure:
Standard image High-resolution imageTable 1. Bipartite network median characteristics, by dataset category.
q | τ | |
---|---|---|
LCI | 0.37 | −1.5 |
Metabolic | 10 | 1.8 |
Table 2. Bipartite network trophic characteristics, LCI datasets. ‘Cycle’ is the number of unique elementary directed cycles, computed with graph-tool [39].
N | NB |
![]() | q |
![]() | α |
![]() | τ | cycle | λ1 | |
---|---|---|---|---|---|---|---|---|---|---|
Agribalyse | 31 698 | 1630 | 6.9 | 1.28 | 8.15 | 2.46 | 0.16 | 0.60 | 117 | 1.52 |
ELCD | 894 | 104 | 11.5 | 0.21 | 6.72 | 3.42 | 0.03 | −9.91 | 62 | 3.00 |
ELCD filtered | 883 | 99 | 11.4 | 0.14 | 6.80 | 3.29 | 0.02 | −23.46 | 0 | 0.03 |
Worldsteel | 63 | 5 | 12.7 | 0.53 | 1.32 | 0.99 | 0.40 | −1.51 | 0 |
![]() |
Bioenergiedat | 457 | 112 | 5.1 | 0.99 | 0.92 | 1.42 | 1.08 | 0.43 | 0 | 0.22 |
Ozlci | 1914 | 957 | 1.0 | 0.00 |
![]() | 0 |
![]() |
From the LCI literature, we expect a large number of feedback loops in the examined databases. Citing [34]: ‘Feedback loops occur frequently in industrial systems. For instance, mining of coal needs electricity, while production of electricity needs coal’. However, the majority of datasets are acyclic. Only Agribalyse and ELCD contain a small number of cycles compared to the edge number. In Agribalyse their occurrence is mostly related to seeds, which are both an input and output in plant growing. In ELCD, loops are associated with a few nodes, and filtering them is enough to remove all cycles, as explained in section 2.2.
This low number of loops has not been reported so far, up to our knowledge. It may arise from modeling bias, restricted scope, or from dataset selection bias. Also, recurring loops challenge numerical solvers in LCA not relying on matrix inversion. In the complex network literature, arguments of improved dynamic stability [18, 19] and transport [18] have been put forward to explain the lack of loops. In [20] the authors hypothesize that ‘instead the absence of feedback loops is a byproduct of a more inherent feature of networks: the existence of a preferred directionality’. In comparison, non-LCI networks introduced in section 2.3, contain a large number of cycles, not reported in table 3 because of prohibitive computational cost.
Table 3. Bipartite network trophic characteristics, other datasets. The number of cycles is not reported for this dataset because it is very high, and too time-consuming to compute exhaustively. Asterisks identify CycleFreeFlux preprocessing as explained in section 2.3.
N | NB |
![]() | q |
![]() | α |
![]() | τ | λ1 | |
---|---|---|---|---|---|---|---|---|---|
iJR904 | 657 | 13 | 4.72 | 11.65 | 10.87 | 5.97 | 1.07 | 1.79 | 5.87 |
iJR904* | 733 | 15 | 4.63 | 9.06 | 10.59 | 6.43 | 0.86 | 1.86 | 6.49 |
iSB619 | 623 | 12 | 4.70 | 11.90 | 11.00 | 5.92 | 1.08 | 1.78 | 6.04 |
iSB619* | 658 | 17 | 4.70 | 8.46 | 9.48 | 6.27 | 0.89 | 1.83 | 6.54 |
iAF692 | 744 | 10 | 4.54 | 16.11 | 12.96 | 6.18 | 1.24 | 1.82 | 5.56 |
iAF692* | 781 | 16 | 4.56 | 7.77 | 10.51 | 6.46 | 0.74 | 1.86 | 6.03 |
iND750 | 718 | 8 | 4.45 | 8.16 | 13.28 | 5.23 | 0.61 | 1.65 | 5.95 |
iND750* | 824 | 20 | 4.42 | 10.70 | 9.49 | 5.53 | 1.13 | 1.71 | 6.47 |
iYO844 | 700 | 12 | 4.78 | 11.82 | 11.77 | 6.30 | 1.00 | 1.84 | 6.30 |
iYO844* | 741 | 17 | 4.77 | 9.02 | 10.15 | 6.60 | 0.89 | 1.89 | 6.56 |
iAB_RBC_283 | 168 | 12 | 2.95 | 3.75 | 4.43 | 2.32 | 0.85 | 0.83 | 3.10 |
iAB_RBC_283* | 285 | 21 | 3.39 | 3.79 | 4.69 | 3.68 | 0.81 | 1.29 | 4.42 |
iIT341 | 637 | 14 | 4.57 | 14.79 | 10.15 | 5.87 | 1.46 | 1.77 | 5.52 |
iIT341* | 643 | 15 | 4.61 | 14.19 | 9.89 | 5.98 | 1.43 | 1.79 | 5.66 |
iNJ661 | 845 | 7 | 5.09 | 12.97 | 17.49 | 7.47 | 0.74 | 2.01 | 7.27 |
iNJ661* | 925 | 19 | 5.04 | 10.07 | 11.03 | 7.99 | 0.91 | 2.08 | 7.89 |
Also the consistency of formulas in section 3.3, adapted from [27], that relate τ and λ1, can be discussed. In figure 5, the leading eigenvalue λ1 is plotted as a function of τ, for both datasets (LCI and non-LCI). Circles representing non-LCI data are well fitted by the dashed curve. Crosses representing LCI datasets are close to the exponential, except for the outlier value at ,
which represents the ELCD dataset, and seems inconsistent with the curve. After filtering as explained in section 2.2 it is mapped to
,
, which gives a satisfactory fit. This case is reminiscent of a remark in [20]: ‘typically loops are not independent as they can share some nodes. In particular, hubs are statistically more likely than other nodes to take part in loops’. Indeed removing just a few nodes changed dramatically the behavior from a coherence point of view. However such cycle-removing is of course not relevant for acyclic datasets, while in Agribalyse cycles are scattered across the dataset rather than concentrated. This raises the question of the robustness of coherence measures and will be discussed in section 4.
Figure 5. Leading eigenvalue λ1 as a function of τ, in the bipartite case. The outlier value at
,
represents the ELCD dataset before filtering. After filtering it is mapped to
,
.
Download figure:
Standard image High-resolution imageIn section 2.2 the topic of monopartite projection was evoked as it is easier to use off-the-shelf tools than to extend a theory to the bipartite case. This is the approach taken for example in [15], that may be justified by the particular nature of LCI databases, as noted in section 2.2. Several numerical experiments were run to try to find patterns in the effect of monopartite projection. First, this imposes the constraint that basal nodes must all belong to the same layer, more specifically to the layer chosen for projection. This problem may be mitigated using the new definitions of coherence in [29]. In the easier case where for processes and projection is done on the flow layer, we were not able to evidence a predictable behavior for coherence quantities. Sometimes the projected network has only two trophic values, which results in
, although q was nonzero in the bipartite network. The case where
can take any value for processes is even harder to deal with.
4. Conclusion
Starting from the observation that little work had been devoted to production processes at the fine-grain level depicted by LCI in the complex networks community, we proposed a first original contribution in that direction. First, unlike earlier works, we proposed (i) to keep the bipartite structure to avoid loss of information, (ii) to look for a random model, (iii) able to reproduce hierarchical features of the datasets.
This was done by building on existing theory by Johnson and colleagues, upon extending some of their tools to the bipartite case. We report that:
- the studied empirical networks built from LCI databases have high coherence compared to other existing datasets and low loop number.
- the random ‘coherence ensemble’ satisfactorily reproduces an important property of empirical dataset (the largest real part of the set of eigenvalues), which it closely related to other important aspects such as behavior of dynamical systems defined on networks as shown by several contributions in the literature.
In further work we plan to extend the number of studied LCI databases, to consider how these observations can extend to other classical properties (such as clustering, diameter, ). Unbalanced bipartite basal model will be explored, as well as potential useful applications for LCI that can be derived from those findings. Furthermore, the new definitions of trophic coherence in [29] will be tested in the particular case of LCI networks.
Data availability statement
The data that support the findings of this study are openly available.
Appendix A: Loop count in the bipartite directed configuration model
Two properties of adjacency matrices of bipartite networks are used below. Let A be the adjacency matrix of a directed bipartite network G, and the associated biadjacency matrices:

Following [41], we notice that even and odd powers of A have different expressions. Since odd cycles are absent in a bipartite network, we focus on even powers of A:

Further, in the monopartite directed configuration model, the expected number of directed edges from node j to node i has the following form:

Being rank 1 matrices, the biadjacency matrices Bl
and Br
can be written as outer products of vectors: and
.
A and its even powers can thus be written:


Summing all terms in equation (A.4) we get the total number of walks of length 2ν :

with ,
,
and
.
Summing all diagonal terms in equation (A.4) we get the total number of cycles of length 2ν:

Appendix B: Basal ensemble
In this section, we compute in the bipartite case. From section 3.2 we have:

Three types of edges will be observed in the bipartite basal ensemble:
- from basal left nodes to right nodes: there are
such edges, with
.
- from non-basal left nodes to right nodes: there are
such edges, with
- from right nodes to non-basal left nodes: there are
such edges, with
From those values, the variance is:

which yields:

Appendix C: Leading eigenvalue
We replicate the steps in [27] to get an expression for τ in the bipartite case. In the coherence ensemble with coherence q, the sum along a cycle is equal to zero. Modeling the xk
as random variables, the authors notice that S has approximately a Gaussian distribution, and
is proportional to
. In the bipartite case we consider closed walks with length 2ν, since odd cycles are not allowed. The random variable along such a walks S has mean 2ν and variance
. It follows that:

where is unknown. Taking the particular case of the basal ensemble, and supposing as in [27] that
we get:

Since and supposing that
we have:

with τ defined as in equation (19).
In parallel the trace of the 2 n-th power of the adjacency matrix still can be expressed:

Taking the expectation in the coherence ensemble:

Taking to the power then the large ν limit as in [27]:

And the same form as in the monopartite case is recovered.
Appendix D: Data and code availability
The repositories evoked in section 2.3 are the Stanford Large Network Dataset Collection [42], ICON [43], and Netzschleuder [44]. They were selected because of their coverage and ability to filter by directedness and bipartiteness. The BiGG database [37] was also used for metabolic networks.
Several software packages were used in this work: Python, graph-tool [39], Igraph [45], Scikit-network [46], NetworkX [47], Cobrapy [48].
LCI datasets are freely available from nexus.openlca.org.
Code will we made available on gitlab.
Appendix E: Comparison between trophic level and economic complexity measure
In this section we propose a quick comparison between trophic level in equation (2) and economic complexity measures introduced for example in [5].
Suppose the starting point is a country/product network, bipartite and undirected. Then economic complexity measures such as country fitness or product complexity [5] can be computed. The trophic levels, however, are not defined because there are no basal nodes in the corresponding network (see section 2.1. Also note that another definition of trophic levels that does not require basal nodes was proposed in [29]). To sum up, in that case, si and measures of economic complexity cannot be compared.
Conversely, if the starting point is a bipartite directed network with basal nodes, then the trophic levels are defined and can be computed. Further, upon transforming this network into an undirected one (which discards important information), economic complexity measures can also be computed.
For example we focus on simple directed bipartite 3-motifs in see figure E1. First we compute the trophic levels si using equation (2) and write them next to each node, with basal node trophic levels set to 1.
Figure E1. Example of directed bipartite 3-motifs, trophic level si is given next to each node.
Download figure:
Standard image High-resolution imageThen, following [5], we compute Fc the fitness of a country c and the complexity Qp of a product p, as indicated by the following equations:

with the total number of countries and
the total number of products. Moreover:

with initial values .
We notice that upon transforming the directed bipartite motifs into undirected networks, motifs A and C are mapped to the same motif. Then the recursions for motifs A and B lead to a fixed point that can be easily found by hand: .
To summarize, in this simple case it appears that from the point of view of the economic complexity measures, the three motifs look the same, whereas the trophic levels preserve hierarchicalness.
A broader range of behaviors is expected if sa and sb are distinct. Also, a similar study could be conducted with 4-motifs, but this is left for further works.
Footnotes
- 1
‘the structure of the network is highly asymmetric. There are processes (transportation, electricity) that deliver to many other processes but that do not require inputs from these other processes. Conversely, there are processes (such as infrastructure-related processes) that have few customers, but that have many suppliers’ [15].
- 2
the opposite convention also exists in the literature.
- 3
e.g. ‘cargo’ appears as both an input and an output to transport processes [36].
- 4
this is done using ForceAtlas2 [40].