“Graphlets”的意思、由来-开放百科全书

Graphlets were first introduced by Nataša Pržulj, when they were used as a basis for designing two new highly sensitive measures of network local structural similarities: the relative graphlet frequency distance (RGF-distance)^[1] and the graphlet degree distribution agreement (GDD-agreement).^[2] Additionally, Pržulj group developed a novel measure of network topological similarity that generalizes the degree of a node in the network to its graphlet degree vector (GDV) or graphlet degree signature.^[4]

Graphlet-based network properties

Relative graphlet frequency distance

RGF-distance compares the frequencies of the appearance of all 3-5-node graphlets in two networks.^[1] Let N_i(G) be the number of graphlets of type

(

) in network G, and let

be the total number of graphlets of G. The "similarity" between two graphs should be independent of the total number of nodes or edges, and should depend only upon the differences between relative frequencies of graphlets. Thus, relative graphlet frequency distance D(G,H) between two graphs G and H is defined as:

where

. The logarithm of the graphlet frequency is used because frequencies of different graphlets can differ by several orders of magnitude and the distance measure should not be entirely dominated by the most frequent graphlets.

Graphlet degree distribution agreement

GDD-agreement generalizes the notion of the degree distribution to the spectrum of graphlet degree distributions (GDDs) in the following way.^[2] The degree distribution measures the number of nodes of degree k in graph G, i.e., the number of nodes "touching" k edges, for each value of k. Note that an edge is the only graphlet with two nodes. GDDs generalize the degree distribution to other graphlets: they measure for each 2-5-node graphlet G_i,

, such as a triangle or a square, the number of nodes "touching" k graphlets G_i at a particular node. A node at which a graphlet is "touched" is topologically relevant, since it allows us to distinguish between nodes "touching", for example, a three node path at an end node or at the middle node. This is summarized by automorphism orbits (or just orbits, for brevity): by taking into account the "symmetries" between nodes of a graphlet, there are 73 different orbits across all 2-5-node graphlets (see [Pržulj, 2007]^[2] for details).

For each orbit j, one needs to measure the j^th GDD, d_G^j(k), i.e., the distribution of the number of nodes in G "touching" the corresponding graphlet at orbit j k times. Clearly, the degree distribution is the 0th GDD. d_G^j(k) is scaled as

to decrease the contribution of larger degrees in a GDD and then normalized with respect to its total area

For two networks G and H and a particular orbit j, the "distance" D^j(G,H) between their normalized j^th GDDs is:

The distance is between 0 and 1, where 0 means that G and H have identical j^th GDDs, and 1 means that their j^th GDDs

are far away. Next, D^j(G,H) is reversed to obtain the j^th GDD-agreement:

The total GDD-agreement between two networks G and H is the arithmetic or the geometric average of the j^th GDD-agreements over all j, i.e.,

respectively. GDD-agreement is scaled to always be between 0 and 1, where 1 means that two networks are identical with respect to this property. (See [Pržulj, 2007]^[2] for details.)

Graphlet degree vectors (signatures) and signature similarities

This method generalizes the degree of a node, which counts the number of edges that the node touches, into the vector of graphlet degrees, or graphlet degree signature, counting the number of graphlets that the node touches at a particular orbit, for all graphlets on 2 to 5 nodes.^[4] The resulting vector of 73 coordinates is the signature of a node that describes the topology of node's neighborhood and captures its interconnectivities out to a distance of 4 (see [Milenković and Pržulj, 2008]^[4] for details). The graphlet degree signature of a node provides a highly constraining measure of local topology in its vicinity and comparing the signatures of two nodes provides a highly constraining measure of local topological similarity between them.

The signature similarity^[4] is computed as follows. For a node u in graph G, u_i denotes the i^th coordinate of its signature vector, i.e., u_i is the number of times node u is touched by an orbit i in G. The distance D_i(u,v) between the i^th orbits of nodes u and v is defined as:

where w_i is the weight of orbit i that accounts for dependencies between orbits (see [Milenković and Pržulj, 2008]^[4] for details). The total distance D(u,v) between nodes u and v is defined as:

The distance D(u,v) is in [0, 1), where distance 0 means that signatures of nodes u and v are identical. Finally, the signature similarity, S(u,v), between nodes u and v is:

Clearly, a higher signature similarity between two nodes corresponds to a higher topological similarity between their extended neighborhoods (out to distance 4).

Application of graphlet-based network properties

RGF-distance and GDD-agreement were used to evaluate the fit of various network models to real-world networks and to discover a new, well-fitting, geometric random graph model for protein-protein interaction networks,^[1]^[2] as well as other types of biological networks, such as protein structure networks, also called residue interaction graphs.^[5] These graphlet-based network properties are implemented in GraphCrunch, a software tool for large network analyses and modeling.^[6] Alternatively, a parallel implementation is provided in PGD, a software library for computing graphlet-based network properties in large and massive networks.^[7]^[8]

Graphlet degree vectors (signatures) and signature similarities were applied to biological networks to identify groups (or clusters) of topologically similar nodes in a network and predict biological properties of yet uncharacterized nodes based on known biological properties of characterized nodes. Specifically, they were applied to protein function prediction,^[4] cancer gene identification,^[9] and discovery of pathways underlying certain biological processes, such as melanogenesis^[9] or protein degradation.^[10]^[11] Additionally, GRAph ALigner (GRAAL),^[12] a global network alignment method, used graphlet degree vectors and signature similarities to produce topological alignments of biological networks, without using any information external to network topology.

References

1. ^¹²³Pržulj N, Corneil DG, Jurisica I: Modeling Interactome, Scale-Free or Geometric?, Bioinformatics 2004, 20(18):3508-3515.
2. ^¹²³⁴⁵Pržulj N, Biological Network Comparison Using Graphlet Degree Distribution, Bioinformatics 2007, 23:e177-e183.
3. ^R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, Network motifs, simple building blocks of complex networks, Science 2002, 298(5594): p. 824-7.
4. ^¹²³⁴⁵Tijana Milenković and Nataša Pržulj, Uncovering Biological Network Function via Graphlet Degree Signatures, Cancer Informatics 2008, 6:257–273.
5. ^Tijana Milenković, Ioannis Filippis, Michael Lappe, and Nataša Pržulj, Optimized Null Model for Protein Structure Networks, 2009, PLoS ONE 4(6): e5967.
6. ^Tijana Milenković, Jason Lai, and Nataša Pržulj, GraphCrunch: a tool for large network analyses, BMC Bioinformatics 2008, 9:70. Highly accessed.
7. ^{{Cite book|last=Ahmed|first=N. K.|last2=Neville|first2=J.|last3=Rossi|first3=R. A.|last4=Duffield|first4=N.|date=2015-11-01|title=Efficient Graphlet Counting for Large Networks|url=http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7373304|journal=2015 IEEE International Conference on Data Mining (ICDM)|pages=1–10|doi=10.1109/ICDM.2015.141|isbn=978-1-4673-9504-5}}
8. ^{{Cite journal|last=Ahmed|first=Nesreen K.|last2=Neville|first2=Jennifer|last3=Rossi|first3=Ryan A.|last4=Duffield|first4=Nick G.|last5=Willke|first5=Theodore L.|date=2016-06-27|title=Graphlet decomposition: framework, algorithms, and applications|journal=Knowledge and Information Systems|volume=50|issue=3|language=en|pages=689–722|doi=10.1007/s10115-016-0965-5|issn=0219-1377|arxiv=1506.04322}}
9. ^¹Tijana Milenković, Vesna Memisević, Anand K. Ganesan, and Nataša Pržulj, Systems-level Cancer Gene Identification from Protein Interaction Network Topology Applied to Melanogenesis-related Interaction Networks, Journal of the Royal Society Interface 2009, {{doi|10.1098/rsif.2009.0192}}.
10. ^Cortnie Guerrero, Tijana Milenković, Nataša Pržulj, Peter Kaiser, Lan Huang, Characterization of the Yeast Proteasome Interaction Network by QTAX-Based Tag-Team Mass Spectrometry and Protein Interaction Network Analysis, PNAS 2008, 105(36): 13333–13338.
11. ^Robyn Kaake, Tijana Milenković, Nataša Pržulj, Peter Kaiser, and Lan Huang, Quantifying Cell Cycle Dependent Changes in Protein Interacting Network of the Yeast 26S Proteasome, Journal of Proteome Research 2010, to appear.
12. ^Oleksii Kuchaiev, Tijana Milenković, Vesna Memisević, Wayne Hayes, and Nataša Pržulj, Topological network alignment uncovers biological function and phylogeny, Journal of the Royal Society Interface 2010, to appear.

Graphlet-based network properties

Relative graphlet frequency distance

Graphlet degree distribution agreement

Graphlet degree vectors (signatures) and signature similarities

Application of graphlet-based network properties

References

External links