Complex network analysis is an important field in the life science domain. Especially protein-protein interaction networks are an important resource for current data-driven biology and bioinformatics. Proteins do seldomly perform their function in isolation. Protein-protein interactions are the key determinants for cellular protein complexes, pathways, and others. Mining protein interaction networks - often containing more than 15.000 nodes - requires computational methods, often developed in the context of complex network analysis.
Determining the similarity between two proteins based on their relative location in the protein interaction network is often hampered by the discrete nature of networks. In the simplest setting, two proteins can have a shortest-path distance of one, two, three, four… but seldomly more than five. Thus we have employed a random-walk based method for calculating the similarity between proteins.
In the figure one can see how a random-walk based method can distinguish (increased similarity score) when two nodes are connected by increasing number of paths in the network - especially in situations where the shortest-path distance is constant (here two). The method we employed is very interesting as it allows to calculate the similarity between multiple start-nodes and all other nodes in the network. We have applied this method to the problem of disease gene identification and had a remarkable success. The publication has been cited more than 500 times according to Google Scholar.