next up previous contents
Next: Study of the neighborhood Up: Network Analysis Tools (NeAT) Previous: Comparisons between networks   Contents

Subsections

Node degree statistics

Introduction

In a graph, the degree $k$ of a node is the number of edges connected to this node. If the graph is directed, we can make a distinction between the in-degree (the number input arcs) and the out-degree (number of output arcs). In this case, the degree of the node consists in the sum of the in-degree and of the out-degree of this node.

Different nodes having different degrees, this variability is characterized by the degree distribution function $P(k)$, which gives the probability that a node has exactly $k$ edges, or, in other words gives the observed frequency of a node of degree $k$.

Scale-free graphs were first described by Barabasi based on the study of the web connectivity, followed by several different biological networks [14].

A graph is scale-free if the distribution of the vertex degree ($k$) follows a power-law distribution of the form $P(k)  k^{-\gamma}$.

The main property of such graphs is that it should have on one hand some highly connected nodes, called hubs, which are central to the network topology, and keep the network together and on the other hand a lot of poorly connected nodes linked to the hubs.

In the following, we will check if this scale free property also applies to the two-hybrid network described by Uetz et al [32] by computing the degree of each node and plotting the node degree distribution of the graph.

Analysis of the node degree distribution of a biological network

Study case

In this demonstration, we will analyze the node degree distribution of the first published yeast protein interaction network. This network is the first attemp to study the yeast interactome using the two-hybrid method and contains 865 interactions between 926 proteins [32].

Protocol for the web server

  1. In the NeATmenu, select the command node topology statistics.

    In the right panel, you should now see a form entitled ``graph-topology''.

  2. Click on the button DEMO.

    The form is now filled with a graph in the tab-delimited format, and the parameters have been set up to their appropriate value for the demonstration, i.e., the degree of all nodes will be computed. At the top of the form, you can read some information about the goal of the demo, and the source of the data.

    As this is a protein - protein interaction graph, we can consider that an interaction between a protein A with a protein B corresponds to an interaction between protein B and protein A. The graph is thus not directed.

    You can uncheck the compution of the closeness and betweenness as these statistics will not be discussed in this section and as this process will increase the computation time.

  3. Click on the button GO.

    The computation should take less than one minute. On one hand, the result page displays a link to the result file and on the other hand the graphics and raw data of the node degree distribution are also available. These will be discussed in the Interpretation of the results section.

Protocol for the command-line tools

If you have installed a stand-alone version of the NeAT distribution, you can use the program graph-topology on the command-line. This requires to be familiar with the Unix shell interface. If you don't have the stand-alone tools, you can skip this section and read the next section (Interpretation of the results).

We will now describe the use of graph-topology as a command line tool. The two two-hybrid dataset described in the previous section may be downloaded at the following address http://rsat.scmbb.ulb.ac.be/rsat/data/neat_tuto_data/. This is the file uetz_2001.tab.

  1. The first step consist in applying graph-topology on the two-hybrid dataset. To this, go into the directory where you downloaded the file uetz_2001.tab and use this command.

    	graph-topology -v 1 -i uetz_2001.tab -return degree -all -o uetz_2001_degrees.tab
    
    The file uetz_2001_degrees.tab is created and contains the degree of each node of the Uetz et al data set.

  2. In the second step, we will study the degree distribution of the nodes. To this, we use the program classfreq from the RSAT suite that compute the distribution of a set of number. As the graph we are working with is undirected, we will only compute this degree distribution for the global degree of the nodes which is the second column of the file uetz_2001_degrees.tab obtained in the previous step.

    	classfreq -i uetz_2001_degrees.tab -v 1 -col 2 -ci 1 -o uetz_2001_degrees_freq.tab
    

  3. Finally, we will display the distribution graph in the PNG format in order to visualize the degree distribution and determine if it has a scale free behaviour. The program XYgraph from RSA-tools will be used for this purpose. Note that we could use other tools like Microsoft Excel or R. The results will be stored in the file uetz_2001_degrees_freq.png that you can open with any visualization tool.

    		XYgraph -i uetz_2001_degrees_freq.tab \
    		-title 'Global node degree distribution for Uetz et al (2001) interaction graph' \
    		-xcol 2 -ycol 4,6 -xleg1 Degree -lines \
    		-yleg1 'Number of nodes' -legend -header -format png \
    		-o uetz_2001_degrees_freq.png
    

Interpretation of the results

graph-topology result file

Open the resulting file produced by graph-topology. According to the requested level of verbosity (-v # option), the file begins with some lines starting with the '#' or ';' symbols that contains some information about the graph and the description of the columns.

The results consists in a two columns data set.

  1. Node name
  2. Global degree

Note that if you used the '-directed' option, the resulting file contains 3 more columns specifying the in-degree, the out-degree and whether the node is only a source node or a target node.

Node degree distribution

Let us first have a look at the node degree distribution data file produced by the classfreq program (raw data). This file is a tab-delimited file containing 9 columns. Each line consists in a value interval. In our case, the value is the degree of the nodes.

  1. Minimal value of the interval
  2. Maximal value of the interval
  3. Central value of the interval
  4. Frequency : Number of elements in this class interval (number of nodes having a degree comprised betwee the minimal and the maximal values.
  5. Cumulative frequency.
  6. Inverse cumulative frequency
  7. Relative frequency : number of elements in this class over the total number of elements
  8. Relative cumulative frequency
  9. Inverse relative cumulative frequency

The first result line contains the distribution results for the nodes having only one neighbour (i.e. degree comprised between 1 and 2), from it we can see that 577 over 926, i.e., 62% of the nodes have a degree of one. Moreover, about 90% of the nodes have a degree lower than 4. This is indicative of the scale-free nature of the interaction network.

The figure best illustrates the scale-freeness of the graph. When looking at the graphical representation of this distribution, we can see two curves. The blue curve represents the absolute frequency and the green curve the inverse cumulative frequency. The exponential decrease of both curves shows that there are a lot more nodes poorly connected than highly connected (hubs). The Uetz graph thus presents a scale free behaviour.


next up previous contents
Next: Study of the neighborhood Up: Network Analysis Tools (NeAT) Previous: Comparisons between networks   Contents
RSAT 2009-09-04