next up previous contents
Next: Bibliography Up: Network Analysis Tools (NeAT) Previous: KEGG network provider   Contents

Subsections

Pathway inference

Introduction

The idea of pathway inference is to connect a given set of seed nodes in the network and thereby extracting a sub-network that is optimal according to certain criteria (e.g. minimal weight or maximal relevance).
In the context of biological networks, the goal is to obtain a valid pathway for a set of biological entities of interest, e.g. genes from microarray data or compounds from metabolomic data. For instance, genes whose products participate in the same metabolic pathway are often co-expressed or grouped together in operons or regulons. We may try to reconstruct this metabolic pathway by associating the gene products to relevant reactions and connecting these reactions in a metabolic network. The resulting sub-network may be a known metabolic pathway or an unknown pathway consisting of known pathways or known reactions and compounds. In the context of microarray data, pathway inference from a set of co-expressed genes may predict which pathways are up- or down-regulated.

Inferring a pathway for a set of co-expressed genes

As an example, we take the case study discussed in [34]. In this case study, a pathway is assembled from genes in the cell-cycle regulated MET cluster [30]. Results described in this tutorial have been obtained with KEGG RPAIR version 49.0.

Protocol for the web server

  1. In the NeATmenu, select the entry Pathwayinference.

  2. Copy-paste the gene names below in the seed nodes text field:
    Met3
    Met14
    Met16
    Met5
    Met10
    Met17
    Met6

  3. Select "Genes/Enzymes" as identifier type.

  4. In the text field "Genes are from organism" type sce, the KEGG abbreviation for Saccharomyces cerevisiae.

  5. Push the GO button.

The result of the mapping of the given genes to KEGG RPAIRS (reactant pairs, [18]) is displayed. Since more than one reactant pair is associated to each gene, we end up with a group of reactant pair groups. Note that each gene (except for Met5) is associated to one or more EC numbers, each of which has been mapped to its corresponding reactions in KEGG, which have in turn be mapped to their corresponding reactant pairs.

You can now select how to deal with the groups. This is a sensitive choice that strongly affects the inferred pathway and which depends on your data. In general, if you keep the original groups, you assume implicitely that only a subset of the reactions associated to the given gene will be active in the pathway. If you think that all reactions associated to a gene might be active, choose "Treat each group member as a separate group" (the default treatment).

For the study case, we recommend you to keep the default.

Push GO. In a few minutes, the result page will be displayed.

Protocol for the command-line tools

This section assumes that you have installed the RSAT/NeAT command line tools.

Pathwayinference is a web application that calls the pathwayinference web service. You can use the Pathwayinference command line tool on the networks provided in the network repository (check the Pathwayinference Manual for this) to reproduce results obtained with the web application on command line. Note that the mapping of genes to reactions and group treatment can only be done via the web application.

Type the following command in one line:

java -Xmx800m graphtools.algorithms.Pathwayinference -g RPAIRGraph_allRPAIRs_undirected.txt
	     -s 'RP00016#RP00182/RP00647/RP00561/RP00143#RP00960#RP04049/RP00096#RP00168#
	     RP04532/RP00003/RP00446/RP00946#RP00857/RP04474/RP00050#RP04533'
	     -f flat -b -y con -P -u -x 0.05

Interpretation of the results

The resulting sub-network contains a large part of the pathway given in [34]. Note that the chosen algorithm (kWalks in combination with Takahashi & Matsuyama) may return one from a set of solutions, so your solution may deviate from the one described here. Despite of this disadvantage, Takahashi & Matsuyama in combination with kWalks is the default algorithm, because it performed best in our evaluation. If your result deviates from the one described below, repeat the inference with the algorithm "repetitive REA".

The pathway described in the study case unites the sulfur assimilation and methionine biosynthesis pathways. It consists of the following steps:
Sulfate 2.7.7.4 Adenylyl sulfate 2.7.1.25 3'phosphoadenylylsulfate 1.8.99.4 sulfite 1.8.1.2 sulfide (alias hydrogen sulfide) 4.2.99.10 Homocysteine 2.1.1.14 L-Methionine

The matching parts of the inferred pathway are:

RP00016 3'-Phosphoadenylyl sulfate RP00446 Adenylyl sulfate RP00960
and
RP00960 Sulfite RP00168 Hydrogen sulfide RP01406 L-Homocysteine RP00096
Seeds are printed in bold.

In addition, the inferred pathway contains a branch that leads from 3'-Phosphoadenylylselenate to Adenylylselenate. This branch mirrors sulfur incorporation, but instead of sulfur, selenium is incorporated.

The presence of both the selenium and sulfur incorporation pathways in the inferred sub-network reflects the well-known fact that selenium might replace sulfur in metabolism.

This example demonstrated that given a set of differentially expressed genes from micro-array data and a metabolic network, it is possible to infer a metabolic pathway that might be affected by altered expression of the query genes.

Summary

Pathwayinference allows extraction of sub-networks from larger networks given a set of seed nodes. The web application is tailored to metabolic networks, but non-metabolic networks can be processed as well.

Strengths and Weaknesses of the approach

Strengths

  1. Sub-network extraction can be applied to any biological network.

  2. It can discover unknown pathways consisting of known components.

  3. It can be fine-tuned to favor certain nodes. For instance, in a global metabolic network, reactions/compounds known to occur in certain species might receive a weight much lower than other nodes, to favor extraction of species-specific sub-networks.

  4. Groups of seed nodes can be specified to reflect AND/OR relationships between seeds.

  5. The web application allows to infer metabolic pathways in metabolic networks extracted from the two major metabolic databases KEGG [15] and MetaCyc [3].

  6. For metabolic networks from MetaCyc or KEGG, the web application supports compounds, reactions, reactant pairs, EC numbers or gene identifiers as seed nodes and handles the required mapping of these seeds to reactions, reactant pairs and compounds.

  7. For metabolic networks from MetaCyc or KEGG, the web application performs a mapping of the inferred sub-network to known pathways stored MetaCyc or KEGG respectively.

  8. Metabolic sub-network extraction has been validated on 71 metabolic pathways extracted from MetaCyc.

Weaknesses

  1. In general, the accuracy of pathway inference depends on the quality of the given network and the number of seeds available.

  2. Spiral-shaped metabolic pathways such as fatty acid biosynthesis can only be partly inferred.

  3. In the densely connected region of metabolic networks, metabolic pathway inference cannot well distinguish alternative pathways without a large number of seed nodes.

  4. The algorithms are too time-consuming to estimate p-values by computing a score distribution (where the score would be the sub-network weight) for randomly chosen seed nodes on the fly. We envisage to pre-compute these distributions for the pre-loaded networks.

  5. Only one sub-network is suggested. We envisage to compute a list of them ranked by their weight.

Troubleshooting

  1. Pathwayinference parameter error.

    You provided insufficient or invalid parameters. Please check the pathwayinference manual page.

  2. You did not specify enough valid seed node groups! Pathwayinference needs at least two valid seed node groups.

    For the pre-loaded metabolic networks from KEGG and MetaCyc, each seed is mapped to data (e.g. compound/reaction identifiers, EC numbers) from these two databases. If the seeds do not map anything, they are considered to be invalid. At least two valid seed groups are needed to infer a network.

  3. The node with identifier ID is not part of the input graph.

    Make sure that your input network contains the node with the given identifier.

  4. Pathwayinference failed to extract a subgraph.

    None of the seed node groups could be connected to any other seed node group. Each might belong to a separate component of the input network or mutual exclusion (in RPAIR networks) might prevent the connection of the seed groups.


next up previous contents
Next: Bibliography Up: Network Analysis Tools (NeAT) Previous: KEGG network provider   Contents
RSAT 2009-09-04