next up previous contents
Next: KEGG network provider Up: Network Analysis Tools (NeAT) Previous: Path finding   Contents

Subsections

Metabolic path finding

Introduction

The metabolic pathfinder enumerates metabolic pathways between a set of start nodes and a set of end nodes, where start and end nodes may be compounds, reactions or enzymes (which are mapped to the reactions they catalyze). When choosing the right parameters (which are set by default), the metabolic pathways found are with high probability biochemically relevant.

The accuracy of path finding in metabolic networks (as in other biological networks) is diminished by the presence of hub nodes (highly connected compounds such as ATP, NADPH or CO2) in the network. Path finding algorithms will traverse the network preferentially via the hub nodes, thereby inferring biochemically irrelevant pathways. Different strategies have been devised to overcome this problem. Arita introduced the mapping and tracing of atoms from substrates to products [1]. This strategy is also applied in the Pathway Hunter Tool available at http://pht.tu-bs.de/PHT/. Other tools rely on rules to avoid hub nodes, e.g. the pathway prediction system at UMBBD (http://umbbd.msi.umn.edu/predict/). Didier Croes et al. used weighted graphs to avoid highly connected nodes [5],[6]. The functionality of Didier Croes' tool is covered by the metabolic pathfinder (with the weighted reaction network).

Metabolic pathfinder relies on a mixed strategy: On the one hand, it makes use of weighted graphs to avoid irrelevant hub nodes and on the other hand, it integrates KEGG RPAIR annotation [18] to favor for each traversed reaction main over side compounds. KEGG RPAIR is a database that divides reactions into reactant pairs (substrate-product pairs) and classifies the reactant pairs according to their role in the reaction. For instance, the cofac reactant pair A00001 couples NADP+ with NADPH. Main reactant pairs connect main compounds and should be traversed preferentially by path finding algorithms.

The KEGG RPAIR annotation is integrated by construction of the undirected RPAIR network, which consists of 7,058 reactant pairs, 4,297 compounds and 14,116 edges for KEGG version 41.0. Alternatively, two other networks are available: the directed reaction network evaluated in [6] and an undirected reaction-specific RPAIR network, in which each reaction is divided in its reactant pairs.

Note that in more recent KEGG versions, identifiers of reactant pairs start with RP instead of A.

In this chapter, we will recover the aldosterone pathway using the RPAIR and the reaction network respectively. Note that the study case was carried out with data from KEGG LIGAND version 41.0. Results might differ for more recent KEGG versions.

Enumerating metabolic pathways between compounds, reactions or enzymes

Study case

Aldosterone is a human steroid hormone involved in the regulation of ion uptake in the kidney and of blood pressure. It is synthesized from progesterone. We aim to recover the aldosterone biosynthesis pathway by providing its start and end reaction.

Protocol for the web server

  1. In the NeATmenu, select the entry Metabolic path finding.

    In the right panel, you should now see a form entitled ``Metabolic pathfinder''.

  2. Click on the button DEMO2 located at the bottom of the form.

    The metabolic pathfinder form is now filled with the start and end reaction of the aldosterone biosynthesis pathway. In addition, information on this pathway is displayed.

  3. Click on the button GO.

  4. The seed node selection table appears.

    This table lists for each reaction the reactant pair identifier(s) associated to it. Note that reaction R02724 is associated to two reactant pairs.

    The seed node selection form allows you to select the correct among all compounds matching your query string in case you provided a partial compound name. If you give KEGG compound identifiers, it displays the name of each compound. For EC numbers, it lists associated reactions or reactant pairs. The seed node selection form also warns you in case you provide problematic identifiers.

  5. Click on the button GO.

    The computation should take no more than one minute.

    Then, a table is displayed, which lists the found paths in the order of their weight. The table may be sorted according to other criteria by clicking the respective column header. Each path node is linked to its corresponding KEGG entry for easy inspection of results.

    If you set Output format in the metabolic pathfinder form to ``Graph'', you obtain an image of the inferred pathway generated by the program dot of the graphviz tool suite and a link to the pathway in the selected graph format.

To see how results change with the choice of the graph, you can repeat steps 1 and 2. In the metabolic path finding form, select Reaction graph instead of RPAIR graph (which is selected by default) and follow step 3 to 5. You will notice in the seed node selection form that the reaction identifiers are no longer mapped to reactant pairs.

Protocol for the command-line tools

This section assumes that you have installed the RSAT/NeAT command line tools.

The metabolic pathfinder is a web application on top of Pathfinder. You may run metabolic path finding on command line by launching the Pathfinder command line tool on the RPAIR and reaction networks, which are provided in the KEGG graph repository reachable from the metabolic pathfinder manual page.

Type the following command in one line to find paths in the RPAIR network:

	java -Xmx800m graphtools.algorithms.Pathfinder -g RPAIRGraph_allRPAIRs_undirected.txt -f flat
	     -s 'A02437' -t 'A02894' -b -y rpairs

To repeat path finding in the reaction network, type in one line:

	java -Xmx800m graphtools.algorithms.Pathfinder -g ReactionGraph_directed.txt -d -f flat
	     -s 'R02724>/R02724<' -t 'R03263>/R03263<' -b -y con

Interpretation of the results

Metabolic path finding in the RPAIR network

The path of first rank does not reproduce exactly the annotated pathway. Instead, it suggests a deviation via 21-hydroxypregnelonone, bypassing progesterone. This path might be a valid alternative, as it appears on the KEGG map for C21-Steroid hormone metabolism in human. One of the two second-ranked paths corresponds to the annotated pathway.

First ranked path:
A02437 (1.14.15.6) Pregnenolone A03407 (1.14.99.10) 21-Hydroxypregnenolone A00731 (1.1.1.145, 5.3.3.1) 11-Deoxycorticosterone A03469 (1.14.15.4) Corticosterone A02893 (1.14.15.5) 18-Hydroxycorticosterone A02894

Second ranked paths:
A02437 (1.14.15.6) Pregnenolone A00386 (1.1.1.145, 5.3.3.1) Progesterone A02045 (1.14.99.10) 11-Deoxycorticosterone A03469 (1.14.15.4) Corticosterone A02893 (1.14.15.5) 18-Hydroxycorticosterone A02894

A02437 (1.14.15.6) Pregnenolone A00386 (1.1.1.145, 5.3.3.1) Progesterone A02047 (1.14.15.4) 11beta-Hydroxyprogesterone A03467 (1.14.99.10) Corticosterone A02893 (1.14.15.5) 18-Hydroxycorticosterone A02894

Metabolic path finding in the reaction network

The paths of first and second rank traverse a side compound, namely adrenal ferredoxin. None of these paths is therefore biochemically valid. In the weighted reaction graph all highly connected side compounds such as ATP and water are avoided. However, adrenal ferredoxin is a rare side compound, thus weighting is not sufficient to bypass it.

First ranked path:
R02724$<$ Reduced adrenal ferredoxin R03262$>$ 18-Hydroxycorticosterone R03263$>$

Second ranked paths:

R02724$>$ Oxidized adrenal ferredoxin R02726$<$ Reduced adrenal ferredoxin R03262$>$ 18-Hydroxycorticosterone R03263$>$

R02724$>$ Oxidized adrenal ferredoxin R02725$<$ Reduced adrenal ferredoxin R03262$>$ 18-Hydroxycorticosterone R03263$>$

Summary

Metabolic path finder provides k shortest path finding in metabolic networks constructed from KEGG LIGAND and KEGG RPAIR. The metabolic path finder is coupled with a mirror of the KEGG database to allow quick identification of partial compound names and to annotate results.

Strengths and Weaknesses of the approach

Strengths

The metabolic path finder has the following benefits compared to other metabolic path finding tools:
  1. It has been extensively evaluated on 55 reference pathways from three organisms.

  2. It supports compounds, reactions, reactant pairs and EC numbers as seed nodes.

  3. It can handle sets of start and end nodes.

Weaknesses

The metabolic path finding tool has the following weaknesses:

  1. RPAIR does not cover all compounds in KEGG. Thus, the RPAIR network is less comprehensive than the reaction network.

  2. By default, the metabolic path finder cannot infer directions of reactions in pathways because of the way the networks were constructed (being undirected or treating all reactions as reversible). However, custom metabolic networks may contain irreversible reactions and it is therefore possible to infer directed pathways from custom networks.

  3. The metabolic path finder can only partly infer cyclic pathways or pathways in which the same enzymes act repeatedly on a growing chain.

Troubleshooting

  1. A Parameter error occurred.

    By default, the optimal parameter values are set. However, if you set your own values, they might not be in the supported value range. Check the Metabolic path finder manual.

  2. The seed node selection form displays the message: "You provided invalid identifier(s)!"

    This occurs when you provide identifiers that do not match any KEGG identifier, EC number or KEGG compound name. Check your identifiers or in case you provided a compound name, check whether the compound is present in KEGG.

  3. The seed node selection form displays the message: "The given compound is not part of the sub-reaction graph."

    As stated in the Weaknesses section, the RPAIR network does not contain all KEGG compounds due to incomplete coverage of the RPAIR database. Try to search paths for this compound in the reaction network.

  4. No path could be found.

    This may happen in the RPAIR network because in this network reactant pairs belonging to the same reaction exclude each other. Try the reaction-specific RPAIR network or the reaction network instead.

  5. An out of memory error occurred.

    This may occur when requesting a large number of paths with the reactant subreaction and compound weighting schemes set to unweighted. In general, when setting the weighting schemes to unweighted, biochemically irrelevant paths will be returned. Use another weighting scheme or reduce the number of requested paths to avoid this error.


next up previous contents
Next: KEGG network provider Up: Network Analysis Tools (NeAT) Previous: Path finding   Contents
RSAT 2009-09-04