Content
Given a number of source and a number of target nodes, enumerate the k lightest paths connecting any of the
source nodes with any of the target nodes in the order of their weight.
Paths are simple, that is no node can occur more than once in the path.
Credits
Pathfinder is a Java wrapper around REA written by Jimenez and Marzal in C [1]. It has been
inspired by a Python wrapper written by Pierre Schaus and Jean-Noël Monette at UCL (INGI).
See the Cytoscape plugins by Pierre Schaus and Jean-Noël Monette for their k shortest paths tool.
Note that Pierre Schaus and Jean-Noël Monette also modified the REA code.
The multiple-to-multiple end path finding relies on a graph transformation suggested
by Olivier Hubaut (former aMAZE team member) and also described in [2]. Pathfinder makes use of a graph library
developed by the former aMAZE team.
Input graph
graph
The input graph can be either in tab-delimited or gml format.
graph file
A file containing a graph in either tab-delimited or gml format.
graph id
The identifier of a graph that has been submitted and stored on
the server. This allows to avoid re-submitting large graphs.
The identifier is assigned by Pathfinder. For storage of graphs
on the server side, see below.
Warning: The weight of already submitted graphs cannot be changed!
Directed
Check the "directed" option to signal that the input graph is directed.
This option is important for tab-delimited format, because this format,
in contrast to gml, does not specifiy whether a graph is directed or undirected.
Results may greatly vary between path finding done on the same input graph treated as
undirected or as directed!
Storage of input graphs on server
To submit large input graphs can take time. In addition, Pathfinder generates
temp files each time it is called. Re-using a graph and its associated
temp files saves time. This is why input graphs can be stored on the server.
Select this option and save the graph identifier that Pathfinder returns.
The next time you want to work on the same graph, enter its
identifier in the graph id field.
Although the input graphs themselves are stored an infinite time when this option has been set,
result files generated from them are only available during three days.
Input nodes
Sources and targets
If several source and/or target node identifiers are given, they are separated by '/'.
Example:
Source nodes: a/b/c
Target nodes: d
Batchfile
A batchfile allows to do several path finding tasks in a row. It is a collection
of sources and targets, which are specified in a tab-delimited file. Source nodes
are assigned to a group containing START in its name and target nodes are assigned to a group
containing END. The START and END groups are assigned to experiment groups, whose names can be chosen
freely. Thus, each experiment is defined by a start group and an end group, which in turn consist of
start nodes and end nodes.
Comments in the batchfile are preceded by #.
# example for batchfile describing two path finding experiments
# the first experiment has source R04511 and target C00191
# the second experiment has source R04198 and targets C00080 and C00681
R04511 START1
C00191 END1
R04198 START2
C00080 END2
C00681 END2
START1 EXP1
END1 EXP1
START2 EXP2
END2 EXP2
Graph formats
Preface
The tab-delimited format is simpler and therefore faster to transfer than
the gml format. Warning: For tab-delimited format, the user needs
to specify whether or not the graph is directed.
See the option described above to do this.
Warning: The tab-delimited format with a node part is restricted to Pathfinder
and cannot be read in by other NeAT tools.
Tab-delimited
- The tab-delimited format in its most simple form is a list of arcs.
Optionally, weights can be set as third column. Example:
; example tab-delimited graph as arc list with weights on arcs
a b 1.1
b c 2.4
c a 3
- If one wants to give nodes (i.e. to include orphan nodes or to set node attributes),
the node identifiers can be set as one column, along with their weights. Arcs
are separated from nodes by a line starting with ;ARCS. Example:
; example tab-delimited graph with nodes
a
b
c
d
;ARCS
a b 1.1
b c 2.4
c a 3
- Additional node or arc attribute values along with the attribute name can be set
by setting tab-delimited node or arc attribute headers. Example:
; example tab-delimited graph with nodes having values for a color attribute
;NODES color
a yellow
b blue
c blue
d yellow
;ARCS
a b 1.1
b c 2.4
c a 3
Attributes can also be set on the arcs. Example:
; example tab-delimited graph with nodes having values for a color attribute
; and arcs having values for a probability attribute
;NODES color
a yellow
b blue
c blue
d yellow
;ARCS probability
a b 0.9 1.1
b c 0.5 2.4
c a 0.1 3
# is an alternative comment symbol. If not specified otherwise, the last node or
arc column is always treated as weight column.
Warning: The symbol -- is NOT treated as comment symbol!
GML
The gml format is a generic format allowing to store data as a tree of objects
consisting of attribute and value pairs. It is widely used for describing graphs.
It allows to set attributes on nodes, arcs and the graph itself. See here
for more information on this format.
Options
Rank
This option specifies how many paths should be found in terms of weight levels. If for example rank is set to three,
Pathfinder attempts to find three increasing weight levels, where each weight level may contain several paths.
Weight
Three options are available.
- unit: This option sets a weight of one on each node.
- degree: This option sets a weight equal to its degree on each node.
- as given in input graph: This option should be set to use weights given in the graph.
Warning: Weights should be set on the arcs/edges as values of the attribute "Weight".
Example for edge in tab-delimited format:
1 2 3.22
or
;ARCS Weight
1 2 3.22
Example for edge in gml format:
edge[ source 1 target 2 Weight 3.22]
Node weights are transformed into edge/arc weights by taking the mean of
the head and tail node weights. The k shortest paths algorithm (see Credits)
expects arc weights as input.
Identifiers of nodes to exclude
This option allows to post-filter paths in order to keep only those that do not contain the given
node identifiers.
Identifiers of nodes to include
This option allows to post-filter paths in oder to keep only those that contain the given
node identifiers.
Maximal Weight
This option sets the maximal weight a path can have. The weight of a path is the sum of its edges.
Maximal Length
This option sets the maximal length a path can have. The length of a path is defined as its number of nodes.
Minimal Length
This option sets the minimal length a path should have.
Exclusion attribute
This option allows to specify the name of the exclusion attribute.
The exclusion attribute is set if certain nodes should not appear together in one path.
Nodes sharing the same value for the exclusion attribute are treated as mutually exclusive.
Metabolic
A metabolic graph is defined as follows: It has values on its nodes for two attributes, namely "ObjectType"
and the given exclusion attribute. The "ObjectType" attribute has two values: one for compound nodes ("Compound") and one for reaction nodes ("Reaction").
In addition, the graph is directed.
This definition follows from our experience that metabolic graphs should be represented as directed, bipartite graphs, where
forward and reverse direction of a reaction are mutually exclusive (see publications).
Algorithm
You can choose between REA (the default, see [1]) or backtracking
developed by Fabian Couche and Didier Croes (see publications).
You can use backtracking only on metabolic graphs and only with a pre-defined weighting scheme.
Backtracking treats the input graph as directed and sets weights as follows: compound nodes receive their degree
as weight and reaction nodes receive a weight of one. Only one start and one end node can be given.
Output
Preface
In general, result files are stored no longer than three days on the server,
so make sure you download them on time, if you need them.
Pathfinder unifies all paths of equal weight into one pathway. Thus, a pathway
does not need to be linear.
Because Pathfinder treats paths of equal weight as one pathway, the option "rank"
is interpreted as the requested maximal weight level. Pathfinder attempts to find as many
weight levels (with any number of paths belonging to a given weight level)
as ranks have been specified.
Of course, if less than the requested number of paths
is present in the graph, Pathfinder cannot return these paths.
If a graph is requested as Output type, the nodes have values for the
"Path_Rank" attribute, to distinguish different paths in the graph.
Nodes and edges/arcs belonging to a path are colored in gold (that is they
have a value "gold" for attribute color and value "#FFD700" for attribute rgb_color).
The following output types are available:
-
table of paths
Pathfinder returns the requested number of lightest paths ranked according to their weight in a table.
The table can be converted into a graph.
-
input graph with paths highlighted
Pathfinder returns the input graph with paths highlighted in another color.
The modified input graph can be input of further analysis steps. Its format can be
specified by setting output format either to "tab-delimited" or to "gml".
-
each path as a separated component of a single graph
Pathfinder returns the requested number of weight levels (see rank) as separated
components of one graph in the desired output format (gml or tab-delimited). The graph can be
input of further analsysis steps. Each component merges all paths of equal weight.
-
paths unified into one graph
Pathfinder returns the paths unified into one graph, which is returned in the desired
output format ("gml" or "tab-delimited"). The graph can be input of further analsysis steps.
Email
You can optionally specify an email address, so Pathfinder results can be sent to you by email.
This is recommended for very large graphs or batch files, which might require a computation
time above the server timeout.
If you leave the email field empty, results will be displayed in the browser.
The next steps panel allows you to input the result of Pathfinder into one of the listed tools without copy-pasting
the result into that tool. The next steps panel will only appear if you request a graph output.
Format limitations:
- Pathfinder cannot deal with multi-edges (several edges between two nodes) or hyper-edges (edges between more than two nodes).
- Pathfinder does not process correctly mixed graphs (mixtures of undirected and directed graphs).
Algorithmic limitations:
- REA does not allow a graph with negative-length cycles reachable from the start nodes. Therefore, negative weights on arcs should be used carefully, if at all.
Runtime:
- Runtime depends strongly on the size of the graph and the number of paths asked. A timeout of 10 minutes has been set.
We tested REA on metabolic graphs up to 44,000 arcs. With degree weighting scheme and paths requested up to second rank, the algorithm usually completed within one minute.
Be aware that for unit weighting scheme, runtime can be much longer, because many more paths of equal weight may exist.
In this case, the runtime may exceed the server timeout.
Whenever a timeout occurs, you might try to repeat the search with your email address set. This will circumvent the server timeout.
Pathfinder does not use the usual RSAT Web Service address!
Check the NeAT web services page.
Literature
- Jimenez, V.M., and Marzal, A. (1999). "Computing the K Shortest Paths: a New Algorithm and an Experimental Comparison", Proc. 3rd Int. Worksh. Algorithm Engineering (WAE 1999)
- Duin, C.W., Volgenant, A., and Voß, S. (2004). "Solving group Steiner problems as Steiner problems." European Journal of Operational Research 154, 323-329.