Path and Traversal

Path and Traversal

LightGraphs.jl provides several traversal and shortest-path algorithms, along with various utility functions. Where appropriate, edge distances may be passed in as a matrix of real number values.

Edge distances for most traversals may be passed in as a sparse or dense matrix of values, indexed by [src,dst] vertices. That is, distmx[2,4] = 2.5 assigns the distance 2.5 to the (directed) edge connecting vertex 2 and vertex 4. Note that also for undirected graphs distmx[4,2] has to be set.

Default edge distances may be passed in via the

DefaultDistance

An array-like structure that provides distance values of 1 for any src, dst combination.

source

structure.

Any graph traversal will traverse an edge only if it is present in the graph. When a distance matrix is passed in,

  1. distance values for undefined edges will be ignored, and
  2. any unassigned values (in sparse distance matrices), for edges that are present in the graph, will be assumed to take the default value of 1.0.
  3. any zero values (in sparse/dense distance matrices), for edges that are present in the graph, will instead have an implicit edge cost of 1.0.

Graph Traversal

Graph traversal refers to a process that traverses vertices of a graph following certain order (starting from user-input sources). This package implements three traversal schemes:

LightGraphs.bfs_treeFunction.
bfs_tree(g, s[; dir=:out])

Provide a breadth-first traversal of the graph g starting with source vertex s, and return a directed acyclic graph of vertices in the order they were discovered. If dir is specified, use the corresponding edge direction (:in and :out are acceptable values).

source
bfs_tree(LevelSynchronousBFS(), g, s, nv)

Provide a parallel breadth-first traversal of the graph g starting with source vertex s, and return a directed acyclic graph of vertices in the order they were discovered using a frontier based parallel approach.

Implementation Notes

This function uses @threads for parallelism which depends on the JULIA_NUM_THREADS environment variable to decide the number of threads to use. Refer @threads documentation for more details. This function is a high level wrapper around LightGraphs.bfs_tree!; use that function for more performance.

source
LightGraphs.bfs_tree!Function.
bfs_tree!(LevelSynchronousBFS(), g, src, parents)

Provide a parallel breadth-first traversal of the graph g starting with source vertex s, and return a parents array. The returned array is an Array of Atomic integers.

Implementation Notes

This function uses @threads for parallelism which depends on the JULIA_NUM_THREADS environment variable to decide the number of threads to use. Refer @threads documentation for more details.

source
LightGraphs.dfs_treeFunction.
dfs_tree(g, s)

Return an ordered vector of vertices representing a directed acylic graph based on depth-first traversal of the graph g starting with source vertex s.

source
maximum_adjacency_visit(g[, distmx][, log][, io])

Return the vertices in g traversed by maximum adjacency search. An optional distmx matrix may be specified; if omitted, edge distances are assumed to be 1. If log (default false) is true, visitor events will be printed to io, which defaults to STDOUT; otherwise, no event information will be displayed.

source
bfs_parents(g, s[; dir=:out])

Perform a breadth-first search of graph g starting from vertex s. Return a vector of parent vertices indexed by vertex. If dir is specified, use the corresponding edge direction (:in and :out are acceptable values).

Performance

This implementation is designed to perform well on large graphs. There are implementations which are marginally faster in practice for smaller graphs, but the performance improvements using this implementation on large graphs can be significant.

source
LightGraphs.has_pathFunction.
has_path(g::AbstractGraph, u, v; exclude_vertices=Vector())

Return true if there is a path from u to v in g (while avoiding vertices in exclude_vertices) or u == v. Return false if there is no such path or if u or v is in excluded_vertices.

source
LightGraphs.diffusionFunction.
diffusion(g, p, n)

Run diffusion simulation on g for n steps with spread probabilities based on p. Return a vector with the set of new vertices reached at each step of the simulation.

Optional Arguments

  • initial_infections=sample(vertices(g), 1): A list of vertices that

are infected at the start of the simulation.

  • watch=Vector(): While simulation is always run on the full graph,

specifying watch limits reporting to a specific set of vertices reached during the simulation. If left empty, all vertices will be watched.

  • normalize=false: if false, set the probability of spread from a vertex $i$ to

each of the outneighbors of $i$ to $p$. If true, set the probability of spread from a vertex $i$ to each of the outneighbors of $i$ to $\frac{p}{outdegreee(g, i)}$.

source
diffusion_rate(results)
diffusion_rate(g, p, n; ...)

Given the results of a diffusion output or the parameters to the diffusion simulation itself, (run and) return the rate of diffusion as a vector representing the cumulative number of vertices infected at each simulation step, restricted to vertices included in watch, if specified.

source
LightGraphs.mincutFunction.
mincut(g, distmx=weights(g))

Return a tuple (parity, bestcut), where parity is a vector of integer values that determines the partition in g (1 or 2) and bestcut is the weight of the cut that makes this partition. An optional distmx matrix may be specified; if omitted, edge distances are assumed to be 1.

source

Random walks

LightGraphs includes uniform random walks and self avoiding walks:

randomwalk(g, s, niter)

Perform a random walk on graph g starting at vertex s and continuing for a maximum of niter steps. Return a vector of vertices visited in order.

source
non_backtracking_randomwalk(g, s, niter)

Perform a non-backtracking random walk on directed graph g starting at vertex s and continuing for a maximum of niter steps. Return a vector of vertices visited in order.

source
LightGraphs.sawFunction.
saw(g, s, niter)

Perform a self-avoiding walk on graph g starting at vertex s and continuing for a maximum of niter steps. Return a vector of vertices visited in order.

source

Connectivity / Bipartiteness

Graph connectivity functions are defined on both undirected and directed graphs:

is_connected(g)

Return true if graph g is connected. For directed graphs, return true if graph g is weakly connected.

source
is_strongly_connected(g)

Return true if directed graph g is strongly connected.

source
is_weakly_connected(g)

Return true if the graph g is weakly connected. If g is undirected, this function is equivalent to is_connected(g).

source
connected_components(g)

Return the connected components of an undirected graph g as a vector of components, with each element a vector of vertices belonging to the component.

For directed graphs, see strongly_connected_components and weakly_connected_components.

source
strongly_connected_components(g)

Compute the strongly connected components of a directed graph g.

Return an array of arrays, each of which is the entire connected component.

Implementation Notes

The order of the components is not part of the API contract.

source
weakly_connected_components(g)

Return the weakly connected components of the graph g. This is equivalent to the connected components of the undirected equivalent of g. For undirected graphs this is equivalent to the connected components of g.

source
has_self_loops(g)

Return true if g has any self loops.

source
attracting_components(g)

Return a vector of vectors of integers representing lists of attracting components in the directed graph g.

The attracting components are a subset of the strongly connected components in which the components do not have any leaving edges.

source
is_bipartite(g)

Return true if graph g is bipartite.

source
bipartite_map(g)

For a bipartite graph g, return a vector c of size $|V|$ containing the assignment of each vertex to one of the two sets ($c_i == 1$ or c_i == 2`). Ifg` is not bipartite, return an empty vector.

Implementation Notes

Note that an empty vector does not necessarily indicate non-bipartiteness. An empty graph will return an empty vector but is bipartite.

source
biconnected_components(g)

Compute the biconnected components of an undirected graph gand return a vector of vectors containing each biconnected component.

Performance: Time complexity is $\mathcal{O}(|V|)$.

source
condensation(g[, scc])

Return the condensation graph of the strongly connected components scc in the directed graph g. If scc is missing, generate the strongly connected components first.

source
neighborhood(g, v, d, distmx=weights(g))

Return a vector of each vertex in g at a geodesic distance less than or equal to d, where distances may be specified by distmx.

Optional Arguments

  • dir=:out: If g is directed, this argument specifies the edge direction

with respect to v of the edges to be considered. Possible values: :in or :out.

source
neighborhood_dists(g, v, d, distmx=weights(g))

Return a tuple of each vertex at a geodesic distance less than or equal to d, where distances may be specified by distmx, along with its distance from v.

Optional Arguments

  • dir=:out: If g is directed, this argument specifies the edge direction

with respect to v of the edges to be considered. Possible values: :in or :out.

source
articulation(g)

Compute the articulation points of a connected graph g and return an array containing all cut vertices.

source
LightGraphs.periodFunction.
period(g)

Return the (common) period for all vertices in a strongly connected directed graph. Will throw an error if the graph is not strongly connected.

source
isgraphical(degs)

Return true if the degree sequence degs is graphical, according to Erdös-Gallai condition.

Performance

Time complexity: ``\mathcal{O}(|degs|^2)``
source

Cycle Detection

In graph theory, a cycle is defined to be a path that starts from some vertex v and ends up at v.

LightGraphs.is_cyclicFunction.
is_cyclic(g)

Return true if graph g contains a cycle.

Implementation Notes

Uses DFS.

source
maxsimplecycles(dg::::IsDirected, byscc::Bool = true)

Compute the theoretical maximum number of cycles in the directed graph dg.

The computation can be performed assuming the graph is complete or taking into account the decomposition in strongly connected components (byscc parameter).

Performance

A more efficient version is possible.

References

source
simplecycles(dg::::IsDirected)

Compute and return all cycles of the given directed graph using Johnson's algorithm.

Performance

The number of cycles grows more than exponentially with the number of vertices, you might want to use the algorithm with a ceiling – getcycles – on large directed graphs (slightly slower). If you want to have an idea of the possible number of cycles, look at function maxsimplecycles(dg::DiGraph, byscc::Bool = true).

References

source
simplecycles_iter(dg::DiGraph, ceiling = 10^6)

Search all cycles of the given directed graph, using Johnson's algorithm, up to the ceiling (to avoid memory overload).

Implementation Notes

If the graph is small, the ceiling will not be reached and simplecycles(dg::DiGraph) is more efficient. It avoids the overhead of the counting and testing if the ceiling is reached. It returns all the cycles of the directed graph if the ceiling is not reached, a subset of them otherwise.

To get an idea of the possible number of cycles, use function `maxsimplecycles(dg::DiGraph, byscc::Bool = true) on the directed graph.

References

source
simplecycles_hadwick_james(g)

Find circuits (including self-loops) in g using the algorithm of Hadwick & James.

References

  • Hadwick & James, "Enumerating Circuits and Loops in Graphs with Self-Arcs and Multiple-Arcs", 2008
source
simplecyclescount(dg::DiGraph, ceiling = 10^6)

Count the number of cycles in a directed graph, using Johnson's algorithm. Return the minimum of the ceiling and the number of cycles.

Implementation Notes

The ceiling is here to avoid memory overload if there are a lot of cycles in the graph. Default value is 10^6, but it can be higher or lower. You can use the function maxsimplecycles(dg::DiGraph, byscc::Bool = true) to get an idea of the theoretical maximum number or cycles.

References

source
simplecycleslength(dg::DiGraph, ceiling = 10^6)

Search all cycles of the given directed graph, using Johnson's algorithm, and return a tuple representing the cycle length and the number of cycles.

Implementation Notes

To get an idea of the possible number of cycles, using function maxsimplecycles(dg::DiGraph, byscc::Bool = true) on the directed graph.

If the ceiling is reached (ncycles = ceiling), the output is only a subset of the cycles lengths.

References

source
karp_minimum_cycle_mean(g[, distmx])

Return minimum cycle mean of the directed graph g with optional edge weights contained in distmx.

References

source

Minimum Spanning Trees (MST) Algorithms

A Minimum Spanning Tree (MST) is a subset of the edges of a connected, edge-weighted (un)directed graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.

kruskal_mst(g, distmx=weights(g))

Return a vector of edges representing the minimum spanning tree of a connected, undirected graph g with optional distance matrix distmx using Kruskal's algorithm.

source
LightGraphs.prim_mstFunction.
prim_mst(g, distmx=weights(g))

Return a vector of edges representing the minimum spanning tree of a connected, undirected graph g with optional distance matrix distmx using Prim's algorithm. Return a vector of edges.

source

Shortest-Path Algorithms

General properties of shortest path algorithms

LightGraphs.a_starFunction.
a_star(g, s, t[, distmx][, heuristic])

Return a vector of edges comprising the shortest path between vertices s and t using the A* search algorithm. An optional heuristic function and edge distance matrix may be supplied. If missing, the distance matrix is set to LightGraphs.DefaultDistance and the heuristic is set to n -> 0.

source
dijkstra_shortest_paths(g, srcs, distmx=weights(g));

Perform Dijkstra's algorithm on a graph, computing shortest distances between srcs and all other vertices. Return a LightGraphs.DijkstraState that contains various traversal information.

Optional Arguments

predecessors of a given vertex.

Performance

Use a matrix type for distmx that is implemented in row-major matrix format for better run-time. Eg. Set the type of distmx to Transpose{Int64, SparseMatrixCSC{Int64,Int64}} instead of SparseMatrixCSC{Int64,Int64}.

source
bellman_ford_shortest_paths(g, s, distmx=weights(g))
bellman_ford_shortest_paths(g, ss, distmx=weights(g))

Compute shortest paths between a source s (or list of sources ss) and all other nodes in graph g using the Bellman-Ford algorithm. Return a LightGraphs.BellmanFordState with relevant traversal information.

source
floyd_warshall_shortest_paths(g, distmx=weights(g); parallel=false)

Use the Floyd-Warshall algorithm to compute the shortest paths between all pairs of vertices in graph g using an optional distance matrix distmx. Return a LightGraphs.FloydWarshallState with relevant traversal information.

Performance

Space complexity is on the order of $\mathcal{O}(|V|^2)$.

Optional Arguments

  • parallel=false: If true, the algorithm runs in parallel.
source
yen_k_shortest_paths(g, source, target, distmx=weights(g), K=1; maxdist=Inf);

Perform Yen's algorithm on a graph, computing k-shortest distances between source and target other vertices. Return a YenState that contains distances and paths.

source
parallel_multisource_dijkstra_shortest_paths(g, sources=vertices(g), distmx=weights(g))

Compute the shortest paths between all pairs of vertices in graph g by running [dijkstra_shortest_paths] for every vertex and using an optional list of source vertex sources and an optional distance matrix distmx. Return a MultipleDijkstraState with relevant traversal information.

source

Path discovery / enumeration

gdistances(g, source; sort_alg=QuickSort)

Return a vector filled with the geodesic distances of vertices in g from source. If source is a collection of vertices each element should be unique. For vertices in disconnected components the default distance is typemax(T).

An optional sorting algorithm may be specified (see Performance section).

Performance

gdistances uses QuickSort internally for its default sorting algorithm, since it performs the best of the algorithms built into Julia Base. However, passing a RadixSort (available via SortingAlgorithms.jl) will provide significant performance improvements on larger graphs.

source
gdistances!(g, source, dists; sort_alg=QuickSort)

Fill dists with the geodesic distances of vertices in g from source vertex (or collection of vertices) source. dists should be a vector of length nv(g) filled with typemax(T). Return dists.

For vertices in disconnected components the default distance is typemax(T).

An optional sorting algorithm may be specified (see Performance section).

Performance

gdistances uses QuickSort internally for its default sorting algorithm, since it performs the best of the algorithms built into Julia Base. However, passing a RadixSort (available via SortingAlgorithms.jl) will provide significant performance improvements on larger graphs.

source
enumerate_paths(state[, vs])

Given a path state state of type AbstractPathState, return a vector (indexed by vertex) of the paths between the source vertex used to compute the path state and a single destination vertex, a list of destination vertices, or the entire graph. For multiple destination vertices, each path is represented by a vector of vertices on the path between the source and the destination. Nonexistent paths will be indicated by an empty vector. For single destinations, the path is represented by a single vector of vertices, and will be length 0 if the path does not exist.

Implementation Notes

For Floyd-Warshall path states, please note that the output is a bit different, since this algorithm calculates all shortest paths for all pairs of vertices: enumerate_paths(state) will return a vector (indexed by source vertex) of vectors (indexed by destination vertex) of paths. enumerate_paths(state, v) will return a vector (indexed by destination vertex) of paths from source v to all other vertices. In addition, enumerate_paths(state, v, d) will return a vector representing the path from vertex v to vertex d.

source

Path States

All path states derive from

AbstractPathState

An abstract type that provides information from shortest paths calculations.

source

The dijkstra_shortest_paths, floyd_warshall_shortest_paths, bellman_ford_shortest_paths, and yen_shortest_paths functions return states that contain various information about the graph learned during traversal.

struct DijkstraState{T, U}

An AbstractPathState designed for Dijkstra shortest-paths calculations.

source
BellmanFordState{T, U}

An AbstractPathState designed for Bellman-Ford shortest-paths calculations.

source
struct FloydWarshallState{T, U}

An AbstractPathState designed for Floyd-Warshall shortest-paths calculations.

source
struct YenState{T, U}

Designed for yen k-shortest-paths calculations.

source

The above state types (with the exception of YenState) have the following common information, accessible via the type:

.dists Holds a vector of distances computed, indexed by source vertex.

.parents Holds a vector of parents of each source vertex. The parent of a source vertex is always 0.

(YenState substitutes .paths for .parents.)

In addition, the following information may be populated with the appropriate arguments to dijkstra_shortest_paths:

.predecessors Holds a vector, indexed by vertex, of all the predecessors discovered during shortest-path calculations. This keeps track of all parents when there are multiple shortest paths available from the source.

.pathcounts Holds a vector, indexed by vertex, of the path counts discovered during traversal. This equals the length of each subvector in the .predecessors output above.