Introduction

In graph theory, particularly in the context of directed graphs, identifying groups of nodes that form closed loops is a foundational task. These groups, known as strongly connected components (SCCs), are essential in areas such as compiler optimization, social network analysis, and distributed systems.

A strongly connected component is a maximal subgraph in which every pair of nodes is reachable from one another. Detecting SCCs efficiently becomes crucial when analyzing large-scale graphs.

Tarjan’s Algorithm, developed by Robert Tarjan in 1972, is a classic and highly efficient method for finding SCCs using a single pass of depth-first search (DFS). It is both time- and space-efficient and remains widely used in modern graph processing systems.

What is Tarjan’s Algorithm?

Tarjan's Algorithm is a DFS-based algorithm that computes all the strongly connected components of a directed graph in linear time. The key idea lies in tracking the discovery time and the low-link value of each node.

  • The discovery time (or index) is the order in which nodes are visited in the DFS traversal.

  • The low-link value of a node is the smallest discovery time reachable from that node (including itself and its descendants).

When the discovery time and the low-link value of a node are equal, the node is identified as the root of an SCC.

The algorithm uses a stack to keep track of the current path in the DFS and efficiently extracts SCCs by popping nodes when an SCC root is found.



How It Works

fig.tarjan's algorithm
(source-Google Images)

  1. Each unvisited node is explored via DFS.

  2. During traversal:

    • Assign a discovery time and a low-link value.

    • Push the node onto the stack.

    • For each adjacent node, recursively apply DFS (if unvisited) and update the low-link value.

    • If a back edge is found to a node on the stack, the low-link is updated accordingly.

  3. If the node's discovery time equals its low-link value, it is the root of an SCC. Pop all nodes from the stack up to and including this root to form the SCC.

Algorithm

  1. Initialize:

    • index counter.

    • Arrays for idslow-link, and a stack.

    • A boolean array to check if a node is in the current stack.

  2. For each node:

    • If it is unvisited, start a DFS traversal.

    • Update low-link values during DFS.

    • On identifying an SCC root, collect all nodes up to that point.

This approach ensures that each edge and vertex is processed only once, resulting in linear time complexity.

Implementation

public class TarjanSCC {
    private List<List<Integer>> graph;
    private int[] ids, low;
    private boolean[] onStack;
    private Deque<Integer> stack;
    private int id;
    private List<List<Integer>> sccs;

    public TarjanSCC(List<List<Integer>> graph) {
        this.graph = graph;
        int n = graph.size();
        ids = new int[n];
        low = new int[n];
        onStack = new boolean[n];
        stack = new ArrayDeque<>();
        sccs = new ArrayList<>();
        Arrays.fill(ids, -1);

        for (int i = 0; i < n; i++) {
            if (ids[i] == -1) {
                dfs(i);
            }
        }
    }

    private void dfs(int at) {
        stack.push(at);
        onStack[at] = true;
        ids[at] = low[at] = id++;

        for (int to : graph.get(at)) {
            if (ids[to] == -1) {
                dfs(to);
                low[at] = Math.min(low[at], low[to]);
            } else if (onStack[to]) {
                low[at] = Math.min(low[at], ids[to]);
            }
        }

        if (ids[at] == low[at]) {
            List<Integer> scc = new ArrayList<>();
            while (true) {
                int node = stack.pop();
                onStack[node] = false;
                scc.add(node);
                if (node == at) break;
            }
            sccs.add(scc);
        }
    }

    public List<List<Integer>> getSCCs() {
        return sccs;
    }

Recurrence Relation

The recurrence relation for the DFS traversal in Tarjan’s Algorithm is:

T(n)=T(n1)+O(1)

This arises because:

  • Each vertex is visited exactly once.

  • Each edge is considered only once during DFS.

This leads to an overall time complexity of:

T(V,E)=O(V+E)

Time and Space Complexity

 
Measure        Complexity
  Time:O(V + E)
Space:O(V)
  • V is the number of vertices.

  • E is the number of edges.

  • Space usage is linear due to arrays and the recursion stack.

Advantages Over Other Algorithms

Tarjan's Algorithm is often compared with Kosaraju’s Algorithm, another well-known method for finding SCCs.

CriterionTarjan's Algorithm   Kosaraju’s Algorithm
DFS Passes Required                1                     2
Stack Usage     Yes (internal DFS)     Explicit stack for post-order
Efficiency in Practice             High            Moderate
Better for  Online/streaming graphs     Static graph processing

In practical applications, Tarjan’s algorithm tends to be more efficient, especially in environments where space and performance are critical.

Conclusion

Tarjan's Algorithm is a foundational technique for detecting strongly connected components in directed graphs. Its linear complexity, elegance, and applicability make it a preferred method in theoretical and applied computer science. Whether it is in analyzing control flow in compilers, understanding communities in social networks, or simplifying complex dependency graphs, Tarjan’s Algorithm provides a reliable and efficient solution.

References

  1. Tarjan, R. E. (1972). Depth-first search and linear graph algorithms. SIAM Journal on Computing.

  2. Pearce, D. J., & Kelly, P. H. (2006). A dynamic algorithm for topologically sorting directed acyclic graphs. ACM Journal of Experimental Algorithmics.

Comments