Topological Sort
Definition
Suppose we have a collection of different tasks or activities, some of which must happen before another. How do we find the way to sort the tasks such that for each task $v$, the tasks that happen before
$v$ come earlier in our sort?
In the graph, each node represents a task. An edge $v\rightarrow w$ indicates that $v$ must happen before $w$. Now our original problem is reduced
to finding a topological sort
.
Topological Sort: An ordering of a graph’s vertices such that for every directed edge $u \rightarrow v$, $u$ comes before $v$ in the ordering.
Valid orderings including:
D, B, A, E, C ,F
E, D, C, B, A, F
Note: If I purely run DFS, the output is D, B, A, F, E, C, E, F
. It is not correct.
Certain Types of Graphs (DAG):
Also, it only makes sense to topological sort some certain types of graphs
. Here is an invalid example:
Since we have a cycle, topological sort is not defined. Also, it includes undirected graphs (each edge corresponds to a cycle):
All in all, topological sorts only apply to directed, acyclic graphs, or DAGs
.
DAG (/dag/) is a very important type of graphs.
Topological Sort: An ordering of DAG
‘s vertices such that for every directed edge $u \rightarrow v$, $u$ comes before $v$ in the ordering. Each DAG
corresponds to a Topological Sort
.
For any topological ordering, you can redraw the graph so that the vertices are all in one line. Thus, topological sort is sometimes called a linearization
of the graph.
In the above DAG, $D$ or $E$ must be the source
, $F$ or $C$ must be the end
, also called sink
.
TS Algorithm
Topological Sort Algorithm:
- Perform a
DFS
traversal from every vertex (any order) in the graph,not
clearing markings between traversals (means not traversing marked vertices). - Record DFS
post-order
along the way (add to list when backtracking). - Topological ordering is the
reverse
of the post-order.
If we use DFS only, the ordering is wrong. For example:
D, C, B, A, F, E
(C comes before E)
Why it works:
Each vertex $v$ gets added to the end of the post-order list only after considering all descendants
of $v$, which means all of them are already on the list. For example, if I want to add vertex $D$, it happens only after I’ve added $C$ and other descendants ($B$, $A$, $F$) to the list. It is guaranteed in the process.
Runtime: $O(V + E)$
Pseudocode
1 | topological(DAG): |
Using BFS:
Reference: Using BFS for topological sort
How to calculate the 0-in-degree vertices from all vertices?
- Adj Matrix: Foreach columns and store 0-in-degree vertices.
- Adj List & Edge List: Foreach all vertices, count, and store 0-in-degree vertices.
1 | topological(DAG): |
Or, we can modify it a little bit (change the output ordering of zero in-degree vertices):
1 | Init zeroInDegreeList // LinkedList is great |
Alternative (From GitBook):
DAG-LPT
Consider the longest paths problem on DAGs:
- From a new copy of the graph $G’$ with signs of all edge weights flipped.
- Run
DAG-SPT
on $G’$ yielding result $X$. - Flip signs of all values in $X$.distTo.
- $X$.edgeTo is already correct.
There is no real need to prove anything or show demos.
- We know DAG-SPT works on graphs with negative edge weights to get the shortest paths.
- Assuming weights are $w_1, w_2, \ldots, w_E$,
we know that $-(-a + -b + -c + -d) = a + b + c + d$.
DAG-SPT
In a DAG, we can use Dijkstra’s algorithm to find the SPT. It’s simpler: DAG-SPT Algorithm
.
Walkthrough Demo: link
One Simple Idea: Visit vertices in topological order.
- When we visit a vertex: Relax all of its going edges.
- Each vertex is visited only when all possible info about it has been used.
Note: DAG-SPT can solve SPT in DAGs with negative weights.
Dijkstra’s can’t handle the negative weight and the result is incorrect.
Walkthrough Demo: link
Runtime: $\Theta(E + V)$
- Step 1: Have to do a topological sort, which is $\Theta(E + V)$ time.
- Step 2: Initialize some arrays of size $V$. In total, takes $\Theta(V)$ time.
- Step 3: Each edge gets relaxed exactly once, so $\Theta(E)$.
DP In DAG:
The DAG-SPT algorithm can be thought of as solving increasingly large subproblems:
- Distance from source to source is very easy, and is just zero.
- We then tackle distances to vertices that are a bit farther to the right.
- We repeat this until we get all the way to the end of the graph.
Dynamic Programming
Dynamic Programming is a terrible name
for a simple and powerful idea for solving “big problems”.
Identify
a collection of subproblems.Solve subproblems
one by one, working from smallest to largest.- Use the answers to the
smaller problems
to help solve thelarger ones
.
Why is the name so bad? It was by design! (link)
by design = as a result of a plan; intentionally
LIS / LLIS
Problem: Given a sequence of numbers, find the longest increasing subsequence (LIS).
Example:
- Sequence:
6, 2, 8, 4, 5, 7
- The LIS:
2, 4, 5, 7
(LLIS = 4)
Related Problem: Find the length of the longest increasing subsequence (LLIS
).
LLIS & DAG
Conversion (DAG)
Goal: Describe the LLIS
problem in terms of a DAG problem. What property of the DAG are we looking for?
Property: The longest path from any vertex to any vertex in the entire DAG (actually plus one).
So our goal is boiled down to design an algorithm for finding the length of the longest path in a DAG.
Clever Idea (-1)
Clever Idea: Using a Negative Weight
Graph to Find Longest paths.
- Create copy with edge weights set to $-1$. ==> $\Theta(E+V)$ (applied to adjacency list)
- For each vertex $v$: ==> $V$ times
- Run
DAG-SPT
Algorithm from $v$ to the end. (Recall this is a DP algorithm) ==> $\Theta(E+V)$- Stops when a vertex is not
reachable
from $v$.
- Stops when a vertex is not
- Let
LPLS[v] = abs(min(distTo))
, i.e. length of the longest path starting at $v$. - Each time you need to reset
distTo
andedgeTo
.
- Run
- Return
max(LPLS)
. ==> $\Theta(V)$
In the worst case, total runtime is $\Theta(EV + V^2)$.
Note: In the worst case:
$\Theta(N) = \Theta(V)$
$\Theta(E) = \Theta(V^2) = \Theta(N^2)$. (Each vertex connects other vertices)
Demo Example
Walkthrough Demo: link (Page 23)
Initialized all edges with $-1$:
DAG-SPT with s = 8
:
Note: Stops at vertex 9, since vertex 2 is not reachable from vertex 8.
DAG-SPT with s = 2
:
Note: Go through all vertices since vertex 9, 4, 5, 7, 3 are all reachable.
The final state of the algorithm:
Reduction
Given a sequence of integers, we transformed our problem into a problem on DAGs. This process is called reduction
. We “reduced” LLIS to $N$ solutions of the longest paths problem on DAGs.
Other informal examples:
- Reduce
illuminating a room
toflipping a light switch
. - Reduce
getting to work
toriding BART
. - Reduce
8puzzle
toA*
.
Using DP
Redundancy
In effect, we are solving a bunch of subproblems in the wrong order
.
The process we described was redundant
. Example: DAG-SPT for s = 4
is a subproblem
of DAG-SPT for s = 2
, but we ran both in their entirety.
New Subproblem
Usually, finding a “right” subproblem of DP is very hard.
Rather than choosing where it starts, we can choose the following subproblem for LLIS:
- Length of the Longest Subsequence
Ending At
(LLSEA)
Examples
Let’s see some examples.
$Q(2) = \text{max}(Q(0) + 1, Q(1) + 1) = 2$
$Q(3) = Q(1) + 1 = 2$
Recall that $Q(K)$ is the solution to LLSEA(K)
(they are equivalent).
$Q(5) = \text{max}{ Q(1) + 1, Q(3) + 1, Q(4) + 1 } = Q(5) + 1 = 4$
Fact #1: Memo
Fact #1: We can use results for small $Q$ to compute results for larger $Q$.
The answer is $62$. Because of all the increasing subsequences that have an edge pointing to vertex 104
, the longest one was of length $61$ (ending at vertex 100
).
The $Q$ values are memorized answers to smaller subproblems.
- The LLIS ending at
vertex 417
is the max of $1 + Q(100)$.
Fact #2: Solvable
Fact #2: We can use our $Q$ values to solve LLIS of the entire sequence.
$Q(K)$ is the LLSEA(K)
. Thus, LLIS is just the maximum of all $Q$ values.
DP Solution
Note: The slide’s solution has some bugs. $Q$ should be initialized with $1$, otherwise $Q[K] = Q[L] + 1$ would make $Q$ values $-\infty + 1$.
Setting them as $1$ is sensible since each number itself is already a 1-length increasing subsequence.
- Create $Q$, an array of length $N$. Set $Q[0] = 1$, and $Q[K] = -\infty$ for rest of the vertices. (Check the Note above. It should be $1$)
- Create a DAG with $N$ vertices. Draw an edge between vertices if left vertex is less than right vertex. (Optional)
For each $K = 1,\ \ldots, N$:
- Then for each $L = 0,\ \ldots, K - 1$:
- If there exists an edge from $L$ to $K$: (or $L < K$)
- If $Q[L] + 1 > Q[K]$ (update when necessary)
- Set $Q[K] = Q[L] + 1$
- If $Q[L] + 1 > Q[K]$ (update when necessary)
- If there exists an edge from $L$ to $K$: (or $L < K$)
Note: The DAG is not necessary. We can remove it and change the following:
- If there
exists an edge
from $L$ to $K$
to:
- If item $L$ is
less than
item $K$
1 | public static void main(String[] args) { |
Runtime:
Note: The runtimes are worst case.
- Initialization: Creating $Q$ array and DAG is $\Theta(N)$ and $\Theta(N^2)$ respectively.
- Execution: Nested for loops with constant time
if
statement take $\Theta(N^2)$.
Summary
Independent Set Problem
Reference: link (Page 50)
Idea: 3SAT Problem
can reduce to Independent Set Problem
.