Reference: EPI 5.1
Description:
The quicksort algorithm would have a bad performance when the array contains many duplicates because the subarrays may differ greatly in size.
One solution is reordering or shuffling the array before sorting. Another solution is known as Dutch national flag partitioning since the Dutch national flag consists of three horizontal bands (red, white, blue).
Let say an example [0, 1, 2, 0, 2, 1, 1]
, and the pivot index is 3 (A[3] = 0
), so the correct partitioning is [0, 0, 1, 2, 2, 1, 1]
. The elements larger than 0 should be after 0, but the ordering does not matter.
If the pivot index is 1 (A[1] = 1
), so the correct partitioning is [0, 0, 1, 1, 1, 2, 2]
.
We have three groups: smaller
, equal
, larger
. To simply the problem we have a enum class as follows:
1 | public enum Color { RED, WHITE, BLUE } |
We use RED
to denote elements in the smaller
group. They could be duplicates or different elements, but that is okay as long as they are less than the WHITE
elements in the equal
group.
Problem
Write a problem that takes an array
A
and an indexi
intoA
, and rearranges the elements such that all elements less thanA[i]
(the “pivot”) appear first, followed by elements equal to the pivot, followed by elements greater than the pivot.
Note: The problem is trivial to solve with O(N) extra space. We can put elements into three arrays and then merge them.
Analysis
Brute-Force
We can avoid the extra space. In the first pass, we iterate through the array and seek an element smaller than the pivot. As soon as we find it, we move it to the subarray of smaller elements via swapping. In the second pass, we do the same thing but moving the larger element to the subarray of larger elements.
Note: In the second pass, we can stop when hitting an element smaller than the pivot, because smaller elements should all lie on the left part of the array.
1 | public static void dutchFlagPartition(int pivotIndex, List<Color> A) { |
Time: O(N2)
Space: O(1)
The worst case occurs when the pivot is in the middle and all elements on its left are larger and the elements on its right are smaller, for example [5, 6, 9, 7, <5>, 3, 2, 4, 1]
. As for the first element (i = 0
), j
has to go through half of the array, which takes O(N/2) time, to find element 3
.
Two Passes
We can improve the time complexity by making a single pass to move smaller elements to the beginning and by making another pass starting from the end to move larger elements to the end.
Alternatively, in the second pass, we can instead move elements which are equal to the pivot, so we don’t need an extra variable to store the current index.
Note: The element in the original pivotIndex
may change when the pivot element is swapped. It does not affect the result since we have backed up the pivot at the beginning of the program, so the value of pivot won’t change.
1 | // Example: 2 6 3 7 <5> 3 2 7 1 |
Time: O(N)
Space: O(1)
One Pass
Similar to the two-pass approach, the difference is that it performs classification into elements less than
, equal to
, and greater than
the pivot in a single pass. We can do this by maintaining four subarrays: smaller
, equal
, unclassified
, larger
. In the code, we use three pointers smaller
, equal
, larger
to denote the elements in the four subarrays.
The explanation is in the comments. Notice the equal
always points to the incoming unclassified element. And the elements before smaller
and the elements after larger
are finalized.
Example:
1 | 2 6 3 7 <5> 3 2 7 1 |
Note: This is my solution based on larger = A.size() - 1
and while (equal <= larger)
. The solution in the book is based on larger = A.size()
and while (equal < larger)
. They are equivalent.
1 | public static void dutchFlagPartition(int pivotIndex, List<Color> A) { |
Time: O(N) since each time at least one of the pointers is moved and each subarray is either expended by one or shrunk by one.
Space: O(1)
Variant 1
Bring duplicate elements all together, but the ordering is not required.
1 | public static void variant1(List<Color> A) { |
Time: O(N)
Space: O(1)