Order Statistics | Selection Algorithm

“Order Statistics” is a concept discussed in the book “Introduction to Algorithms” by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. It pertains to finding the i^th order statistics of a set of n elements. i^th order statistics is the i^th smallest element in a set of n elements.

Selection Problem

You are given a set of n distinct numbers and a number k such that 1 <= k <= n. Find the element x such that x is the k^th smallest number or x is greater than exactly k-1 elements of the set.
The selection problem can be solved in O(nlogn) time, by sorting the numbers using Merge Sort or Heap Sort and then returning the i^th indexed element.

Selection in approximate Linear Time O(n)

The Selection in Expected Linear Time (Randomized-Select) algorithm, also known as Quickselect, is an efficient approach for solving the Selection Problem. It works similar to the Quick Sort algorithm by partitioning the array around a pivot element.

Also read: How to find k nearest neighbours of a given point from a set of points in a plane?

Randomized-Select Algorithm

The idea is to partition the input array around a pivot recursively. But unlike Quick Sort, which recursively processes both sides of the partition, Randomized-Select only works on one side of the partition. As a result of the randomly selecting the pivot and selectively partitioning the array, the expected time complexity of Randomized-Select Algorithm is O(n).

Worst Case Time Complexity

The worst-case time complexity of the Order Statistics algorithm can degrade to O(n²). This occurs when the selected pivot leads to unbalanced partitioning, particularly if the pivot is consistently selected poorly. For instance, if the selected pivot is always the smallest or largest element, and the input array is already sorted or nearly sorted; then the partitioning process would not significantly reduce the size of the subproblems. And this would result in recursive calls with subarrays of sizes closer to n (size of the original array), leading to O(n²) time complexity. However, in most scenarios this doesn’t occur much frequently.

Also read:

Comparing Order Statistic Algorithm and Min Heap for Finding the Kth Smallest Element

Explanation

Choose a pivot element from the array. This can be done randomly or by using a specific strategy (e.g., choosing the median of medians as the pivot).
Partition the array around the pivot, so that all elements smaller than the pivot come before it, and all elements greater than the pivot come after it. This step is similar to the partitioning step in Quick Sort.
Check the position of the pivot after partitioning. If the pivot is at index k, then it is the k^th order statistic and we can return it.
If the pivot is at an index greater than k, recursively apply the selection algorithm to the left subarray (elements smaller than the pivot) to find the k^th order statistic.
If the pivot is at an index smaller than k, recursively apply the selection algorithm to the right subarray (elements greater than the pivot) to find the k^th order statistic.

By recursively partitioning the array and narrowing down the search range based on the pivot position, the selection algorithm can efficiently find the k^th order statistic in linear time complexity.
It’s worth noting that there are variations of the selection algorithm, such as the randomized selection algorithm and the median of medians algorithm, which provide better worst-case time complexity guarantees. These variations handle cases where the pivot choice may lead to poor partitioning and unbalanced subarrays.

Pseudocode

function select(arr, left, right, k):
    if left == right:
        return arr[left]
    
    pivotIndex = choosePivot(arr, left, right)
    pivotIndex = partition(arr, left, right, pivotIndex)
    
    if k == pivotIndex:
        return arr[k]
    else if k < pivotIndex:
        return select(arr, left, pivotIndex - 1, k)
    else:
        return select(arr, pivotIndex + 1, right, k)

function partition(arr, left, right, pivotIndex):
    pivotValue = arr[pivotIndex]
    swap(arr, pivotIndex, right)
    storeIndex = left
    
    for i = left to right - 1:
        if arr[i] < pivotValue:
            swap(arr, i, storeIndex)
            storeIndex++
    
    swap(arr, storeIndex, right)
    return storeIndex

function choosePivot(arr, left, right):
    return random value between left and right

Java Code Implementation

import java.util.Arrays;
public class KthOrderStatistics {

    public static int findKthSmallest (int[] array, int k) {

        if (array == null || array.length == 0 || k <= 0 || k > array.length) {
            throw new IllegalArgumentException("Invalid input");
        }

        return select(array, 0, array.length - 1, k - 1);
    }

    private static int select (int[] array, int start, int end, int k) {

        if (start == end) {
            return array[start];
        }

        int pivotIndex = partition(array, start, end);

        if (k == pivotIndex) {
            return array[k];
        } else if (k < pivotIndex) {
            return select(array, start, pivotIndex - 1, k);
        } else {
            return select(array, pivotIndex + 1, end, k);
        }

    }

    private static int partition (int[] array, int start, int end) {
        //Selecting end index as the pivotIndex, 
        //we could also select a random index between [start, end]  
        int pivot = array[end];
        int i = start - 1;

        for (int j = start; j < end; j++) {
            if (array[j] <= pivot) {
                i++;
                swap(array, i, j);
            }
        }

        swap(array, i + 1, end);
        return i + 1;
    }

    private static void swap (int[] array, int i, int j) {
        int temp = array[i];
        array[i] = array[j];
        array[j] = temp;
    }

    public static void main (String[] args) {
        int[] array = { 7, 10, 4, 3, 20, 15 };
        int k = 3;
        int kthSmallest = findKthSmallest(array, k);
        System.out.println("The " + k + "th smallest element is: " + kthSmallest);
    }

}

Output

The 3th smallest element is: 7

Here’s a working example: Kth Smallest

Selection Problem

Selection in approximate Linear Time O(n)

Randomized-Select Algorithm

Worst Case Time Complexity

Share this:

Leave a Reply Cancel reply

Leave a Reply