Comparing Order Statistics Algorithm and Min Heap for Finding the Kth Smallest Element

When dealing with large datasets or lists of numbers, determining the kth smallest element efficiently is a common problem. In this article, we’ll explore and compare two popular approaches: the Order Statistics algorithm and the Min Heap data structure, implemented in C++, to find the kth smallest element.

Order Statistics Algorithm

The Order Statistics algorithm, based on the concept of selection, involves finding the kth smallest element in an unsorted array. It works by partitioning the array around a chosen pivot element and recursively narrowing down the search space. It is explained in detail here – Order Statistics | Selection Algorithm.

Min Heap Approach

Another approach to find the kth smallest element involves using a Min Heap data structure. A Min Heap is a binary tree where the parent node is smaller than its children. By building a Min Heap from the array, we can extract the smallest element (k – 1) times to find the kth smallest element. This approach works by using a priority queue (Min Heap) to efficiently find the kth smallest element by popping the top k – 1 elements and returning the then top element. It is explained in detail here –

You may also find this problem helpful:

How to find k nearest neighbours of a given point from a set of points in a plane?

Comparing the Approaches

Both the Order Statistics algorithm and the Min Heap approach can find the kth smallest element efficiently, but they differ in their implementation and performance characteristics.

Order Statistics Algorithm: Ideal for scenarios with an unsorted array. Provides a deterministic approach and has an average time complexity of O(n), but it can degrade to O(n²) in the worst-case scenario. Worst-case occurs when pivot selection consistently leads to unbalancing partitioning ie selecting minimum or maximum element as the pivot.
Min Heap Approach: Suitable for cases where you can afford additional memory space for a Min Heap. Offers an average time complexity of O(n + klogn) for building the heap and extracting the kth element.

Choosing between these approaches depends on factors such as the nature of your dataset, memory constraints, and performance requirements. However in most cases, Order Statistics Algorithm will work more efficiently because of its better average time complexity and random nature of Selection procedure.

Code Implementation

//
//  main.cpp
//  Order Statistics & Min Heap Comparison
//
//  Created by Himanshu on 21/11/23.
//
#include <iostream>
#include <algorithm>
#include <ctime>
#include <random>
#include <vector>
#include <chrono>
#include <queue>
#include <unordered_set>
using namespace std;

random_device rd;  // Obtain a random seed from the hardware
mt19937 gen(rd()); // Standard mersenne_twister_engine seeded with rd()

// Random number generator
int choosePivot(int low, int high) {
    uniform_int_distribution<int> distrib(low, high); // Define the range
    return distrib(gen); // Generate a random number within the range
}

// Order Statistics Algorithm
int partition(int arr[], int low, int high) {
    int pivotIndex = choosePivot(low, high);
    int pivot = arr[pivotIndex];
    
    swap(arr[pivotIndex], arr[high]);
    
    int i = low - 1;
    for (int j = low; j < high; j++) {
        if (arr[j] <= pivot) {
            i++;
            swap(arr[i], arr[j]);
        }
    }
    
    swap(arr[i + 1], arr[high]);

    return i + 1;
}

int kthSmallest(int arr[], int low, int high, int k) {
    
    if (k > 0 && k <= high - low + 1) {
        int pivot = partition(arr, low, high);

        if (pivot - low == k - 1) {
            return arr[pivot];
        } else if (pivot - low > k - 1) {
            return kthSmallest(arr, low, pivot - 1, k);
        } else {
            // Adjusted value of 'k' for new value of low (ie. pivot + 1)
            return kthSmallest(arr, pivot + 1, high, k  + low - (pivot + 1));
        }

    }
    
    return -1; // Return -1 for invalid input
}

// Min Heap Approach
int kthSmallestUsingHeap(int arr[], int n, int k) {

    priority_queue<int, vector<int>, greater<int>> minHeap(arr, arr + n);

    for (int i = 1; i < k; i++) {
        minHeap.pop();
    }

    return minHeap.top();
}
    
int main() {
    const int listSize = 100000; // Adjust the list size as needed
    int arr[listSize];
    
    unordered_set<int> uniqueNumbers;

    // Generate a random list of sequential numbers
    srand((unsigned int) time(NULL));

    for (int i = 0; i < listSize; i++) {
        int randomNumber;

        do {
            randomNumber = rand() % 100000; // Generate a random number between 0 and 99999
        } while (!uniqueNumbers.insert(randomNumber).second); // Insert and check uniqueness

        arr[i] = randomNumber;
    }

    int kValues[] = {100, 500, 1000, 5000, 10000, 50000}; // Different values of k to test
    
    for (int k : kValues) {
        
        // Measure execution time for Order Statistics algorithm
        auto start = chrono::high_resolution_clock::now();
        int result1 = kthSmallest(arr, 0, listSize - 1, k);
        auto end = chrono::high_resolution_clock::now();
        chrono::duration<double> duration1 = end - start;

        // Measure execution time for Min Heap approach
        start = chrono::high_resolution_clock::now();
        int result2 = kthSmallestUsingHeap(arr, listSize, k);
        end = chrono::high_resolution_clock::now();
        chrono::duration<double> duration2 = end - start;
        
        // Output result
        cout<< k <<"th smallest " << "using Order Statistic algorithm: "<<result1 << endl;
        cout<< k <<"th smallest " << "using Min Heap approach: "<<result2 << endl;
        // Output execution times
        cout << "Time taken for k = " << k << " using Order Statistic algorithm: " << duration1.count() << " seconds\n";
        cout << "Time taken for k = " << k << " using Min Heap approach: " << duration2.count() << " seconds\n\n";
    }

    return 0;
}

Output

100th smallest using Order Statistic algorithm: 99
100th smallest using Min Heap approach: 99
Time taken for k = 100 using Order Statistic algorithm: 0.000244519 seconds
Time taken for k = 100 using Min Heap approach: 0.0010945 seconds

500th smallest using Order Statistic algorithm: 499
500th smallest using Min Heap approach: 499
Time taken for k = 500 using Order Statistic algorithm: 9.0636e-05 seconds
Time taken for k = 500 using Min Heap approach: 0.00109331 seconds

1000th smallest using Order Statistic algorithm: 999
1000th smallest using Min Heap approach: 999
Time taken for k = 1000 using Order Statistic algorithm: 0.000120848 seconds
Time taken for k = 1000 using Min Heap approach: 0.00110461 seconds

5000th smallest using Order Statistic algorithm: 4999
5000th smallest using Min Heap approach: 4999
Time taken for k = 5000 using Order Statistic algorithm: 0.000496643 seconds
Time taken for k = 5000 using Min Heap approach: 0.00143438 seconds

10000th smallest using Order Statistic algorithm: 9999
10000th smallest using Min Heap approach: 9999
Time taken for k = 10000 using Order Statistic algorithm: 0.000290027 seconds
Time taken for k = 10000 using Min Heap approach: 0.00191804 seconds

50000th smallest using Order Statistic algorithm: 49999
50000th smallest using Min Heap approach: 49999
Time taken for k = 50000 using Order Statistic algorithm: 0.000480221 seconds
Time taken for k = 50000 using Min Heap approach: 0.00532426 seconds

Here’s a working example: Ideone

In conclusion, both methods provide effective solutions for finding the kth smallest element in a list of numbers. Understanding their strengths and trade-offs will help you select the most suitable approach for your specific use case.

Order Statistics Algorithm

Min Heap Approach

Comparing the Approaches

Share this:

Leave a Reply Cancel reply

Leave a Reply