In this post, we will compare 4 efficient sorting algorithms namely, Heapsort, Merge Sort, Quick Sort, and Timsort, each having an average time complexity of O(n logn). We will discuss their time and space complexity, analyze their performance using C++, and examine their use cases.
Efficient Sorting Algorithms
Efficient sorting algorithms are designed to perform well on large datasets by minimizing the number of comparisons and data movements. Unlike simple algorithms like Bubble Sort and Insertion Sort, which have time complexity of O(n2) in the worst case, efficient sorting algorithms typically have a time complexity of O(n logn). These algorithms are particularly useful for applications where performance and memory usage are critical factors.
Heapsort, Merge Sort, Quick Sort, and Timsort have already been discussed in detail in these posts:
- Heap Data structure | Heap Sort in C++
- Why is Quicksort better than Merge Sort?
- Quicksort worst-case and averages-case analysis
- Timsort: The Hybrid Sorting Algorithm
Understanding Heapsort, Merge Sort, Quick Sort, and Timsort
Heapsort
Heapsort builds a binary heap from the input data and repeatedly extracts the largest element (or smallest, depending on the heap type) to form the sorted array.
Key Features:
- It is an in-place algorithm, meaning it does not require additional space for sorting, except for the space required to build the heap.
- Its time complexity is always O(n logn).
- Unlike Quick Sort, it guarantees O(n logn) in both the worst and best cases.
Merge Sort
Merge Sort is a divide-and-conquer algorithm that recursively splits the array into halves, sorts each half, and then merges the sorted halves back together. The process continues until the array is fully sorted.
Key Features:
- It always divides the array in half, resulting in a time complexity of O(n logn).
- It is stable, meaning it preserves the order of equal elements.
- Merge Sort requires additional space for the temporary arrays used in the merging process.
Quick Sort
Quick Sort is another divide-and-conquer algorithm that selects a pivot element and partitions the array around the pivot, ensuring that all elements less than the pivot come before it and all greater elements come after. It recursively applies this process to the subarrays.
Key Features:
- Quick Sort has an average time complexity of O(n logn) but may degrade to O(n2) in the worst case (though this can be mitigated using random pivots).
- It is an in-place algorithm, making it memory efficient.
- Quick Sort is widely regarded as the fastest sorting algorithm in practice for most datasets.
Timsort
Timsort is a hybrid sorting algorithm that combines the features of Insertion Sort and Merge Sort. It is optimized for real-world datasets that often have partially sorted segments.
Key Features:
- Timsort works by dividing the array into runs (small sorted segments) and then merging these runs together using Merge Sort.
- It is stable and highly efficient for datasets with many already-sorted elements.
- Timsort has a guaranteed time complexity of O(n logn).
Time and Space Complexity
Let’s analyze the time and space complexity of the four sorting algorithms discussed above.
Understanding Time Complexity
- Best Case: The number of comparisons made when the input is already sorted.
- Worst Case: The number of comparisons made when the input is in the worst possible order.
- Average Case: The number of comparisons for a typical (random) input.
Time Complexity
- Heapsort has O(n logn) time complexity in the worst case, but it is not stable.
- Merge Sort has consistent O(n logn) performance but requires O(n) additional memory for merging.
- Quick Sort generally has O(n logn) time complexity but may degrade to O(n2) in the worst case (when pivot is consistently selected as the last or the first element).
- Timsort is optimized for real-world datasets and can achieve O(n) performance in best-case scenarios with partially sorted data.
Space Complexity
- Heapsort is in-place and has O(1) space complexity. It sorts the array in place using the array as the heap.
- Merge Sort has O(n) space complexity because it requires additional arrays to store sorted parts for merging.
- Quick Sort is memory efficient with only O(logn) space complexity, required due to recursion (recursive stack usage).
- Timsort is stable, making it suitable for real-world data. But like Merge Sort, has O(n) space complexity for merging runs and temporary arrays.
Comparing Execution Time
To understand how these sorting algorithms perform, let’s implement them in C++ and compare their execution time for various types of input arrays: sorted in ascending order, reverse sorted, and randomly shuffled arrays.
Note: The results obtained may not accurately reflect exact execution times due to factors such as platform differences, caching mechanisms, and system optimizations.
Code Implementation
#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include <random>
#include <iomanip> // for std::fixed and std::setprecision
using namespace std;
using namespace chrono;
// Merge Sort
void merge(vector<int>& arr, int left, int mid, int right) {
int n1 = mid - left + 1;
int n2 = right - mid;
vector<int> leftArr(n1), rightArr(n2);
for (int i = 0; i < n1; i++) leftArr[i] = arr[left + i];
for (int i = 0; i < n2; i++) rightArr[i] = arr[mid + 1 + i];
int i = 0, j = 0, k = left;
while (i < n1 && j < n2) arr[k++] = (leftArr[i] <= rightArr[j]) ? leftArr[i++] : rightArr[j++];
while (i < n1) arr[k++] = leftArr[i++];
while (j < n2) arr[k++] = rightArr[j++];
}
void mergeSort(vector<int>& arr, int left, int right) {
if (left < right) {
int mid = left + (right - left) / 2;
mergeSort(arr, left, mid);
mergeSort(arr, mid + 1, right);
merge(arr, left, mid, right);
}
}
// Heapsort
void heapify(vector<int>& arr, int n, int i) {
int largest = i;
int left = 2 * i + 1;
int right = 2 * i + 2;
if (left < n && arr[left] > arr[largest]) largest = left;
if (right < n && arr[right] > arr[largest]) largest = right;
if (largest != i) {
swap(arr[i], arr[largest]);
heapify(arr, n, largest);
}
}
void heapSort(vector<int>& arr, int n) {
for (int i = (n / 2) - 1; i >= 0; i--) heapify(arr, n, i);
for (int i = n - 1; i > 0; i--) {
swap(arr[0], arr[i]);
heapify(arr, i, 0);
}
}
// Quick Sort
int partition(vector<int>& arr, int low, int high) {
int pivotIndex = low + (rand() % (high - low + 1)); // Generate a random index between low and high (inclusive)
swap(arr[pivotIndex], arr[high]); // Move the pivot element to the end
int pivotValue = arr[high]; // Store the pivot value
int i = low - 1;
for (int j = low; j < high; j++) {
if (arr[j] < pivotValue) swap(arr[++i], arr[j]);
}
swap(arr[++i], arr[high]); // Place pivot in its correct position
return i;
}
void quickSort(vector<int>& arr, int low, int high) {
if (low < high) {
int pivot = partition(arr, low, high);
quickSort(arr, low, pivot - 1);
quickSort(arr, pivot + 1, high);
}
}
// Timsort
// This function sorts array using insertion sort for Timsort
void insertionSort(vector<int>& arr, int left, int right) {
for (int i = left + 1; i <= right; i++) {
int temp = arr[i];
int j = i - 1;
while (j >= left && arr[j] > temp) {
arr[j + 1] = arr[j];
j--;
}
arr[j + 1] = temp;
}
}
// This function merges two sorted parts of the array for Timsort
void timMerge(vector<int>& arr, int left, int mid, int right) {
int len1 = mid - left + 1, len2 = right - mid;
vector<int> leftArr(len1), rightArr(len2);
for (int i = 0; i < len1; i++)
leftArr[i] = arr[left + i];
for (int i = 0; i < len2; i++)
rightArr[i] = arr[mid + 1 + i];
int i = 0, j = 0, k = left;
while (i < len1 && j < len2) {
if (leftArr[i] <= rightArr[j]) {
arr[k++] = leftArr[i++];
} else {
arr[k++] = rightArr[j++];
}
}
while (i < len1)
arr[k++] = leftArr[i++];
while (j < len2)
arr[k++] = rightArr[j++];
}
// Timsort implementation
void timSort(vector<int>& arr, int n) {
int minRun = 64;
// Sort individual subarrays of size minRun using insertion sort
for (int i = 0; i < n; i += minRun) {
insertionSort(arr, i, min((i + minRun - 1), (n - 1)));
}
// Start merging from size minRun
for (int size = minRun; size < n; size = 2 * size) {
for (int left = 0; left < n; left += 2 * size) {
int mid = left + size - 1;
int right = min((left + 2 * size - 1), (n - 1));
if (mid < right)
timMerge(arr, left, mid, right);
}
}
}
// Version for sorting algorithms that require low and high
void measureSortingTime(void (*sortFunc)(vector<int>&, int, int), const vector<int>& arr, int low, int high, const string& sortName) {
vector<int> tempArr = arr; // Create a copy of the input vector
auto start = high_resolution_clock::now();
sortFunc(tempArr, low, high); // tempArr is passed as reference to each sort function
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
cout << fixed << setprecision(2); // Set fixed format and precision to 2 decimal places
cout << sortName << " took " << (duration.count()/1000.0) << " ms\n";
}
// Overloaded version for sorting algorithms that only need the array size
void measureSortingTime(void (*sortFunc)(vector<int>&, int), const vector<int>& arr, const string& sortName) {
vector<int> tempArr = arr; // Create a copy of the input vector
auto start = high_resolution_clock::now();
sortFunc(tempArr, (int) tempArr.size()); // tempArr is passed as reference to each sort function
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
cout << fixed << setprecision(2); // Set fixed format and precision to 2 decimal places
cout << sortName << " took " << (duration.count()/1000.0) << " ms\n";
}
void generateTestArrays(vector<int>& ascending, vector<int>& descending, vector<int>& randomArr, int n) {
ascending.clear();
descending.clear();
randomArr.clear();
for (int i = 1; i <= n; i++) {
ascending.push_back(i);
descending.push_back(n - i + 1);
randomArr.push_back(i);
}
default_random_engine rng(42); // 42 as the seed for reproducibility
shuffle(randomArr.begin(), randomArr.end(), rng);
}
int main() {
int n = 10000; // Larger size array for testing
vector<int> ascending, descending, randomArr;
generateTestArrays(ascending, descending, randomArr, n);
cout << "Running tests on array size: " << n << endl;
cout << "\nHeap Sort on different input types:" << endl;
measureSortingTime(heapSort, ascending, "Heap Sort (Ascending)");
measureSortingTime(heapSort, descending, "Heap Sort (Descending)");
measureSortingTime(heapSort, randomArr, "Heap Sort (Random)");
cout << "\nMerge Sort on different input types:" << endl;
measureSortingTime(mergeSort, ascending, 0, n - 1, "Merge Sort (Ascending)");
measureSortingTime(mergeSort, descending, 0, n - 1, "Merge Sort (Descending)");
measureSortingTime(mergeSort, randomArr, 0, n - 1, "Merge Sort (Random)");
cout << "\nQuick Sort on different input types:" << endl;
measureSortingTime(quickSort, ascending, 0, n - 1, "Quick Sort (Ascending)");
measureSortingTime(quickSort, descending, 0, n - 1, "Quick Sort (Descending)");
measureSortingTime(quickSort, randomArr, 0, n - 1, "Quick Sort (Random)");
cout << "\nTim Sort on different input types:" << endl;
measureSortingTime(timSort, ascending, "Timsort (Ascending)");
measureSortingTime(timSort, descending, "Timsort (Descending)");
measureSortingTime(timSort, randomArr, "Timsort (Random)");
return 0;
}
Output
Running tests on array size: 10000
Heap Sort on different input types:
Heap Sort (Ascending) took 2.82 ms
Heap Sort (Descending) took 2.71 ms
Heap Sort (Random) took 3.33 ms
Merge Sort on different input types:
Merge Sort (Ascending) took 10.44 ms
Merge Sort (Descending) took 10.49 ms
Merge Sort (Random) took 11.09 ms
Quick Sort on different input types:
Quick Sort (Ascending) took 1.54 ms
Quick Sort (Descending) took 1.43 ms
Quick Sort (Random) took 1.92 ms
Tim Sort on different input types:
Timsort (Ascending) took 2.96 ms
Timsort (Descending) took 5.40 ms
Timsort (Random) took 4.80 ms
Comparing Memory Usage
Let’s now analyze the memory usage of these sorting algorithms using C++ and compare them for different types of input arrays as discussed above.
However, note that the memory usage values are approximate and serve as rough estimates. The results may not reflect exact memory usage due to platform differences. For precise and detailed memory profiling, it’s recommended to use specialized tools like Valgrind or platform-specific profilers.
Code Implementation
#include <iostream>
#include <vector>
#include <algorithm>
#include <sys/resource.h>
#include <random>
using namespace std;
// Function to get memory usage
size_t getMemoryUsage() {
struct rusage usage;
getrusage(RUSAGE_SELF, &usage);
return usage.ru_maxrss; // Memory usage in kilobytes (KB)
}
// Merge Sort
void merge(vector<int>& arr, int left, int mid, int right) {
int n1 = mid - left + 1;
int n2 = right - mid;
vector<int> leftArr(n1), rightArr(n2);
for (int i = 0; i < n1; i++) leftArr[i] = arr[left + i];
for (int i = 0; i < n2; i++) rightArr[i] = arr[mid + 1 + i];
int i = 0, j = 0, k = left;
while (i < n1 && j < n2) arr[k++] = (leftArr[i] <= rightArr[j]) ? leftArr[i++] : rightArr[j++];
while (i < n1) arr[k++] = leftArr[i++];
while (j < n2) arr[k++] = rightArr[j++];
}
void mergeSort(vector<int>& arr, int left, int right) {
if (left < right) {
int mid = left + (right - left) / 2;
mergeSort(arr, left, mid);
mergeSort(arr, mid + 1, right);
merge(arr, left, mid, right);
}
}
// Heapsort
void heapify(vector<int>& arr, int n, int i) {
int largest = i;
int left = 2 * i + 1;
int right = 2 * i + 2;
if (left < n && arr[left] > arr[largest]) largest = left;
if (right < n && arr[right] > arr[largest]) largest = right;
if (largest != i) {
swap(arr[i], arr[largest]);
heapify(arr, n, largest);
}
}
void heapSort(vector<int>& arr, int n) {
for (int i = (n / 2) - 1; i >= 0; i--) heapify(arr, n, i);
for (int i = n - 1; i >= 0; i--) {
swap(arr[0], arr[i]);
heapify(arr, i, 0);
}
}
// Quick Sort
int partition(vector<int>& arr, int low, int high) {
int pivotIndex = low + (rand() % (high - low + 1)); // Generate a random index between low and high (inclusive)
swap(arr[pivotIndex], arr[high]); // Move the pivot element to the end
int pivotValue = arr[high]; // Store the pivot value
int i = low - 1;
for (int j = low; j < high; j++) {
if (arr[j] < pivotValue) swap(arr[++i], arr[j]);
}
swap(arr[++i], arr[high]); // Place pivot in its correct position
return i;
}
void quickSort(vector<int>& arr, int low, int high) {
if (low < high) {
int pi = partition(arr, low, high);
quickSort(arr, low, pi - 1);
quickSort(arr, pi + 1, high);
}
}
// Timsort
// This function sorts array using insertion sort for Timsort
void insertionSort(vector<int>& arr, int left, int right) {
for (int i = left + 1; i <= right; i++) {
int temp = arr[i];
int j = i - 1;
while (j >= left && arr[j] > temp) {
arr[j + 1] = arr[j];
j--;
}
arr[j + 1] = temp;
}
}
// This function merges two sorted parts of the array for Timsort
void timMerge(vector<int>& arr, int left, int mid, int right) {
int len1 = mid - left + 1, len2 = right - mid;
vector<int> leftArr(len1), rightArr(len2);
for (int i = 0; i < len1; i++)
leftArr[i] = arr[left + i];
for (int i = 0; i < len2; i++)
rightArr[i] = arr[mid + 1 + i];
int i = 0, j = 0, k = left;
while (i < len1 && j < len2) {
if (leftArr[i] <= rightArr[j]) {
arr[k++] = leftArr[i++];
} else {
arr[k++] = rightArr[j++];
}
}
while (i < len1)
arr[k++] = leftArr[i++];
while (j < len2)
arr[k++] = rightArr[j++];
}
// Timsort implementation
void timSort(vector<int>& arr, int n) {
int minRun = 64;
// Sort individual subarrays of size minRun using insertion sort
for (int i = 0; i < n; i += minRun) {
insertionSort(arr, i, min((i + minRun - 1), (n - 1)));
}
// Start merging from size minRun
for (int size = minRun; size < n; size = 2 * size) {
for (int left = 0; left < n; left += 2 * size) {
int mid = left + size - 1;
int right = min((left + 2 * size - 1), (n - 1));
if (mid < right)
timMerge(arr, left, mid, right);
}
}
}
// Version to measure memory usage of the sorting algorithms that require low and high
void measureMemoryUsage(void (*sortFunc)(vector<int>&, int, int), const vector<int>& arr, int low, int high, const string& sortName) {
vector<int> tempArr = arr; // Create a copy of the input vector
size_t initialMemory = getMemoryUsage();
sortFunc(tempArr, low, high);
size_t finalMemory = getMemoryUsage();
cout << sortName << " memory usage: " << (finalMemory - initialMemory) << " KB\n";
}
// Overloaded version for sorting algorithms that only need the array size
void measureMemoryUsage(void (*sortFunc)(vector<int>&, int), const vector<int>& arr, const string& sortName) {
vector<int> tempArr = arr; // Create a copy of the input vector
size_t initialMemory = getMemoryUsage();
sortFunc(tempArr, (int) tempArr.size());
size_t finalMemory = getMemoryUsage();
cout << sortName << " memory usage: " << (finalMemory - initialMemory) << " KB\n";
}
void generateTestArrays(vector<int>& ascending, vector<int>& descending, vector<int>& randomArr, int n) {
ascending.clear();
descending.clear();
randomArr.clear();
for (int i = 1; i <= n; i++) {
ascending.push_back(i);
descending.push_back(n - i + 1);
randomArr.push_back(i);
}
default_random_engine rng(42); // 42 as the seed for reproducibility
shuffle(randomArr.begin(), randomArr.end(), rng);
}
int main() {
int n = 100000; // Larger size array for testing
vector<int> ascending, descending, randomArr;
generateTestArrays(ascending, descending, randomArr, n);
cout << "Running tests on array size: " << n << endl;
cout << "\nHeap Sort on different input types:" << endl;
measureMemoryUsage(heapSort, ascending, "Heap Sort (Ascending)");
measureMemoryUsage(heapSort, descending, "Heap Sort (Descending)");
measureMemoryUsage(heapSort, randomArr, "Heap Sort (Random)");
cout << "\nMerge Sort on different input types:" << endl;
measureMemoryUsage(mergeSort, ascending, 0, n - 1, "Merge Sort (Ascending)");
measureMemoryUsage(mergeSort, descending, 0, n - 1, "Merge Sort (Descending)");
measureMemoryUsage(mergeSort, randomArr, 0, n - 1, "Merge Sort (Random)");
cout << "\nQuick Sort on different input types:" << endl;
measureMemoryUsage(quickSort, ascending, 0, n - 1, "Quick Sort (Ascending)");
measureMemoryUsage(quickSort, descending, 0, n - 1, "Quick Sort (Descending)");
measureMemoryUsage(quickSort, randomArr, 0, n - 1, "Quick Sort (Random)");
cout << "\nTim Sort on different input types:" << endl;
measureMemoryUsage(timSort, ascending, "Timsort (Ascending)");
measureMemoryUsage(timSort, descending, "Timsort (Descending)");
measureMemoryUsage(timSort, randomArr, "Timsort (Random)");
return 0;
}
Output
Running tests on array size: 100000
Heap Sort on different input types:
Heap Sort (Ascending) memory usage: 0 KB
Heap Sort (Descending) memory usage: 0 KB
Heap Sort (Random) memory usage: 0 KB
Merge Sort on different input types:
Merge Sort (Ascending) memory usage: 520192 KB
Merge Sort (Descending) memory usage: 81920 KB
Merge Sort (Random) memory usage: 65536 KB
Quick Sort on different input types:
Quick Sort (Ascending) memory usage: 0 KB
Quick Sort (Descending) memory usage: 0 KB
Quick Sort (Random) memory usage: 0 KB
Tim Sort on different input types:
Timsort (Ascending) memory usage: 106496 KB
Timsort (Descending) memory usage: 49152 KB
Timsort (Random) memory usage: 36864 KB
Performance Comparison and Use Cases
Heapsort
- Performance: Guarantees O(n logn) time complexity but may not be as fast as Quick Sort in practice.
- Use Case: Suitable for memory-limited environments due to its in-place sorting and O(1) space complexity.
Merge Sort
- Performance: Excellent for large datasets due to its consistent O(n logn) time complexity.
- Use Case: Preferred for sorting linked lists or when stability is important (e.g., sorting databases).
Quick Sort
- Performance: Best for general-purpose sorting and typically the fastest in practice.
- Use Case: Ideal for large, randomly ordered datasets, but care must be taken with worst-case inputs (handled by randomized pivots).
Timsort
- Performance: Excellent for real-world data with many pre-sorted segments, achieving O(n) in best cases.
- Use Case: Suitable for datasets with partially ordered elements, such as spreadsheets or log files.
So, that’s about performance overview of efficient sorting algorithms having an O(n logn) average time complexity. Each algorithm has its own strengths and use cases depending on the specific requirements of the task, including memory constraints, stability, and data size. In practice, Quick Sort is often the fastest, while Timsort excels with real-world data.