Python Search & Sort Algorithms for AI/ML

Master Python searching and sorting algorithms. Explore linear search, theory, and practical examples for efficient data handling in AI and Machine Learning.

Python Searching and Sorting Algorithms

This document provides a comprehensive overview of common searching and sorting algorithms in Python, with explanations, theoretical underpinnings, and illustrative examples.

9.1 Searching Algorithms

Searching algorithms are fundamental to computer science, enabling us to find specific elements within a data structure.

Linear search is the simplest searching algorithm. It sequentially checks each element of the list until a match is found or the entire list has been traversed.

How it works:

  1. Start from the first element of the list.
  2. Compare the current element with the target value.
  3. If they match, return the index of the current element.
  4. If they don't match, move to the next element.
  5. If the end of the list is reached without finding the target, return an indicator that the element is not present (e.g., -1).

Time Complexity:

  • Best Case: O(1) (element is the first one)
  • Average Case: O(n)
  • Worst Case: O(n) (element is the last one or not present)

Space Complexity: O(1)

Example:

def linear_search(arr, target):
    for i in range(len(arr)):
        if arr[i] == target:
            return i
    return -1 # Element not found

my_list = [5, 2, 8, 12, 3, 6]
target_element = 8
index = linear_search(my_list, target_element)

if index != -1:
    print(f"Element {target_element} found at index {index}")
else:
    print(f"Element {target_element} not found in the list")

Binary search is a much more efficient searching algorithm that works on sorted lists. It repeatedly divides the search interval in half.

How it works:

  1. Start with an interval covering the whole list.
  2. Find the middle element of the interval.
  3. If the middle element is the target, return its index.
  4. If the target is less than the middle element, search in the left half.
  5. If the target is greater than the middle element, search in the right half.
  6. Repeat steps 2-5 until the target is found or the interval is empty.

Prerequisite: The list must be sorted.

Time Complexity:

  • Best Case: O(1) (element is the middle one)
  • Average Case: O(log n)
  • Worst Case: O(log n)

Space Complexity: O(1) (iterative), O(log n) (recursive due to call stack)

Example:

def binary_search(arr, target):
    low = 0
    high = len(arr) - 1

    while low <= high:
        mid = (low + high) // 2
        mid_val = arr[mid]

        if mid_val == target:
            return mid
        elif target < mid_val:
            high = mid - 1
        else: # target > mid_val
            low = mid + 1
    return -1 # Element not found

sorted_list = [2, 3, 5, 6, 8, 12]
target_element = 6
index = binary_search(sorted_list, target_element)

if index != -1:
    print(f"Element {target_element} found at index {index}")
else:
    print(f"Element {target_element} not found in the list")

9.4 Sorting Algorithms

Sorting algorithms are used to arrange elements of a list in a specific order (e.g., ascending or descending).

9.5 Bubble Sort

Bubble Sort is a simple sorting algorithm that repeatedly steps through the list, compares adjacent elements and swaps them if they are in the wrong order. The pass through the list is repeated until the list is sorted.

How it works:

  1. Iterate through the list from the beginning to the end.
  2. Compare each pair of adjacent elements.
  3. If the first element is greater than the second element, swap them.
  4. Repeat this process for the entire list.
  5. After each pass, the largest unsorted element "bubbles up" to its correct position at the end of the unsorted portion.
  6. The passes continue until no swaps are needed in a pass, indicating the list is sorted.

Time Complexity:

  • Best Case: O(n) (if the list is already sorted and an optimization is used to detect no swaps)
  • Average Case: O(n^2)
  • Worst Case: O(n^2)

Space Complexity: O(1)

Example:

def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        # Flag to optimize: if no two elements were swapped by inner loop, then break
        swapped = False
        for j in range(0, n - i - 1):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
                swapped = True
        if not swapped:
            break # If no swaps occurred, the array is sorted

my_list = [64, 34, 25, 12, 22, 11, 90]
bubble_sort(my_list)
print("Sorted array is:", my_list)

9.6 Insertion Sort

Insertion Sort builds the final sorted array one item at a time. It is much less efficient on large lists than more advanced algorithms like quicksort, heapsort, or merge sort. However, it has some advantages:

  • Simple implementation.
  • Efficient for small data sets.
  • Adaptive: If the input array is already partially sorted, it is much faster.
  • Stable: Does not change the relative order of elements with equal values.

How it works:

  1. Assume the first element is sorted.
  2. Iterate from the second element to the end of the list.
  3. For each element, consider it the "key."
  4. Compare the key with the elements before it in the sorted portion.
  5. Shift elements in the sorted portion that are greater than the key one position to the right to make space for the key.
  6. Insert the key into its correct position.

Time Complexity:

  • Best Case: O(n) (if the list is already sorted)
  • Average Case: O(n^2)
  • Worst Case: O(n^2)

Space Complexity: O(1)

Example:

def insertion_sort(arr):
    for i in range(1, len(arr)):
        key = arr[i]
        j = i - 1
        # Move elements of arr[0..i-1], that are greater than key,
        # to one position ahead of their current position
        while j >= 0 and key < arr[j]:
            arr[j + 1] = arr[j]
            j -= 1
        arr[j + 1] = key

my_list = [12, 11, 13, 5, 6]
insertion_sort(my_list)
print("Sorted array is:", my_list)

9.7 Selection Sort

Selection Sort is an in-place comparison sorting algorithm. It divides the input list into two parts: a sorted sublist which is built up from left to right at the front of the list and a sublist of the remaining unsorted items that occupy the rest of the list.

How it works:

  1. Find the minimum element in the unsorted portion of the list.
  2. Swap it with the first element of the unsorted portion.
  3. Move the boundary of the sorted portion one element to the right.
  4. Repeat steps 1-3 until the entire list is sorted.

Time Complexity:

  • Best Case: O(n^2)
  • Average Case: O(n^2)
  • Worst Case: O(n^2)

Space Complexity: O(1)

Example:

def selection_sort(arr):
    n = len(arr)
    for i in range(n):
        # Find the minimum element in the remaining unsorted array
        min_idx = i
        for j in range(i + 1, n):
            if arr[j] < arr[min_idx]:
                min_idx = j

        # Swap the found minimum element with the first element
        arr[i], arr[min_idx] = arr[min_idx], arr[i]

my_list = [64, 25, 12, 22, 11]
selection_sort(my_list)
print("Sorted array is:", my_list)

9.8 Merge Sort

Merge Sort is a divide-and-conquer sorting algorithm. It divides the unsorted list into n sublists, each containing one element (a list of one element is considered sorted). Then, it repeatedly merges sublists to produce new sorted sublists until there is only one sublist remaining, which is the sorted list.

How it works:

  1. Divide: If the list has more than one element, split it into two halves.
  2. Conquer: Recursively sort the two sublists.
  3. Combine (Merge): Merge the two sorted sublists to produce a single sorted list. This merge step involves comparing elements from both sublists and placing them in order into a new temporary list.

Time Complexity:

  • Best Case: O(n log n)
  • Average Case: O(n log n)
  • Worst Case: O(n log n)

Space Complexity: O(n) (due to the temporary array used for merging)

Example:

def merge_sort(arr):
    if len(arr) <= 1:
        return arr

    mid = len(arr) // 2
    left_half = arr[:mid]
    right_half = arr[mid:]

    # Recursive calls to sort both halves
    left_half = merge_sort(left_half)
    right_half = merge_sort(right_half)

    # Merge the sorted halves
    return merge(left_half, right_half)

def merge(left, right):
    merged_arr = []
    i = j = 0

    # Compare elements from left and right halves and append the smaller one
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            merged_arr.append(left[i])
            i += 1
        else:
            merged_arr.append(right[j])
            j += 1

    # Append any remaining elements from left or right halves
    merged_arr.extend(left[i:])
    merged_arr.extend(right[j:])

    return merged_arr

my_list = [38, 27, 43, 3, 9, 82, 10]
sorted_list = merge_sort(my_list)
print("Sorted array is:", sorted_list)

9.9 Quick Sort

Quick Sort is an efficient, comparison-based, divide-and-conquer sorting algorithm. It picks an element as a "pivot" and partitions the given list around the picked pivot. The pivot is placed at its correct sorted position.

How it works:

  1. Choose a Pivot: Select an element from the list as the pivot. Common strategies include picking the first, last, middle, or a random element.
  2. Partition: Rearrange the list so that all elements less than the pivot come before it, and all elements greater than the pivot come after it. Elements equal to the pivot can go on either side. After partitioning, the pivot is in its final sorted position.
  3. Recursively Sort: Recursively apply Quick Sort to the sublist of elements with smaller values and the sublist of elements with greater values.

Time Complexity:

  • Best Case: O(n log n)
  • Average Case: O(n log n)
  • Worst Case: O(n^2) (occurs when the pivot selection consistently leads to highly unbalanced partitions, e.g., on an already sorted or reverse-sorted list if the pivot is always the first or last element)

Space Complexity: O(log n) on average (due to recursion stack), O(n) in the worst case.

Example (using Lomuto partition scheme):

def partition(arr, low, high):
    pivot = arr[high] # Choose the last element as pivot
    i = low - 1       # Index of smaller element

    for j in range(low, high):
        # If current element is smaller than or equal to pivot
        if arr[j] <= pivot:
            i += 1
            arr[i], arr[j] = arr[j], arr[i] # Swap

    arr[i + 1], arr[high] = arr[high], arr[i + 1] # Swap pivot to its correct place
    return i + 1

def quick_sort(arr, low, high):
    if low < high:
        # pi is partitioning index, arr[pi] is now at right place
        pi = partition(arr, low, high)

        # Separately sort elements before partition and after partition
        quick_sort(arr, low, pi - 1)
        quick_sort(arr, pi + 1, high)

my_list = [10, 7, 8, 9, 1, 5]
n = len(my_list)
quick_sort(my_list, 0, n - 1)
print("Sorted array is:", my_list)

9.10 Heap Sort

Heap Sort is an efficient comparison-based sorting algorithm that uses a binary heap data structure. It works by first converting the list into a max-heap or min-heap.

How it works:

  1. Heapify: Build a max-heap from the input list. In a max-heap, the parent node is always greater than its child nodes. The largest element is at the root.
  2. Sort:
    • Swap the root element (the largest) with the last element of the heap.
    • Reduce the size of the heap by one.
    • Heapify the root element to maintain the max-heap property.
    • Repeat this process until the heap size is 1.

Time Complexity:

  • Best Case: O(n log n)
  • Average Case: O(n log n)
  • Worst Case: O(n log n)

Space Complexity: O(1) (in-place sort)

Example:

def heapify(arr, n, i):
    largest = i  # Initialize largest as root
    l = 2 * i + 1     # left child index
    r = 2 * i + 2     # right child index

    # See if left child of root exists and is greater than root
    if l < n and arr[i] < arr[l]:
        largest = l

    # See if right child of root exists and is greater than root
    if r < n and arr[largest] < arr[r]:
        largest = r

    # Change root, if needed
    if largest != i:
        arr[i], arr[largest] = arr[largest], arr[i]  # swap

        # Heapify the root.
        heapify(arr, n, largest)

def heap_sort(arr):
    n = len(arr)

    # Build a maxheap.
    # Since last parent will be at (n//2) - 1, we can start at that node.
    for i in range(n // 2 - 1, -1, -1):
        heapify(arr, n, i)

    # One by one extract elements
    for i in range(n - 1, 0, -1):
        arr[i], arr[0] = arr[0], arr[i]  # swap
        heapify(arr, i, 0) # heapify root element

my_list = [12, 11, 13, 5, 6, 7]
heap_sort(my_list)
print("Sorted array is:", my_list)

9.11 Tim Sort

Tim Sort is a hybrid, stable sorting algorithm derived from merge sort and insertion sort. It is designed to perform well on many kinds of real-world data. It was developed by Tim Peters for use in Python.

Key Characteristics:

  • Hybrid: Combines the strengths of Merge Sort (efficient for large datasets, O(n log n)) and Insertion Sort (efficient for small or nearly sorted datasets, O(n^2) but fast in practice for small n).
  • Adaptive: Exploits existing runs (sequences of already sorted elements) in the data.
  • Stable: Preserves the relative order of equal elements.
  • O(n log n) Complexity: Guarantees logarithmic time complexity in the worst case.

How it works (Simplified Overview):

  1. Identify Runs: The algorithm scans the input array and identifies "runs" – existing sorted sequences (either ascending or descending). Descending runs are reversed to make them ascending.
  2. Minrun: A minimum run length (minrun) is calculated based on the size of the array. Shorter runs are extended to this minimum length using insertion sort.
  3. Merge Runs: The algorithm then merges these runs together in a sophisticated way. It maintains a stack of runs and merges adjacent runs in an order that aims to balance the sizes of the runs being merged and maintain the O(n log n) complexity. Merge operations use a temporary buffer.

Time Complexity:

  • Best Case: O(n) (for already sorted data with long runs)
  • Average Case: O(n log n)
  • Worst Case: O(n log n)

Space Complexity: O(n) in the worst case, but often O(k) where k is the number of runs, which is typically much smaller than n.

Python's sorted() and list.sort():

In Python, the built-in sorted() function and list.sort() method use Tim Sort. This means you typically don't need to implement these algorithms yourself unless you are studying them or have very specific low-level requirements.

Example (using Python's built-in):

my_list = [38, 27, 43, 3, 9, 82, 10]

# Using list.sort() (sorts in-place)
my_list.sort()
print("Sorted array (list.sort()):", my_list)

# Using sorted() (returns a new sorted list)
another_list = [5, 1, 4, 2, 8]
sorted_another_list = sorted(another_list)
print("Sorted array (sorted()):", sorted_another_list)