Python Collections Module: Data Structures for AI/ML

Master Python's Collections Module for efficient data handling in AI & Machine Learning. Explore defaultdict, Counter, deque, OrderedDict & more for robust data science.

5.2 Python Collections Module

Python provides a rich set of built-in data structures, known as collections, for storing, managing, and manipulating groups of related data efficiently. These collections are fundamental tools for a wide range of programming tasks, from simple data handling to complex algorithm implementation.

Overview of Python Collection Types

Python offers four primary built-in collection types, each with distinct characteristics:

Collection TypeDescriptionMutableOrderedAllows Duplicates
ListOrdered, changeable sequence; allows duplicate elements.YesYesYes
TupleOrdered, immutable sequence; allows duplicate elements.NoYesYes
SetUnordered collection of unique elements; mutable.YesNoNo
DictionaryUnordered collection of key-value pairs with unique keys.YesNoNo (keys)

1. Python Lists

A list is a versatile and dynamic array that can store elements of various data types. Lists preserve the order of their elements, can contain duplicate values, and are mutable, meaning their contents can be changed after creation.

Creating a List

fruits = ["apple", "banana", "cherry"]
print(fruits)
# Output: ['apple', 'banana', 'cherry']

Accessing List Elements

Elements in a list are accessed using their index, starting from 0. Negative indexing can be used to access elements from the end of the list.

fruits = ["apple", "banana", "cherry"]
print(fruits[1])   # Accessing the element at index 1
# Output: banana

print(fruits[-1])  # Accessing the last element
# Output: cherry

Common List Methods

MethodDescription
append(item)Adds an item to the end of the list.
insert(i, item)Inserts an item at a specified index i.
remove(item)Removes the first occurrence of the item.
pop([index])Removes and returns the item at index (or the last item if index is omitted).
sort()Sorts the list in ascending order in-place.
reverse()Reverses the order of elements in-place.
len(list)Returns the number of items in the list.
fruits = ["apple", "banana", "cherry"]
fruits.append("orange")  # Add "orange" to the end
fruits.sort()           # Sort the list alphabetically
print(fruits)
# Output: ['apple', 'banana', 'cherry', 'orange']

2. Python Tuples

A tuple is similar to a list in that it is an ordered sequence of elements. However, tuples are immutable, meaning their contents cannot be modified after creation. This immutability makes them ideal for storing fixed collections of data, such as coordinates or records.

Creating a Tuple

coordinates = (10, 20)
print(coordinates)
# Output: (10, 20)

Accessing Tuple Elements

Tuple elements are accessed using indexing, similar to lists.

coordinates = (10, 20)
print(coordinates[0])  # Accessing the element at index 0
# Output: 10

Tuple Unpacking

Tuple unpacking allows you to assign elements of a tuple to individual variables.

coordinates = (10, 20)
x, y = coordinates
print(x, y)
# Output: 10 20

3. Python Sets

A set is an unordered collection of unique items. Sets are highly efficient for membership testing (checking if an item exists in the set), removing duplicate elements, and performing mathematical set operations like union, intersection, and difference.

Creating a Set

When creating a set with duplicate elements, only unique elements are retained.

unique_numbers = {1, 2, 3, 3, 2}
print(unique_numbers)
# Output: {1, 2, 3} (order may vary)

Common Set Methods

MethodDescription
add(item)Adds a single element to the set.
update(iterable)Adds multiple elements from an iterable to the set.
remove(item)Removes a specified item. Raises a KeyError if the item is not found.
discard(item)Removes a specified item if it is present. Does nothing if the item is not found.
union(other_set)Returns a new set containing all elements from both sets.
intersection(other_set)Returns a new set containing common elements between the sets.
difference(other_set)Returns a new set with elements in the first set but not in the other.
a = {1, 2, 3}
b = {3, 4, 5}

print(a.union(b))        # Union of sets a and b
# Output: {1, 2, 3, 4, 5}

print(a.intersection(b)) # Intersection of sets a and b
# Output: {3}

print(a.difference(b))   # Elements in a but not in b
# Output: {1, 2}

4. Python Dictionaries

A dictionary (often abbreviated as dict) stores data in key-value pairs. Each key in a dictionary must be unique and immutable (e.g., strings, numbers, or tuples), while values can be of any data type and can be duplicated. Dictionaries are essential for creating mappings and associating data.

Creating a Dictionary

student = {"name": "Alex", "age": 21, "grade": "A"}
print(student)
# Output: {'name': 'Alex', 'age': 21, 'grade': 'A'}

Accessing Dictionary Elements

Elements are accessed using their corresponding keys.

student = {"name": "Alex", "age": 21, "grade": "A"}
print(student["name"])  # Accessing the value associated with the key "name"
# Output: Alex

Modifying Dictionary Data

You can add new key-value pairs or update existing ones.

student = {"name": "Alex", "age": 21, "grade": "A"}
student["age"] = 22  # Update the value for the key "age"
student["email"] = "alex@example.com"  # Add a new key-value pair
print(student)
# Output: {'name': 'Alex', 'age': 22, 'grade': 'A', 'email': 'alex@example.com'}

Common Dictionary Methods

MethodDescription
keys()Returns a view object that displays a list of all the keys.
values()Returns a view object that displays a list of all the values.
items()Returns a view object that displays a list of a dictionary's key-value tuple pairs.
get(key)Returns the value for the key if it exists, otherwise returns None (or a specified default value).
update(other_dict)Updates the dictionary with key-value pairs from other_dict, overwriting existing keys.
pop(key)Removes the specified key and returns its associated value. Raises a KeyError if the key is not found.
student = {"name": "Alex", "age": 22, "grade": "A"}
print(student.get("name"))     # Get value for "name"
# Output: Alex

print(student.items())         # Get all key-value pairs
# Output: dict_items([('name', 'Alex'), ('age', 22), ('grade', 'A')])

Choosing the Right Collection

Selecting the appropriate collection type is crucial for efficient and readable code. Consider these common use cases:

  • Need an ordered, changeable sequence: Use a List.
  • Require a fixed and immutable group of items: Use a Tuple.
  • Need to ensure uniqueness of elements and perform set operations: Use a Set.
  • Need to store and retrieve data using key-value pairs with fast lookups: Use a Dictionary.

Advanced Collections from the collections Module

Python's built-in collections module provides specialized container datatypes that offer extended functionality beyond the standard collections.

1. namedtuple

namedtuple allows you to create tuple subclasses with named fields. This makes accessing tuple elements more readable by using names instead of indices.

from collections import namedtuple

# Create a Point namedtuple with fields 'x' and 'y'
Point = namedtuple("Point", "x y")

# Create an instance of Point
p = Point(10, 20)

# Access elements by name
print(p.x, p.y)
# Output: 10 20

2. deque (Double-Ended Queue)

deque is a list-like container optimized for fast appends and pops from both ends. It's a highly efficient data structure for implementing queues and stacks.

from collections import deque

# Create a deque
dq = deque([1, 2, 3])

# Append to the left (beginning)
dq.appendleft(0)
# deque([0, 1, 2, 3])

# Append to the right (end)
dq.append(4)
# deque([0, 1, 2, 3, 4])

print(dq)
# Output: deque([0, 1, 2, 3, 4])

3. defaultdict

defaultdict is a subclass of dict that calls a factory function to supply default values for missing keys. This avoids KeyError exceptions when trying to access a non-existent key and simplifies the initialization of collections within dictionaries.

from collections import defaultdict

# Create a defaultdict where missing keys default to integers (0)
dd = defaultdict(int)

# Accessing a missing key automatically creates it with the default value
dd["a"] += 1
dd["b"] += 5

print(dd)
# Output: defaultdict(<class 'int'>, {'a': 1, 'b': 5})

4. Counter

Counter is a dictionary subclass for counting hashable objects. It's useful for quickly tallying the occurrences of items in an iterable.

from collections import Counter

# Create a Counter from a string
cnt = Counter("success")

print(cnt)
# Output: Counter({'s': 3, 'c': 2, 'u': 1, 'e': 1})

# Accessing counts
print(cnt['s'])
# Output: 3