Python Collections Module: Data Structures for AI/ML
Master Python's Collections Module for efficient data handling in AI & Machine Learning. Explore defaultdict, Counter, deque, OrderedDict & more for robust data science.
5.2 Python Collections Module
Python provides a rich set of built-in data structures, known as collections, for storing, managing, and manipulating groups of related data efficiently. These collections are fundamental tools for a wide range of programming tasks, from simple data handling to complex algorithm implementation.
Overview of Python Collection Types
Python offers four primary built-in collection types, each with distinct characteristics:
Collection Type | Description | Mutable | Ordered | Allows Duplicates |
---|---|---|---|---|
List | Ordered, changeable sequence; allows duplicate elements. | Yes | Yes | Yes |
Tuple | Ordered, immutable sequence; allows duplicate elements. | No | Yes | Yes |
Set | Unordered collection of unique elements; mutable. | Yes | No | No |
Dictionary | Unordered collection of key-value pairs with unique keys. | Yes | No | No (keys) |
1. Python Lists
A list is a versatile and dynamic array that can store elements of various data types. Lists preserve the order of their elements, can contain duplicate values, and are mutable, meaning their contents can be changed after creation.
Creating a List
fruits = ["apple", "banana", "cherry"]
print(fruits)
# Output: ['apple', 'banana', 'cherry']
Accessing List Elements
Elements in a list are accessed using their index, starting from 0. Negative indexing can be used to access elements from the end of the list.
fruits = ["apple", "banana", "cherry"]
print(fruits[1]) # Accessing the element at index 1
# Output: banana
print(fruits[-1]) # Accessing the last element
# Output: cherry
Common List Methods
Method | Description |
---|---|
append(item) | Adds an item to the end of the list. |
insert(i, item) | Inserts an item at a specified index i . |
remove(item) | Removes the first occurrence of the item. |
pop([index]) | Removes and returns the item at index (or the last item if index is omitted). |
sort() | Sorts the list in ascending order in-place. |
reverse() | Reverses the order of elements in-place. |
len(list) | Returns the number of items in the list. |
fruits = ["apple", "banana", "cherry"]
fruits.append("orange") # Add "orange" to the end
fruits.sort() # Sort the list alphabetically
print(fruits)
# Output: ['apple', 'banana', 'cherry', 'orange']
2. Python Tuples
A tuple is similar to a list in that it is an ordered sequence of elements. However, tuples are immutable, meaning their contents cannot be modified after creation. This immutability makes them ideal for storing fixed collections of data, such as coordinates or records.
Creating a Tuple
coordinates = (10, 20)
print(coordinates)
# Output: (10, 20)
Accessing Tuple Elements
Tuple elements are accessed using indexing, similar to lists.
coordinates = (10, 20)
print(coordinates[0]) # Accessing the element at index 0
# Output: 10
Tuple Unpacking
Tuple unpacking allows you to assign elements of a tuple to individual variables.
coordinates = (10, 20)
x, y = coordinates
print(x, y)
# Output: 10 20
3. Python Sets
A set is an unordered collection of unique items. Sets are highly efficient for membership testing (checking if an item exists in the set), removing duplicate elements, and performing mathematical set operations like union, intersection, and difference.
Creating a Set
When creating a set with duplicate elements, only unique elements are retained.
unique_numbers = {1, 2, 3, 3, 2}
print(unique_numbers)
# Output: {1, 2, 3} (order may vary)
Common Set Methods
Method | Description |
---|---|
add(item) | Adds a single element to the set. |
update(iterable) | Adds multiple elements from an iterable to the set. |
remove(item) | Removes a specified item. Raises a KeyError if the item is not found. |
discard(item) | Removes a specified item if it is present. Does nothing if the item is not found. |
union(other_set) | Returns a new set containing all elements from both sets. |
intersection(other_set) | Returns a new set containing common elements between the sets. |
difference(other_set) | Returns a new set with elements in the first set but not in the other. |
a = {1, 2, 3}
b = {3, 4, 5}
print(a.union(b)) # Union of sets a and b
# Output: {1, 2, 3, 4, 5}
print(a.intersection(b)) # Intersection of sets a and b
# Output: {3}
print(a.difference(b)) # Elements in a but not in b
# Output: {1, 2}
4. Python Dictionaries
A dictionary (often abbreviated as dict
) stores data in key-value pairs. Each key in a dictionary must be unique and immutable (e.g., strings, numbers, or tuples), while values can be of any data type and can be duplicated. Dictionaries are essential for creating mappings and associating data.
Creating a Dictionary
student = {"name": "Alex", "age": 21, "grade": "A"}
print(student)
# Output: {'name': 'Alex', 'age': 21, 'grade': 'A'}
Accessing Dictionary Elements
Elements are accessed using their corresponding keys.
student = {"name": "Alex", "age": 21, "grade": "A"}
print(student["name"]) # Accessing the value associated with the key "name"
# Output: Alex
Modifying Dictionary Data
You can add new key-value pairs or update existing ones.
student = {"name": "Alex", "age": 21, "grade": "A"}
student["age"] = 22 # Update the value for the key "age"
student["email"] = "alex@example.com" # Add a new key-value pair
print(student)
# Output: {'name': 'Alex', 'age': 22, 'grade': 'A', 'email': 'alex@example.com'}
Common Dictionary Methods
Method | Description |
---|---|
keys() | Returns a view object that displays a list of all the keys. |
values() | Returns a view object that displays a list of all the values. |
items() | Returns a view object that displays a list of a dictionary's key-value tuple pairs. |
get(key) | Returns the value for the key if it exists, otherwise returns None (or a specified default value). |
update(other_dict) | Updates the dictionary with key-value pairs from other_dict , overwriting existing keys. |
pop(key) | Removes the specified key and returns its associated value. Raises a KeyError if the key is not found. |
student = {"name": "Alex", "age": 22, "grade": "A"}
print(student.get("name")) # Get value for "name"
# Output: Alex
print(student.items()) # Get all key-value pairs
# Output: dict_items([('name', 'Alex'), ('age', 22), ('grade', 'A')])
Choosing the Right Collection
Selecting the appropriate collection type is crucial for efficient and readable code. Consider these common use cases:
- Need an ordered, changeable sequence: Use a List.
- Require a fixed and immutable group of items: Use a Tuple.
- Need to ensure uniqueness of elements and perform set operations: Use a Set.
- Need to store and retrieve data using key-value pairs with fast lookups: Use a Dictionary.
Advanced Collections from the collections
Module
Python's built-in collections
module provides specialized container datatypes that offer extended functionality beyond the standard collections.
1. namedtuple
namedtuple
allows you to create tuple subclasses with named fields. This makes accessing tuple elements more readable by using names instead of indices.
from collections import namedtuple
# Create a Point namedtuple with fields 'x' and 'y'
Point = namedtuple("Point", "x y")
# Create an instance of Point
p = Point(10, 20)
# Access elements by name
print(p.x, p.y)
# Output: 10 20
2. deque
(Double-Ended Queue)
deque
is a list-like container optimized for fast appends and pops from both ends. It's a highly efficient data structure for implementing queues and stacks.
from collections import deque
# Create a deque
dq = deque([1, 2, 3])
# Append to the left (beginning)
dq.appendleft(0)
# deque([0, 1, 2, 3])
# Append to the right (end)
dq.append(4)
# deque([0, 1, 2, 3, 4])
print(dq)
# Output: deque([0, 1, 2, 3, 4])
3. defaultdict
defaultdict
is a subclass of dict
that calls a factory function to supply default values for missing keys. This avoids KeyError
exceptions when trying to access a non-existent key and simplifies the initialization of collections within dictionaries.
from collections import defaultdict
# Create a defaultdict where missing keys default to integers (0)
dd = defaultdict(int)
# Accessing a missing key automatically creates it with the default value
dd["a"] += 1
dd["b"] += 5
print(dd)
# Output: defaultdict(<class 'int'>, {'a': 1, 'b': 5})
4. Counter
Counter
is a dictionary subclass for counting hashable objects. It's useful for quickly tallying the occurrences of items in an iterable.
from collections import Counter
# Create a Counter from a string
cnt = Counter("success")
print(cnt)
# Output: Counter({'s': 3, 'c': 2, 'u': 1, 'e': 1})
# Accessing counts
print(cnt['s'])
# Output: 3
Python List Comprehension: Concise List Creation for AI
Master Python list comprehension for efficient data manipulation. Learn how this powerful feature simplifies list creation, ideal for AI and ML workflows.
Python Math Module: Functions for AI & ML
Explore Python's math module for AI & ML. Learn how to import and use functions for real number operations, essential for data science and machine learning.