Master Python file handling for AI & ML! Learn to read/write data, understand context managers, and efficiently manage files in your projects.

File Handling in Python

This documentation provides a comprehensive overview of file handling operations in Python, covering reading from and writing to various file formats, as well as understanding context managers.

8.1 Python Files I/O

Python provides built-in functions for interacting with files, allowing you to read data from files and write data to them. The primary function used is open(), which returns a file object.

The open() function typically takes two arguments:

filename: The path to the file.
mode: A string specifying how the file will be used. Common modes include:
- 'r': Read (default).
- 'w': Write (truncates the file if it exists, creates it if it doesn't).
- 'a': Append (adds data to the end of the file, creates it if it doesn't).
- 'x': Create (creates a new file, raises an error if the file already exists).
- 'b': Binary mode (for non-text files).
- 't': Text mode (default).
- '+': Opening a disk file for updating (reading and writing).

It's crucial to close files after you're finished with them using the close() method to free up system resources and ensure data is flushed to the file.

# Example of opening and closing a file
file = open("my_file.txt", "w")
# Perform operations on the file
file.close()

Using a with statement (context manager) is the recommended approach as it automatically handles closing the file, even if errors occur.

with open("my_file.txt", "w") as file:
    # Perform operations on the file
    pass # File is automatically closed here

8.2 Reading from a File

Python offers several methods for reading data from files:

read(): Reads the entire content of the file as a single string.
readline(): Reads a single line from the file, including the newline character.
readlines(): Reads all lines from the file and returns them as a list of strings.

Example: Reading the Entire File

try:
    with open("example.txt", "r") as file:
        content = file.read()
        print(content)
except FileNotFoundError:
    print("Error: The file 'example.txt' was not found.")

Example: Reading Line by Line

try:
    with open("example.txt", "r") as file:
        for line in file:
            print(line.strip()) # .strip() removes leading/trailing whitespace including newline
except FileNotFoundError:
    print("Error: The file 'example.txt' was not found.")

Example: Reading All Lines into a List

try:
    with open("example.txt", "r") as file:
        lines = file.readlines()
        print(lines)
except FileNotFoundError:
    print("Error: The file 'example.txt' was not found.")

8.3 Writing to a File

Python allows you to write data to files using the 'w' (write) or 'a' (append) modes.

write(string): Writes a string to the file.
writelines(list_of_strings): Writes a list of strings to the file. Note that it does not automatically add newline characters, so you might need to add them manually to your strings.

Example: Writing to a File (Overwriting)

with open("output.txt", "w") as file:
    file.write("This is the first line.\n")
    file.write("This is the second line.\n")

Example: Appending to a File

with open("output.txt", "a") as file:
    file.write("This line is appended.\n")

Example: Writing a List of Strings

lines_to_write = ["Line 1\n", "Line 2\n", "Line 3\n"]
with open("list_output.txt", "w") as file:
    file.writelines(lines_to_write)

8.4 Reading CSV Files

CSV (Comma Separated Values) files are common for storing tabular data. Python's csv module provides convenient ways to read and write CSV files.

To read a CSV file, you typically use csv.reader.

Example: Reading a CSV File

Let's assume you have a file named data.csv with the following content:

Name,Age,City
Alice,30,New York
Bob,25,Los Angeles
Charlie,35,Chicago

import csv

try:
    with open("data.csv", "r", newline='') as csvfile:
        csv_reader = csv.reader(csvfile)
        header = next(csv_reader) # Read the header row
        print(f"Header: {header}")
        for row in csv_reader:
            print(row)
except FileNotFoundError:
    print("Error: The file 'data.csv' was not found.")

The newline='' argument is important when opening CSV files to prevent blank rows from appearing in the output due to differing newline conventions.

Reading CSV as Dictionaries

The csv.DictReader allows you to read rows as dictionaries, where keys are taken from the header row.

import csv

try:
    with open("data.csv", "r", newline='') as csvfile:
        dict_reader = csv.DictReader(csvfile)
        for row in dict_reader:
            print(row)
            # Access data by column name:
            # print(f"Name: {row['Name']}, Age: {row['Age']}, City: {row['City']}")
except FileNotFoundError:
    print("Error: The file 'data.csv' was not found.")

8.5 Writing CSV Files

The csv module also facilitates writing to CSV files. You can use csv.writer or csv.DictWriter.

Example: Writing to a CSV File

import csv

data_to_write = [
    ['Name', 'Age', 'City'],
    ['Alice', 30, 'New York'],
    ['Bob', 25, 'Los Angeles'],
    ['Charlie', 35, 'Chicago']
]

with open("output_data.csv", "w", newline='') as csvfile:
    csv_writer = csv.writer(csvfile)
    csv_writer.writerows(data_to_write) # Writes all rows at once

Example: Writing to a CSV File using DictWriter

import csv

data_for_dict_write = [
    {'Name': 'Alice', 'Age': 30, 'City': 'New York'},
    {'Name': 'Bob', 'Age': 25, 'City': 'Los Angeles'},
    {'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}
]

fieldnames = ['Name', 'Age', 'City'] # Must match the keys in the dictionaries

with open("output_dict_data.csv", "w", newline='') as csvfile:
    dict_writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    dict_writer.writeheader() # Writes the header row
    dict_writer.writerows(data_for_dict_write) # Writes the data rows

8.6 Reading and Writing Excel Files

Python's capabilities for directly reading and writing .xls or .xlsx files are typically handled by third-party libraries. pandas is a very popular and powerful library for data manipulation, including Excel file handling.

First, you'll need to install pandas:

pip install pandas openpyxl xlrd

(openpyxl and xlrd are engines that pandas uses for .xlsx and .xls files respectively.)

Example: Reading an Excel File

Let's assume you have an Excel file named data.xlsx with a sheet containing data.

import pandas as pd

try:
    # Read the first sheet by default
    df = pd.read_excel("data.xlsx")
    print(df)

    # Read a specific sheet
    # df_sheet2 = pd.read_excel("data.xlsx", sheet_name="Sheet2")
    # print(df_sheet2)
except FileNotFoundError:
    print("Error: The file 'data.xlsx' was not found.")
except Exception as e:
    print(f"An error occurred: {e}")

pandas reads Excel data into a DataFrame, which is a two-dimensional labeled data structure with columns of potentially different types.

Example: Writing to an Excel File

import pandas as pd

# Sample data
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [30, 25, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Write to an Excel file
try:
    df.to_excel("output_excel.xlsx", index=False) # index=False prevents writing the DataFrame index as a column
    print("Successfully wrote to output_excel.xlsx")
except Exception as e:
    print(f"An error occurred: {e}")

# Write to a specific sheet
# with pd.ExcelWriter("output_excel_multi.xlsx") as writer:
#     df.to_excel(writer, sheet_name="Sheet1", index=False)
#     df2 = pd.DataFrame({'ColA': [1, 2], 'ColB': [3, 4]})
#     df2.to_excel(writer, sheet_name="Sheet2", index=False)

8.7 JSON Handling

JSON (JavaScript Object Notation) is a lightweight data-interchange format. Python's built-in json module provides functions to encode and decode JSON data.

json.dumps(obj): Serializes a Python object (obj) into a JSON formatted string.
json.dump(obj, fp): Serializes a Python object (obj) and writes it to a file-like object (fp).
json.loads(s): Deserializes a JSON formatted string (s) into a Python object.
json.load(fp): Deserializes JSON data from a file-like object (fp) into a Python object.

Example: Reading JSON from a File

Let's assume you have a file named data.json with the following content:

{
  "name": "Alice",
  "age": 30,
  "city": "New York",
  "isStudent": false,
  "courses": ["Math", "Science"]
}

import json

try:
    with open("data.json", "r") as json_file:
        data = json.load(json_file)
        print(data)
        print(f"Name: {data['name']}, Age: {data['age']}")
except FileNotFoundError:
    print("Error: The file 'data.json' was not found.")
except json.JSONDecodeError:
    print("Error: Could not decode JSON from the file.")

Example: Writing JSON to a File

import json

python_data = {
    "name": "Bob",
    "age": 25,
    "city": "Los Angeles",
    "isStudent": True,
    "grades": {"Math": 90, "Science": 85}
}

# Write to a file
with open("output.json", "w") as json_file:
    json.dump(python_data, json_file, indent=4) # indent=4 for pretty printing

print("Successfully wrote to output.json")

# Get JSON string
json_string = json.dumps(python_data, indent=4)
# print("\nJSON String:")
# print(json_string)

The indent parameter in json.dump and json.dumps is used for pretty-printing, making the JSON output more readable by adding whitespace.

8.8 Context Managers in Python

Context managers in Python are used to manage resources, ensuring that setup and teardown operations are performed correctly. They are commonly used with the with statement.

The with statement simplifies exception handling by ensuring that certain operations are performed before exiting a block of code, regardless of whether an exception occurred.

How Context Managers Work

A context manager is an object that defines two special methods:

__enter__(self): This method is called when the with statement is entered. It can optionally return an object that will be bound to the variable specified in the as clause of the with statement.
__exit__(self, exc_type, exc_val, exc_tb): This method is called when the with block is exited. It is responsible for cleaning up resources.
- exc_type, exc_val, exc_tb: These arguments receive information about any exception that occurred within the with block. If no exception occurred, they are all None.
- If __exit__ returns True, the exception is suppressed; otherwise, it's re-raised after __exit__ completes.

Using `with` for File Handling (Recap)

As seen throughout this documentation, file objects in Python are context managers. This is why with open(...) as file: is the preferred way to handle files:

with open("my_file.txt", "r") as f:
    # File is guaranteed to be closed upon exiting this block
    content = f.read()
    print(content)

# The file 'my_file.txt' is automatically closed here.

The __enter__ method of a file object returns the file object itself, which is then assigned to f. The __exit__ method ensures that f.close() is called.

Creating Custom Context Managers

You can create your own context managers using classes or by using the contextlib module.

1. Using a Class

class MyContextManager:
    def __init__(self, name):
        self.name = name
        print(f"Initializing context manager for {self.name}")

    def __enter__(self):
        print(f"Entering context for {self.name}")
        # You can return a value to be used in the 'as' clause
        return f"Resource {self.name}"

    def __exit__(self, exc_type, exc_val, exc_tb):
        print(f"Exiting context for {self.name}")
        if exc_type:
            print(f"An exception occurred: {exc_type}, {exc_val}")
        # Return False to propagate exceptions, True to suppress them
        return False

# Using the custom context manager
with MyContextManager("Database Connection") as resource:
    print(f"Inside the context: {resource}")
    # Simulate an error
    # raise ValueError("Something went wrong!")

print("Outside the context.")

2. Using `contextlib.contextmanager` Decorator

This is a more concise way to create context managers. You write a generator function and decorate it with @contextmanager. The code before the yield statement acts as __enter__, and the code after the yield acts as __exit__.

from contextlib import contextmanager

@contextmanager
def managed_resource(name):
    print(f"Setting up resource: {name}")
    try:
        # Code executed in __enter__
        yield f"Managed {name}" # The value yielded is available in the 'as' clause
    except Exception as e:
        print(f"Exception handled: {e}")
        # Code executed in __exit__ if an exception occurs
        raise # Re-raise the exception if not handled
    finally:
        # Code executed in __exit__ always (cleanup)
        print(f"Tearing down resource: {name}")

# Using the context manager from contextlib
with managed_resource("File Operation") as res:
    print(f"Using: {res}")
    # Simulate an error
    # raise TypeError("Incorrect type used")

print("Operation finished.")

Context managers are fundamental for robust resource management in Python, ensuring that resources are properly acquired and released, preventing leaks and maintaining application stability.

Python File Handling: Read, Write & Context Managers