Learn essential ML model packaging & deployment techniques in Module 5. Master model serialization with Pickle for production-ready AI applications.

Module 5: Model Packaging & Deployment

This module covers the essential steps and techniques for packaging and deploying machine learning models, enabling them to be used in production environments.

5.1 Model Serialization

Before deployment, models need to be saved in a format that can be loaded and used efficiently. This process is called serialization.

Common Serialization Formats:

Pickle:

A standard Python library for serializing and de-serializing Python object structures.
Pros: Simple to use, supports most Python objects.
Cons: Python-specific, potential security risks when loading untrusted data, version compatibility issues between Python versions.
Use Cases: Quick saving and loading of models within Python environments.

import pickle
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

# Train a simple model
X, y = load_iris(return_X_y=True)
model = LogisticRegression()
model.fit(X, y)

# Save the model
with open('logistic_regression_model.pkl', 'wb') as f:
    pickle.dump(model, f)

# Load the model
with open('logistic_regression_model.pkl', 'rb') as f:
    loaded_model = pickle.load(f)

# Make predictions with the loaded model
predictions = loaded_model.predict(X[:5])
print(predictions)

ONNX (Open Neural Network Exchange):

An open format built to represent machine learning models, enabling interoperability between different frameworks.
Pros: Framework agnostic (PyTorch, TensorFlow, scikit-learn, etc.), hardware acceleration potential, good for cross-platform deployment.
Cons: Not all model components or operations might be supported by all runtimes.
Use Cases: Deploying models across diverse platforms and frameworks, optimizing for inference speed.

import torch
import torch.nn as nn
import onnxruntime as ort

# Define a simple PyTorch model
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.linear = nn.Linear(10, 2)

    def forward(self, x):
        return self.linear(x)

model = SimpleModel()

# Export the model to ONNX
dummy_input = torch.randn(1, 10)
torch.onnx.export(model, dummy_input, "simple_model.onnx", verbose=True)

# Load and run the ONNX model using ONNX Runtime
ort_session = ort.InferenceSession("simple_model.onnx")
input_name = ort_session.get_inputs()[0].name
output_name = ort_session.get_outputs()[0].name

# Prepare input data (matching the dummy input shape and type)
input_data = torch.randn(1, 10).numpy().astype("float32")
ort_outputs = ort_session.run([output_name], {input_name: input_data})

print(ort_outputs)

TorchScript:

A way to serialize PyTorch models so they can be loaded in C++ or used in environments where Python is not available.
Pros: Enables deployment without a Python interpreter, performance optimizations.
Cons: Primarily for PyTorch models, requires tracing or scripting the model.
Use Cases: Deploying PyTorch models in C++ environments, mobile applications.

import torch
import torch.nn as nn

# Define a simple PyTorch model
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.linear = nn.Linear(10, 2)

    def forward(self, x):
        return self.linear(x)

model = SimpleModel()

# Trace the model (scripting is another option)
dummy_input = torch.randn(1, 10)
traced_model = torch.jit.trace(model, dummy_input)

# Save the TorchScript model
traced_model.save("simple_model.pt")

# Load the TorchScript model
loaded_traced_model = torch.jit.load("simple_model.pt")

# Make predictions
predictions = loaded_traced_model(dummy_input)
print(predictions)

5.2 REST API Development

Exposing your trained model as a RESTful API allows other applications to request predictions easily.

Frameworks for API Development:

FastAPI:

A modern, fast (high-performance) web framework for building APIs with Python 3.7+, based on standard Python type hints.
Pros: Very fast, automatic data validation, interactive API documentation (Swagger UI, ReDoc), easy to learn.
Use Cases: Building high-performance ML inference APIs.

from fastapi import FastAPI
from pydantic import BaseModel
import joblib # Or your preferred model loading library
import numpy as np

# Load your trained model (replace with your actual model path and loading logic)
try:
    model = joblib.load("your_trained_model.pkl")
except FileNotFoundError:
    # Create a dummy model for demonstration if the file doesn't exist
    from sklearn.linear_model import LogisticRegression
    from sklearn.datasets import load_iris
    X, y = load_iris(return_X_y=True)
    model = LogisticRegression()
    model.fit(X, y)
    joblib.dump(model, "your_trained_model.pkl") # Save it for future runs
    print("Dummy model created and saved as 'your_trained_model.pkl'")


# Define the request body schema using Pydantic
class PredictionRequest(BaseModel):
    features: list[float] # Assuming your model expects a list of floats

# Initialize FastAPI app
app = FastAPI()

@app.post("/predict/")
async def predict_item(request: PredictionRequest):
    """
    Accepts a list of features and returns a prediction from the loaded model.
    """
    try:
        # Convert input features to a numpy array
        input_data = np.array(request.features).reshape(1, -1) # Reshape for single prediction
        prediction = model.predict(input_data)
        return {"prediction": prediction.tolist()} # Return prediction as a list
    except Exception as e:
        return {"error": str(e)}

# To run this:
# 1. Save the code as main.py
# 2. Install uvicorn: pip install uvicorn fastapi joblib scikit-learn numpy
# 3. Run: uvicorn main:app --reload
# Then access http://127.0.0.1:8000/docs for interactive API documentation.

Flask:

A lightweight WSGI web application framework in Python.
Pros: Simple to start with, flexible, mature ecosystem.
Cons: Less performant out-of-the-box compared to FastAPI, requires extensions for features like automatic docs.
Use Cases: Simple ML APIs, prototyping.

from flask import Flask, request, jsonify
import joblib # Or your preferred model loading library
import numpy as np

# Load your trained model (replace with your actual model path and loading logic)
try:
    model = joblib.load("your_trained_model.pkl")
except FileNotFoundError:
    # Create a dummy model for demonstration if the file doesn't exist
    from sklearn.linear_model import LogisticRegression
    from sklearn.datasets import load_iris
    X, y = load_iris(return_X_y=True)
    model = LogisticRegression()
    model.fit(X, y)
    joblib.dump(model, "your_trained_model.pkl") # Save it for future runs
    print("Dummy model created and saved as 'your_trained_model.pkl'")


app = Flask(__name__)

@app.route('/predict', methods=['POST'])
def predict():
    """
    Accepts JSON input with 'features' and returns a prediction.
    """
    data = request.get_json()
    if not data or 'features' not in data:
        return jsonify({"error": "Invalid input. 'features' field is required."}), 400

    try:
        # Convert input features to a numpy array
        input_data = np.array(data['features']).reshape(1, -1) # Reshape for single prediction
        prediction = model.predict(input_data)
        return jsonify({"prediction": prediction.tolist()}) # Return prediction as a list
    except Exception as e:
        return jsonify({"error": str(e)}), 500

# To run this:
# 1. Save the code as app.py
# 2. Install Flask: pip install Flask joblib scikit-learn numpy
# 3. Run: python app.py
# Then send POST requests to http://127.0.0.1:5000/predict with JSON body like:
# {"features": [5.1, 3.5, 1.4, 0.2]}

5.3 Deployment Strategies

Once your model is serialized and accessible via an API (or directly), you need to choose how to deploy it.

5.3.1 Local Deployment

Description: Running your model directly on a developer's machine or a dedicated server.
Pros: Simple setup, good for testing and development, full control.
Cons: Limited scalability, maintenance burden, not suitable for high traffic.
Use Cases: Local development environments, internal tools.

5.3.2 Cloud Deployment

Description: Deploying your model on cloud platforms like AWS, Google Cloud, Azure, etc.
Pros: Scalability, managed services, high availability, cost-effectiveness (pay-as-you-go).
Cons: Can be complex to manage, vendor lock-in potential, cost monitoring is crucial.
Use Cases: Most production applications requiring scalability and reliability.
- Managed ML Platforms: Services like AWS SageMaker, Google AI Platform, Azure Machine Learning offer end-to-end solutions for deploying models.
- Container Orchestration: Docker and Kubernetes are crucial here.

5.3.3 Serverless Deployment

Description: Deploying your model as functions that are triggered by events (e.g., HTTP requests) and automatically scale based on demand. Examples include AWS Lambda, Google Cloud Functions, Azure Functions.
Pros: Automatic scaling, pay-per-execution, no server management.
Cons: Cold start issues (initial latency), execution time limits, limited memory/storage, not ideal for very large models or complex dependencies.
Use Cases: Infrequent or event-driven predictions, microservices.

5.3.4 Edge Deployment

Description: Deploying models directly onto end-user devices (e.g., mobile phones, IoT devices).
Pros: Low latency, offline capabilities, privacy benefits, reduced bandwidth usage.
Cons: Limited computational resources on edge devices, model size constraints, complex deployment and updates.
Use Cases: Real-time applications on mobile, autonomous systems, smart devices.
- Frameworks like TensorFlow Lite and PyTorch Mobile are commonly used.

5.4 Deploying with Docker/Kubernetes

Containerization with Docker and orchestration with Kubernetes are industry-standard for robust and scalable deployments.

5.4.1 Dockerizing ML Models

Docker allows you to package your model, its dependencies, and your API code into a portable container.

Dockerfile Example:

# Use an official Python runtime as a parent image
FROM python:3.9-slim

# Set the working directory in the container
WORKDIR /app

# Copy the requirements file into the container
COPY requirements.txt .

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code into the container
COPY . .

# Make port 80 available to the world outside this container
EXPOSE 80

# Define environment variable
ENV MODEL_PATH=/app/your_trained_model.pkl

# Run your API application when the container launches
# Replace 'main:app' with your FastAPI/Flask app entry point
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]
# Or for Flask: CMD ["python", "app.py"]

requirements.txt Example:

fastapi
uvicorn
python-multipart
joblib
scikit-learn
numpy

Building and Running a Docker Image:

# Build the Docker image (from the directory containing Dockerfile and your code)
docker build -t my-ml-api .

# Run the Docker container
docker run -p 8000:80 my-ml-api

This will start your API, typically accessible at http://localhost:8000.

5.4.2 Kubernetes Deployment

Kubernetes is used to automate the deployment, scaling, and management of containerized applications.

Kubernetes Deployment Manifest (example deployment.yaml):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-api-deployment
  labels:
    app: ml-api
spec:
  replicas: 3 # Number of desired pods
  selector:
    matchLabels:
      app: ml-api
  template:
    metadata:
      labels:
        app: ml-api
    spec:
      containers:
      - name: ml-api-container
        image: your-dockerhub-username/my-ml-api:latest # Replace with your image
        ports:
        - containerPort: 80 # Port your app listens on inside the container
        resources: # Optional: define resource requests and limits
          requests:
            memory: "256Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "1000m"
        # You can also mount models as volumes here if not baked into the image

Kubernetes Service Manifest (example service.yaml):

apiVersion: v1
kind: Service
metadata:
  name: ml-api-service
spec:
  selector:
    app: ml-api # Matches the labels in your Deployment template
  ports:
  - protocol: TCP
    port: 80 # The port the service will be accessible on
    targetPort: 80 # The port your container listens on
  type: LoadBalancer # Or ClusterIP, NodePort depending on your needs

Applying Manifests:

# Apply the deployment
kubectl apply -f deployment.yaml

# Apply the service
kubectl apply -f service.yaml

This setup allows Kubernetes to manage multiple instances of your containerized API, automatically restart failed containers, and scale your application up or down based on traffic.

Module 5: ML Model Packaging & Deployment | Serialization