Module 5: ML Model Packaging & Deployment | Serialization
Learn essential ML model packaging & deployment techniques in Module 5. Master model serialization with Pickle for production-ready AI applications.
Module 5: Model Packaging & Deployment
This module covers the essential steps and techniques for packaging and deploying machine learning models, enabling them to be used in production environments.
5.1 Model Serialization
Before deployment, models need to be saved in a format that can be loaded and used efficiently. This process is called serialization.
Common Serialization Formats:
-
Pickle:
- A standard Python library for serializing and de-serializing Python object structures.
- Pros: Simple to use, supports most Python objects.
- Cons: Python-specific, potential security risks when loading untrusted data, version compatibility issues between Python versions.
- Use Cases: Quick saving and loading of models within Python environments.
import pickle from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris # Train a simple model X, y = load_iris(return_X_y=True) model = LogisticRegression() model.fit(X, y) # Save the model with open('logistic_regression_model.pkl', 'wb') as f: pickle.dump(model, f) # Load the model with open('logistic_regression_model.pkl', 'rb') as f: loaded_model = pickle.load(f) # Make predictions with the loaded model predictions = loaded_model.predict(X[:5]) print(predictions)
-
ONNX (Open Neural Network Exchange):
- An open format built to represent machine learning models, enabling interoperability between different frameworks.
- Pros: Framework agnostic (PyTorch, TensorFlow, scikit-learn, etc.), hardware acceleration potential, good for cross-platform deployment.
- Cons: Not all model components or operations might be supported by all runtimes.
- Use Cases: Deploying models across diverse platforms and frameworks, optimizing for inference speed.
import torch import torch.nn as nn import onnxruntime as ort # Define a simple PyTorch model class SimpleModel(nn.Module): def __init__(self): super(SimpleModel, self).__init__() self.linear = nn.Linear(10, 2) def forward(self, x): return self.linear(x) model = SimpleModel() # Export the model to ONNX dummy_input = torch.randn(1, 10) torch.onnx.export(model, dummy_input, "simple_model.onnx", verbose=True) # Load and run the ONNX model using ONNX Runtime ort_session = ort.InferenceSession("simple_model.onnx") input_name = ort_session.get_inputs()[0].name output_name = ort_session.get_outputs()[0].name # Prepare input data (matching the dummy input shape and type) input_data = torch.randn(1, 10).numpy().astype("float32") ort_outputs = ort_session.run([output_name], {input_name: input_data}) print(ort_outputs)
-
TorchScript:
- A way to serialize PyTorch models so they can be loaded in C++ or used in environments where Python is not available.
- Pros: Enables deployment without a Python interpreter, performance optimizations.
- Cons: Primarily for PyTorch models, requires tracing or scripting the model.
- Use Cases: Deploying PyTorch models in C++ environments, mobile applications.
import torch import torch.nn as nn # Define a simple PyTorch model class SimpleModel(nn.Module): def __init__(self): super(SimpleModel, self).__init__() self.linear = nn.Linear(10, 2) def forward(self, x): return self.linear(x) model = SimpleModel() # Trace the model (scripting is another option) dummy_input = torch.randn(1, 10) traced_model = torch.jit.trace(model, dummy_input) # Save the TorchScript model traced_model.save("simple_model.pt") # Load the TorchScript model loaded_traced_model = torch.jit.load("simple_model.pt") # Make predictions predictions = loaded_traced_model(dummy_input) print(predictions)
5.2 REST API Development
Exposing your trained model as a RESTful API allows other applications to request predictions easily.
Frameworks for API Development:
-
FastAPI:
- A modern, fast (high-performance) web framework for building APIs with Python 3.7+, based on standard Python type hints.
- Pros: Very fast, automatic data validation, interactive API documentation (Swagger UI, ReDoc), easy to learn.
- Use Cases: Building high-performance ML inference APIs.
from fastapi import FastAPI from pydantic import BaseModel import joblib # Or your preferred model loading library import numpy as np # Load your trained model (replace with your actual model path and loading logic) try: model = joblib.load("your_trained_model.pkl") except FileNotFoundError: # Create a dummy model for demonstration if the file doesn't exist from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris X, y = load_iris(return_X_y=True) model = LogisticRegression() model.fit(X, y) joblib.dump(model, "your_trained_model.pkl") # Save it for future runs print("Dummy model created and saved as 'your_trained_model.pkl'") # Define the request body schema using Pydantic class PredictionRequest(BaseModel): features: list[float] # Assuming your model expects a list of floats # Initialize FastAPI app app = FastAPI() @app.post("/predict/") async def predict_item(request: PredictionRequest): """ Accepts a list of features and returns a prediction from the loaded model. """ try: # Convert input features to a numpy array input_data = np.array(request.features).reshape(1, -1) # Reshape for single prediction prediction = model.predict(input_data) return {"prediction": prediction.tolist()} # Return prediction as a list except Exception as e: return {"error": str(e)} # To run this: # 1. Save the code as main.py # 2. Install uvicorn: pip install uvicorn fastapi joblib scikit-learn numpy # 3. Run: uvicorn main:app --reload # Then access http://127.0.0.1:8000/docs for interactive API documentation.
-
Flask:
- A lightweight WSGI web application framework in Python.
- Pros: Simple to start with, flexible, mature ecosystem.
- Cons: Less performant out-of-the-box compared to FastAPI, requires extensions for features like automatic docs.
- Use Cases: Simple ML APIs, prototyping.
from flask import Flask, request, jsonify import joblib # Or your preferred model loading library import numpy as np # Load your trained model (replace with your actual model path and loading logic) try: model = joblib.load("your_trained_model.pkl") except FileNotFoundError: # Create a dummy model for demonstration if the file doesn't exist from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris X, y = load_iris(return_X_y=True) model = LogisticRegression() model.fit(X, y) joblib.dump(model, "your_trained_model.pkl") # Save it for future runs print("Dummy model created and saved as 'your_trained_model.pkl'") app = Flask(__name__) @app.route('/predict', methods=['POST']) def predict(): """ Accepts JSON input with 'features' and returns a prediction. """ data = request.get_json() if not data or 'features' not in data: return jsonify({"error": "Invalid input. 'features' field is required."}), 400 try: # Convert input features to a numpy array input_data = np.array(data['features']).reshape(1, -1) # Reshape for single prediction prediction = model.predict(input_data) return jsonify({"prediction": prediction.tolist()}) # Return prediction as a list except Exception as e: return jsonify({"error": str(e)}), 500 # To run this: # 1. Save the code as app.py # 2. Install Flask: pip install Flask joblib scikit-learn numpy # 3. Run: python app.py # Then send POST requests to http://127.0.0.1:5000/predict with JSON body like: # {"features": [5.1, 3.5, 1.4, 0.2]}
5.3 Deployment Strategies
Once your model is serialized and accessible via an API (or directly), you need to choose how to deploy it.
5.3.1 Local Deployment
- Description: Running your model directly on a developer's machine or a dedicated server.
- Pros: Simple setup, good for testing and development, full control.
- Cons: Limited scalability, maintenance burden, not suitable for high traffic.
- Use Cases: Local development environments, internal tools.
5.3.2 Cloud Deployment
-
Description: Deploying your model on cloud platforms like AWS, Google Cloud, Azure, etc.
-
Pros: Scalability, managed services, high availability, cost-effectiveness (pay-as-you-go).
-
Cons: Can be complex to manage, vendor lock-in potential, cost monitoring is crucial.
-
Use Cases: Most production applications requiring scalability and reliability.
- Managed ML Platforms: Services like AWS SageMaker, Google AI Platform, Azure Machine Learning offer end-to-end solutions for deploying models.
- Container Orchestration: Docker and Kubernetes are crucial here.
5.3.3 Serverless Deployment
- Description: Deploying your model as functions that are triggered by events (e.g., HTTP requests) and automatically scale based on demand. Examples include AWS Lambda, Google Cloud Functions, Azure Functions.
- Pros: Automatic scaling, pay-per-execution, no server management.
- Cons: Cold start issues (initial latency), execution time limits, limited memory/storage, not ideal for very large models or complex dependencies.
- Use Cases: Infrequent or event-driven predictions, microservices.
5.3.4 Edge Deployment
- Description: Deploying models directly onto end-user devices (e.g., mobile phones, IoT devices).
- Pros: Low latency, offline capabilities, privacy benefits, reduced bandwidth usage.
- Cons: Limited computational resources on edge devices, model size constraints, complex deployment and updates.
- Use Cases: Real-time applications on mobile, autonomous systems, smart devices.
- Frameworks like TensorFlow Lite and PyTorch Mobile are commonly used.
5.4 Deploying with Docker/Kubernetes
Containerization with Docker and orchestration with Kubernetes are industry-standard for robust and scalable deployments.
5.4.1 Dockerizing ML Models
Docker allows you to package your model, its dependencies, and your API code into a portable container.
-
Dockerfile Example:
# Use an official Python runtime as a parent image FROM python:3.9-slim # Set the working directory in the container WORKDIR /app # Copy the requirements file into the container COPY requirements.txt . # Install any needed packages specified in requirements.txt RUN pip install --no-cache-dir -r requirements.txt # Copy the rest of the application code into the container COPY . . # Make port 80 available to the world outside this container EXPOSE 80 # Define environment variable ENV MODEL_PATH=/app/your_trained_model.pkl # Run your API application when the container launches # Replace 'main:app' with your FastAPI/Flask app entry point CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"] # Or for Flask: CMD ["python", "app.py"]
-
requirements.txt
Example:fastapi uvicorn python-multipart joblib scikit-learn numpy
-
Building and Running a Docker Image:
# Build the Docker image (from the directory containing Dockerfile and your code) docker build -t my-ml-api . # Run the Docker container docker run -p 8000:80 my-ml-api
This will start your API, typically accessible at
http://localhost:8000
.
5.4.2 Kubernetes Deployment
Kubernetes is used to automate the deployment, scaling, and management of containerized applications.
-
Kubernetes Deployment Manifest (example
deployment.yaml
):apiVersion: apps/v1 kind: Deployment metadata: name: ml-api-deployment labels: app: ml-api spec: replicas: 3 # Number of desired pods selector: matchLabels: app: ml-api template: metadata: labels: app: ml-api spec: containers: - name: ml-api-container image: your-dockerhub-username/my-ml-api:latest # Replace with your image ports: - containerPort: 80 # Port your app listens on inside the container resources: # Optional: define resource requests and limits requests: memory: "256Mi" cpu: "500m" limits: memory: "512Mi" cpu: "1000m" # You can also mount models as volumes here if not baked into the image
-
Kubernetes Service Manifest (example
service.yaml
):apiVersion: v1 kind: Service metadata: name: ml-api-service spec: selector: app: ml-api # Matches the labels in your Deployment template ports: - protocol: TCP port: 80 # The port the service will be accessible on targetPort: 80 # The port your container listens on type: LoadBalancer # Or ClusterIP, NodePort depending on your needs
-
Applying Manifests:
# Apply the deployment kubectl apply -f deployment.yaml # Apply the service kubectl apply -f service.yaml
This setup allows Kubernetes to manage multiple instances of your containerized API, automatically restart failed containers, and scale your application up or down based on traffic.
Unit Test ML Data & Models: Python Guide
Master unit testing for ML data pipelines & models. Learn to ensure integrity & correctness with Python's unittest & pytest frameworks for robust AI development.
Deploy ML Models with Docker & Kubernetes at Scale
Learn to containerize ML models with Docker and deploy them at scale using Kubernetes. Master MLOps for efficient machine learning operations.