2340 words
12 minutes
From Notebooks to Nodes: Scaling Machine Learning on K8s

From Notebooks to Nodes: Scaling Machine Learning on K8s#

Machine learning (ML) often starts as something simple—perhaps a single Jupyter notebook on a local machine. In the early stages, this might be enough to train small models or experiment with data. However, as your workload grows, so does the need for computational power, distributed training, version control, reproducibility, and the ability to quickly iterate and deploy models. This is where Kubernetes (K8s) steps in, providing a powerful platform to manage growth effortlessly.

In this blog post, we will explore how to move from the simplicity of local notebooks to production-grade deployments at scale on Kubernetes. Whether you’re just starting out in your ML journey or looking to leverage the advanced features of K8s for production workloads, this comprehensive guide will help you navigate the key concepts, tools, and best practices.

Table of Contents#

  1. Introduction to Containers and Kubernetes
  2. Why Run Machine Learning on Kubernetes?
  3. Fundamental Kubernetes Concepts for ML
  4. From Notebook to Docker Container
  5. Deploying Your ML Service on Kubernetes
  6. Managing Data and Storage
  7. Scaling Strategies
  8. Basic Example: Deploying a Simple Model on K8s
  9. Intermediate Concepts: CI/CD and Reproducibility
  10. Advanced Concepts: GPUs, Hyperparameter Tuning, and Auto-Scaling
  11. Orchestration Frameworks: Kubeflow and Beyond
  12. Best Practices and Future Trends
  13. Conclusion

Introduction to Containers and Kubernetes#

Before diving into the mechanics of running machine learning workloads on Kubernetes, let’s do a brief recap of the foundational technologies.

Containers: The Building Blocks#

A container is a packaged unit of software that bundles together code, libraries, environments, and all dependencies into a single artifact. Containers ensure that your application runs the same way in every environment.

Key benefits for ML:

  • Consistency: Your model runs reliably across different development, testing, and production environments.
  • Portability: Containers can run on any machine that has a container runtime (such as Docker).
  • Isolation: Each container is isolated from the rest of the system, making dependency management easier.

Kubernetes: The Container Orchestrator#

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Once you have your machine learning model packaged in a container, Kubernetes can help you:

  • Distribute replicas of your container across multiple nodes.
  • Handle failover and high availability.
  • Automatically scale based on resource usage.
  • Provide rolling updates to avoid downtime.

Kubernetes is especially important for machine learning workflows that require distributed training or large-scale inference.

Why Run Machine Learning on Kubernetes?#

Many data scientists and ML engineers often use frameworks like scikit-learn, TensorFlow, or PyTorch on a single machine or a specialized cluster. While that works for development and experimentation, scaling up can pose significant hurdles:

  1. Resource Allocation: Different models require different resources. K8s can dynamically allocate resources (CPU, GPU, memory) to containers.
  2. Rapid Experimentation: Launching and tearing down environments is simpler when using container deployments, enabling faster iterations.
  3. Autoscaling: You can automatically scale your application—or model inference service—up or down based on metrics.
  4. Standardization: With K8s, your architecture becomes more standardized, reducing complexities in multi-tenant or multi-team scenarios.

In short, Kubernetes provides the orchestration needed to handle both small and large-scale requirements for modern machine learning.

Fundamental Kubernetes Concepts for ML#

Before getting started, let’s explore the elementary concepts in Kubernetes that matter most to machine learning workloads.

Kubernetes ObjectDescriptionTypical Use in ML
PodThe smallest deployable unit in K8s. A Pod can contain one or more containers.Hosting a training job, inference microservice, or data preprocessing step.
DeploymentManages stateless Pods. It can scale and roll out updates.Hosting model services (inference APIs) in a resilient, autoscaling way.
ServiceExposes Pods to the network or other Pods inside the cluster.Providing stable endpoints for inference requests.
StatefulSetManages stateful Pods, maintaining a sticky identity for each Pod.Running distributed frameworks like Spark, or advanced ML pipelines needing state.
Job/CronJobCreates Pods to run a batch job to completion (once or on a schedule).Model training or data processing tasks that must be completed.
Volume (PVC)Provides persistent storage for Pods through Persistent Volume Claims (PVCs).Storing training data, model weights, or logs that must persist.

Understanding these concepts helps prepare you for more sophisticated deployments.

From Notebook to Docker Container#

Step 1: Conceptual Shift#

Data scientists often develop initial prototypes in Jupyter notebooks. While notebooks are great for exploration, they’re not ideal for production. Instead, production-ready code is usually packaged as a Python script or an installable module. This code then gets containerized.

Step 2: Converting Notebook to a Script#

Let’s imagine you have a simple notebook, train_model.ipynb, that trains a linear regression model on some dataset. To convert it to a script:

train_model.py
import pandas as pd
from sklearn.linear_model import LinearRegression
import joblib
# Example data
data = pd.read_csv("data.csv")
X = data[["feature1", "feature2"]]
y = data["target"]
model = LinearRegression()
model.fit(X, y)
joblib.dump(model, "linear_regression_model.pkl")
print("Model trained and saved!")

Your code no longer relies on Jupyter’s interactive environment. Instead, it runs independently as a Python script.

Step 3: Creating a Dockerfile#

A Dockerfile describes the environment needed to run your code. For instance:

# Start from a lightweight Python image
FROM python:3.9-slim
# Set a working directory
WORKDIR /app
# Copy your requirements file first for caching benefits
COPY requirements.txt /app/
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of your application
COPY . /app/
# Run the training script by default
CMD ["python", "train_model.py"]

Step 4: Building and Testing the Image#

Build and tag the image:

Terminal window
docker build -t my-ml-app:latest .

Run a container locally to test:

Terminal window
docker run --rm my-ml-app:latest

Once you’ve confirmed everything works, you’re prepared to push this image to a container registry (like Docker Hub or a private registry) for deployment to Kubernetes.

Deploying Your ML Service on Kubernetes#

Container Registry#

To deploy to Kubernetes, your image must be accessible from the cluster, which implies pushing it to a container registry:

Terminal window
docker tag my-ml-app:latest myregistry.com/my-ml-app:latest
docker push myregistry.com/my-ml-app:latest

Creating Kubernetes Manifests#

Once your image is in a registry, you can define a Kubernetes Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-model-deployment
spec:
replicas: 2
selector:
matchLabels:
app: ml-model
template:
metadata:
labels:
app: ml-model
spec:
containers:
- name: ml-container
image: myregistry.com/my-ml-app:latest
ports:
- containerPort: 80

You can expose your Deployment as a Service:

apiVersion: v1
kind: Service
metadata:
name: ml-model-service
spec:
selector:
app: ml-model
ports:
- protocol: TCP
port: 80
targetPort: 80
type: ClusterIP

Apply these configurations to your cluster:

Terminal window
kubectl apply -f ml-model-deployment.yaml
kubectl apply -f ml-model-service.yaml

Confirming the Deployment#

Use kubectl get pods and kubectl get services to verify that your Pods are running and your Service is exposed internally (or externally, if configured).

Managing Data and Storage#

Machine learning workloads often require large datasets or need to produce outputs (models, logs, metrics) that must persist beyond container lifespans. Kubernetes abstracts storage using Volumes and Persistent Volume Claims (PVCs).

Using Persistent Volume (PV) and Persistent Volume Claim (PVC)#

  1. Define a Persistent Volume: An interface to the actual physical storage (local or network-attached).
  2. Create a PVC: A request for storage that references a PV.
  3. Attach PVC to a Pod: The Pod then sees the persistent disk as part of its filesystem.

Here’s an example PersistentVolumeClaim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: training-data-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi

You can mount this PVC in the Pod:

apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-training-deployment
spec:
replicas: 1
selector:
matchLabels:
app: ml-training
template:
metadata:
labels:
app: ml-training
spec:
containers:
- name: ml-training-container
image: myregistry.com/ml-training-image:latest
volumeMounts:
- name: training-data-volume
mountPath: /app/data
volumes:
- name: training-data-volume
persistentVolumeClaim:
claimName: training-data-pvc

Mounting a PVC for training data ensures that large datasets are locally available for ML tasks and remain persistent even if the container restarts.

Scaling Strategies#

Kubernetes offers autoscaling capabilities to match your resource usage. This can be particularly helpful for inference workloads that see spikes in demand.

Horizontal Pod Autoscaler (HPA)#

The Horizontal Pod Autoscaler uses metrics such as CPU, memory, or custom metrics (e.g., requests per second) to adjust the number of running Pods.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ml-model-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ml-model-deployment
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

When CPU usage exceeds 70%, Kubernetes automatically scales up Pods in the ml-model-deployment. And when the usage goes down, it scales back down.

Distributed Training#

Distributed training usually requires more advanced frameworks like Horovod or PyTorch’s Distributed Data Parallel. Kubernetes can schedule multiple containers across different nodes and allow them to communicate via a Service. This is more advanced and typically involves specialized custom resource definitions (CRDs) offered by tools like Kubeflow.

Basic Example: Deploying a Simple Model on K8s#

Let’s walk through a more concrete but simplified example of deploying a pre-trained model (e.g., a scikit-learn regression model) behind a REST API in a Flask app.

1. Flask Prediction App#

app.py
from flask import Flask, request, jsonify
import joblib
app = Flask(__name__)
# Load the model at startup
model = joblib.load("linear_regression_model.pkl")
@app.route("/predict", methods=["POST"])
def predict():
data = request.json
features = data.get("features", [])
if len(features) == 2: # expecting 2 features
prediction = model.predict([features])
return jsonify({"prediction": prediction.tolist()})
else:
return jsonify({"error": "Invalid input"}), 400
if __name__ == "__main__":
app.run(host="0.0.0.0", port=80)

2. Dockerfile#

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app/
CMD ["python", "app.py"]

3. Kubernetes Manifests#

deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: flask-ml-deployment
spec:
replicas: 2
selector:
matchLabels:
app: flask-ml
template:
metadata:
labels:
app: flask-ml
spec:
containers:
- name: flask-ml-container
image: myregistry.com/flask-ml-image:latest
ports:
- containerPort: 80
service.yaml
apiVersion: v1
kind: Service
metadata:
name: flask-ml-service
spec:
selector:
app: flask-ml
ports:
- port: 80
targetPort: 80
protocol: TCP
type: ClusterIP

After applying these YAML files, your model is accessible within the cluster on the flask-ml-service:80 endpoint. For external traffic, you’d configure an Ingress or use a LoadBalancer-type Service.

Testing with cURL or Python#

Within the cluster or via port-forwarding, you can test your endpoint:

Terminal window
curl -X POST -H "Content-Type: application/json" \
-d '{"features": [3.2, 1.5]}' \
http://flask-ml-service/predict

Intermediate Concepts: CI/CD and Reproducibility#

Continuous Integration and Continuous Deployment#

Automating the build and deployment process is essential for agility. A typical flow might be:

  1. Commit to Git triggers a CI pipeline.
  2. Tests are run automatically.
  3. Docker image is built and pushed to the registry.
  4. Kubernetes manifests are updated or a Helm chart is deployed to your K8s cluster.

Tools such as Jenkins, GitLab CI, or GitHub Actions can facilitate these steps.

Reproducibility and Version Control#

In machine learning:

  • Data versioning matters as much as code versioning. Tools like DVC (Data Version Control) or MLflow can help track changes.
  • Model versioning is critical for proper A/B testing and rollbacks. Store your models in a central registry with distinct version tags.

Using GitOps strategies (e.g., Argo CD or Flux) ensures your cluster’s state is always inferred from source control.

Advanced Concepts: GPUs, Hyperparameter Tuning, and Auto-Scaling#

Using GPUs in Kubernetes#

Training deep learning models often requires GPU acceleration. When you have GPU-enabled nodes, Kubernetes can schedule containers that request GPU resources.

apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-training-deployment
spec:
replicas: 1
selector:
matchLabels:
app: gpu-training
template:
metadata:
labels:
app: gpu-training
spec:
containers:
- name: gpu-trainer
image: nvidia/cuda:11.0-base
resources:
limits:
nvidia.com/gpu: 1

This example requests a single GPU. K8s, with the NVIDIA device plugin, ensures pods requiring GPUs are placed on GPU-enabled nodes.

Hyperparameter Tuning with Kubernetes Jobs#

Hyperparameter tuning can be done by spawning multiple Jobs, each testing a different hyperparameter set:

apiVersion: batch/v1
kind: Job
metadata:
name: hyperparam-job
spec:
template:
spec:
containers:
- name: trainer
image: myregistry.com/training-image:latest
command: ["python", "train.py", "--lr=0.01", "--batch-size=64"]
restartPolicy: Never
backoffLimit: 0

By programmatically generating multiple Job manifests, you can run them in parallel to sweep over a parameter search space.

Autoscaling with Custom Metrics#

Kubernetes also allows for scaling based on custom metrics, such as the rate of inference requests or queue depth. With the help of metrics adapters and a system like Prometheus, you can define custom rules.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-metric-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: inference-deployment
minReplicas: 1
maxReplicas: 20
metrics:
- type: Pods
pods:
metric:
name: inference_requests_per_second
target:
type: AverageValue
averageValue: "50"

When the average inference requests per second surpass 50 across all Pods, Kubernetes spins up more Pods automatically.

Orchestration Frameworks: Kubeflow and Beyond#

Kubernetes is a strong foundation, but ML workflows often require even more specialized tooling. That’s where frameworks like Kubeflow, Argo, or MLflow on Kubernetes come in.

Kubeflow#

Kubeflow aims to make running machine learning workflows on Kubernetes straightforward. It includes:

  • Kubeflow Pipelines: A platform for building and deploying scalable ML workflows.
  • TFJob, PyTorchJob, MXNetJob: Custom resources for distributed training.
  • Central Dashboard: To monitor pipelines, notebooks, and experiments.

MLflow on K8s#

MLflow is another popular tool for tracking experiments, storing models, and managing the entire model lifecycle. It can be installed on Kubernetes, allowing you to store training artifacts and serve models in a standardized manner.

Argo Workflow#

Argo is a workflow engine for Kubernetes, allowing you to define multi-step data and ML pipelines. Similar to Kubeflow Pipelines, you define each stage as a set of containers that run in sequence or in parallel.

Orchestration ToolStrengthsTypical Use Cases
KubeflowEnd-to-end ML platform on K8s, focuses on distributed training and pipelines.Large-scale pipelines, end-to-end ML lifecycle.
Argo WorkflowGeneral-purpose workflow engine.Complex data processing or multi-step workflows.
MLflowExperiment tracking, model registry, and serving.All stages of ML development and deployment with tracking.
  1. Minimize Image Size: Large Docker images slow down deployment and scaling. Use slim images and ensure your Dockerfile only installs necessary dependencies.
  2. Enable Logging and Monitoring: Integrate your ML workloads with logging (e.g., EFK stack) and monitoring (Prometheus + Grafana). This gives visibility into performance and resource usage.
  3. Secure Your Pipeline: Implement security best practices, including scanning container images for vulnerabilities, using read-only file systems, and restricting permissions with Pod Security Policies or Pod Security Admission.
  4. Use Helm or Kustomize: For more advanced or repeated deployments, Helm or Kustomize templates reduce YAML duplication and confusion.
  5. Look Ahead to Serverless ML: Serverless offerings like Knative or vendor-specific solutions (e.g., AWS Fargate) could simplify the operational overhead. In some cases, you simply need to run your code without managing the underlying cluster.

Conclusion#

Transitioning your machine learning workflows from local notebooks to production-ready pipelines on Kubernetes is a massive step up in maturity. With containers providing consistency and Kubernetes offering robust orchestration, you can:

  • Easily scale training and inference workloads.
  • Manage multi-tenant environments for different data scientists and teams.
  • Integrate logs, monitoring, and advanced workflows like hyperparameter tuning or distributed training.
  • Lay the groundwork for future trends like serverless ML.

While the initial ramp-up in complexity might feel steep, the benefits in scalability, reproducibility, and efficiency are worth it. Whether you choose to simply orchestrate your Dockerized Python scripts or leverage full-fledged platforms like Kubeflow, the Kubernetes ecosystem is ready to handle your machine learning ambitions at scale.

By consistently refining your processes—adopting CI/CD, versioning data, integrating advanced autoscaling metrics, and using specialized frameworks—you’ll move beyond the days of single-machine notebooks. Your ML journey can then fully leverage the power and reliability of Kubernetes, turning ideas into reliable, scalable production services. Happy scaling!

From Notebooks to Nodes: Scaling Machine Learning on K8s
https://science-ai-hub.vercel.app/posts/656afa6c-93b6-4719-ab19-ac3da9ff127a/2/
Author
AICore
Published at
2025-06-27
License
CC BY-NC-SA 4.0