zfn9
Published on July 17, 2025

Serving Predictions: Deploying a Machine Learning Model on AWS EC2

Building a machine learning model takes time and effort, but a model isn’t very useful until others can interact with it. Hosting your model as an accessible service allows applications, users, or systems to make use of its predictions in real time. Amazon EC2 is a common choice for deploying models because it gives you full control over the environment and lets you scale up or down as needed. Although setting it up involves several steps, it’s straightforward when broken down clearly. This guide explains how to deploy a machine learning model on AWS EC2 step by step.

Preparing Your Model and Environment

Before starting your server, make sure your machine learning model is saved in a portable format that’s easy to load. Common formats include .pkl for Scikit-learn, .pt for PyTorch, or .h5 for TensorFlow. You also need a serving script that loads the model and accepts requests, often using frameworks like Flask or FastAPI to provide an HTTP interface. This script defines how data is received, passed to the model, and sent back as a response. Test your script locally and confirm it works as expected before moving it to the cloud.

List all dependencies your script requires in a requirements.txt file. Include exact versions to avoid unexpected compatibility issues. Once your files are ready, sign in to your AWS account and ensure you have a key pair for SSH access. If not, create one through the AWS console and download the .pem file. Choose an AWS region close to where most requests will come from, which can help reduce latency. Decide on an appropriate instance type; lightweight CPU-bound models often work fine on smaller instances like t3.medium, while deep learning models that rely on GPU acceleration need something like g4dn.xlarge.

Launching and Configuring the EC2 Instance

From the AWS console, go to the EC2 dashboard and select “Launch Instance.” Choose an Amazon Machine Image (AMI), such as Ubuntu LTS, and name your instance so it’s easy to identify later. Next, select an instance type that matches your workload. If unsure, start with a modest size — AWS allows you to change instance types later without starting over.

Attach your key pair to enable secure SSH access, and create a security group that opens only the necessary ports. At a minimum, allow port 22 for SSH and whichever port your application will listen on (such as 80 for HTTP or 5000 for development). Launch the instance and wait a few minutes for it to become ready.

You can now connect to the server using SSH. In your terminal, run a command similar to:

ssh -i /path/to/key.pem ubuntu@your-ec2-public-ip

Once logged in, update the server’s package lists and install Python, pip, and virtual environment tools. Create and activate a virtual environment to isolate your project’s dependencies from the system. Upload your model file, serving script, and requirements.txt to the server using scp or another file transfer method. Install the dependencies into the virtual environment so your script has everything it needs to run.

Serving Your Model as a Web Application

After setting up the environment and dependencies, test your serving script to make sure it starts correctly. If you’re using Flask or FastAPI, make sure it listens on 0.0.0.0 instead of localhost so it can accept external requests. Running the script now should start a web server that listens on the designated port.

To keep your application running even after you disconnect, you can use tmux or screen to leave the process active in the background. For a more reliable solution, you can use gunicorn as a process manager for Flask and configure a systemd service to start your app automatically on boot.

For a more polished setup, install and configure Nginx to act as a reverse proxy. Nginx can listen on port 80 and forward incoming requests to your Python application running on an internal port, while handling connection management more efficiently. Install Nginx on the server, set up a simple configuration file for your app, and reload the service. Check that your instance’s security group has port 80 open so users can access your application.

Test your endpoint by sending a sample request and verifying that your model responds correctly. This is a good time to check response times, ensure data is returned as expected, and handle any unexpected input gracefully.

Securing and Maintaining Your Deployment

Once the model is live, take a few more steps to protect and maintain your deployment. Disable direct root logins over SSH and use strong key-based authentication with non-default usernames. Regularly update the operating system and Python packages to reduce potential vulnerabilities. Limit open ports to only what’s needed for your application to run.

To protect user data and prevent eavesdropping, serve your app over HTTPS. You can obtain a free SSL/TLS certificate through Let’s Encrypt and configure it with Nginx. This ensures all communication between clients and your server is encrypted.

Set up basic monitoring to keep an eye on CPU, memory, and disk usage. AWS CloudWatch offers convenient dashboards, or you can use a lightweight tool like htop and logs to stay informed. Testing your endpoint periodically ensures it continues to function as intended. If demand increases, you can take a snapshot of your configured instance, launch more servers from that snapshot, and place them behind a load balancer. This distributes traffic and improves reliability without much additional setup.

Conclusion

Deploying a machine learning model on AWS EC2 makes your work accessible to others while giving you full control over the environment. Preparing your files and dependencies, setting up and configuring the server, serving the model through a web application, and securing the deployment are all manageable steps when approached methodically. AWS EC2 allows you to adjust resources over time to fit your needs and handle changes in demand. With a well-tested script and sensible practices, you can run a model that serves predictions reliably and remains easy to maintain. This setup keeps your model useful and ready for real-world use.

Related Resources:

By following these steps, deploying your machine learning model on AWS EC2 can be a smooth and rewarding process, enabling you to bring your innovations to a wider audience.