Building a machine learning model takes time and effort, but a model isn’t very useful until others can interact with it. Hosting your model as an accessible service allows applications, users, or systems to make use of its predictions in real time. Amazon EC2 is a common choice for deploying models because it gives you full control over the environment and lets you scale up or down as needed. Although setting it up involves several steps, it’s straightforward when broken down clearly. This guide explains how to deploy a machine learning model on AWS EC2 step by step.
Before starting your server, make sure your machine learning model is saved in a portable format that’s easy to load. Common formats include .pkl
for Scikit-learn, .pt
for PyTorch, or .h5
for TensorFlow. You also need a serving script that loads the model and accepts requests, often using frameworks like Flask or FastAPI to provide an HTTP interface. This script defines how data is received, passed to the model, and sent back as a response. Test your script locally and confirm it works as expected before moving it to the cloud.
List all dependencies your script requires in a requirements.txt
file. Include exact versions to avoid unexpected compatibility issues. Once your files are ready, sign in to your AWS account and ensure you have a key pair for SSH access. If not, create one through the AWS console and download the .pem
file. Choose an AWS region close to where most requests will come from, which can help reduce latency. Decide on an appropriate instance type; lightweight CPU-bound models often work fine on smaller instances like t3.medium
, while deep learning models that rely on GPU acceleration need something like g4dn.xlarge
.
From the AWS console, go to the EC2 dashboard and select “Launch Instance.” Choose an Amazon Machine Image (AMI), such as Ubuntu LTS, and name your instance so it’s easy to identify later. Next, select an instance type that matches your workload. If unsure, start with a modest size — AWS allows you to change instance types later without starting over.
Attach your key pair to enable secure SSH access, and create a security group that opens only the necessary ports. At a minimum, allow port 22 for SSH and whichever port your application will listen on (such as 80 for HTTP or 5000 for development). Launch the instance and wait a few minutes for it to become ready.
You can now connect to the server using SSH. In your terminal, run a command similar to:
ssh -i /path/to/key.pem ubuntu@your-ec2-public-ip
Once logged in, update the server’s package lists and install Python, pip, and virtual environment tools. Create and activate a virtual environment to isolate your project’s dependencies from the system. Upload your model file, serving script, and requirements.txt
to the server using scp
or another file transfer method. Install the dependencies into the virtual environment so your script has everything it needs to run.
After setting up the environment and dependencies, test your serving script to make sure it starts correctly. If you’re using Flask or FastAPI, make sure it listens on 0.0.0.0
instead of localhost so it can accept external requests. Running the script now should start a web server that listens on the designated port.
To keep your application running even after you disconnect, you can use tmux
or screen
to leave the process active in the background. For a more reliable solution, you can use gunicorn
as a process manager for Flask and configure a systemd
service to start your app automatically on boot.
For a more polished setup, install and configure Nginx to act as a reverse proxy. Nginx can listen on port 80 and forward incoming requests to your Python application running on an internal port, while handling connection management more efficiently. Install Nginx on the server, set up a simple configuration file for your app, and reload the service. Check that your instance’s security group has port 80 open so users can access your application.
Test your endpoint by sending a sample request and verifying that your model responds correctly. This is a good time to check response times, ensure data is returned as expected, and handle any unexpected input gracefully.
Once the model is live, take a few more steps to protect and maintain your deployment. Disable direct root logins over SSH and use strong key-based authentication with non-default usernames. Regularly update the operating system and Python packages to reduce potential vulnerabilities. Limit open ports to only what’s needed for your application to run.
To protect user data and prevent eavesdropping, serve your app over HTTPS. You can obtain a free SSL/TLS certificate through Let’s Encrypt and configure it with Nginx. This ensures all communication between clients and your server is encrypted.
Set up basic monitoring to keep an eye on CPU, memory, and disk usage. AWS CloudWatch offers convenient dashboards, or you can use a lightweight tool like htop
and logs to stay informed. Testing your endpoint periodically ensures it continues to function as intended. If demand increases, you can take a snapshot of your configured instance, launch more servers from that snapshot, and place them behind a load balancer. This distributes traffic and improves reliability without much additional setup.
Deploying a machine learning model on AWS EC2 makes your work accessible to others while giving you full control over the environment. Preparing your files and dependencies, setting up and configuring the server, serving the model through a web application, and securing the deployment are all manageable steps when approached methodically. AWS EC2 allows you to adjust resources over time to fit your needs and handle changes in demand. With a well-tested script and sensible practices, you can run a model that serves predictions reliably and remains easy to maintain. This setup keeps your model useful and ready for real-world use.
Related Resources:
By following these steps, deploying your machine learning model on AWS EC2 can be a smooth and rewarding process, enabling you to bring your innovations to a wider audience.
Learn simple steps to estimate the time and cost of a machine learning project, from planning to deployment and risk management.
Learn simple steps to estimate the time and cost of a machine learning project, from planning to deployment and risk management
We've raised $100 million to scale open machine learning and support global communities in building transparent, inclusive, and ethical AI systems.
Discover how the integration of IoT and machine learning drives predictive analytics, real-time data insights, optimized operations, and cost savings.
Explore how deep learning transforms industries with innovation and problem-solving power.
Machine learning bots automate workflows, eliminate paper, boost efficiency, and enable secure digital offices overnight
Learn how pattern matching in machine learning powers AI innovations, driving smarter decisions across modern industries
Discover the best books to learn Natural Language Processing, including Natural Language Processing Succinctly and Deep Learning for NLP and Speech Recognition.
Explore how AI-powered personalized learning tailors education to fit each student’s pace, style, and progress.
Learn how transfer learning helps AI learn faster, saving time and data, improving efficiency in machine learning models.
Natural Language Processing Succinctly and Deep Learning for NLP and Speech Recognition are the best books to master NLP
Discover what an AI model is, how it operates, and its significance in transforming machine learning tasks. Explore different types of AI models with clarity and simplicity.
Hyundai creates new brand to focus on the future of software-defined vehicles, transforming how cars adapt, connect, and evolve through intelligent software innovation.
Discover how Deloitte's Zora AI is reshaping enterprise automation and intelligent decision-making at Nvidia GTC 2025.
Discover how Nvidia, Google, and Disney's partnership at GTC aims to revolutionize robot AI infrastructure, enhancing machine learning and movement in real-world scenarios.
What is Nvidia's new AI Factory Platform, and how is it redefining AI reasoning? Here's how GTC 2025 set a new direction for intelligent computing.
Can talking cars become the new normal? A self-driving taxi prototype is testing a conversational AI agent that goes beyond basic commands—here's how it works and why it matters.
Hyundai is investing $21 billion in the U.S. to enhance electric vehicle production, modernize facilities, and drive innovation, creating thousands of skilled jobs and supporting sustainable mobility.
An AI startup hosted a hackathon to test smart city tools in simulated urban conditions, uncovering insights, creative ideas, and practical improvements for more inclusive cities.
Researchers fine-tune billion-parameter AI models to adapt them for specific, real-world tasks. Learn how fine-tuning techniques make these massive systems efficient, reliable, and practical for healthcare, law, and beyond.
How AI is shaping the 2025 Masters Tournament with IBM’s enhanced features and how Meta’s Llama 4 models are redefining open-source innovation.
Discover how next-generation technology is redefining NFL stadiums with AI-powered systems that enhance crowd flow, fan experience, and operational efficiency.
Gartner forecasts task-specific AI will outperform general AI by 2027, driven by its precision and practicality. Discover the reasons behind this shift and its impact on the future of artificial intelligence.
Hugging Face has entered the humanoid robots market following its acquisition of a robotics firm, blending advanced AI with lifelike machines for homes, education, and healthcare.