Containers
Announcing NVIDIA GPU support for Bottlerocket on Amazon ECS
Last year, we announced the general availability of the Amazon Elastic Container Service (Amazon ECS)-optimized Bottlerocket AMI. Bottlerocket is an open source project that focuses on security and maintainability, providing a reliable and consistent Linux distribution for hosting container-based workloads. Now, we are happy to announce that you can now run ECS NVIDIA GPU-accelerated workloads on ECS using Bottlerocket.
In this post, we will walk through how to create an Amazon ECS task to run an NVIDIA GPU workload on Bottlerocket.
Why Bottlerocket?
Customers have continued to adopt containers to run their workloads, and AWS saw a need for a Linux distribution designed and optimized to run these containerized applications. Bottlerocket OS was built to provide a secure foundation for hosts running containers, and minimizing operational overhead to manage them at scale. Bottlerocket is designed for reliable updates that can be applied through automation.
You can learn more about getting started with Bottlerocket and Amazon ECS in the Getting started with Bottlerocket and Amazon ECS blog post.
Setting up an ECS cluster with Bottlerocket and NVIDIA GPUs
Let’s have a look at how this is done in practice. We will be working in the us-west-2
(Oregon) Region.
Prerequisites
- The AWS CLI with appropriate credentials
- A default VPC in a region of your choice (you can also use an existing VPC in your account)
First, let’s create the ECS cluster named ecs-bottlerocket
.
aws ecs --region us-west-2 create-cluster --cluster-name ecs-bottlerocket
The instance we’re launching will need an AWS Identity and Access Management (IAM) role to communicate both with the ECS APIs and the Systems Manager Session Manager APIs as well. I have created an IAM role named ecsInstanceRole
that has both the AmazonSSMManagedInstanceCore and the AmazonEC2ContainerServiceforEC2Role managed policies attached.
The list of Bottlerocket Amazon Machine Images (AMIs) supported for use with NVIDIA GPUs is publicly available from AWS Systems Manager Parameter Store, so let’s get the AMI ID for the latest Bottlerocket release. (AMIs are available for both x86_64
and aarch64
architectures). In this blog post we are going to be using the x86_64
AMI.
Next, we get the list of subnets that are configured to allocate a public IP address.
To associate our EC2 instance to the ECS cluster, we need to provide some information to the instance when we create it: a small config file (userdata.toml) that has the details of the ECS cluster, saved in a file in the current directory.
A full set of supported settings is here.
Let’s deploy one Bottlerocket instance in one of the subnets above. We are choosing a public subnet for this blog post. It will be easier to debug and connect to the instances if needed. You can choose private or public subnets based on your use case.
We are using the p3.2xlarge instance type, which has one NVIDIA Tesla V100 Tesla Core GPU.
Next, let’s create the task definition for the sample application.
In the task definition, assign one NVIDIA GPU to our task through the resourceRequirements parameter. We are also defining the awslogs-group configuration for our task to send the log output from our container into Amazon CloudWatch.
The log group configuration is as follows:
- region: us-west-2
- log group name: /ecs/bottlerocket
- log stream prefix: demo-gpu
Create the CloudWatch log group specified above in the task definition.
aws logs create-log-group –log-group-name ‘/ecs/bottlerocket’ –region us-west-2
Register the task in ECS.
Run the task.
The task will run and execute a command () inside the container to provide information on the GPU configuration available, and then it will exit.
When you go into the ECS console in your account, you will see a stopped task. Select Clusters on the left menu, select the ecs-bottlerocket
cluster, and then select the Tasks tab.
Click on the task ID and then select the Logs tab, which will show you the log output from the task that just ran:
You can also view the log output from the container from the command line. By passing in both the log group name, the log stream name and a timeframe. In my case this would be:
Cleanup
To remove the resources that you created during this post, run the following commands.
Conclusion
In this post, we walked through how to create an ECS task definition with the appropriate configuration that will let you run a GPU enabled workload inside a container on Bottlerocket, quickly and securely. We also saw how the container logs are available in CloudWatch and how to access them from the command line. If you are looking for additional examples of GPU-accelerated workloads to run with Bottlerocket on ECS, you can check out the NVIDIA GPU-optimized containers from the NVIDIA NGC catalog on AWS Marketplace.
Bottlerocket is open source (MIT or Apache 2.0 licensed), meaning you have a number of well-documented freedoms to use, modify, and extend. Bottlerocket is also developed in the open on GitHub (https://github.com/bottlerocket-os/) and welcomes contribution, issues, and feedback on our discussion forum (https://github.com/bottlerocket-os/bottlerocket/discussions).