AWS Neuron

SDK to optimize AI and deep learning on AWS Trainium and AWS Inferentia

What is AWS Neuron?

AWS Neuron is the software development kit (SDK) used to run deep learning and generative AI workloads on AWS Inferentia- and AWS Trainium-powered Amazon Elastic Compute Cloud (Amazon EC2) instances. It includes a compiler, runtime, training and inference libraries, and developer tools for monitoring, profiling, and debugging. Neuron supports your end-to-end machine learning (ML) development lifecycle including building and deploying deep learning and AI models, optimizing to achieve highest performance and lowest cost, and getting deeper insights into model behavior.

background pattern

Native integration with popular ML frameworks and libraries

Neuron integrates natively with PyTorch and JAX, and essential ML libraries such as Hugging Face Optimum Neuron, PyTorch Lightning, and AXLearn. Neuron also supports OpenXLA, including StableHLO and GSPMD, enabling PyTorch, XLA, and JAX developers to use Neuron's compiler optimizations for Inferentia and Trainium. Neuron enables you to use Trainium- and Inferentia-based instances with services such as Amazon SageMaker, Amazon EKS, Amazon ECS, AWS ParallelCluster, and AWS Batch, as well as third-party services like Ray (Anyscale), Domino Data Lab, Datadog, and Weights & Biases.

background pattern

Distributed training and inference libraries

Neuron includes out-of-the-box optimizations for distributed training and inference with the open source PyTorch libraries NxD Training and NxD Inference. NxD Training simplifies and optimizes large-scale distributed training and supports various model architectures, parallelism strategies, and training workflows. NxD Inference provides a comprehensive solution for optimized model inference with key features such as on-device sampling, QKV weight fusion, continuous batching, speculative decoding, dynamic bucketing, and distributed inference. NxD Inference also integrates with serving solutions like vLLM and Hugging Face TGI. They both include a model hub for different model architectures.

background pattern

Advanced applied science capabilities

Neuron has several applied science capabilities to empower scientists and researchers to push the boundaries of open source AI research and innovation on Trainium and Inferentia. Neuron Kernel Interface (NKI) provides direct access to hardware primitives and instructions available on Trainium and Inferentia, enabling researchers to build and tune compute kernels for optimal performance. It is a Python-based programming environment which adopts commonly used Triton-like syntax and tile-level semantics. Researchers can use NKI to enhance deep learning models with new functionalities, optimizations, and science innovations. Neuron’s custom C++ operators enable developers to extend the SDK's functionality by creating their own operators optimized for Inferentia and Trainium.

background pattern

Powerful developer tools

The AWS Neuron SDK offers a comprehensive toolset to provide deep insights into the monitoring, management, and optimization of deep learning models on AWS Inferentia- and Trainium-powered EC2 instances. It provides utilities like neuron-top, neuron-monitor, and Neuron Sysfs to monitor hardware resources, model execution, and system details. For containerized applications on Kubernetes and EKS, Neuron simplifies monitoring through Amazon CloudWatch integration and other popular observability tools like Data Dog and Weights & Biases. Additionally, the neuron-profile tool helps identify and address performance bottlenecks in both single-node and distributed applications, and provides native profiling capabilities for popular ML frameworks.

background pattern

Getting started

Neuron Deep Learning Amazon Machine Images (Neuron DLAMIs) come pre-configured with the Neuron SDK, popular frameworks, and helpful libraries, allowing you to quickly begin training and running inference on AWS Inferentia. Neuron DLAMIs streamline your workflow and optimize performance, eliminating setup complexities so you can focus on building and deploying AI models. Get started with Neuron DLAMIs.

Quickly deploy models using pre-configured AWS Neuron Deep Learning Containers (Neuron DLCs) with optimized frameworks for Trainium and Inferentia. For custom solutions, build your own containers and leverage Kubernetes features like the Neuron Device Plugin, Neuron Scheduler Extension, and Helm Charts. Seamlessly integrate with AWS services like Amazon EKS, AWS Batch, and Amazon ECS for scalable deployments. Get started with Neuron DLCs.

Optimum Neuron bridges Hugging Face Transformers and the AWS Neuron SDK, providing standard Hugging Face APIs for Trainium and Inferentia. It offers solutions for both training and inference, including support for large-scale model training and deployment for AI workflows. Supporting Amazon SageMaker and pre-built Deep Learning Containers, Optimum Neuron simplifies the use of Trainium and Inferentia for ML. This integration allows developers to work with familiar Hugging Face interfaces while leveraging Trainium and Inferentia for their transformer-based projects. Get started with Hugging Face Optimum Neuron.

You can use Amazon SageMaker JumpStart to train and deploy models using Neuron. JumpStart provides support for fine-tuning and deploying popular models such as Meta’s Llama family of models. Get started with SageMaker JumpStart.