What is AWS Neuron?
AWS Neuron is the software development kit (SDK) used to run deep learning and generative AI workloads on AWS Inferentia- and AWS Trainium-powered Amazon Elastic Compute Cloud (Amazon EC2) instances. It includes a compiler, runtime, training and inference libraries, and developer tools for monitoring, profiling, and debugging. Neuron supports your end-to-end machine learning (ML) development lifecycle including building and deploying deep learning and AI models, optimizing to achieve highest performance and lowest cost, and getting deeper insights into model behavior.
Native integration with popular ML frameworks and libraries
Neuron integrates natively with PyTorch and JAX, and essential ML libraries such as Hugging Face Optimum Neuron, PyTorch Lightning, and AXLearn. Neuron also supports OpenXLA, including StableHLO and GSPMD, enabling PyTorch, XLA, and JAX developers to use Neuron's compiler optimizations for Inferentia and Trainium. Neuron enables you to use Trainium- and Inferentia-based instances with services such as Amazon SageMaker, Amazon EKS, Amazon ECS, AWS ParallelCluster, and AWS Batch, as well as third-party services like Ray (Anyscale), Domino Data Lab, Datadog, and Weights & Biases.
Distributed training and inference libraries
Neuron includes out-of-the-box optimizations for distributed training and inference with the open source PyTorch libraries NxD Training and NxD Inference. NxD Training simplifies and optimizes large-scale distributed training and supports various model architectures, parallelism strategies, and training workflows. NxD Inference provides a comprehensive solution for optimized model inference with key features such as on-device sampling, QKV weight fusion, continuous batching, speculative decoding, dynamic bucketing, and distributed inference. NxD Inference also integrates with serving solutions like vLLM and Hugging Face TGI. They both include a model hub for different model architectures.
Advanced applied science capabilities
Neuron has several applied science capabilities to empower scientists and researchers to push the boundaries of open source AI research and innovation on Trainium and Inferentia. Neuron Kernel Interface (NKI) provides direct access to hardware primitives and instructions available on Trainium and Inferentia, enabling researchers to build and tune compute kernels for optimal performance. It is a Python-based programming environment which adopts commonly used Triton-like syntax and tile-level semantics. Researchers can use NKI to enhance deep learning models with new functionalities, optimizations, and science innovations. Neuron’s custom C++ operators enable developers to extend the SDK's functionality by creating their own operators optimized for Inferentia and Trainium.
Powerful developer tools
The AWS Neuron SDK offers a comprehensive toolset to provide deep insights into the monitoring, management, and optimization of deep learning models on AWS Inferentia- and Trainium-powered EC2 instances. It provides utilities like neuron-top, neuron-monitor, and Neuron Sysfs to monitor hardware resources, model execution, and system details. For containerized applications on Kubernetes and EKS, Neuron simplifies monitoring through Amazon CloudWatch integration and other popular observability tools like Data Dog and Weights & Biases. Additionally, the neuron-profile tool helps identify and address performance bottlenecks in both single-node and distributed applications, and provides native profiling capabilities for popular ML frameworks.