Boost inference performance for Mixtral and Llama 2 models with new Amazon SageMaker containers
AWS Machine Learning Blog
APRIL 8, 2024
This version offers support for new models (including Mixture of Experts), performance and usability improvements across inference backends, as well as new generation details for increased control and prediction explainability (such as reason for generation completion and token level log probabilities).
Let's personalize your content