Auto-complete, Deep Learning and Inference Engine

Host ML models on Amazon SageMaker using Triton: TensorRT models

AWS Machine Learning Blog

MAY 8, 2023

TensorRT is an SDK developed by NVIDIA that provides a high-performance deep learning inference library. It’s optimized for NVIDIA GPUs and provides a way to accelerate deep learning inference in production environments. Triton Inference Server supports ONNX as a model format.

ML BERT Deep Learning Auto-complete

Build a personalized avatar with generative AI using Amazon SageMaker

AWS Machine Learning Blog

AUGUST 2, 2023

It also provides a built-in queuing mechanism for queuing up requests, and a task completion notification mechanism via Amazon SNS, in addition to other native features of SageMaker hosting such as auto scaling. To host the asynchronous endpoint, we must complete several steps. amazonaws.com/djl-inference:0.21.0-deepspeed0.8.3-cu117"

Generative AI

Generative AI Computer Vision Auto-complete Inference Engine

Artificial Intelligence Zone

Host ML models on Amazon SageMaker using Triton: TensorRT models

Build a personalized avatar with generative AI using Amazon SageMaker

Webinars

Stay Connected