Auto-complete, BERT and Inference Engine - Artificial Intelligence Zone

Host ML models on Amazon SageMaker using Triton: TensorRT models

AWS Machine Learning Blog

MAY 8, 2023

With kernel auto-tuning, the engine selects the best algorithm for the target GPU, maximizing hardware utilization. Overall, TensorRT’s combination of techniques results in faster inference and lower latency compared to other inference engines. These functions are used during the inference step.

ML BERT Deep Learning Auto-complete

Artificial Intelligence Zone

Host ML models on Amazon SageMaker using Triton: TensorRT models

Webinars

Stay Connected