Reduce inference time for BERT models using neural architecture search and SageMaker Automated Model Tuning
AWS Machine Learning Blog
JANUARY 19, 2024
In this post, we demonstrate how to use neural architecture search (NAS) based structural pruning to compress a fine-tuned BERT model to improve model performance and reduce inference times. First, we use an Amazon SageMaker Studio notebook to fine-tune a pre-trained BERT model on a target task using a domain-specific dataset.
Let's personalize your content