Auto-complete, ML and Software Development - Artificial Intelligence Zone

Deploy Meta Llama 3.1-8B on AWS Inferentia using Amazon EKS and vLLM

AWS Machine Learning Blog

NOVEMBER 26, 2024

8B model With the setup complete, you can now deploy the model using a Kubernetes deployment. Complete the following steps: Check the deployment status: kubectl get deployments This will show you the desired, current, and up-to-date number of replicas. AWS_REGION.amazonaws.com/${ECR_REPO_NAME}:latest Deploy the Meta Llama 3.1-8B

Deploy Meta Llama 3.1-8B on AWS Inferentia using Amazon EKS and vLLM

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

Webinars

Trending Sources

Enhance deployment guardrails with inference component rolling updates for Amazon SageMaker AI inference

Webinars

Retrain ML models and automate batch predictions in Amazon SageMaker Canvas using updated datasets

Package and deploy classical ML and LLMs easily with Amazon SageMaker, part 2: Interactive User Experiences in SageMaker Studio

MIT Researchers Introduce LILO: A Neuro-Symbolic Framework for Learning Interpretable Libraries for Program Synthesis

Build verifiable explainability into financial services workflows with Automated Reasoning checks for Amazon Bedrock Guardrails

Introducing automatic training for solutions in Amazon Personalize

Top 50+ AI Coding Assistant Tools in 2023

Deploy a Hugging Face (PyAnnote) speaker diarization model on Amazon SageMaker as an asynchronous endpoint

Build a self-service digital assistant using Amazon Lex and Knowledge Bases for Amazon Bedrock

Node problem detection and recovery for AWS Neuron nodes within Amazon EKS clusters

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Transforming financial analysis with CreditAI on Amazon Bedrock: Octus’s journey with AWS

Announcing Rekogniton Custom Moderation: Enhance accuracy of pre-trained Rekognition moderation models with your data

Optimize for sustainability with Amazon CodeWhisperer

Improve performance of Falcon models with Amazon SageMaker

Fine-tune Llama 2 using QLoRA and Deploy it on Amazon SageMaker with AWS Inferentia2

How Getir reduced model training durations by 90% with Amazon SageMaker and AWS Batch

Announcing provisioned concurrency for Amazon SageMaker Serverless Inference

Boost inference performance for Mixtral and Llama 2 models with new Amazon SageMaker containers

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

Faster LLMs with speculative decoding and AWS Inferentia2

Scale your machine learning workloads on Amazon ECS powered by AWS Trainium instances

Get started quickly with AWS Trainium and AWS Inferentia using AWS Neuron DLAMI and AWS Neuron DLC

Introducing Amazon SageMaker HyperPod to train foundation models at scale

Deploy Falcon-40B with large model inference DLCs on Amazon SageMaker

Optimize data preparation with new features in AWS SageMaker Data Wrangler

Achieve high performance with lowest cost for generative AI inference using AWS Inferentia2 and AWS Trainium on Amazon SageMaker

How Clearwater Analytics is revolutionizing investment management with generative AI and Amazon SageMaker JumpStart

Achieve up to ~2x higher throughput while reducing costs by up to ~50% for generative AI inference on Amazon SageMaker with the new inference optimization toolkit – Part 2

Is your model good? A deep dive into Amazon SageMaker Canvas advanced metrics

Time series forecasting with Amazon SageMaker AutoML

Google Research, 2022 & Beyond: Language, Vision and Generative Models

Deploy large models at high performance using FasterTransformer on Amazon SageMaker

MLOps Is an Extension of DevOps. Not a Fork — My Thoughts on THE MLOPS Paper as an MLOps Startup CEO

Managing Computer Vision Projects with Micha? Tadeusiak

Best prompting practices for using the Llama 2 Chat LLM through Amazon SageMaker JumpStart

Deploying Conversational AI Products to Production With Jason Flaks

Unlock cost savings with the new scale down to zero feature in SageMaker Inference

Run ML inference on unplanned and spiky traffic using Amazon SageMaker multi-model endpoints

Llama 3.3 70B now available in Amazon SageMaker JumpStart

Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker

Fine-tune and deploy Llama 2 models cost-effectively in Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium

Stay Connected