AI Modeling, Inference Engine and Large Language Models

NVIDIA Dynamo: Scaling AI inference with open-source efficiency

AI News

MARCH 19, 2025

Efficiently managing and coordinating AI inference requests across a fleet of GPUs is a critical endeavour to ensure that AI factories can operate with optimal cost-effectiveness and maximise the generation of token revenue. Dynamo orchestrates and accelerates inference communication across potentially thousands of GPUs.

Big Data

Big Data AI AI Inference Engine

The Best Inference APIs for Open LLMs to Enhance Your AI App

Unite.AI

DECEMBER 12, 2024

Imagine this: you have built an AI app with an incredible idea, but it struggles to deliver because running large language models (LLMs) feels like trying to host a concert with a cassette player. This is where inference APIs for open LLMs come in. The potential is there, but the performance?

LLM

LLM AI AI OpenAI

The AI Boom Did Not Bust, but AI Computing is Definitely Changing

Unite.AI

MARCH 19, 2025

This is not the sound of an AI boom going bust, but there has been a growing unease around how much money is being spent on enabling AI applications. One was an understanding that DeepSeek did not invent a new way to work with AI. After AI models have been trained, things change.

Inference Engine

Inference Engine AI AI Large Language Models

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

MORE WEBINARS

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Unite.AI

JANUARY 17, 2024

Due to their exceptional content creation capabilities, Generative Large Language Models are now at the forefront of the AI revolution, with ongoing efforts to enhance their generative abilities. However, despite rapid advancements, these models require substantial computational power and resources. Let's begin.

Large Language Models

Large Language Models Inference Engine LLM Natural Language Processing

Zhipu AI Releases GLM-4-Voice: A New Open-Source End-to-End Speech Large Language Model

Marktechpost

OCTOBER 25, 2024

Modern AI models excel in text generation, image understanding, and even creating visual content, but speech—the primary medium of human communication—presents unique hurdles. Zhipu AI recently released GLM-4-Voice, an open-source end-to-end speech large language model designed to address these limitations.

Large Language Models

Large Language Models Inference Engine Artificial Intelligence Artificial Intelligence

Baichuan-Omni: An Open-Source 7B Multimodal Large Language Model for Image, Video, Audio, and Text Processing

Marktechpost

OCTOBER 18, 2024

Recent advancements in Large Language Models (LLMs) have reshaped the Artificial intelligence (AI)landscape, paving the way for the creation of Multimodal Large Language Models (MLLMs). Don’t Forget to join our 50k+ ML SubReddit.

Large Language Models

Large Language Models Natural Language Processing Inference Engine LLM

Design Patterns in Python for AI and LLM Engineers: A Practical Guide

Unite.AI

NOVEMBER 25, 2024

As AI engineers, crafting clean, efficient, and maintainable code is critical, especially when building complex systems. For AI and large language model (LLM) engineers , design patterns help build robust, scalable, and maintainable systems that handle complex workflows efficiently. GPU memory ).

Python

Python LLM AI Engineer AI

IBM Releases Granite 3.0 2B and 8B AI Models for AI Enterprises

Marktechpost

OCTOBER 21, 2024

Organizations require models that are adaptable, secure, and capable of understanding domain-specific contexts while also maintaining compliance and privacy standards. Traditional AI models often struggle with delivering such tailored performance, requiring businesses to make a trade-off between customization and general applicability.

AI Modeling

AI Modeling Large Language Models Natural Language Processing Inference Engine

Google Researchers Introduce UNBOUNDED: An Interactive Generative Infinite Game based on Generative AI Models

Marktechpost

OCTOBER 26, 2024

However, recent advancements in generative AI have opened up new possibilities for creating an infinite game experience. It maintains character and environment consistency through DreamBooth and a novel regional IP-Adapter that separates character and environment conditioning. Don’t Forget to join our 55k+ ML SubReddit.

Generative AI

Generative AI AI Modeling Large Language Models Inference Engine

Run AI Open Sources Run:ai Model Streamer: A Purpose-Built Solution to Make Large Models Loading Faster, and More Efficient

Marktechpost

OCTOBER 31, 2024

Run AI recently announced an open-source solution to tackle this very problem: Run AI: Model Streamer. This tool aims to drastically cut down the time it takes to load inference models, helping the AI community overcome one of its most notorious technical hurdles.

Data Scientist

Data Scientist Inference Engine Machine Learning AI Modeling

Deploying AI at Scale: How NVIDIA NIM and LangChain are Revolutionizing AI Integration and Performance

Unite.AI

SEPTEMBER 24, 2024

However, scaling AI across an organization takes work. It involves complex tasks like integrating AI models into existing systems, ensuring scalability and performance, preserving data security and privacy, and managing the entire lifecycle of AI models.

Inference Engine

Inference Engine Large Language Models AI AI

How NVIDIA AI Foundry Lets Enterprises Forge Custom Generative AI Models

NVIDIA

JULY 23, 2024

NVIDIA AI Foundry is a service that enables enterprises to use data, accelerated computing and software tools to create and deploy custom models that can supercharge their generative AI initiatives. The key difference is the product: TSMC produces physical semiconductor chips, while NVIDIA AI Foundry helps create custom models.

Generative AI

Generative AI AI Modeling AI AI

Overcoming Cross-Platform Deployment Hurdles in the Age of AI Processing Units

Unite.AI

JULY 18, 2024

Imagine working with an AI model that runs smoothly on one processor but struggles on another due to these differences. For developers and researchers, this means navigating complex problems to ensure their AI solutions are efficient and scalable on all types of hardware.

Neural Network

Neural Network AI Modeling AI AI

Start Up Your Engines: NVIDIA and Google Cloud Collaborate to Accelerate AI Development

NVIDIA

APRIL 9, 2024

Teams from the companies worked closely together to accelerate the performance of Gemma — built from the same research and technology used to create Google DeepMind’s most capable model yet, Gemini — with NVIDIA TensorRT-LLM , an open-source library for optimizing large language model inference, when running on NVIDIA GPUs.

AI Developer

AI Developer AI Development Generative AI Inference Engine

How NVIDIA Nim Can Revolutionize Deployment of Generative AI applications?

Towards AI

JULY 1, 2024

The deployment of these super powerful models into production environments is NOT easy and time efficient. As we are halfway there to 2025, companies have to move beyond API calling pretrained Large Language Models and go deep into thinking about deploying these full-scale models into production environment.

Generative AI

Generative AI Inference Engine Large Language Models OpenAI

Meta AI Releases Meta Spirit LM: An Open Source Multimodal Language Model Mixing Text and Speech

Marktechpost

OCTOBER 18, 2024

Traditionally, large language models (LLMs) used for building TTS pipelines convert speech to text using automatic speech recognition (ASR), process it using an LLM, and then convert the output back to speech via TTS. If you like our work, you will love our newsletter. Don’t Forget to join our 50k+ ML SubReddit.

Inference Engine

Inference Engine Large Language Models AI AI

TOPS of the Class: Decoding AI Performance on RTX AI PCs and Workstations

NVIDIA

JUNE 12, 2024

This is the kind of horsepower needed to handle AI-assisted digital content creation, AI super resolution in PC gaming, generating images from text or video, querying local large language models (LLMs) and more. LLM performance is measured in the number of tokens generated by the model. Source: Jan.ai

LLM

LLM Generative AI AI AI

SecCodePLT: A Unified Platform for Evaluating Security Risks in Code GenAI

Marktechpost

OCTOBER 19, 2024

Code generation AI models (Code GenAI) are becoming pivotal in developing automated software demonstrating capabilities in writing, debugging, and reasoning about code. These models may inadvertently introduce insecure code, which could be exploited in cyberattacks.

Inference Engine

Inference Engine Large Language Models LLM AI Modeling

Mechanistic Unlearning: A New AI Method that Uses Mechanistic Interpretability to Localize and Edit Specific Model Components Associated with Factual Recall Mechanisms

Marktechpost

OCTOBER 26, 2024

Large language models (LLMs) sometimes learn the things that we don’t want them to learn and understand knowledge. It’s important to find ways to remove or adjust this knowledge to keep AI accurate, precise, and in control. This approach aims to make edits more robust and reduce unintended side effects.

Inference Engine

Inference Engine Large Language Models AI AI

Controllable Safety Alignment (CoSA): An AI Framework Designed to Adapt Models to Diverse Safety Requirements without Re-Training

Marktechpost

OCTOBER 21, 2024

As large language models (LLMs) become increasingly capable and better day by day, their safety has become a critical topic for research. To create a safe model, model providers usually pre-define a policy or a set of rules. In various cases, a standard one-size-fits-all safe model is too restrictive to be helpful.

Large Language Models

Large Language Models Inference Engine LLM AI

Start Local, Go Global: India’s Startups Spur Growth and Innovation With NVIDIA Technology

NVIDIA

OCTOBER 23, 2024

Conversational AI for Indian Railway Customers Bengaluru-based startup CoRover.ai already has over a billion users of its LLM-based conversational AI platform, which includes text, audio and video-based agents. NVIDIA AI technology enables us to deliver enterprise-grade virtual assistants that support 1.3

Conversational AI

Conversational AI Chatbots Generative AI Natural Language Processing

Jina AI Released g.jina.ai: A Powerful API for Strengthening Human Written Content with Grounded, Fact-Based Information from Real-Time Searches

Marktechpost

OCTOBER 17, 2024

Jina AI announced the release of their latest product, g.jina.ai , designed to tackle the growing problem of misinformation and hallucination in generative AI models. This innovative tool is part of their larger suite of applications to improve factual accuracy and grounding in AI-generated and human-written content.

Inference Engine

Inference Engine Large Language Models LLM AI

ODSC’s AI Weekly Recap: Week of March 8th

ODSC - Open Data Science

MARCH 8, 2024

Gemma is a family of lightweight, state-of-the-art open models built from research and technology used to create Google Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Data Science Large Language Models

Large Action Models: Beyond Language, Into Action

Viso.ai

MAY 24, 2024

Large Action Models (LAMs) are AI software designed to take action in a hierarchical approach where tasks are broken down into smaller subtasks. Unlike large language models , a Large Action Model combines language understanding with logic and reasoning to execute various tasks.

Neural Network

Neural Network Robotics Automation Explainability

Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference

AWS Machine Learning Blog

DECEMBER 2, 2024

Today at AWS re:Invent 2024, we are excited to announce the new Container Caching capability in Amazon SageMaker, which significantly reduces the time required to scale generative AI models for inference. 70B model showed significant and consistent improvements in end-to-end (E2E) scaling times.

Generative AI

Generative AI Machine Learning Large Language Models ML Engineer

Serving LLMs using vLLM and Amazon EC2 instances with AWS AI chips

AWS Machine Learning Blog

NOVEMBER 26, 2024

The use of large language models (LLMs) and generative AI has exploded over the last year. With the release of powerful publicly available foundation models, tools for training, fine tuning and hosting your own LLM have also become democratized. llm = LLM(model="meta-llama/Llama-3.2-1B",

LLM

LLM AI AI Artificial Intelligence

Nvidia AI Quietly Launches Nemotron 70B: Crushing OpenAI’s GPT-4 on Various Benchmarks

Marktechpost

OCTOBER 16, 2024

Current generative AI models face challenges related to robustness, accuracy, efficiency, cost, and handling nuanced human-like responses. There is a need for more scalable and efficient solutions that can deliver precise outputs while being practical for diverse AI applications. Developed as part of the Llama 3.1

Large Language Models

Large Language Models Inference Engine Generative AI AI

Meta AI Releases New Quantized Versions of Llama 3.2 (1B & 3B): Delivering Up To 2-4x Increases in Inference Speed and 56% Reduction in Model Size

Marktechpost

OCTOBER 24, 2024

The rapid growth of large language models (LLMs) has brought significant advancements across various sectors, but it has also presented considerable challenges. By doing so, Meta AI aims to enhance the performance of large models while reducing the computational resources needed for deployment.

Large Language Models

Large Language Models NLP Natural Language Processing Inference Engine

Build a personalized avatar with generative AI using Amazon SageMaker

AWS Machine Learning Blog

AUGUST 2, 2023

In this post, we demonstrate how you can use generative AI models like Stable Diffusion to build a personalized avatar solution on Amazon SageMaker and save inference cost with multi-model endpoints (MMEs) at the same time. amazonaws.com/djl-inference:0.21.0-deepspeed0.8.3-cu117" deepspeed0.8.3-cu117"

Generative AI

Generative AI Computer Vision Auto-complete Natural Language Processing

NVIDIA NIM on AWS Supercharges AI Inference

NVIDIA

DECEMBER 4, 2024

NVIDIA NIM, part of the NVIDIA AI Enterprise software platform available in the AWS Marketplace , provides developers with a set of easy-to-use microservices designed for secure, reliable deployment of high-performance, enterprise-grade AI model inference across clouds, data centers and workstations.

Inference Engine

Inference Engine Generative AI Large Language Models AI

Artificial Intelligence Zone

NVIDIA Dynamo: Scaling AI inference with open-source efficiency

The Best Inference APIs for Open LLMs to Enhance Your AI App

Webinars

Trending Sources

The AI Boom Did Not Bust, but AI Computing is Definitely Changing

Webinars

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Zhipu AI Releases GLM-4-Voice: A New Open-Source End-to-End Speech Large Language Model

Baichuan-Omni: An Open-Source 7B Multimodal Large Language Model for Image, Video, Audio, and Text Processing

Design Patterns in Python for AI and LLM Engineers: A Practical Guide

IBM Releases Granite 3.0 2B and 8B AI Models for AI Enterprises

Google Researchers Introduce UNBOUNDED: An Interactive Generative Infinite Game based on Generative AI Models

Run AI Open Sources Run:ai Model Streamer: A Purpose-Built Solution to Make Large Models Loading Faster, and More Efficient

Deploying AI at Scale: How NVIDIA NIM and LangChain are Revolutionizing AI Integration and Performance

How NVIDIA AI Foundry Lets Enterprises Forge Custom Generative AI Models

Overcoming Cross-Platform Deployment Hurdles in the Age of AI Processing Units

Start Up Your Engines: NVIDIA and Google Cloud Collaborate to Accelerate AI Development

How NVIDIA Nim Can Revolutionize Deployment of Generative AI applications?

Meta AI Releases Meta Spirit LM: An Open Source Multimodal Language Model Mixing Text and Speech

TOPS of the Class: Decoding AI Performance on RTX AI PCs and Workstations

SecCodePLT: A Unified Platform for Evaluating Security Risks in Code GenAI

Mechanistic Unlearning: A New AI Method that Uses Mechanistic Interpretability to Localize and Edit Specific Model Components Associated with Factual Recall Mechanisms

Controllable Safety Alignment (CoSA): An AI Framework Designed to Adapt Models to Diverse Safety Requirements without Re-Training

Start Local, Go Global: India’s Startups Spur Growth and Innovation With NVIDIA Technology

Jina AI Released g.jina.ai: A Powerful API for Strengthening Human Written Content with Grounded, Fact-Based Information from Real-Time Searches

ODSC’s AI Weekly Recap: Week of March 8th

Large Action Models: Beyond Language, Into Action

Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference

Serving LLMs using vLLM and Amazon EC2 instances with AWS AI chips

Nvidia AI Quietly Launches Nemotron 70B: Crushing OpenAI’s GPT-4 on Various Benchmarks

Meta AI Releases New Quantized Versions of Llama 3.2 (1B & 3B): Delivering Up To 2-4x Increases in Inference Speed and 56% Reduction in Model Size

Build a personalized avatar with generative AI using Amazon SageMaker

NVIDIA NIM on AWS Supercharges AI Inference

Stay Connected