Data Ingestion, Large Language Models and Metadata

Data Ingestion

Large Language Models

Metadata

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

AWS Machine Learning Blog

APRIL 7, 2025

One of these strategies is using Amazon Simple Storage Service (Amazon S3) folder structures and Amazon Bedrock Knowledge Bases metadata filtering to enable efficient data segmentation within a single knowledge base. The S3 bucket, containing customer data and metadata, is configured as a knowledge base data source.

Metadata

Metadata Data Ingestion Generative AI Natural Language Processing

The importance of data ingestion and integration for enterprise AI

IBM Journey to AI blog

JANUARY 9, 2024

Companies still often accept the risk of using internal data when exploring large language models (LLMs) because this contextual data is what enables LLMs to change from general-purpose to domain-specific knowledge. In the generative AI or traditional AI development cycle, data ingestion serves as the entry point.

Data Ingestion

Data Ingestion Data Integration Data Quality LLM

Join 15,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Trending Sources

How AWS Sales uses generative AI to streamline account planning

AWS Machine Learning Blog

APRIL 3, 2025

Mid-market Account Manager Amazon Q, Amazon Bedrock, and other AWS services underpin this experience, enabling us to use large language models (LLMs) and knowledge bases (KBs) to generate relevant, data-driven content for APs. Its a game-changer for serving my full portfolio of accounts.

Generative AI

Generative AI Metadata Software Development AI

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Secure a generative AI assistant with OWASP Top 10 mitigation

Flipboard

JANUARY 24, 2025

In this post, we show you an example of a generative AI assistant application and demonstrate how to assess its security posture using the OWASP Top 10 for Large Language Model Applications , as well as how to apply mitigations for common threats. Alternatively, you can choose to use a customer managed key.

Generative AI

Generative AI LLM AI AI

LlamaIndex: Augment your LLM Applications with Custom Data Easily

Unite.AI

OCTOBER 25, 2023

Large language models (LLMs) like OpenAI's GPT series have been trained on a diverse range of publicly accessible data, demonstrating remarkable capabilities in text generation, summarization, question answering, and planning. Among the indexes, ‘VectorStoreIndex' is often the go-to choice.

LLM

LLM OpenAI Prompt Engineer Prompt Engineering

Operationalizing Large Language Models: How LLMOps can help your LLM-based applications succeed

deepsense.ai

JULY 30, 2023

To start simply, you could think of LLMOps ( Large Language Model Operations) as a way to make machine learning work better in the real world over a long period of time. As previously mentioned: model training is only part of what machine learning teams deal with. What is LLMOps? Why are these elements so important?

Large Language Models

Large Language Models LLM Machine Learning Automation

How Deltek uses Amazon Bedrock for question and answering on government solicitation documents

AWS Machine Learning Blog

AUGUST 9, 2024

Retrieval Augmented Generation (RAG) has emerged as a leading method for using the power of large language models (LLMs) to interact with documents in natural language. The first step is data ingestion, as shown in the following diagram. This structure can be used to optimize data ingestion.

Data Ingestion

Data Ingestion Metadata LLM Generative AI

Drive hyper-personalized customer experiences with Amazon Personalize and generative AI

AWS Machine Learning Blog

NOVEMBER 26, 2023

You follow the same process of data ingestion, training, and creating a batch inference job as in the previous use case. Announcing LangChain integration to seamlessly integrate Amazon Personalize with the LangChain framework LangChain is a powerful open-source framework that allows for integration with large language models (LLMs).

Generative AI

Generative AI Metadata Software Engineer AI

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

This post highlights how Twilio enabled natural language-driven data exploration of business intelligence (BI) data with RAG and Amazon Bedrock. Twilio’s use case Twilio wanted to provide an AI assistant to help their data analysts find data in their data lake.

Metadata

Metadata LLM Prompt Engineer Prompt Engineering

Personalize your generative AI applications with Amazon SageMaker Feature Store

AWS Machine Learning Blog

OCTOBER 6, 2023

Large language models (LLMs) are revolutionizing fields like search engines, natural language processing (NLP), healthcare, robotics, and code generation. A feature store maintains user profile data. A media metadata store keeps the promotion movie list up to date.

Generative AI

Generative AI LLM Natural Language Processing Metadata

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Core features of end-to-end MLOps platforms End-to-end MLOps platforms combine a wide range of essential capabilities and tools, which should include: Data management and preprocessing : Provide capabilities for data ingestion, storage, and preprocessing, allowing you to efficiently manage and prepare data for training and evaluation.

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

AWS Machine Learning Blog

NOVEMBER 15, 2023

The teams built a new data ingestion mechanism, allowing the CTR files to be jointly delivered with the audio file to an S3 bucket. In the future, Principal plans to continue expanding postprocessing capabilities with additional data aggregation, analytics, and natural language generation (NLG) models for text summarization.

Data Ingestion

Data Ingestion Metadata NLP Data Scientist

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 2, 2024

Opensearch Dashboards provides powerful search and analytical capabilities, allowing teams to dive deeper into generative AI model behavior, user interactions, and system-wide metrics. This workflow includes the following steps: Data can be securely transferred to AWS using either custom or existing tools or the AWS Transfer family.

Generative AI

Generative AI Data Ingestion AI AI

11 Trending LLM Topics Coming to ODSC West 2024

ODSC - Open Data Science

SEPTEMBER 17, 2024

As one of the most rapidly developing fields in AI, the capabilities for and applications of Large Language Models (LLMs) are changing and growing continuously. It can be hard to keep on top of all the advancements. At ODSC West this October 29th-31st, you’ll find a wide range of workshops, tutorials, and talks on LLMs and RAG.

LLM

LLM Large Language Models Metadata Data Science

John Snow Labs to Present Latest Advances in Healthcare Generative AI at HIMSS 2025

John Snow Labs

FEBRUARY 18, 2025

This talk will explore a new capability that transforms diverse clinical data (EHR, FHIR, notes, and PDFs) into a unified patient timeline, enabling natural language question answering.

Generative AI

Generative AI Data Ingestion Metadata Automation

Introducing the Topic Tracks for ODSC East 2025: Spotlight on Gen AI, AI Agents, LLMs, & More

ODSC - Open Data Science

FEBRUARY 25, 2025

Topics Include: Agentic AI DesignPatterns LLMs & RAG forAgents Agent Architectures &Chaining Evaluating AI Agent Performance Building with LangChain and LlamaIndex Real-World Applications of Autonomous Agents Who Should Attend: Data Scientists, Developers, AI Architects, and ML Engineers seeking to build cutting-edge autonomous systems.

Data Scientist

Data Scientist Machine Learning Large Language Models ML Engineer

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon SageMaker Canvas is a no-code machine learning (ML) service that empowers business analysts and domain experts to build, train, and deploy ML models without writing a single line of code. As the data scientist, complete the following steps: In the Environments section of the Banking-Consumer-ML project, choose SageMaker Studio.

Machine Learning

Machine Learning Data Scientist ML Data Quality

LLMOps: What It Is, Why It Matters, and How to Implement It

The MLOps Blog

MARCH 12, 2024

TL;DR LLMOps involves managing the entire lifecycle of Large Language Models (LLMs), including data and prompt management, model fine-tuning and evaluation, pipeline orchestration, and LLM deployment. What is Large Language Model Operations (LLMOps)? What the future of LLMOps looks like.

Prompt Engineering

Prompt Engineering Prompt Engineer Large Language Models LLM

Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale

The MLOps Blog

FEBRUARY 13, 2025

During my talk at NeurIPS, I broke down five key lessons learned from teams facing large-scale model training and monitoring. Real-time monitoring prevents costly failures Imagine this: you’re training a large language model on thousands of GPUs at a cost of hundreds of thousands of dollars per day.

Data Ingestion

Data Ingestion Automation Software Engineer Metadata

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

In order to train transformer models on internet-scale data, huge quantities of PBAs were needed. In November 2022, ChatGPT was released, a large language model (LLM) that used the transformer architecture, and is widely credited with starting the current generative AI boom. 32xlarge 0 16 0 128 512 512 4 x 1.9

ML Deep Learning Algorithm Large Language Models

Unlock the power of structured data for enterprises using natural language with Amazon Q Business

AWS Machine Learning Blog

AUGUST 20, 2024

One of the most common applications of generative artificial intelligence (AI) and large language models (LLMs) in an enterprise environment is answering questions based on the enterprise’s knowledge corpus. In response, Amazon Q Business provides an appropriate Athena query to run.

Natural Language Processing

Natural Language Processing Metadata NLP Data Ingestion

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

To make that possible, your data scientists would need to store enough details about the environment the model was created in and the related metadata so that the model could be recreated with the same or similar outcomes. Your ML platform must have versioning in-built because code and data mostly make up the ML system.

Machine Learning

Machine Learning Data Scientist ML Metadata

Simplify automotive damage processing with Amazon Bedrock and vector databases

AWS Machine Learning Blog

NOVEMBER 14, 2024

Although these models can provide precise estimates based on historical data, they can’t be generalized to provide a quick range of estimates and any changes to the damage dataset (which includes updated makes and models) or varying repair estimates based on parts, labor, and facility. This technique is called semantic search.

Metadata

Metadata Data Ingestion Generative AI Computer Vision

Introducing Amazon Kendra GenAI Index – Enhanced semantic search and retrieval capabilities

AWS Machine Learning Blog

DECEMBER 4, 2024

AWS customers use Amazon Kendra with large language models (LLMs) to quickly create secure, generative AI –powered conversational experiences on top of your enterprise content. This requires implementing information extraction models, optimizing text processing, and balancing sparse and dense retrieval methods.

Metadata

Metadata Generative AI Data Ingestion Software Engineer

Dive deep into vector data stores using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

OCTOBER 11, 2024

This post dives deep into Amazon Bedrock Knowledge Bases , which helps with the storage and retrieval of data in vector databases for RAG-based workflows, with the objective to improve large language model (LLM) responses for inference involving an organization’s datasets.

Metadata

Metadata Generative AI LLM Data Ingestion

Discover insights from your Amazon Aurora PostgreSQL database using the Amazon Q Business connector

AWS Machine Learning Blog

DECEMBER 11, 2024

Next, you need to index this data to make it available for a Retrieval Augmented Generation (RAG) approach, where relevant passages are delivered with high accuracy to a large language model (LLM). When the data source state is Active , choose Sync now.

Auto-complete

Auto-complete IDP Generative AI Metadata

Artificial Intelligence Zone

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

The importance of data ingestion and integration for enterprise AI

Webinars

Trending Sources

How AWS Sales uses generative AI to streamline account planning

Webinars

Secure a generative AI assistant with OWASP Top 10 mitigation

LlamaIndex: Augment your LLM Applications with Custom Data Easily

Operationalizing Large Language Models: How LLMOps can help your LLM-based applications succeed

How Deltek uses Amazon Bedrock for question and answering on government solicitation documents

Drive hyper-personalized customer experiences with Amazon Personalize and generative AI

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Personalize your generative AI applications with Amazon SageMaker Feature Store

MLOps Landscape in 2023: Top Tools and Platforms

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

11 Trending LLM Topics Coming to ODSC West 2024

John Snow Labs to Present Latest Advances in Healthcare Generative AI at HIMSS 2025

Introducing the Topic Tracks for ODSC East 2025: Spotlight on Gen AI, AI Agents, LLMs, & More

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

LLMOps: What It Is, Why It Matters, and How to Implement It

Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale

A review of purpose-built accelerators for financial services

Unlock the power of structured data for enterprises using natural language with Amazon Q Business

Definite Guide to Building a Machine Learning Platform

Simplify automotive damage processing with Amazon Bedrock and vector databases

Introducing Amazon Kendra GenAI Index – Enhanced semantic search and retrieval capabilities

Dive deep into vector data stores using Amazon Bedrock Knowledge Bases

Discover insights from your Amazon Aurora PostgreSQL database using the Amazon Q Business connector

Stay Connected