Data Ingestion, Metadata and ML - Artificial Intelligence Zone

Amazon Q Business simplifies integration of enterprise knowledge bases at scale

Flipboard

FEBRUARY 11, 2025

Amazon Q Business , a new generative AI-powered assistant, can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in an enterprises systems. Large-scale data ingestion is crucial for applications such as document analysis, summarization, research, and knowledge management.

Data Ingestion

Data Ingestion Metadata Machine Learning Generative AI

How Deltek uses Amazon Bedrock for question and answering on government solicitation documents

AWS Machine Learning Blog

AUGUST 9, 2024

Deltek is continuously working on enhancing this solution to better align it with their specific requirements, such as supporting file formats beyond PDF and implementing more cost-effective approaches for their data ingestion pipeline. The first step is data ingestion, as shown in the following diagram. What is RAG?

Data Ingestion

Data Ingestion Metadata LLM Generative AI

Secure a generative AI assistant with OWASP Top 10 mitigation

Flipboard

JANUARY 24, 2025

By default, Amazon Bedrock encrypts all knowledge base-related data using an AWS managed key. When setting up a data ingestion job for your knowledge base, you can also encrypt the job using a custom AWS Key Management Service (AWS KMS) key. Alternatively, you can choose to use a customer managed key.

Generative AI

Generative AI LLM AI AI

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Data4ML Preparation Guidelines (Beyond The Basics)

Towards AI

NOVEMBER 8, 2024

Data preparation isn’t just a part of the ML engineering process — it’s the heart of it. Photo by Myriam Jessier on Unsplash To set the stage, let’s examine the nuances between research-phase data and production-phase data. Data is a key differentiator in ML projects (more on this in my blog post below).

Data Ingestion

Data Ingestion Metadata ML Engineer ML

A Beginner’s Guide to Data Warehousing

Unite.AI

DECEMBER 5, 2023

In BI systems, data warehousing first converts disparate raw data into clean, organized, and integrated data, which is then used to extract actionable insights to facilitate analysis, reporting, and data-informed decision-making. The pipeline ensures correct, complete, and consistent data.

Metadata

Metadata Big Data ETL Data Mining

How the UNDP Independent Evaluation Office is using AWS AI/ML services to enhance the use of evaluation to support progress toward the Sustainable Development Goals

AWS Machine Learning Blog

MARCH 29, 2023

In this post, we discuss how the IEO developed UNDP’s artificial intelligence and machine learning (ML) platform—named Artificial Intelligence for Development Analytics (AIDA)— in collaboration with AWS, UNDP’s Information and Technology Management Team (UNDP ITM), and the United Nations International Computing Centre (UNICC).

ML

ML Metadata Data Ingestion Data Extraction

Drive hyper-personalized customer experiences with Amazon Personalize and generative AI

AWS Machine Learning Blog

NOVEMBER 26, 2023

Amazon Personalize is a fully managed machine learning (ML) service that makes it easy for developers to deliver personalized experiences to their users. You follow the same process of data ingestion, training, and creating a batch inference job as in the previous use case. You can also use this for sequential chains.

Generative AI

Generative AI Metadata Software Engineer AI

Build a multi-interface AI assistant using Amazon Q and Slack with Amazon CloudFront clickable references from an Amazon S3 bucket

AWS Machine Learning Blog

FEBRUARY 5, 2025

Amazon Kendra also supports the use of metadata for each source file, which enables both UIs to provide a link to its sources, whether it is the Spack documentation website or a CloudFront link. Furthermore, Amazon Kendra supports relevance tuning , enabling boosting certain data sources.

Data Ingestion

Data Ingestion AI AI Metadata

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications. For information about model pricing, refer to Amazon Bedrock pricing.

Metadata

Metadata LLM Prompt Engineer Prompt Engineering

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

AWS Machine Learning Blog

APRIL 26, 2024

With Knowledge Bases for Amazon Bedrock, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for fully managed Retrieval Augmented Generation (RAG). You can now interact with your documents in real time without prior data ingestion or database configuration.

Data Ingestion

Data Ingestion Generative AI Python Software Engineer

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

Personalize your generative AI applications with Amazon SageMaker Feature Store

AWS Machine Learning Blog

OCTOBER 6, 2023

One such component is a feature store, a tool that stores, shares, and manages features for machine learning (ML) models. Features are the inputs used during training and inference of ML models. Amazon SageMaker Feature Store is a fully managed repository designed specifically for storing, sharing, and managing ML model features.

Generative AI

Generative AI LLM Natural Language Processing Metadata

Airbnb Researchers Develop Chronon: A Framework for Developing Production-Grade Features for Machine Learning Models

Marktechpost

AUGUST 8, 2023

In the ever-evolving landscape of machine learning, feature management has emerged as a key pain point for ML Engineers at Airbnb. Airbnb recognized the need for a solution that could streamline feature data management, provide real-time updates, and ensure consistency between training and production environments.

Machine Learning

Machine Learning ML Engineer Data Ingestion ML

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

AWS Machine Learning Blog

APRIL 5, 2023

Amazon SageMaker Serverless Inference is a purpose-built inference service that makes it easy to deploy and scale machine learning (ML) models. The dataset is a collection of 147,702 product listings with multilingual metadata and 398,212 unique catalogue images. We use the first metadata file in this demo.

Metadata

Metadata Neural Network ML Python

Power recommendations and search using an IMDb knowledge graph – Part 3

AWS Machine Learning Blog

JANUARY 6, 2023

We downloaded the data from AWS Data Exchange and processed it in AWS Glue to generate KG files. In Part 2 , we demonstrated how to use Amazon Neptune ML (in Amazon SageMaker ) to train the KG and create KG embeddings. The KG files were stored in Amazon Simple Storage Service (Amazon S3) and then loaded in Amazon Neptune.

Metadata

Metadata Machine Learning Data Scientist ML

Build an image search engine with Amazon Kendra and Amazon Rekognition

AWS Machine Learning Blog

MAY 5, 2023

In this post, we discuss a machine learning (ML) solution for complex image searches using Amazon Kendra and Amazon Rekognition. Amazon Kendra is an intelligent search service powered by ML, and Amazon Rekognition is an ML service that can identify objects, people, text, scenes, and activities from images or videos.

Metadata

Metadata ETL ML Data Ingestion

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 2, 2024

This approach, when applied to generative AI solutions, means that a specific AI or machine learning (ML) platform configuration can be used to holistically address the operational excellence challenges across the enterprise, allowing the developers of the generative AI solution to focus on business value.

Generative AI

Generative AI Data Ingestion AI AI

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

AWS Machine Learning Blog

NOVEMBER 15, 2023

Combining accurate transcripts with Genesys CTR files, Principal could properly identify the speakers, categorize the calls into groups, analyze agent performance, identify upsell opportunities, and conduct additional machine learning (ML)-powered analytics.

Data Ingestion

Data Ingestion Metadata NLP Data Scientist

Automate the deployment of an Amazon Forecast time-series forecasting model

AWS Machine Learning Blog

MAY 4, 2023

Simple methods for time series forecasting use historical values of the same variable whose future values need to be predicted, whereas more complex, machine learning (ML)-based methods use additional information, such as the time series data of related variables. A dataset must conform to the schema defined within Forecast.

Automation

Automation Metadata Data Ingestion Data Scientist

First ODSC Europe 2023 Sessions Announced

ODSC - Open Data Science

MARCH 27, 2023

Learn about the flow, difficulties, and tools for performing ML clustering at scale Ori Nakar | Principal Engineer, Threat Research | Imperva Given that there are billions of daily botnet attacks from millions of different IPs, the most difficult challenge of botnet detection is choosing the most relevant data.

Machine Learning

Machine Learning Data Science Data Ingestion Deep Learning

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone makes it straightforward for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so they can discover, use, and collaborate to derive data-driven insights.

Machine Learning

Machine Learning Data Scientist ML Data Quality

How Earth.com and Provectus implemented their MLOps Infrastructure with Amazon SageMaker

AWS Machine Learning Blog

JUNE 27, 2023

When machine learning (ML) models are deployed into production and employed to drive business decisions, the challenge often lies in the operation and management of multiple models. That is where Provectus , an AWS Premier Consulting Partner with competencies in Machine Learning, Data & Analytics, and DevOps, stepped in.

DevOps

DevOps ML Machine Learning ML Engineer

Boost your forecast accuracy with time series clustering

AWS Machine Learning Blog

APRIL 4, 2023

AWS provides various services catered to time series data that are low code/no code, which both machine learning (ML) and non-ML practitioners can use for building ML solutions. We recommend running this notebook on Amazon SageMaker Studio , a web-based, integrated development environment (IDE) for ML.

Python

Python Machine Learning Explainability Data Ingestion

How to Build Machine Learning Systems With a Feature Store

The MLOps Blog

JANUARY 26, 2024

Luckily, we have tried and trusted tools and architectural patterns that provide a blueprint for reliable ML systems. In this article, I’ll introduce you to a unified architecture for ML systems built around the idea of FTI pipelines and a feature store as the central component. But what is an ML pipeline?

Machine Learning

Machine Learning Metadata ML Python

Integrate Amazon Bedrock Knowledge Bases with Microsoft SharePoint as a data source

AWS Machine Learning Blog

SEPTEMBER 19, 2024

It provides the ability to extract structured data, metadata, and other information from documents ingested from SharePoint to provide relevant search results based on the user query. For more information, see Encryption of transient data storage during data ingestion. Choose Next. Abhi Patlolla is a Sr.

Metadata

Metadata Data Ingestion ML Generative AI

Unlock the power of structured data for enterprises using natural language with Amazon Q Business

AWS Machine Learning Blog

AUGUST 20, 2024

In this post, we discuss an architecture to query structured data using Amazon Q Business, and build out an application to query cost and usage data in Amazon Athena with Amazon Q Business. You can extend this architecture to use additional data sources, query validation, and prompting techniques to cover a wider range of use cases.

Natural Language Processing

Natural Language Processing Metadata NLP Data Ingestion

Introducing the Topic Tracks for ODSC East 2025: Spotlight on Gen AI, AI Agents, LLMs, & More

ODSC - Open Data Science

FEBRUARY 25, 2025

Topics Include: Agentic AI DesignPatterns LLMs & RAG forAgents Agent Architectures &Chaining Evaluating AI Agent Performance Building with LangChain and LlamaIndex Real-World Applications of Autonomous Agents Who Should Attend: Data Scientists, Developers, AI Architects, and ML Engineers seeking to build cutting-edge autonomous systems.

Data Scientist

Data Scientist Large Language Models Machine Learning ML Engineer

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

From gathering and processing data to building models through experiments, deploying the best ones, and managing them at scale for continuous value in production—it’s a lot. As the number of ML-powered apps and services grows, it gets overwhelming for data scientists and ML engineers to build and deploy models at scale.

Machine Learning

Machine Learning Data Scientist ML Metadata

Operationalizing Large Language Models: How LLMOps can help your LLM-based applications succeed

deepsense.ai

JULY 30, 2023

These models offer tremendous potential but also bring a unique set of challenges when it comes to building large-scale ML projects. Naturally, training a machine learning model (regardless of the problem being solved or the particular model architecture that was chosen) is a key part of every ML project. But what happens next?

Large Language Models

Large Language Models LLM Machine Learning Automation

11 Trending LLM Topics Coming to ODSC West 2024

ODSC - Open Data Science

SEPTEMBER 17, 2024

Large Model Quality and Evaluation Anoop Sinha | Research Director, AI & Future Technologies | Google Large model development faces many challenges when it comes to ML quality and evaluation, including the coverage, scale, and wide use cases for what LLMs are used for. Check out a few of them below.

LLM

LLM Large Language Models Metadata Data Science

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Arranging Efficient Data Streams Modern companies typically receive data from multiple sources. Therefore, quick data ingestion for instant use can be challenging. Machine Learning Integration Opportunities Organizations harness machine learning (ML) algorithms to make forecasts on the data.

Business Intelligence

Business Intelligence Data Ingestion Metadata Machine Learning

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

These activities cover disparate fields such as basic data processing, analytics, and machine learning (ML). ML is often associated with PBAs, so we start this post with an illustrative figure. The ML paradigm is learning followed by inference. The union of advances in hardware and ML has led us to the current day.

ML

ML Deep Learning Algorithm Large Language Models

Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale

The MLOps Blog

FEBRUARY 13, 2025

The solution lies in systems that can handle high-throughput data ingestion while providing accurate, real-time insights. A solution lies in adopting a single source of truth for all experiment metadata, encompassing everything from input data and training metrics to checkpoints and outputs. Tools like neptune.ai

Data Ingestion

Data Ingestion Automation Software Engineer Metadata

Unlocking the 12 Ways to Improve Data Quality

Pickl AI

OCTOBER 19, 2023

Ensure that everyone handling data understands its importance and the role it plays in maintaining data quality. Data Documentation Comprehensive data documentation is essential. Create data dictionaries and metadata repositories to help users understand the data’s structure and context.

Data Quality

Data Quality ETL Machine Learning Data Ingestion

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

Heartbeat

NOVEMBER 6, 2023

You might need to extract the weather and metadata information about the location, after which you will combine both for transformation. In the image, you can see that the extract the weather data and extract metadata information about the location need to run in parallel. This type of execution is shown below.

ETL

ETL Python Metadata Deep Learning

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

One of the most prevalent complaints we hear from ML engineers in the community is how costly and error-prone it is to manually go through the ML workflow of building and deploying models. Building end-to-end machine learning pipelines lets ML engineers build once, rerun, and reuse many times. If all goes well, of course ?

ML

ML Machine Learning Metadata Data Science

Simplify automotive damage processing with Amazon Bedrock and vector databases

AWS Machine Learning Blog

NOVEMBER 14, 2024

The traditional way to solve these problems is to use computer vision machine learning (ML) models to classify the damage and its severity and complement with regression models that predict numerical outcomes based on input features like the make and model of the car, damage severity, damaged part, and more.

Metadata

Metadata Data Ingestion Generative AI Computer Vision

Dive deep into vector data stores using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

OCTOBER 11, 2024

Role of metadata while indexing data in vector databases Metadata plays a crucial role when loading documents into a vector data store in Amazon Bedrock. These identifiers can be used to uniquely reference and retrieve specific documents from the vector data store.

Metadata

Metadata Generative AI LLM Data Ingestion

Discover insights from your Amazon Aurora PostgreSQL database using the Amazon Q Business connector

AWS Machine Learning Blog

DECEMBER 11, 2024

A document is a collection of information that consists of a title, the content (or the body), metadata (data about the document), and access control list (ACL) information to make sure answers are provided from documents that the user has access to. When the data source state is Active , choose Sync now.

Auto-complete

Auto-complete IDP Generative AI Metadata

Amazon Q Business simplifies integration of enterprise knowledge bases at scale

How Deltek uses Amazon Bedrock for question and answering on government solicitation documents

Webinars

Trending Sources

Secure a generative AI assistant with OWASP Top 10 mitigation

Webinars

Data4ML Preparation Guidelines (Beyond The Basics)

A Beginner’s Guide to Data Warehousing

How the UNDP Independent Evaluation Office is using AWS AI/ML services to enhance the use of evaluation to support progress toward the Sustainable Development Goals

Drive hyper-personalized customer experiences with Amazon Personalize and generative AI

Build a multi-interface AI assistant using Amazon Q and Slack with Amazon CloudFront clickable references from an Amazon S3 bucket

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

MLOps Landscape in 2023: Top Tools and Platforms

Personalize your generative AI applications with Amazon SageMaker Feature Store

Airbnb Researchers Develop Chronon: A Framework for Developing Production-Grade Features for Machine Learning Models

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

Power recommendations and search using an IMDb knowledge graph – Part 3

Build an image search engine with Amazon Kendra and Amazon Rekognition

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

Automate the deployment of an Amazon Forecast time-series forecasting model

First ODSC Europe 2023 Sessions Announced

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

How Earth.com and Provectus implemented their MLOps Infrastructure with Amazon SageMaker

Boost your forecast accuracy with time series clustering

How to Build Machine Learning Systems With a Feature Store

Integrate Amazon Bedrock Knowledge Bases with Microsoft SharePoint as a data source

Unlock the power of structured data for enterprises using natural language with Amazon Q Business

Introducing the Topic Tracks for ODSC East 2025: Spotlight on Gen AI, AI Agents, LLMs, & More

Definite Guide to Building a Machine Learning Platform

Operationalizing Large Language Models: How LLMOps can help your LLM-based applications succeed

11 Trending LLM Topics Coming to ODSC West 2024

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

A review of purpose-built accelerators for financial services

Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale

Unlocking the 12 Ways to Improve Data Quality

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

How to Build an End-To-End ML Pipeline

Simplify automotive damage processing with Amazon Bedrock and vector databases

Dive deep into vector data stores using Amazon Bedrock Knowledge Bases

Discover insights from your Amazon Aurora PostgreSQL database using the Amazon Q Business connector

Stay Connected