Data Ingestion, Information and ML - Artificial Intelligence Zone

Amazon Q Business simplifies integration of enterprise knowledge bases at scale

Flipboard

FEBRUARY 11, 2025

Amazon Q Business , a new generative AI-powered assistant, can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in an enterprises systems. Large-scale data ingestion is crucial for applications such as document analysis, summarization, research, and knowledge management.

Data Ingestion

Data Ingestion Metadata Machine Learning Generative AI

Secure a generative AI assistant with OWASP Top 10 mitigation

Flipboard

JANUARY 24, 2025

This comprehensive security setup addresses LLM10:2025 Unbound Consumption and LLM02:2025 Sensitive Information Disclosure, making sure that applications remain both resilient and secure. In the physical architecture diagram, the application controller is the LLM orchestrator AWS Lambda function.

Generative AI

Generative AI LLM AI AI

Boosting Resiliency with an ML-based Telemetry Analytics Architecture | Amazon Web Services

Flipboard

MARCH 3, 2023

Data proliferation has become a norm and as organizations become more data driven, automating data pipelines that enable data ingestion, curation, …

Data Ingestion

Data Ingestion ML Automation Big Data

Webinars

The Intersection of AI and Sales: Personalization Without Compromise

How to Achieve High-Accuracy Results When Using LLMs

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

MORE WEBINARS

Basil Faruqui, BMC: Why DataOps needs orchestration to make it work

AI News

AUGUST 29, 2023

If you think about building a data pipeline, whether you’re doing a simple BI project or a complex AI or machine learning project, you’ve got data ingestion, data storage and processing, and data insight – and underneath all of those four stages, there’s a variety of different technologies being used,” explains Faruqui.

Data Ingestion

Data Ingestion Big Data Explainability ETL

Unlock ML insights using the Amazon SageMaker Feature Store Feature Processor

AWS Machine Learning Blog

SEPTEMBER 19, 2023

Amazon SageMaker Feature Store provides an end-to-end solution to automate feature engineering for machine learning (ML). For many ML use cases, raw data like log files, sensor readings, or transaction records need to be transformed into meaningful features that are optimized for model training. SageMaker Studio set up.

ML

ML Data Ingestion Python Machine Learning

Create a next generation chat assistant with Amazon Bedrock, Amazon Connect, Amazon Lex, LangChain, and WhatsApp

AWS Machine Learning Blog

OCTOBER 23, 2024

Amazon Bedrock Knowledge Bases gives foundation models (FMs) and agents contextual information from your company’s private data sources for Retrieval Augmented Generation (RAG) to deliver more relevant, accurate, and customized responses. The outbound message handler retrieves the relevant chat contact information from Amazon DynamoDB.

Data Ingestion

Data Ingestion Natural Language Processing Generative AI Conversational AI

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Thats why we use advanced technology and data analytics to streamline every step of the homeownership experience, from application to closing. Data exploration and model development were conducted using well-known machine learning (ML) tools such as Jupyter or Apache Zeppelin notebooks.

Data Science

Data Science Data Scientist Data Ingestion DevOps

Skip Levens, Marketing Director, Media & Entertainment, Quantum – Interview Series

Unite.AI

OCTOBER 14, 2024

Quantum provides end-to-end data solutions that help organizations manage, enrich, and protect unstructured data, such as video and audio files, at scale. Their technology focuses on transforming data into valuable insights, enabling businesses to extract value and make informed decisions.

ML

ML Data Ingestion Data Analysis Machine Learning

Build a multi-interface AI assistant using Amazon Q and Slack with Amazon CloudFront clickable references from an Amazon S3 bucket

AWS Machine Learning Blog

FEBRUARY 5, 2025

This multi-interface, RAG-powered approach not only strives to meet the flexibility demands of modern users, but also fosters a more informed and engaged user base, ultimately maximizing the assistants effectiveness and reach. Its versatility extends beyond team messaging to serve as an effective interface for assistants.

Data Ingestion

Data Ingestion AI AI Metadata

How Deltek uses Amazon Bedrock for question and answering on government solicitation documents

AWS Machine Learning Blog

AUGUST 9, 2024

Deltek serves over 30,000 clients with industry-specific software and information solutions. Deltek is continuously working on enhancing this solution to better align it with their specific requirements, such as supporting file formats beyond PDF and implementing more cost-effective approaches for their data ingestion pipeline.

Data Ingestion

Data Ingestion Metadata LLM Generative AI

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

Through evaluations of sensors and informed decision-making support, Afri-SET empowers governments and civil society for effective air quality management. This manual synchronization process, hindered by disparate data formats, is resource-intensive, limiting the potential for widespread data orchestration.

Generative AI

Generative AI Data Ingestion Python LLM

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

AWS Machine Learning Blog

APRIL 26, 2024

FM-powered artificial intelligence (AI) assistants have limitations, such as providing outdated information or struggling with context outside their training data. It provides this context to the FM, which uses it to generate a more informed and precise response. Businesses incur charges for data storage and management.

Data Ingestion

Data Ingestion Generative AI Python Software Engineer

How the UNDP Independent Evaluation Office is using AWS AI/ML services to enhance the use of evaluation to support progress toward the Sustainable Development Goals

AWS Machine Learning Blog

MARCH 29, 2023

In this post, we discuss how the IEO developed UNDP’s artificial intelligence and machine learning (ML) platform—named Artificial Intelligence for Development Analytics (AIDA)— in collaboration with AWS, UNDP’s Information and Technology Management Team (UNDP ITM), and the United Nations International Computing Centre (UNICC).

ML

ML Metadata Data Ingestion Data Extraction

Accelerate your Amazon Q implementation: starter kits for SMBs

AWS Machine Learning Blog

FEBRUARY 7, 2025

Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. It empowers employees to be more creative, data-driven, efficient, prepared, and productive.

Data Ingestion

Data Ingestion Large Language Models Generative AI Automation

A Beginner’s Guide to Data Warehousing

Unite.AI

DECEMBER 5, 2023

In BI systems, data warehousing first converts disparate raw data into clean, organized, and integrated data, which is then used to extract actionable insights to facilitate analysis, reporting, and data-informed decision-making. The following elements serve as a backbone for a functional data warehouse.

Metadata

Metadata Big Data ETL Data Mining

Generative AI operating models in enterprise organizations with Amazon Bedrock

AWS Machine Learning Blog

JANUARY 29, 2025

However, even in a decentralized model, often LOBs must align with central governance controls and obtain approvals from the CCoE team for production deployment, adhering to global enterprise standards for areas such as access policies, model risk management, data privacy, and compliance posture, which can introduce governance complexities.

Generative AI

Generative AI AI AI Large Language Models

Meet OpenCopilot: Create Custom AI Copilots for Your Own SaaS Product (like Shopify Sidekick)

Marktechpost

SEPTEMBER 26, 2023

They also plan on incorporating offline LLMs as they can process sensitive or confidential information without the need to transmit data over the internet. This will reduce the risk of data breaches and unauthorized access. Check out the GitHub and Documentation. If you like our work, you will love our newsletter.

Data Ingestion

Data Ingestion Natural Language Processing Artificial Intelligence Artificial Intelligence

Build a contextual chatbot application using Knowledge Bases for Amazon Bedrock

AWS Machine Learning Blog

FEBRUARY 19, 2024

One way to enable more contextual conversations is by linking the chatbot to internal knowledge bases and information systems. Integrating proprietary enterprise data from internal knowledge bases enables chatbots to contextualize their responses to each user’s individual needs and interests.

Chatbots

Chatbots Data Ingestion Machine Learning Generative AI

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 2, 2024

This approach, when applied to generative AI solutions, means that a specific AI or machine learning (ML) platform configuration can be used to holistically address the operational excellence challenges across the enterprise, allowing the developers of the generative AI solution to focus on business value.

Generative AI

Generative AI Data Ingestion AI AI

Celebrating 40 years of Db2: Running the world’s mission critical workloads

IBM Journey to AI blog

SEPTEMBER 11, 2023

With such high-value data, much of which holds highly sensitive financial and personal information, the mainframe is a potential target for cyber criminals. Many consider a NoSQL database essential for high data ingestion rates. trillion instructions per day.

Machine Learning

Machine Learning Data Ingestion Automation Data Scientist

Build an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and the AWS CDK

AWS Machine Learning Blog

AUGUST 28, 2024

RAG models retrieve relevant information from a large corpus of text and then use a generative language model to synthesize an answer based on the retrieved information. Choose Sync to initiate the data ingestion job. After the data ingestion job is complete, choose the desired FM to use for retrieval and generation.

Data Ingestion

Data Ingestion Natural Language Processing Machine Learning Generative AI

Build an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and AWS CloudFormation

AWS Machine Learning Blog

AUGUST 5, 2024

RAG models first retrieve relevant information from a large corpus of text and then use a FM to synthesize an answer based on the retrieved information. Choose Sync to initiate the data ingestion job. On the Amazon Bedrock console, navigate to the created knowledge base.

Natural Language Processing

Natural Language Processing Automation Machine Learning Generative AI

Boost employee productivity with automated meeting summaries using Amazon Transcribe, Amazon SageMaker, and LLMs from Hugging Face

AWS Machine Learning Blog

MAY 7, 2024

The service allows for simple audio data ingestion, easy-to-read transcript creation, and accuracy improvement through custom vocabularies. Hugging Face is an open-source machine learning (ML) platform that provides tools and resources for the development of AI projects.

Automation

Automation Auto-complete DevOps UX Design

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

AWS Machine Learning Blog

FEBRUARY 7, 2025

This post, part of the Governing the ML lifecycle at scale series ( Part 1 , Part 2 , Part 3 ), explains how to set up and govern a multi-account ML platform that addresses these challenges. An enterprise might have the following roles involved in the ML lifecycles. This ML platform provides several key benefits.

ML

ML Data Scientist ML Engineer Data Science

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

AWS Machine Learning Blog

NOVEMBER 15, 2023

Content redaction: Each customer audio interaction is recorded as a stereo WAV file, but could potentially include sensitive information such as HIPAA-protected and personally identifiable information (PII). Scalability: This architecture needed to immediately scale to thousands of calls per day and millions of calls per year.

Data Ingestion

Data Ingestion Metadata NLP Data Scientist

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications.

Metadata

Metadata LLM Prompt Engineering Prompt Engineer

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone makes it straightforward for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so they can discover, use, and collaborate to derive data-driven insights.

Machine Learning

Machine Learning Data Scientist ML Data Quality

Meet MegaParse: An Open-Source AI Tool for Parsing Various Types of Documents for LLM Ingestion

Marktechpost

DECEMBER 3, 2024

Many existing LLMs require specific formats and well-structured data to function effectively. Parsing and transforming different types of documents—ranging from PDFs to Word files—for machine learning tasks can be tedious, often leading to information loss or requiring extensive manual intervention. Unstructured with Check Table 0.77

LLM

LLM AI Tools Large Language Models Data Ingestion

Personalize your generative AI applications with Amazon SageMaker Feature Store

AWS Machine Learning Blog

OCTOBER 6, 2023

The personalization of LLM applications can be achieved by incorporating up-to-date user information, which typically involves integrating several components. One such component is a feature store, a tool that stores, shares, and manages features for machine learning (ML) models.

Generative AI

Generative AI LLM Natural Language Processing Metadata

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 18, 2023

By exploring these challenges, organizations can recognize the importance of real-time forecasting and explore innovative solutions to overcome these hurdles, enabling them to stay competitive, make informed decisions, and thrive in today’s fast-paced business environment. For more information, refer to the following resources.

Data Extraction

Data Extraction Data Ingestion ML Machine Learning

Boost your forecast accuracy with time series clustering

AWS Machine Learning Blog

APRIL 4, 2023

AWS provides various services catered to time series data that are low code/no code, which both machine learning (ML) and non-ML practitioners can use for building ML solutions. We recommend running this notebook on Amazon SageMaker Studio , a web-based, integrated development environment (IDE) for ML.

Python

Python Machine Learning Explainability Data Ingestion

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

AWS Machine Learning Blog

MARCH 22, 2023

You can use machine learning (ML) to generate these insights and build predictive models. Educators can also use ML to identify challenges in learning outcomes, increase success and retention among students, and broaden the reach and impact of online learning content. Download the following student dataset to your local computer.

Machine Learning

Machine Learning Data Scientist Data Ingestion ML

Use GitHub Actions with Azure ML Studio: train, deploy/publish, monitor

Mlearning.ai

AUGUST 28, 2023

I highly recommend anyone coming from a Machine Learning or Deep Learning modeling background who wants to learn about deploying models (MLOps) on a cloud platform to take this exam or an equivalent; the exam also includes topics on SQL data ingestion with Azure and Databricks, which is also a very important skill to have in Data Science.

ML

ML Data Science Python Data Scientist

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

MPII is using a machine learning (ML) bid optimization engine to inform upstream decision-making processes in power asset management and trading. This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. Data comes from disparate sources in a number of formats.

Machine Learning

Machine Learning Data Ingestion ML Data Science

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

AWS Machine Learning Blog

FEBRUARY 23, 2023

At the higher levels of automation (Level 2 and above), the AD system performs multiple functions: Data collection – The AV system gathers information about the vehicle’s surroundings in real time with centimeter accuracy. AV systems fuse data from the devices that are integrated together to build a comprehensive perception.

Automation

Automation Machine Learning Neural Network Data Scientist

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

AWS Machine Learning Blog

NOVEMBER 22, 2023

The IDP Well-Architected Custom Lens follows the AWS Well-Architected Framework, reviewing the solution with six pillars with the granularity of a specific AI or machine learning (ML) use case, and providing the guidance to tackle common challenges. Additionally, the solution must handle high data volumes with low latency and high throughput.

IDP

IDP ML Machine Learning Automation

Introducing the Amazon Comprehend flywheel for MLOps

AWS Machine Learning Blog

MARCH 1, 2023

This combination of great models and continuous adaptation is what will lead to a successful machine learning (ML) strategy. It helps you extract information by recognizing sentiments, key phrases, entities, and much more, allowing you to take advantage of state-of-the-art models and adapt them for your specific use case.

Data Ingestion

Data Ingestion DevOps ML Automation

How Earth.com and Provectus implemented their MLOps Infrastructure with Amazon SageMaker

AWS Machine Learning Blog

JUNE 27, 2023

When machine learning (ML) models are deployed into production and employed to drive business decisions, the challenge often lies in the operation and management of multiple models. That is where Provectus , an AWS Premier Consulting Partner with competencies in Machine Learning, Data & Analytics, and DevOps, stepped in.

DevOps

DevOps ML Machine Learning ML Engineer

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

Machine learning (ML) is becoming increasingly complex as customers try to solve more and more challenging problems. This complexity often leads to the need for distributed ML, where multiple machines are used to train a single model. SageMaker is a fully managed service for building, training, and deploying ML models.

Machine Learning

Machine Learning ML Python Auto-complete

Build well-architected IDP solutions with a custom lens – Part 6: Sustainability

AWS Machine Learning Blog

NOVEMBER 22, 2023

Effectively manage your data and its lifecycle Data plays a key role throughout your IDP solution. Starting with the initial data ingestion, data is pushed through various stages of processing, and finally returned as output to end-users. He is passionate about automotive, AI/ML and developer productivity.

IDP

IDP Data Ingestion Automation Natural Language Processing

Power recommendations and search using an IMDb knowledge graph – Part 3

AWS Machine Learning Blog

JANUARY 6, 2023

We downloaded the data from AWS Data Exchange and processed it in AWS Glue to generate KG files. In Part 2 , we demonstrated how to use Amazon Neptune ML (in Amazon SageMaker ) to train the KG and create KG embeddings. The KG files were stored in Amazon Simple Storage Service (Amazon S3) and then loaded in Amazon Neptune.

Metadata

Metadata Machine Learning Data Scientist ML

Build an image search engine with Amazon Kendra and Amazon Rekognition

AWS Machine Learning Blog

MAY 5, 2023

In this post, we discuss a machine learning (ML) solution for complex image searches using Amazon Kendra and Amazon Rekognition. Amazon Kendra is an intelligent search service powered by ML, and Amazon Rekognition is an ML service that can identify objects, people, text, scenes, and activities from images or videos.

Metadata

Metadata ETL ML Data Ingestion

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

Automate the deployment of an Amazon Forecast time-series forecasting model

AWS Machine Learning Blog

MAY 4, 2023

Simple methods for time series forecasting use historical values of the same variable whose future values need to be predicted, whereas more complex, machine learning (ML)-based methods use additional information, such as the time series data of related variables. For more information, refer to Training Predictors.

Automation

Automation Metadata Data Ingestion Data Scientist

Amazon Q Business simplifies integration of enterprise knowledge bases at scale

Secure a generative AI assistant with OWASP Top 10 mitigation

Webinars

Trending Sources

Boosting Resiliency with an ML-based Telemetry Analytics Architecture | Amazon Web Services

Webinars

Basil Faruqui, BMC: Why DataOps needs orchestration to make it work

Unlock ML insights using the Amazon SageMaker Feature Store Feature Processor

Create a next generation chat assistant with Amazon Bedrock, Amazon Connect, Amazon Lex, LangChain, and WhatsApp

How Rocket Companies modernized their data science solution on AWS

Skip Levens, Marketing Director, Media & Entertainment, Quantum – Interview Series

Build a multi-interface AI assistant using Amazon Q and Slack with Amazon CloudFront clickable references from an Amazon S3 bucket

How Deltek uses Amazon Bedrock for question and answering on government solicitation documents

Improving air quality with generative AI

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

How the UNDP Independent Evaluation Office is using AWS AI/ML services to enhance the use of evaluation to support progress toward the Sustainable Development Goals

Accelerate your Amazon Q implementation: starter kits for SMBs

A Beginner’s Guide to Data Warehousing

Generative AI operating models in enterprise organizations with Amazon Bedrock

Meet OpenCopilot: Create Custom AI Copilots for Your Own SaaS Product (like Shopify Sidekick)

Build a contextual chatbot application using Knowledge Bases for Amazon Bedrock

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

Celebrating 40 years of Db2: Running the world’s mission critical workloads

Build an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and the AWS CDK

Build an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and AWS CloudFormation

Boost employee productivity with automated meeting summaries using Amazon Transcribe, Amazon SageMaker, and LLMs from Hugging Face

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Meet MegaParse: An Open-Source AI Tool for Parsing Various Types of Documents for LLM Ingestion

Personalize your generative AI applications with Amazon SageMaker Feature Store

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

Boost your forecast accuracy with time series clustering

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

Use GitHub Actions with Azure ML Studio: train, deploy/publish, monitor

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

Introducing the Amazon Comprehend flywheel for MLOps

How Earth.com and Provectus implemented their MLOps Infrastructure with Amazon SageMaker

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

Build well-architected IDP solutions with a custom lens – Part 6: Sustainability

Power recommendations and search using an IMDb knowledge graph – Part 3

Build an image search engine with Amazon Kendra and Amazon Rekognition

MLOps Landscape in 2023: Top Tools and Platforms

Automate the deployment of an Amazon Forecast time-series forecasting model

Stay Connected