Data Ingestion, Data Science and ML - Artificial Intelligence Zone

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Rockets legacy data science environment challenges Rockets previous data science solution was built around Apache Spark and combined the use of a legacy version of the Hadoop environment and vendor-provided Data Science Experience development tools.

Data Science

Data Science Data Scientist Data Ingestion DevOps

Deliver your first ML use case in 8–12 weeks

AWS Machine Learning Blog

APRIL 26, 2023

Do you need help to move your organization’s Machine Learning (ML) journey from pilot to production? Most executives think ML can apply to any business decision, but on average only half of the ML projects make it to production. Challenges Customers may face several challenges when implementing machine learning (ML) solutions.

ML

ML Machine Learning Data Science Data Drift

Foundational models at the edge

IBM Journey to AI blog

SEPTEMBER 20, 2023

Foundational models (FMs) are marking the beginning of a new era in machine learning (ML) and artificial intelligence (AI) , which is leading to faster development of AI that can be adapted to a wide range of downstream tasks and fine-tuned for an array of applications. Large language models (LLMs) have taken the field of AI by storm.

Large Language Models

Large Language Models DevOps Data Science AI Modeling

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 27, 2024

In this post, we share how Axfood, a large Swedish food retailer, improved operations and scalability of their existing artificial intelligence (AI) and machine learning (ML) operations by prototyping in close collaboration with AWS experts and using Amazon SageMaker.

Machine Learning

Machine Learning DevOps Data Scientist Data Quality

Build a multi-interface AI assistant using Amazon Q and Slack with Amazon CloudFront clickable references from an Amazon S3 bucket

AWS Machine Learning Blog

FEBRUARY 5, 2025

The architectures strengths lie in its consistency across environments, automatic data ingestion processes, and comprehensive monitoring capabilities. He solves complex organizational and technical challenges using data science and engineering. In addition, he builds and deploys AI/ML models on the AWS Cloud.

Data Ingestion

Data Ingestion AI AI Metadata

Data4ML Preparation Guidelines (Beyond The Basics)

Towards AI

NOVEMBER 8, 2024

Data preparation isn’t just a part of the ML engineering process — it’s the heart of it. Photo by Myriam Jessier on Unsplash To set the stage, let’s examine the nuances between research-phase data and production-phase data. Data is a key differentiator in ML projects (more on this in my blog post below).

Data Ingestion

Data Ingestion Metadata ML Engineer ML

A Comprehensive Overview of Data Engineering Pipeline Tools

Marktechpost

JUNE 13, 2024

Introduction to Data Engineering Data Engineering Challenges: Data engineering involves obtaining, organizing, understanding, extracting, and formatting data for analysis, a tedious and time-consuming task. Data scientists often spend up to 80% of their time on data engineering in data science projects.

ETL

ETL Machine Learning Data Ingestion Big Data

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

AWS Machine Learning Blog

FEBRUARY 7, 2025

Data science teams often face challenges when transitioning models from the development environment to production. This post, part of the Governing the ML lifecycle at scale series ( Part 1 , Part 2 , Part 3 ), explains how to set up and govern a multi-account ML platform that addresses these challenges.

ML

ML Data Scientist ML Engineer Data Science

First ODSC Europe 2023 Sessions Announced

ODSC - Open Data Science

MARCH 27, 2023

Learn about the flow, difficulties, and tools for performing ML clustering at scale Ori Nakar | Principal Engineer, Threat Research | Imperva Given that there are billions of daily botnet attacks from millions of different IPs, the most difficult challenge of botnet detection is choosing the most relevant data.

Machine Learning

Machine Learning Data Science Deep Learning Data Ingestion

Celebrating 40 years of Db2: Running the world’s mission critical workloads

IBM Journey to AI blog

SEPTEMBER 11, 2023

With the IoT, tracking website clicks, capturing call data records for a mobile network carrier, tracking events generated by “smart meters” and embedded devices can all generate huge volumes of transactions. Many consider a NoSQL database essential for high data ingestion rates.

Machine Learning

Machine Learning Data Ingestion Automation Data Scientist

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

Be sure to check out his talk, “ Apache Kafka for Real-Time Machine Learning Without a Data Lake ,” there! The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem.

Machine Learning

Machine Learning Data Science Data Ingestion Neural Network

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

Use GitHub Actions with Azure ML Studio: train, deploy/publish, monitor

Mlearning.ai

AUGUST 28, 2023

I recently took the Azure Data Scientist Associate certification exam DP-100, thankfully I passed after about 3–4 months for studying the Microsoft Data Science Learning Path and the Coursera Microsoft Azure Data Scientist Associate Specialization.

ML

ML Data Science Python Data Scientist

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

AWS Machine Learning Blog

NOVEMBER 22, 2023

The IDP Well-Architected Custom Lens follows the AWS Well-Architected Framework, reviewing the solution with six pillars with the granularity of a specific AI or machine learning (ML) use case, and providing the guidance to tackle common challenges. Additionally, the solution must handle high data volumes with low latency and high throughput.

IDP

IDP ML Machine Learning Automation

Build an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and the AWS CDK

AWS Machine Learning Blog

AUGUST 28, 2024

Choose Sync to initiate the data ingestion job. After the data ingestion job is complete, choose the desired FM to use for retrieval and generation. He has successfully delivered state-of-the-art AI/ML-powered solutions to solve complex business problems for diverse industries, optimizing efficiency and scalability.

Data Ingestion

Data Ingestion Natural Language Processing Machine Learning Generative AI

A Beginner’s Guide to Data Warehousing

Unite.AI

DECEMBER 5, 2023

In BI systems, data warehousing first converts disparate raw data into clean, organized, and integrated data, which is then used to extract actionable insights to facilitate analysis, reporting, and data-informed decision-making. Some key differences are discussed below.

Metadata

Metadata Big Data ETL Data Mining

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

Manager Data Science at Marubeni Power International. MPII is using a machine learning (ML) bid optimization engine to inform upstream decision-making processes in power asset management and trading. This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability.

Machine Learning

Machine Learning Data Ingestion ML Data Science

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 2, 2024

This approach, when applied to generative AI solutions, means that a specific AI or machine learning (ML) platform configuration can be used to holistically address the operational excellence challenges across the enterprise, allowing the developers of the generative AI solution to focus on business value.

Generative AI

Generative AI Data Ingestion AI AI

Introducing the Amazon Comprehend flywheel for MLOps

AWS Machine Learning Blog

MARCH 1, 2023

This combination of great models and continuous adaptation is what will lead to a successful machine learning (ML) strategy. MLOps focuses on the intersection of data science and data engineering in combination with existing DevOps practices to streamline model delivery across the ML development lifecycle.

Data Ingestion

Data Ingestion DevOps ML Automation

Up Your Machine Learning Game With These ODSC East 2024 Sessions

ODSC - Open Data Science

FEBRUARY 22, 2024

Andre Franca | CTO | connectedFlow Join this session to demystify the world of Causal AI, with a focus on understanding cause-and-effect relationships within data to drive optimal decisions. By the end of this session, you’ll have a practical blueprint to efficiently harness feature stores within ML workflows.

Machine Learning

Machine Learning Data Science Python ML

Top Data Analytics Skills and Platforms for 2023, PyTorch 2.0

ODSC - Open Data Science

MARCH 24, 2023

Top Data Analytics Skills and Platforms for 2023, PyTorch 2.0 Released, and 5 Huge Data Science Career Mistakes Top Data Analytics Skills and Platforms for 2023 We looked at over 25,000 job descriptions, and these are the data analytics platforms, tools, and skills that employers are looking for in 2023.

Data Science

Data Science Machine Learning Data Ingestion NLP

Announcing the First Sessions for ODSC East 2024

ODSC - Open Data Science

JANUARY 10, 2024

Andre Franca | CTO | connectedFlow Explore the world of Causal AI for data science practitioners, with a focus on understanding cause-and-effect relationships within data to drive optimal decisions. No-Code and Low-Code AI: A Practical Project-Driven Approach to ML Gwendolyn D.

Large Language Models

Large Language Models Deep Learning Data Science LLM

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications.

Metadata

Metadata LLM Prompt Engineer Prompt Engineering

Machine Learning Operations (MLOPs) with Azure Machine Learning

ODSC - Open Data Science

JULY 19, 2023

Machine Learning Operations (MLOps) can significantly accelerate how data scientists and ML engineers meet organizational needs. A well-implemented MLOps process not only expedites the transition from testing to production but also offers ownership, lineage, and historical data about ML artifacts used within the team.

Machine Learning

Machine Learning Data Drift Data Science Data Scientist

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Snorkel AI

JANUARY 24, 2023

Unfortunately accessing data across various locations and file types and then operationalizing that data for AI usage has traditionally been a painfully manual, time-consuming, and costly process. Ahmad Khan, Head of AI/ML Strategy at Snowflake, discusses the challenges of operationalizing ML in a recent talk.

Data Ingestion

Data Ingestion Data Science Machine Learning ML

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Snorkel AI

JANUARY 24, 2023

Unfortunately accessing data across various locations and file types and then operationalizing that data for AI usage has traditionally been a painfully manual, time-consuming, and costly process. Ahmad Khan, Head of AI/ML Strategy at Snowflake, discusses the challenges of operationalizing ML in a recent talk.

Data Ingestion

Data Ingestion Data Science Machine Learning ML

Power recommendations and search using an IMDb knowledge graph – Part 3

AWS Machine Learning Blog

JANUARY 6, 2023

We downloaded the data from AWS Data Exchange and processed it in AWS Glue to generate KG files. In Part 2 , we demonstrated how to use Amazon Neptune ML (in Amazon SageMaker ) to train the KG and create KG embeddings. The KG files were stored in Amazon Simple Storage Service (Amazon S3) and then loaded in Amazon Neptune.

Metadata

Metadata Machine Learning Data Scientist ML

Vertex AI: Guide to Google’s Unified Machine Learning Platform

Pickl AI

AUGUST 28, 2024

Introduction In the rapidly evolving landscape of Machine Learning , Google Cloud’s Vertex AI stands out as a unified platform designed to streamline the entire Machine Learning (ML) workflow. This unified approach enables seamless collaboration among data scientists, data engineers, and ML engineers.

Machine Learning

Machine Learning ML Engineer ML Automation

Building Scalable AI Pipelines with MLOps: A Guide for Software Engineers

ODSC - Open Data Science

OCTOBER 7, 2024

Understanding the MLOps Lifecycle The MLOps lifecycle consists of several critical stages, each with its unique challenges: Data Ingestion: Collecting data from various sources and ensuring it’s available for analysis. Data Preparation: Cleaning and transforming raw data to make it usable for machine learning.

Software Engineer

Software Engineer Data Ingestion Machine Learning Data Scientist

Automate the deployment of an Amazon Forecast time-series forecasting model

AWS Machine Learning Blog

MAY 4, 2023

Simple methods for time series forecasting use historical values of the same variable whose future values need to be predicted, whereas more complex, machine learning (ML)-based methods use additional information, such as the time series data of related variables. Raju Patil is a Data Scientist in AWS Professional Services.

Automation

Automation Metadata Data Ingestion Data Scientist

What Do Data Scientists Do? A Guide to AI Maturity, Challenges, and Solutions

DataRobot Blog

SEPTEMBER 13, 2022

Data scientists drive business outcomes. Many implement machine learning and artificial intelligence to tackle challenges in the age of Big Data. They develop and continuously optimize AI/ML models , collaborating with stakeholders across the enterprise to inform decisions that drive strategic business value. Download Now.

Data Scientist

Data Scientist Automation ML Machine Learning

11 Trending LLM Topics Coming to ODSC West 2024

ODSC - Open Data Science

SEPTEMBER 17, 2024

Large Model Quality and Evaluation Anoop Sinha | Research Director, AI & Future Technologies | Google Large model development faces many challenges when it comes to ML quality and evaluation, including the coverage, scale, and wide use cases for what LLMs are used for. Check out a few of them below.

LLM

LLM Large Language Models Metadata Data Science

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

AWS Machine Learning Blog

NOVEMBER 22, 2023

Harness a flywheel approach, wherein continuous data feedback is utilized to routinely orchestrate and evaluate enhancements to your models and processes. Enhance IDP with Amazon Comprehend Flywheel and Amazon Textract Custom Queries Leverage the Amazon Comprehend flywheel for a streamlined ML process, from data ingestion to deployment.

IDP

IDP Machine Learning Data Extraction ML

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

AWS Machine Learning Blog

APRIL 12, 2023

The PdMS includes AWS services to securely manage the lifecycle of edge compute devices and BHS assets, cloud data ingestion, storage, machine learning (ML) inference models, and business logic to power proactive equipment maintenance in the cloud. It’s an easy way to run analytics on IoT data to gain accurate insights.

ML

ML Machine Learning Data Ingestion Automation

Taking Pandas To The Next Level With LLMs

Mlearning.ai

MAY 15, 2023

Photo by Andrew Neel on Unsplash Introduction If you are working or have worked on any data science task then you definitely used pandas. So, pandas is a library which helps with performing data ingestion and transformations. Note for Readers — Are you a programming, AI, or machine learning enthusiast?

Data Ingestion

Data Ingestion Data Science LLM OpenAI

Build well-architected IDP solutions with a custom lens – Part 5: Cost optimization

AWS Machine Learning Blog

NOVEMBER 22, 2023

For a deeper dive into end-to-end solutions that cover data ingestion, classification, extraction, enrichment, verification and validation, and consumption, refer to Intelligent document processing with AWS AI services: Part 1 and Part 2. About the Authors Suyin Wang is an AI/ML Specialist Solutions Architect at AWS.

IDP

IDP Auto-classification Machine Learning Auto-complete

Build a powerful question answering bot with Amazon SageMaker, Amazon OpenSearch Service, Streamlit, and LangChain

AWS Machine Learning Blog

MAY 25, 2023

Amazon SageMaker Processing jobs for large scale data ingestion into OpenSearch. This notebook will ingest the SageMaker docs to an OpenSearch Service index called llm_apps_workshop_embeddings. This will download the dataset locally into the notebook and then ingest it into the OpenSearch Service index. .

LLM

LLM Data Ingestion Python ML

Operationalizing Large Language Models: How LLMOps can help your LLM-based applications succeed

deepsense.ai

JULY 30, 2023

These models offer tremendous potential but also bring a unique set of challenges when it comes to building large-scale ML projects. Naturally, training a machine learning model (regardless of the problem being solved or the particular model architecture that was chosen) is a key part of every ML project. But what happens next?

Large Language Models

Large Language Models LLM Machine Learning Automation

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

AWS Machine Learning Blog

APRIL 5, 2023

Amazon SageMaker Serverless Inference is a purpose-built inference service that makes it easy to deploy and scale machine learning (ML) models. This includes preparing data, creating a SageMaker model, and performing batch transform using the model. The data ingestion for this practice should finish within 60 seconds.

Metadata

Metadata Neural Network ML Python

Introducing the Topic Tracks for ODSC East 2025: Spotlight on Gen AI, AI Agents, LLMs, & More

ODSC - Open Data Science

FEBRUARY 25, 2025

Topics Include: Agentic AI DesignPatterns LLMs & RAG forAgents Agent Architectures &Chaining Evaluating AI Agent Performance Building with LangChain and LlamaIndex Real-World Applications of Autonomous Agents Who Should Attend: Data Scientists, Developers, AI Architects, and ML Engineers seeking to build cutting-edge autonomous systems.

Data Scientist

Data Scientist Machine Learning Large Language Models ML Engineer

How to Integrate DataRobot and Apache Airflow for Orchestration and MLOps Workflows

DataRobot Blog

JUNE 16, 2022

Airflow is a perfect tool to orchestrate stages of the DataRobot machine learning (ML) pipeline, because it provides an easy but powerful solution to integrate DataRobot capabilities into bigger pipelines, combine it with other services, as well as to clean your data, and store or publish the results. DataRobot Provider Modules.

Python

Python ML Machine Learning Data Ingestion

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Arranging Efficient Data Streams Modern companies typically receive data from multiple sources. Therefore, quick data ingestion for instant use can be challenging. Machine Learning Integration Opportunities Organizations harness machine learning (ML) algorithms to make forecasts on the data.

Business Intelligence

Business Intelligence Data Ingestion Metadata Machine Learning

Demystifying Time Series Database: A Comprehensive Guide

Pickl AI

JULY 8, 2024

They can efficiently aggregate and process data over defined periods, making them ideal for identifying trends, anomalies, and correlations within the data. High-Volume Data Ingestion TSDBs are built to handle large volumes of data coming in at high velocities. What are the Benefits of Using a Time Series Database?

Data Ingestion

Data Ingestion Machine Learning DevOps Data Scientist

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Robust validation and monitoring frameworks enhance pipeline reliability and trustworthiness, safeguarding against data-driven decision-making risks. Must Read Blogs: Elevate Your Data Quality: Unleashing the Power of AI and ML for Scaling Operations. The Difference Between Data Observability And Data Quality.

Data Quality

Data Quality ETL Data Integration Automation

How Rocket Companies modernized their data science solution on AWS

Deliver your first ML use case in 8–12 weeks

Webinars

Trending Sources

Foundational models at the edge

Webinars

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

Build a multi-interface AI assistant using Amazon Q and Slack with Amazon CloudFront clickable references from an Amazon S3 bucket

Data4ML Preparation Guidelines (Beyond The Basics)

A Comprehensive Overview of Data Engineering Pipeline Tools

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

First ODSC Europe 2023 Sessions Announced

Celebrating 40 years of Db2: Running the world’s mission critical workloads

Streaming Machine Learning Without a Data Lake

MLOps Landscape in 2023: Top Tools and Platforms

Use GitHub Actions with Azure ML Studio: train, deploy/publish, monitor

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

Build an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and the AWS CDK

A Beginner’s Guide to Data Warehousing

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

Introducing the Amazon Comprehend flywheel for MLOps

Up Your Machine Learning Game With These ODSC East 2024 Sessions

Top Data Analytics Skills and Platforms for 2023, PyTorch 2.0

Announcing the First Sessions for ODSC East 2024

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Machine Learning Operations (MLOPs) with Azure Machine Learning

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Power recommendations and search using an IMDb knowledge graph – Part 3

Vertex AI: Guide to Google’s Unified Machine Learning Platform

Building Scalable AI Pipelines with MLOps: A Guide for Software Engineers

Automate the deployment of an Amazon Forecast time-series forecasting model

What Do Data Scientists Do? A Guide to AI Maturity, Challenges, and Solutions

11 Trending LLM Topics Coming to ODSC West 2024

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

Taking Pandas To The Next Level With LLMs

Build well-architected IDP solutions with a custom lens – Part 5: Cost optimization

Build a powerful question answering bot with Amazon SageMaker, Amazon OpenSearch Service, Streamlit, and LangChain

Operationalizing Large Language Models: How LLMOps can help your LLM-based applications succeed

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

Introducing the Topic Tracks for ODSC East 2025: Spotlight on Gen AI, AI Agents, LLMs, & More

How to Integrate DataRobot and Apache Airflow for Orchestration and MLOps Workflows

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Demystifying Time Series Database: A Comprehensive Guide

Build Data Pipelines: Comprehensive Step-by-Step Guide

Stay Connected