Data Ingestion, Data Scientist and ML - Artificial Intelligence Zone

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Data exploration and model development were conducted using well-known machine learning (ML) tools such as Jupyter or Apache Zeppelin notebooks. Apache Hive was used to provide a tabular interface to data stored in HDFS, and to integrate with Apache Spark SQL. This also led to a backlog of data that needed to be ingested.

Data Science

Data Science Data Scientist Data Ingestion DevOps

Unlock ML insights using the Amazon SageMaker Feature Store Feature Processor

AWS Machine Learning Blog

SEPTEMBER 19, 2023

Amazon SageMaker Feature Store provides an end-to-end solution to automate feature engineering for machine learning (ML). For many ML use cases, raw data like log files, sensor readings, or transaction records need to be transformed into meaningful features that are optimized for model training. SageMaker Studio set up.

ML

ML Data Ingestion Python Machine Learning

Deliver your first ML use case in 8–12 weeks

AWS Machine Learning Blog

APRIL 26, 2023

Do you need help to move your organization’s Machine Learning (ML) journey from pilot to production? Most executives think ML can apply to any business decision, but on average only half of the ML projects make it to production. Challenges Customers may face several challenges when implementing machine learning (ML) solutions.

ML

ML Machine Learning Data Science Data Drift

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone makes it straightforward for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so they can discover, use, and collaborate to derive data-driven insights.

Machine Learning

Machine Learning Data Scientist ML Data Quality

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 27, 2024

In this post, we share how Axfood, a large Swedish food retailer, improved operations and scalability of their existing artificial intelligence (AI) and machine learning (ML) operations by prototyping in close collaboration with AWS experts and using Amazon SageMaker. This is a guest post written by Axfood AB.

Machine Learning

Machine Learning DevOps Data Scientist Data Quality

What Do Data Scientists Do? A Guide to AI Maturity, Challenges, and Solutions

DataRobot Blog

SEPTEMBER 13, 2022

According to IDC , 83% of CEOs want their organizations to be more data-driven. Data scientists could be your key to unlocking the potential of the Information Revolution—but what do data scientists do? What Do Data Scientists Do? Data scientists drive business outcomes. Download Now.

Data Scientist

Data Scientist Automation ML Machine Learning

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

AWS Machine Learning Blog

FEBRUARY 7, 2025

This post, part of the Governing the ML lifecycle at scale series ( Part 1 , Part 2 , Part 3 ), explains how to set up and govern a multi-account ML platform that addresses these challenges. An enterprise might have the following roles involved in the ML lifecycles. This ML platform provides several key benefits.

ML

ML Data Scientist ML Engineer Data Science

A Comprehensive Overview of Data Engineering Pipeline Tools

Marktechpost

JUNE 13, 2024

Introduction to Data Engineering Data Engineering Challenges: Data engineering involves obtaining, organizing, understanding, extracting, and formatting data for analysis, a tedious and time-consuming task. Data scientists often spend up to 80% of their time on data engineering in data science projects.

ETL

ETL Machine Learning Data Ingestion Big Data

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

AWS Machine Learning Blog

MARCH 22, 2023

You can use machine learning (ML) to generate these insights and build predictive models. Educators can also use ML to identify challenges in learning outcomes, increase success and retention among students, and broaden the reach and impact of online learning content. Note that major data transformation is out of scope for this post.

Machine Learning

Machine Learning Data Scientist Data Ingestion ML

Celebrating 40 years of Db2: Running the world’s mission critical workloads

IBM Journey to AI blog

SEPTEMBER 11, 2023

With the IoT, tracking website clicks, capturing call data records for a mobile network carrier, tracking events generated by “smart meters” and embedded devices can all generate huge volumes of transactions. Many consider a NoSQL database essential for high data ingestion rates.

Machine Learning

Machine Learning Data Ingestion Automation Data Scientist

Build an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and AWS CloudFormation

AWS Machine Learning Blog

AUGUST 5, 2024

Choose Sync to initiate the data ingestion job. After data synchronization is complete, select the desired FM to use for retrieval and generation (it requires model access to be granted to this FM in Amazon Bedrock before using). On the Amazon Bedrock console, navigate to the created knowledge base.

Natural Language Processing

Natural Language Processing Automation Machine Learning Generative AI

How the UNDP Independent Evaluation Office is using AWS AI/ML services to enhance the use of evaluation to support progress toward the Sustainable Development Goals

AWS Machine Learning Blog

MARCH 29, 2023

In this post, we discuss how the IEO developed UNDP’s artificial intelligence and machine learning (ML) platform—named Artificial Intelligence for Development Analytics (AIDA)— in collaboration with AWS, UNDP’s Information and Technology Management Team (UNDP ITM), and the United Nations International Computing Centre (UNICC).

ML

ML Metadata Data Ingestion Data Extraction

Foundational models at the edge

IBM Journey to AI blog

SEPTEMBER 20, 2023

Foundational models (FMs) are marking the beginning of a new era in machine learning (ML) and artificial intelligence (AI) , which is leading to faster development of AI that can be adapted to a wide range of downstream tasks and fine-tuned for an array of applications. Large language models (LLMs) have taken the field of AI by storm.

Large Language Models

Large Language Models DevOps Data Science AI Modeling

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 2, 2024

However, a more holistic organizational approach is crucial because generative AI practitioners, data scientists, or developers can potentially use a wide range of technologies, models, and datasets to circumvent the established controls.

Generative AI

Generative AI Data Ingestion AI AI

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

AWS Machine Learning Blog

NOVEMBER 15, 2023

Combining accurate transcripts with Genesys CTR files, Principal could properly identify the speakers, categorize the calls into groups, analyze agent performance, identify upsell opportunities, and conduct additional machine learning (ML)-powered analytics.

Data Ingestion

Data Ingestion Metadata NLP Data Scientist

Use GitHub Actions with Azure ML Studio: train, deploy/publish, monitor

Mlearning.ai

AUGUST 28, 2023

I recently took the Azure Data Scientist Associate certification exam DP-100, thankfully I passed after about 3–4 months for studying the Microsoft Data Science Learning Path and the Coursera Microsoft Azure Data Scientist Associate Specialization.

ML

ML Data Science Python Data Scientist

First ODSC Europe 2023 Sessions Announced

ODSC - Open Data Science

MARCH 27, 2023

Our expert speakers will cover a wide range of topics, tools, and techniques that data scientists of all levels can apply in their work. Scaling AI/ML Workloads with Ray Kai Fricke | Senior Software Engineer | Anyscale Inc. Check a few of them out below.

Machine Learning

Machine Learning Data Science Deep Learning Data Ingestion

Build an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and the AWS CDK

AWS Machine Learning Blog

AUGUST 28, 2024

Choose Sync to initiate the data ingestion job. After the data ingestion job is complete, choose the desired FM to use for retrieval and generation. About the Authors Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services, helping businesses innovate with generative AI.

Data Ingestion

Data Ingestion Natural Language Processing Machine Learning Generative AI

Power recommendations and search using an IMDb knowledge graph – Part 3

AWS Machine Learning Blog

JANUARY 6, 2023

We downloaded the data from AWS Data Exchange and processed it in AWS Glue to generate KG files. In Part 2 , we demonstrated how to use Amazon Neptune ML (in Amazon SageMaker ) to train the KG and create KG embeddings. The KG files were stored in Amazon Simple Storage Service (Amazon S3) and then loaded in Amazon Neptune.

Metadata

Metadata Machine Learning Data Scientist ML

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

Be sure to check out his talk, “ Apache Kafka for Real-Time Machine Learning Without a Data Lake ,” there! The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem.

Machine Learning

Machine Learning Data Science Data Ingestion Neural Network

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 18, 2023

MongoDB Atlas offers automatic sharding, horizontal scalability, and flexible indexing for high-volume data ingestion. Among all, the native time series capabilities is a standout feature, making it ideal for a managing high volume of time-series data, such as business critical application data, telemetry, server logs and more.

Data Extraction

Data Extraction Data Ingestion ML Machine Learning

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

AWS Machine Learning Blog

FEBRUARY 23, 2023

Identification of relevant representation data from a huge volume of data – This is essential to reduce biases in the datasets so that common scenarios (driving at normal speed with obstruction) don’t create class imbalance. To yield better accuracy, DNNs require large volumes of diverse, good quality data.

Automation

Automation Machine Learning Neural Network Data Scientist

Introducing the Topic Tracks for ODSC East 2025: Spotlight on Gen AI, AI Agents, LLMs, & More

ODSC - Open Data Science

FEBRUARY 25, 2025

Topics Include: Agentic AI DesignPatterns LLMs & RAG forAgents Agent Architectures &Chaining Evaluating AI Agent Performance Building with LangChain and LlamaIndex Real-World Applications of Autonomous Agents Who Should Attend: Data Scientists, Developers, AI Architects, and ML Engineers seeking to build cutting-edge autonomous systems.

Data Scientist

Data Scientist Machine Learning Large Language Models ML Engineer

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

Machine learning (ML) is becoming increasingly complex as customers try to solve more and more challenging problems. This complexity often leads to the need for distributed ML, where multiple machines are used to train a single model. SageMaker is a fully managed service for building, training, and deploying ML models.

Machine Learning

Machine Learning ML Python Auto-complete

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

MPII is using a machine learning (ML) bid optimization engine to inform upstream decision-making processes in power asset management and trading. This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. Data comes from disparate sources in a number of formats.

Machine Learning

Machine Learning Data Ingestion ML Data Science

Building Scalable AI Pipelines with MLOps: A Guide for Software Engineers

ODSC - Open Data Science

OCTOBER 7, 2024

Understanding the MLOps Lifecycle The MLOps lifecycle consists of several critical stages, each with its unique challenges: Data Ingestion: Collecting data from various sources and ensuring it’s available for analysis. Data Preparation: Cleaning and transforming raw data to make it usable for machine learning.

Software Engineer

Software Engineer Data Ingestion Machine Learning Data Scientist

Automate the deployment of an Amazon Forecast time-series forecasting model

AWS Machine Learning Blog

MAY 4, 2023

Simple methods for time series forecasting use historical values of the same variable whose future values need to be predicted, whereas more complex, machine learning (ML)-based methods use additional information, such as the time series data of related variables. However, the notebook-based method is difficult to operationalize.

Automation

Automation Metadata Data Ingestion Data Scientist

HAYAT HOLDING uses Amazon SageMaker to increase product quality and optimize manufacturing output, saving $300,000 annually

AWS Machine Learning Blog

MARCH 29, 2023

there is enormous potential to use machine learning (ML) for quality prediction. ML-based predictive quality in HAYAT HOLDING HAYAT is the world’s fourth-largest branded baby diapers manufacturer and the largest paper tissue manufacturer of the EMEA. Two types of data sources exist for this use case.

ML

ML Machine Learning Algorithm Data Scientist

Vertex AI: Guide to Google’s Unified Machine Learning Platform

Pickl AI

AUGUST 28, 2024

Introduction In the rapidly evolving landscape of Machine Learning , Google Cloud’s Vertex AI stands out as a unified platform designed to streamline the entire Machine Learning (ML) workflow. This unified approach enables seamless collaboration among data scientists, data engineers, and ML engineers.

Machine Learning

Machine Learning ML Engineer ML Automation

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Snorkel AI

JANUARY 24, 2023

Unfortunately accessing data across various locations and file types and then operationalizing that data for AI usage has traditionally been a painfully manual, time-consuming, and costly process. Ahmad Khan, Head of AI/ML Strategy at Snowflake, discusses the challenges of operationalizing ML in a recent talk.

Data Ingestion

Data Ingestion Data Science Machine Learning ML

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Snorkel AI

JANUARY 24, 2023

Unfortunately accessing data across various locations and file types and then operationalizing that data for AI usage has traditionally been a painfully manual, time-consuming, and costly process. Ahmad Khan, Head of AI/ML Strategy at Snowflake, discusses the challenges of operationalizing ML in a recent talk.

Data Ingestion

Data Ingestion Data Science Machine Learning ML

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

AWS Machine Learning Blog

APRIL 12, 2023

The PdMS includes AWS services to securely manage the lifecycle of edge compute devices and BHS assets, cloud data ingestion, storage, machine learning (ML) inference models, and business logic to power proactive equipment maintenance in the cloud. It’s an easy way to run analytics on IoT data to gain accurate insights.

ML

ML Machine Learning Data Ingestion Automation

Machine Learning Operations (MLOPs) with Azure Machine Learning

ODSC - Open Data Science

JULY 19, 2023

Machine Learning Operations (MLOps) can significantly accelerate how data scientists and ML engineers meet organizational needs. A well-implemented MLOps process not only expedites the transition from testing to production but also offers ownership, lineage, and historical data about ML artifacts used within the team.

Machine Learning

Machine Learning Data Drift Data Science Data Scientist

Operationalizing Large Language Models: How LLMOps can help your LLM-based applications succeed

deepsense.ai

JULY 30, 2023

These models offer tremendous potential but also bring a unique set of challenges when it comes to building large-scale ML projects. Naturally, training a machine learning model (regardless of the problem being solved or the particular model architecture that was chosen) is a key part of every ML project. But what happens next?

Large Language Models

Large Language Models LLM Machine Learning Automation

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

AWS Machine Learning Blog

APRIL 5, 2023

Amazon SageMaker Serverless Inference is a purpose-built inference service that makes it easy to deploy and scale machine learning (ML) models. PyTorch is an open-source ML framework that accelerates the path from research prototyping to production deployment. The data ingestion for this practice should finish within 60 seconds.

Metadata

Metadata Neural Network ML Python

Demystifying Time Series Database: A Comprehensive Guide

Pickl AI

JULY 8, 2024

They can efficiently aggregate and process data over defined periods, making them ideal for identifying trends, anomalies, and correlations within the data. High-Volume Data Ingestion TSDBs are built to handle large volumes of data coming in at high velocities. What are the Benefits of Using a Time Series Database?

Data Ingestion

Data Ingestion Machine Learning DevOps Data Scientist

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Arranging Efficient Data Streams Modern companies typically receive data from multiple sources. Therefore, quick data ingestion for instant use can be challenging. Machine Learning Integration Opportunities Organizations harness machine learning (ML) algorithms to make forecasts on the data.

Business Intelligence

Business Intelligence Data Ingestion Metadata Machine Learning

Azure Data Engineer Jobs

Pickl AI

APRIL 6, 2023

Data Engineering is one of the most productive job roles today because it imbibes both the skills required for software engineering and programming and advanced analytics needed by Data Scientists. How to Become an Azure Data Engineer? Answer : Polybase helps optimize data ingestion into PDW and supports T-SQL.

Big Data

Big Data ETL Data Ingestion Software Engineer

Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale

The MLOps Blog

FEBRUARY 13, 2025

The solution lies in systems that can handle high-throughput data ingestion while providing accurate, real-time insights. Waabis teams, running large-scale ML experiments, needed a way to organize and share their experiment data efficiently. Tools like neptune.ai Consider the challenges faced by Waabi and ReSpo.Vision.

Data Ingestion

Data Ingestion Automation Software Engineer Metadata

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

Heartbeat

NOVEMBER 6, 2023

In the upcoming part of this series, we will delve into advanced concepts of Airflow, including backfilling techniques and building an ETL pipeline in Airflow for data ingestion into Postgres and Google Cloud BigQuery. You have learned how to trigger a DAG in Airflow, create a DAG from scratch, and initiate its execution.

ETL

ETL Python Metadata Deep Learning

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

One of the most prevalent complaints we hear from ML engineers in the community is how costly and error-prone it is to manually go through the ML workflow of building and deploying models. Building end-to-end machine learning pipelines lets ML engineers build once, rerun, and reuse many times. If all goes well, of course ?

ML

ML Machine Learning Metadata Data Science

How to Build ML Model Training Pipeline

The MLOps Blog

JUNE 6, 2023

Complete ML model training pipeline workflow | Source But before we delve into the step-by-step model training pipeline, it’s essential to understand the basics, architecture, motivations, challenges associated with ML pipelines, and a few tools that you will need to work with.

ML

ML Machine Learning Auto-classification Auto-complete

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The MLOps Blog

AUGUST 11, 2023

There comes a time when every ML practitioner realizes that training a model in Jupyter Notebook is just one small part of the entire project. Getting a workflow ready which takes your data from its raw form to predictions while maintaining responsiveness and flexibility is the real deal.

ML

ML Machine Learning Data Ingestion Deep Learning

How Rocket Companies modernized their data science solution on AWS

Unlock ML insights using the Amazon SageMaker Feature Store Feature Processor

Webinars

Trending Sources

Deliver your first ML use case in 8–12 weeks

Webinars

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

What Do Data Scientists Do? A Guide to AI Maturity, Challenges, and Solutions

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

A Comprehensive Overview of Data Engineering Pipeline Tools

Build a machine learning model to predict student performance using Amazon SageMaker Canvas

Celebrating 40 years of Db2: Running the world’s mission critical workloads

Build an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and AWS CloudFormation

How the UNDP Independent Evaluation Office is using AWS AI/ML services to enhance the use of evaluation to support progress toward the Sustainable Development Goals

Foundational models at the edge

MLOps Landscape in 2023: Top Tools and Platforms

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

Use GitHub Actions with Azure ML Studio: train, deploy/publish, monitor

First ODSC Europe 2023 Sessions Announced

Build an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and the AWS CDK

Power recommendations and search using an IMDb knowledge graph – Part 3

Streaming Machine Learning Without a Data Lake

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

Introducing the Topic Tracks for ODSC East 2025: Spotlight on Gen AI, AI Agents, LLMs, & More

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Building Scalable AI Pipelines with MLOps: A Guide for Software Engineers

Automate the deployment of an Amazon Forecast time-series forecasting model

HAYAT HOLDING uses Amazon SageMaker to increase product quality and optimize manufacturing output, saving $300,000 annually

Vertex AI: Guide to Google’s Unified Machine Learning Platform

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Snorkel AI partners with Snowflake to bring data-centric AI to the Snowflake Data Cloud

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

Machine Learning Operations (MLOPs) with Azure Machine Learning

Operationalizing Large Language Models: How LLMOps can help your LLM-based applications succeed

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

Demystifying Time Series Database: A Comprehensive Guide

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Azure Data Engineer Jobs

Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

How to Build an End-To-End ML Pipeline

How to Build ML Model Training Pipeline

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

Stay Connected