ETL and ML Engineer - Artificial Intelligence Zone

Streamlining ETL data processing at Talent.com with Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 14, 2023

Our pipeline belongs to the general ETL (extract, transform, and load) process family that combines data from multiple sources into a large, central repository. The solution does not require porting the feature extraction code to use PySpark, as required when using AWS Glue as the ETL solution. session.Session().region_name

ETL

ETL Data Scientist Machine Learning Deep Learning

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

Often the Data Team, comprising Data and ML Engineers , needs to build this infrastructure, and this experience can be painful. However, efficient use of ETL pipelines in ML can help make their life much easier. What is an ETL data pipeline in ML? Let’s look at the importance of ETL pipelines in detail.

ETL

ETL ML Machine Learning Data Scientist

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Data refinement: Raw data is refined into consumable layers (raw, processed, conformed, and analytical) using a combination of AWS Glue extract, transform, and load (ETL) jobs and EMR jobs. Deployment times stretched for months and required a team of three system engineers and four ML engineers to keep everything running smoothly.

Data Science

Data Science Data Scientist Data Ingestion DevOps

Webinars

Relevance, Reach, Return: How to Turn Marketing Trends From Hype to High-Impact

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

In addition to the challenge of defining the features for the ML model, it’s critical to automate the feature generation process so that we can get ML features from the raw data for ML inference and model retraining. Because most of the games share similar log types, they want to reuse this ML solution to other games.

Automation

Automation ETL Data Drift ML

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

JANUARY 17, 2024

To obtain such insights, the incoming raw data goes through an extract, transform, and load (ETL) process to identify activities or engagements from the continuous stream of device location pings. We can analyze activities by identifying stops made by the user or mobile device by clustering pings using ML models in Amazon SageMaker.

ETL

ETL ML Machine Learning Data Scientist

Software Engineering Patterns for Machine Learning

The MLOps Blog

SEPTEMBER 7, 2023

This situation is not different in the ML world. Data Scientists and ML Engineers typically write lots and lots of code. Building a mental model for ETL components Learn the art of constructing a mental representation of the components within an ETL process.

Software Engineer

Software Engineer Machine Learning ETL Data Scientist

Build an Amazon SageMaker Model Registry approval and promotion workflow with human intervention

AWS Machine Learning Blog

JANUARY 10, 2024

Specialist Data Engineering at Merck, and Prabakaran Mathaiyan, Sr. ML Engineer at Tiger Analytics. The large machine learning (ML) model development lifecycle requires a scalable model release process similar to that of software development. This post is co-written with Jayadeep Pabbisetty, Sr.

ML

ML Machine Learning Data Scientist ETL

The Undisputed Champion of Open Source Generative AI

TheSequence

MAY 21, 2023

📢 Event: apply(risk), the ML Engineering Community Conference for Building Risk & Fraud Detection Systems Want to connect with the ML engineering community and learn best practices from ML practitioners at Affirm, Remitly, Block, Tide, and more, on how to build risk and fraud detection systems?

Generative AI

Generative AI ML Engineer ETL LLM

Bring your own AI using Amazon SageMaker with Salesforce Data Cloud

AWS Machine Learning Blog

AUGUST 4, 2023

It eliminates tedious, costly, and error-prone ETL (extract, transform, and load) jobs. SageMaker Projects provides a straightforward way to set up and standardize the development environment for data scientists and ML engineers to build and deploy ML models on SageMaker.

Data Scientist

Data Scientist ML ETL Data Platform

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

MARCH 15, 2023

.” Hence the very first thing to do is to make sure that the data being used is of high quality and that any errors or anomalies are detected and corrected before proceeding with ETL and data sourcing. If you aren’t aware already, let’s introduce the concept of ETL. We primarily used ETL services offered by AWS.

ETL

ETL Data Drift Machine Learning ML

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

These teams are as follows: Advanced analytics team (data lake and data mesh) – Data engineers are responsible for preparing and ingesting data from multiple sources, building ETL (extract, transform, and load) pipelines to curate and catalog the data, and prepare the necessary historical data for the ML use cases.

Generative AI

Generative AI Prompt Engineer Prompt Engineering ML

Modernizing data science lifecycle management with AWS and Wipro

AWS Machine Learning Blog

JANUARY 5, 2024

With experience of leading AWS AI/ML solutions across industries, Bhajandeep has enabled clients to maximize the value of AWS AI/ML services through his expertise and leadership. Ajay Vishwakarma is an ML engineer for the AWS wing of Wipro’s AI solution practice.

Data Science

Data Science Data Drift DevOps Auto-complete

How to Build Machine Learning Systems With a Feature Store

The MLOps Blog

JANUARY 26, 2024

We’ll see how this architecture applies to different classes of ML systems, discuss MLOps and testing aspects, and look at some example implementations. Understanding machine learning pipelines Machine learning (ML) pipelines are a key component of ML systems. But what is an ML pipeline?

Machine Learning

Machine Learning Metadata ML Python

Arize AI on How to apply and use machine learning observability

Snorkel AI

JUNE 30, 2023

You have to make sure that your ETLs are locked down. And usually what ends up happening is that some poor data scientist or ML engineer has to manually troubleshoot this in a Jupyter Notebook. So this path on the right side of the production icon is what we’re calling ML observability.

Machine Learning

Machine Learning ML Data Drift Data Quality

Arize AI on How to apply and use machine learning observability

Snorkel AI

JUNE 30, 2023

You have to make sure that your ETLs are locked down. And usually what ends up happening is that some poor data scientist or ML engineer has to manually troubleshoot this in a Jupyter Notebook. So this path on the right side of the production icon is what we’re calling ML observability.

Machine Learning

Machine Learning ML Data Drift Data Quality

Arize AI on How to apply and use machine learning observability

Snorkel AI

JUNE 30, 2023

You have to make sure that your ETLs are locked down. And usually what ends up happening is that some poor data scientist or ML engineer has to manually troubleshoot this in a Jupyter Notebook. So this path on the right side of the production icon is what we’re calling ML observability.

Machine Learning

Machine Learning ML Data Drift Data Quality

Working as a Data Scientist?—?expectation versus reality!

Mlearning.ai

FEBRUARY 9, 2023

While dealing with larger quantities of data, you will likely be working with Data Engineers to create ETL (extract, transform, load) pipelines to get data from new sources. Data Science is an umbrella role with common roles such as Data Analytics, research, ML model building, ML Ops, and ML engineering underneath.

Data Scientist

Data Scientist Data Science ML Machine Learning

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

And that’s what we’re going to focus on in this article, which is the second in my series on Software Patterns for Data Science & ML Engineering. While often ignored by data scientists, I believe mastering ETL is core and critical to guarantee the success of any machine learning project.

Data Scientist

Data Scientist Python Explainability Data Science

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

The MLOps Blog

MARCH 28, 2023

Regarding other teams, they may approach testing ML models differently, especially in tabular ML use cases, by testing on sub-populations of the data. It’s a healthy situation when data scientists and ML engineers, in particular, are responsible for delivering tests for the functionalities of their projects.

Machine Learning

Machine Learning Data Scientist Automation ML

Learnings From Building the ML Platform at Stitch Fix

The MLOps Blog

AUGUST 3, 2023

This is Piotr Niedźwiedź and Aurimas Griciūnas from neptune.ai , and you’re listening to ML Platform Podcast. Stefan is a software engineer, data scientist, and has been doing work as an ML engineer. Jeff Magnusson has a pretty famous post about engineers shouldn’t write ETL. Stefan: Yeah.

ML

ML Data Scientist Software Engineer Machine Learning

Search enterprise data assets using LLMs backed by knowledge graphs

Flipboard

NOVEMBER 27, 2024

He helps architect solutions across AI/ML applications, enterprise data platforms, data governance, and unified search in enterprises. Gi Kim is a Data & ML Engineer with the AWS Professional Services team, helping customers build data analytics solutions and AI/ML applications.

Metadata

Metadata Auto-complete Data Discovery ML Engineer

AWS SageMaker vs. Custom ML: Choosing the Right Approach in 2025

How to Learn Machine Learning

APRIL 10, 2025

Seamless AWS Integration Works effortlessly with AWS S3 (data storage), AWS Lambda (serverless computing), and AWS Glue (ETL). Challenges of Custom ML Initial deployment costs are high because DevOps teams and ML engineers must be hired while infrastructure expenditure is necessary.

ML

ML DevOps Auto-complete Machine Learning

Artificial Intelligence Zone

Streamlining ETL data processing at Talent.com with Amazon SageMaker

How to Build ETL Data Pipeline in ML

Webinars

Trending Sources

How Rocket Companies modernized their data science solution on AWS

Webinars

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

Software Engineering Patterns for Machine Learning

Build an Amazon SageMaker Model Registry approval and promotion workflow with human intervention

The Undisputed Champion of Open Source Generative AI

Bring your own AI using Amazon SageMaker with Salesforce Data Cloud

How to Build a CI/CD MLOps Pipeline [Case Study]

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Modernizing data science lifecycle management with AWS and Wipro

How to Build Machine Learning Systems With a Feature Store

Arize AI on How to apply and use machine learning observability

Arize AI on How to apply and use machine learning observability

Arize AI on How to apply and use machine learning observability

Working as a Data Scientist?—?expectation versus reality!

How to Use Exploratory Notebooks [Best Practices]

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

Learnings From Building the ML Platform at Stitch Fix

Search enterprise data assets using LLMs backed by knowledge graphs

AWS SageMaker vs. Custom ML: Choosing the Right Approach in 2025

Stay Connected