Data Scientist, ETL and ML Engineer - Artificial Intelligence Zone

Streamlining ETL data processing at Talent.com with Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 14, 2023

Our pipeline belongs to the general ETL (extract, transform, and load) process family that combines data from multiple sources into a large, central repository. This post shows how we used SageMaker to build a large-scale data processing pipeline for preparing features for the job recommendation engine at Talent.com.

ETL

ETL Data Scientist Machine Learning Deep Learning

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

This also led to a backlog of data that needed to be ingested. Steep learning curve for data scientists: Many of Rockets data scientists did not have experience with Spark, which had a more nuanced programming model compared to other popular ML solutions like scikit-learn.

Data Science

Data Science Data Scientist Data Ingestion DevOps

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

Often the Data Team, comprising Data and ML Engineers , needs to build this infrastructure, and this experience can be painful. However, efficient use of ETL pipelines in ML can help make their life much easier. What is an ETL data pipeline in ML?

ETL

ETL ML Machine Learning Data Scientist

Webinars

The Intersection of AI and Sales: Personalization Without Compromise

How to Achieve High-Accuracy Results When Using LLMs

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

MORE WEBINARS

Working as a Data Scientist?—?expectation versus reality!

Mlearning.ai

FEBRUARY 9, 2023

Working as a Data Scientist — Expectation versus Reality! 11 key differences in 2023 Photo by Jan Tinneberg on Unsplash Working in Data Science and Machine Learning (ML) professions can be a lot different from the expectation of it. You could be working entirely on data analytics under a Data Scientist job title.

Data Scientist

Data Scientist Data Science ML Machine Learning

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

In addition to the challenge of defining the features for the ML model, it’s critical to automate the feature generation process so that we can get ML features from the raw data for ML inference and model retraining. Because most of the games share similar log types, they want to reuse this ML solution to other games.

Automation

Automation ETL Data Drift ML

Software Engineering Patterns for Machine Learning

The MLOps Blog

SEPTEMBER 7, 2023

This situation is not different in the ML world. Data Scientists and ML Engineers typically write lots and lots of code. Building a mental model for ETL components Learn the art of constructing a mental representation of the components within an ETL process.

Software Engineer

Software Engineer Machine Learning ETL Data Scientist

Build an Amazon SageMaker Model Registry approval and promotion workflow with human intervention

AWS Machine Learning Blog

JANUARY 10, 2024

Specialist Data Engineering at Merck, and Prabakaran Mathaiyan, Sr. ML Engineer at Tiger Analytics. The large machine learning (ML) model development lifecycle requires a scalable model release process similar to that of software development. This post is co-written with Jayadeep Pabbisetty, Sr.

ML

ML Machine Learning Data Scientist ETL

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

JANUARY 17, 2024

To obtain such insights, the incoming raw data goes through an extract, transform, and load (ETL) process to identify activities or engagements from the continuous stream of device location pings. We can analyze activities by identifying stops made by the user or mobile device by clustering pings using ML models in Amazon SageMaker.

ETL

ETL ML Machine Learning Data Scientist

Bring your own AI using Amazon SageMaker with Salesforce Data Cloud

AWS Machine Learning Blog

AUGUST 4, 2023

Introducing Einstein Studio on Data Cloud Data Cloud is a data platform that provides businesses with real-time updates of their customer data from any touch point. With Einstein Studio, a gateway to AI tools on the data platform, admins and data scientists can effortlessly create models with a few clicks or using code.

Data Scientist

Data Scientist ML ETL Data Platform

Modernizing data science lifecycle management with AWS and Wipro

AWS Machine Learning Blog

JANUARY 5, 2024

We can then examine how the integrated SageMaker AI/ML offerings helped solve those challenges. Collaboration – Data scientists each worked on their own local Jupyter notebooks to create and train ML models. They lacked an effective method for sharing and collaborating with other data scientists.

Data Science

Data Science Data Drift DevOps Auto-complete

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

These teams are as follows: Advanced analytics team (data lake and data mesh) – Data engineers are responsible for preparing and ingesting data from multiple sources, building ETL (extract, transform, and load) pipelines to curate and catalog the data, and prepare the necessary historical data for the ML use cases.

Generative AI

Generative AI Prompt Engineering Prompt Engineer ML

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

MARCH 15, 2023

Collaboration : Ensuring that all teams involved in the project, including data scientists, engineers, and operations teams, are working together effectively. Costs: Oftentimes, cost is the most important aspect of any ML model deployment. One Data Engineer: Cloud database integration with our cloud expert.

ETL

ETL Data Drift Machine Learning ML

Arize AI on How to apply and use machine learning observability

Snorkel AI

JUNE 30, 2023

You have to make sure that your ETLs are locked down. And usually what ends up happening is that some poor data scientist or ML engineer has to manually troubleshoot this in a Jupyter Notebook. So this path on the right side of the production icon is what we’re calling ML observability.

Machine Learning

Machine Learning ML Data Drift Data Quality

Arize AI on How to apply and use machine learning observability

Snorkel AI

JUNE 30, 2023

You have to make sure that your ETLs are locked down. And usually what ends up happening is that some poor data scientist or ML engineer has to manually troubleshoot this in a Jupyter Notebook. So this path on the right side of the production icon is what we’re calling ML observability.

Machine Learning

Machine Learning ML Data Drift Data Quality

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

Nevertheless, many data scientists will agree that they can be really valuable – if used well. And that’s what we’re going to focus on in this article, which is the second in my series on Software Patterns for Data Science & ML Engineering. Data on its own is not sufficient for a cohesive story.

Data Scientist

Data Scientist Python Explainability ETL

Arize AI on How to apply and use machine learning observability

Snorkel AI

JUNE 30, 2023

You have to make sure that your ETLs are locked down. And usually what ends up happening is that some poor data scientist or ML engineer has to manually troubleshoot this in a Jupyter Notebook. So this path on the right side of the production icon is what we’re calling ML observability.

Machine Learning

Machine Learning ML Data Drift Data Quality

Learnings From Building the ML Platform at Stitch Fix

The MLOps Blog

AUGUST 3, 2023

This is Piotr Niedźwiedź and Aurimas Griciūnas from neptune.ai , and you’re listening to ML Platform Podcast. Stefan is a software engineer, data scientist, and has been doing work as an ML engineer. To a junior data scientist, it doesn’t matter if you’re using Airflow, Prefect , Dexter.

ML

ML Data Scientist Software Engineer Machine Learning

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

The MLOps Blog

MARCH 28, 2023

This brings interpersonal challenges, and the AI/ML teams are encouraged to build good relationships with clients to help support the models by telling people how to use the solution instead of just exposing the endpoint without documentation or telling them how. For them, it works surprisingly well, considering how low the complexity is. “We

Machine Learning

Machine Learning Data Scientist Automation ML

Artificial Intelligence Zone

Streamlining ETL data processing at Talent.com with Amazon SageMaker

How Rocket Companies modernized their data science solution on AWS

Webinars

Trending Sources

How to Build ETL Data Pipeline in ML

Webinars

Working as a Data Scientist?—?expectation versus reality!

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Software Engineering Patterns for Machine Learning

Build an Amazon SageMaker Model Registry approval and promotion workflow with human intervention

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

Bring your own AI using Amazon SageMaker with Salesforce Data Cloud

Modernizing data science lifecycle management with AWS and Wipro

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

How to Build a CI/CD MLOps Pipeline [Case Study]

Arize AI on How to apply and use machine learning observability

Arize AI on How to apply and use machine learning observability

How to Use Exploratory Notebooks [Best Practices]

Arize AI on How to apply and use machine learning observability

Learnings From Building the ML Platform at Stitch Fix

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

Stay Connected