article thumbnail

Streamlining ETL data processing at Talent.com with Amazon SageMaker

AWS Machine Learning Blog

Our pipeline belongs to the general ETL (extract, transform, and load) process family that combines data from multiple sources into a large, central repository. The system includes feature engineering, deep learning model architecture design, hyperparameter optimization, and model evaluation, where all modules are run using Python.

ETL 81
article thumbnail

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

.” Hence the very first thing to do is to make sure that the data being used is of high quality and that any errors or anomalies are detected and corrected before proceeding with ETL and data sourcing. If you aren’t aware already, let’s introduce the concept of ETL. We primarily used ETL services offered by AWS.

ETL 52
professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How to Build ETL Data Pipeline in ML

The MLOps Blog

Often the Data Team, comprising Data and ML Engineers , needs to build this infrastructure, and this experience can be painful. However, efficient use of ETL pipelines in ML can help make their life much easier. What is an ETL data pipeline in ML? Let’s look at the importance of ETL pipelines in detail.

ETL 59
article thumbnail

Software Engineering Patterns for Machine Learning

The MLOps Blog

This situation is not different in the ML world. Data Scientists and ML Engineers typically write lots and lots of code. These combinations of Python code and SQL play a crucial role but can be challenging to keep them robust for their entire lifetime. Explore how these principles can elevate the quality of your ETL work.

article thumbnail

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

In addition to the challenge of defining the features for the ML model, it’s critical to automate the feature generation process so that we can get ML features from the raw data for ML inference and model retraining. Because most of the games share similar log types, they want to reuse this ML solution to other games.

article thumbnail

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

The GPU-powered interactive visualizer and Python notebooks provide a seamless way to explore millions of data points in a single window and share insights and results. We can analyze activities by identifying stops made by the user or mobile device by clustering pings using ML models in Amazon SageMaker.

ETL 103
article thumbnail

How to Build Machine Learning Systems With a Feature Store

The MLOps Blog

We’ll see how this architecture applies to different classes of ML systems, discuss MLOps and testing aspects, and look at some example implementations. Understanding machine learning pipelines Machine learning (ML) pipelines are a key component of ML systems. But what is an ML pipeline? All of them are written in Python.