article thumbnail

Build an image search engine with Amazon Kendra and Amazon Rekognition

AWS Machine Learning Blog

The following figure shows an example diagram that illustrates an orchestrated extract, transform, and load (ETL) architecture solution. For example, searching for the terms “How to orchestrate ETL pipeline” returns results of architecture diagrams built with AWS Glue and AWS Step Functions. join(", "), }; }).catch((error)

Metadata 101
article thumbnail

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

In the case of our CI/CD-MLOPs system, we stored the model versions and metadata in the data storage services offered by AWS i.e ” Hence the very first thing to do is to make sure that the data being used is of high quality and that any errors or anomalies are detected and corrected before proceeding with ETL and data sourcing.

ETL 52
professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Integrate SaaS platforms with Amazon SageMaker to enable ML-powered applications

AWS Machine Learning Blog

The open-source Custom Connector SDK enables the development of a private, shared, or public connector using Python or Java. SaaS platform SDK – If the SaaS platform has an SDK (Software Development Kit), such as a Python SDK, this can be used to access data directly from a SageMaker notebook.

ML 75
article thumbnail

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

More useful resources about DVC: Versioning data and models Data version control with Python and DVC DVCorg YouTube DVC data version control cheatsheet At this point, one question arises; why use DVC instead of Git? It provides options for tracking, organizing, and storing metadata from machine learning experiments. neptune.ai

ML 52
article thumbnail

The Full Stack Data Scientist Part 6: Automation with Airflow

Applied Data Science

To keep myself sane, I use Airflow to automate tasks with simple, reusable pieces of code for frequently repeated elements of projects, for example: Web scraping ETL Database management Feature building and data validation And much more! Note that we can use the core python package datetime to help us define our DAGs.

article thumbnail

How to Build Machine Learning Systems With a Feature Store

The MLOps Blog

A feature store typically comprises a feature repository, a feature serving layer, and a metadata store. The metadata store manages the metadata associated with each feature, such as its origin and transformations. All of them are written in Python. It can also transform incoming data on the fly.

article thumbnail

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

To solve this problem, we build an extract, transform, and load (ETL) pipeline that can be run automatically and repeatedly for training and inference dataset creation. The ETL pipeline, MLOps pipeline, and ML inference should be rebuilt in a different AWS account. But there is still an engineering challenge.