article thumbnail

Create SageMaker Pipelines for training, consuming and monitoring your batch use cases

AWS Machine Learning Blog

If the model performs acceptably according to the evaluation criteria, the pipeline continues with a step to baseline the data using a built-in SageMaker Pipelines step. For the data drift Model Monitor type, the baselining step uses a SageMaker managed container image to generate statistics and constraints based on your training data.

article thumbnail

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

Databricks Databricks is a cloud-native platform for big data processing, machine learning, and analytics built using the Data Lakehouse architecture. Delta Lake Delta Lake is an open-source storage layer that provides reliability, ACID transactions, and data versioning for big data processing frameworks such as Apache Spark.

professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

AWS Machine Learning Blog

For instance, a notebook that monitors for model data drift should have a pre-step that allows extract, transform, and load (ETL) and processing of new data and a post-step of model refresh and training in case a significant drift is noticed. Run the notebooks The sample code for this solution is available on GitHub.

article thumbnail

Capital One’s data-centric solutions to banking business challenges

Snorkel AI

Kishore will then double click into some of the opportunities we find here at Capital One, and Bayan will finish us off with a lean into one of our open-source solutions that really is an important contribution to our data-centric AI community. Compute, big data, large commoditized models—all important stages.

article thumbnail

Capital One’s data-centric solutions to banking business challenges

Snorkel AI

Kishore will then double click into some of the opportunities we find here at Capital One, and Bayan will finish us off with a lean into one of our open-source solutions that really is an important contribution to our data-centric AI community. Compute, big data, large commoditized models—all important stages.

article thumbnail

MLOps for batch inference with model monitoring and retraining using Amazon SageMaker, HashiCorp Terraform, and GitLab CI/CD

AWS Machine Learning Blog

The batch inference pipeline includes steps for checking data quality against a baseline created by the training pipeline, as well as model quality (model performance) if ground truth labels are available. If the batch inference pipeline discovers data quality issues, it will notify the responsible data scientist via Amazon SNS.

article thumbnail

Google experts on practical paths to data-centricity in applied AI

Snorkel AI

We thought we’d structure this more as a conversation where we walk you through some of our thinking around some of the most common themes in data centricity in applied AI. Is more data always better? One of them is that it is really hard to maintain high data quality with rigorous validation.