Data Quality, Data Scientist and ML Engineer - Artificial Intelligence Zone

David Driggers, CTO of Cirrascale – Interview Series

Unite.AI

JANUARY 27, 2025

Enterprise-wide AI adoption faces barriers like data quality, infrastructure constraints, and high costs. While Cirrascale does not offer Data Quality type services, we do partner with companies that can assist with Data issues. How does Cirrascale address these challenges for businesses scaling AI initiatives?

Deep Learning

Deep Learning Data Quality ML Engineer Data Scientist

The Weather Company enhances MLOps with Amazon SageMaker, AWS CloudFormation, and Amazon CloudWatch

AWS Machine Learning Blog

JULY 8, 2024

TWCo data scientists and ML engineers took advantage of automation, detailed experiment tracking, integrated training, and deployment pipelines to help scale MLOps effectively. The Data Quality Check part of the pipeline creates baseline statistics for the monitoring task in the inference pipeline.

Data Scientist

Data Scientist ML Engineer Data Science Machine Learning

Customized model monitoring for near real-time batch inference with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 28, 2024

Early and proactive detection of deviations in model quality enables you to take corrective actions, such as retraining models, auditing upstream systems, or fixing quality issues without having to monitor models manually or build additional tooling. Ajay Raghunathan is a Machine Learning Engineer at AWS. Raju Patil is a Sr.

ML

ML Metadata Data Scientist DevOps

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

MORE WEBINARS

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 27, 2024

However, there are many clear benefits of modernizing our ML platform and moving to Amazon SageMaker Studio and Amazon SageMaker Pipelines. Each product translates into an AWS CloudFormation template, which is deployed when a data scientist creates a new SageMaker project with our MLOps blueprint as the foundation.

Machine Learning

Machine Learning DevOps Data Scientist Data Quality

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

AWS Machine Learning Blog

JANUARY 28, 2025

Furthermore, evaluation processes are important not only for LLMs, but are becoming essential for assessing prompt template quality, input data quality, and ultimately, the entire application stack. In this post, we show how to use FMEval and Amazon SageMaker to programmatically evaluate LLMs.

LLM

LLM Large Language Models ML Algorithm

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Some popular end-to-end MLOps platforms in 2023 Amazon SageMaker Amazon SageMaker provides a unified interface for data preprocessing, model training, and experimentation, allowing data scientists to collaborate and share code easily. Check out the Kubeflow documentation.

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

Use a data-centric approach to minimize the amount of data required to train Amazon SageMaker models

AWS Machine Learning Blog

MARCH 9, 2023

As machine learning (ML) models have improved, data scientists, ML engineers and researchers have shifted more of their attention to defining and bettering data quality. Applying these techniques allows ML practitioners to reduce the amount of data required to train an ML model.

ML Engineer

ML Engineer Data Scientist Convolutional Neural Networks ML

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 16, 2023

Amazon SageMaker provides purpose-built tools for machine learning operations (MLOps) to help automate and standardize processes across the ML lifecycle. In this post, we describe how Philips partnered with AWS to develop AI ToolSuite—a scalable, secure, and compliant ML platform on SageMaker.

Data Scientist

Data Scientist ML Data Science Automation

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

Its goal is to help with a quick analysis of target characteristics, training vs testing data, and other such data characterization tasks. Apache Superset GitHub | Website Apache Superset is a must-try project for any ML engineer, data scientist, or data analyst.

Data Analysis

Data Analysis Data Science Business Intelligence Python

How to Visualize Deep Learning Models

The MLOps Blog

NOVEMBER 14, 2023

Visualizing deep learning models can help us with several different objectives: Interpretability and explainability: The performance of deep learning models is, at times, staggering, even for seasoned data scientists and ML engineers. Which one is right for you depends on your goal.

Deep Learning

Deep Learning Neural Network Convolutional Neural Networks Data Scientist

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

Model governance involves overseeing the development, deployment, and maintenance of ML models to help ensure that they meet business objectives and are accurate, fair, and compliant with regulations.

ML

ML Machine Learning Auto-complete Auto-classification

The Age of Health Informatics: Part 1

Heartbeat

OCTOBER 23, 2023

Revolutionizing Healthcare through Data Science and Machine Learning Image by Cai Fang on Unsplash Introduction In the digital transformation era, healthcare is experiencing a paradigm shift driven by integrating data science, machine learning, and information technology.

Data Scientist

Data Scientist Machine Learning Big Data Algorithm

Enterprise LLM Summit highlights the importance of data development

Snorkel AI

OCTOBER 27, 2023

Instead of exclusively relying on a singular data development technique, leverage a variety of techniques such as promoting, RAG, and fine-tuning for the most optimal outcome. Focus on improving data quality and transforming manual data development processes into programmatic operations to scale fine-tuning.

LLM

LLM Data Scientist Machine Learning Large Language Models

Deliver your first ML use case in 8–12 weeks

AWS Machine Learning Blog

APRIL 26, 2023

You may have gaps in skills and technologies, including operationalizing ML solutions, implementing ML services, and managing ML projects for rapid iterations. Ensuring data quality, governance, and security may slow down or stall ML projects. We recognize that customers have different starting points.

ML

ML Machine Learning Data Science Data Drift

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

JANUARY 17, 2024

Solution overview As mentioned earlier, the AWS services that you can use for analysis of mobility data are Amazon S3, Amazon Macie, AWS Glue, S3 Object Lambda, Amazon Comprehend, and Amazon SageMaker geospatial capabilities. Data scientists can accomplish this process by connecting through Amazon SageMaker notebooks.

ETL

ETL ML Machine Learning Data Scientist

Enterprise LLM Summit highlights the importance of data development

Snorkel AI

OCTOBER 27, 2023

Instead of exclusively relying on a singular data development technique, leverage a variety of techniques such as promoting, RAG, and fine-tuning for the most optimal outcome. Focus on improving data quality and transforming manual data development processes into programmatic operations to scale fine-tuning.

LLM

LLM Data Scientist Machine Learning Large Language Models

How Vodafone Uses TensorFlow Data Validation in their Data Contracts to Elevate Data Governance at Scale

TensorFlow

MARCH 10, 2023

It can also include constraints on the data, such as: Minimum and maximum values for numerical columns Allowed values for categorical columns. Before a model is productionized, the Contract is agreed upon by the stakeholders working on the pipeline, such as the ML Engineers, Data Scientists and Data Owners.

Data Drift

Data Drift Data Scientist ML Engineer Machine Learning

Must-Have Skills for a Machine Learning Engineer

Pickl AI

NOVEMBER 28, 2024

Fundamental Programming Skills Strong programming skills are essential for success in ML. This section will highlight the critical programming languages and concepts ML engineers should master, including Python, R , and C++, and an understanding of data structures and algorithms. during the forecast period.

Machine Learning

Machine Learning Neural Network ML Engineer Algorithm

Arize AI on How to apply and use machine learning observability

Snorkel AI

JUNE 30, 2023

And usually what ends up happening is that some poor data scientist or ML engineer has to manually troubleshoot this in a Jupyter Notebook. So this path on the right side of the production icon is what we’re calling ML observability. We have four pillars that we use when thinking about ML observability.

Machine Learning

Machine Learning ML Data Drift Data Quality

Arize AI on How to apply and use machine learning observability

Snorkel AI

JUNE 30, 2023

And usually what ends up happening is that some poor data scientist or ML engineer has to manually troubleshoot this in a Jupyter Notebook. So this path on the right side of the production icon is what we’re calling ML observability. We have four pillars that we use when thinking about ML observability.

Machine Learning

Machine Learning ML Data Drift Data Quality

Enterprise LLM Summit highlights the importance of data development

Snorkel AI

OCTOBER 27, 2023

Instead of exclusively relying on a singular data development technique, leverage a variety of techniques such as promoting, RAG, and fine-tuning for the most optimal outcome. Focus on improving data quality and transforming manual data development processes into programmatic operations to scale fine-tuning.

LLM

LLM Data Scientist Machine Learning Large Language Models

MLOps: What is a Product First vs. Model First Mindset?

Mlearning.ai

MAY 23, 2023

It’s critical for beginners learn this, since it affects everything: workflows, data quality requirements, etc. Model mindset prioritizes the ML model that you are building. While product mindset focuses on the end data product: the minimum viable product. There are two approaches we see in MLOps. What is the Difference?

Machine Learning

Machine Learning Data Scientist ML Data Science

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

Snorkel AI

JUNE 9, 2023

Transforming the Customer Experience with AI: Wayfair’s Data-Centric Way Wayfair ’s Archana Sapkota (ML Manager) and Vinny DeGenova (Associate Director of Machine Learning) shared insights on transforming the customer experience with AI, emphasizing the use of ML in understanding customers and catalog products.

Large Language Models

Large Language Models Data Scientist Machine Learning Computer Vision

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

Snorkel AI

JUNE 9, 2023

Transforming the Customer Experience with AI: Wayfair’s Data-Centric Way Wayfair ’s Archana Sapkota (ML Manager) and Vinny DeGenova (Associate Director of Machine Learning) shared insights on transforming the customer experience with AI, emphasizing the use of ML in understanding customers and catalog products.

Large Language Models

Large Language Models Data Scientist Machine Learning Computer Vision

Arize AI on How to apply and use machine learning observability

Snorkel AI

JUNE 30, 2023

And usually what ends up happening is that some poor data scientist or ML engineer has to manually troubleshoot this in a Jupyter Notebook. So this path on the right side of the production icon is what we’re calling ML observability. We have four pillars that we use when thinking about ML observability.

Machine Learning

Machine Learning ML Data Drift Data Quality

Watch all Future of Data-Centric AI 2023 videos now!

Snorkel AI

OCTOBER 12, 2023

Data-Driven Government: A Fireside Chat with the Former U.S. Chief Data Scientist In this fireside chat as Snorkel AI CEO and co-founder Alex Ratner and DJ Patil, the Former U.S. Chief Data Scientist dive into data science’s history, impact, and challenges in the United States government.

Data Scientist

Data Scientist ML Computer Vision AI

Importance of Machine Learning Model Retraining in Production

Heartbeat

OCTOBER 30, 2023

Ensuring Long-Term Performance and Adaptability of Deployed Models Source: [link] Introduction When working on any machine learning problem, data scientists and machine learning engineers usually spend a lot of time on data gathering , efficient data preprocessing , and modeling to build the best model for the use case.

Machine Learning

Machine Learning Data Drift ML Data Scientist

Watch all Future of Data-Centric AI 2023 videos now!

Snorkel AI

OCTOBER 12, 2023

Data-Driven Government: A Fireside Chat with the Former U.S. Chief Data Scientist In this fireside chat as Snorkel AI CEO and co-founder Alex Ratner and DJ Patil, the Former U.S. Chief Data Scientist dive into data science’s history, impact, and challenges in the United States government.

Data Scientist

Data Scientist ML Computer Vision AI

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

From data processing to quick insights, robust pipelines are a must for any ML system. Often the Data Team, comprising Data and ML Engineers , needs to build this infrastructure, and this experience can be painful. However, efficient use of ETL pipelines in ML can help make their life much easier.

ETL

ETL ML Machine Learning Data Scientist

Watch all Future of Data-Centric AI 2023 videos now!

Snorkel AI

OCTOBER 12, 2023

Data-Driven Government: A Fireside Chat with the Former U.S. Chief Data Scientist In this fireside chat as Snorkel AI CEO and co-founder Alex Ratner and DJ Patil, the Former U.S. Chief Data Scientist dive into data science’s history, impact, and challenges in the United States government.

Data Scientist

Data Scientist NLP ML Computer Vision

Learnings From Building the ML Platform at Stitch Fix

The MLOps Blog

AUGUST 3, 2023

This is Piotr Niedźwiedź and Aurimas Griciūnas from neptune.ai , and you’re listening to ML Platform Podcast. Stefan is a software engineer, data scientist, and has been doing work as an ML engineer. To a junior data scientist, it doesn’t matter if you’re using Airflow, Prefect , Dexter.

ML

ML Data Scientist Software Engineer Machine Learning

What is Data Scrubbing? Unfolding the Details

Pickl AI

JUNE 6, 2024

Data scrubbing is often used interchangeably but there’s a subtle difference. Cleaning is broader, improving data quality. This is a more intensive technique within data cleaning, focusing on identifying and correcting errors. Data scrubbing is a powerful tool within this cleaning service.

Machine Learning

Machine Learning Algorithm Business Intelligence Data Quality

7 Critical Model Training Errors: What They Mean & How to Fix Them

Viso.ai

JANUARY 30, 2024

During machine learning model training, there are seven common errors that engineers and data scientists typically run into. This is a bigger deal with raw or unstructured data that engineers and developers might be using to feed the machine learning program. 6: Data Drift What is Data Drift?

Data Drift

Data Drift Machine Learning Computer Vision Algorithm

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

MARCH 15, 2023

Collaboration : Ensuring that all teams involved in the project, including data scientists, engineers, and operations teams, are working together effectively. Costs: Oftentimes, cost is the most important aspect of any ML model deployment. This includes data quality, privacy, and compliance.

ETL

ETL Data Drift Machine Learning ML

Google experts on practical paths to data-centricity in applied AI

Snorkel AI

JULY 5, 2023

Organizations struggle in multiple aspects, especially in modern-day data engineering practices and getting ready for successful AI outcomes. One of them is that it is really hard to maintain high data quality with rigorous validation. More features mean more data consumed upstream. Catch the sessions you missed!

Large Language Models

Large Language Models Metadata Machine Learning AI

Google experts on practical paths to data-centricity in applied AI

Snorkel AI

JULY 5, 2023

Organizations struggle in multiple aspects, especially in modern-day data engineering practices and getting ready for successful AI outcomes. One of them is that it is really hard to maintain high data quality with rigorous validation. More features mean more data consumed upstream. Catch the sessions you missed!

Large Language Models

Large Language Models Metadata Machine Learning AI

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

AWS Machine Learning Blog

JANUARY 26, 2024

The goal of this post is to empower AI and machine learning (ML) engineers, data scientists, solutions architects, security teams, and other stakeholders to have a common mental model and framework to apply security best practices, allowing AI/ML teams to move fast without trading off security for speed.

Generative AI

Generative AI ML LLM AI

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

One of the most prevalent complaints we hear from ML engineers in the community is how costly and error-prone it is to manually go through the ML workflow of building and deploying models. Building end-to-end machine learning pipelines lets ML engineers build once, rerun, and reuse many times. Data preprocessing.

ML

ML Machine Learning Metadata Data Science

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The MLOps Blog

AUGUST 11, 2023

Getting a workflow ready which takes your data from its raw form to predictions while maintaining responsiveness and flexibility is the real deal. At that point, the Data Scientists or ML Engineers become curious and start looking for such implementations.

ML

ML Machine Learning Data Ingestion Deep Learning

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

From gathering and processing data to building models through experiments, deploying the best ones, and managing them at scale for continuous value in production—it’s a lot. As the number of ML-powered apps and services grows, it gets overwhelming for data scientists and ML engineers to build and deploy models at scale.

Machine Learning

Machine Learning Data Scientist ML Metadata

LLM Agents Underscore One Truth: Data Is The Real Differentiator.

Towards AI

NOVEMBER 8, 2024

. — Peter Norvig, The Unreasonable Effectiveness of Data. Edited Photo by Taylor Vick on Unsplash In ML engineering, data quality isn’t just critical — it’s foundational. Since 2011, Peter Norvig’s words underscore the power of a data-centric approach in machine learning. Using biased or low-quality data?

LLM

LLM ML Engineer Data Quality Data Scientist

Mikiko Bazeley: What I Learned Building the ML Platform at Mailchimp

The MLOps Blog

JANUARY 26, 2024

I started my ML journey as an analyst back in 2016. Since then, I’ve worked as a data scientist for a multinational company and an MLOps engineer for an early-stage startup before moving to Mailchimp in May 2021. Technical projects must be aligned with business objectives. This was my team.)

ML

ML Data Scientist Machine Learning ML Engineer

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

AWS Machine Learning Blog

NOVEMBER 13, 2024

With the unification of SageMaker Model Cards and SageMaker Model Registry, architects, data scientists, ML engineers, or platform engineers (depending on the organization’s hierarchy) can now seamlessly register ML model versions early in the development lifecycle, including essential business details and technical metadata.

Metadata

Metadata ML Software Engineer Machine Learning

David Driggers, CTO of Cirrascale – Interview Series

The Weather Company enhances MLOps with Amazon SageMaker, AWS CloudFormation, and Amazon CloudWatch

Webinars

Trending Sources

Customized model monitoring for near real-time batch inference with Amazon SageMaker

Webinars

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

MLOps Landscape in 2023: Top Tools and Platforms

Use a data-centric approach to minimize the amount of data required to train Amazon SageMaker models

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

11 Open Source Data Exploration Tools You Need to Know in 2023

How to Visualize Deep Learning Models

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

The Age of Health Informatics: Part 1

Enterprise LLM Summit highlights the importance of data development

Deliver your first ML use case in 8–12 weeks

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

Enterprise LLM Summit highlights the importance of data development

How Vodafone Uses TensorFlow Data Validation in their Data Contracts to Elevate Data Governance at Scale

Must-Have Skills for a Machine Learning Engineer

Arize AI on How to apply and use machine learning observability

Arize AI on How to apply and use machine learning observability

Enterprise LLM Summit highlights the importance of data development

MLOps: What is a Product First vs. Model First Mindset?

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

Arize AI on How to apply and use machine learning observability

Watch all Future of Data-Centric AI 2023 videos now!

Importance of Machine Learning Model Retraining in Production

Watch all Future of Data-Centric AI 2023 videos now!

How to Build ETL Data Pipeline in ML

Watch all Future of Data-Centric AI 2023 videos now!

Learnings From Building the ML Platform at Stitch Fix

What is Data Scrubbing? Unfolding the Details

7 Critical Model Training Errors: What They Mean & How to Fix Them

How to Build a CI/CD MLOps Pipeline [Case Study]

Google experts on practical paths to data-centricity in applied AI

Google experts on practical paths to data-centricity in applied AI

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

How to Build an End-To-End ML Pipeline

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

Definite Guide to Building a Machine Learning Platform

LLM Agents Underscore One Truth: Data Is The Real Differentiator.

Mikiko Bazeley: What I Learned Building the ML Platform at Mailchimp

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

Stay Connected