Data Quality, Data Scientist and ML - Artificial Intelligence Zone

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

Flipboard

NOVEMBER 22, 2024

This post is part of an ongoing series about governing the machine learning (ML) lifecycle at scale. This post dives deep into how to set up data governance at scale using Amazon DataZone for the data mesh. The data mesh is a modern approach to data management that decentralizes data ownership and treats data as a product.

ML

ML Data Science Metadata DevOps

David Driggers, CTO of Cirrascale – Interview Series

Unite.AI

JANUARY 27, 2025

Enterprise-wide AI adoption faces barriers like data quality, infrastructure constraints, and high costs. While Cirrascale does not offer Data Quality type services, we do partner with companies that can assist with Data issues. How does Cirrascale address these challenges for businesses scaling AI initiatives?

Deep Learning

Deep Learning Data Quality ML Engineer Data Scientist

AI in DevOps: Streamlining Software Deployment and Operations

Unite.AI

OCTOBER 30, 2023

When unstructured data surfaces during AI development, the DevOps process plays a crucial role in data cleansing, ultimately enhancing the overall model quality. Improving AI quality: AI system effectiveness hinges on data quality. Poor data can distort AI responses.

DevOps

DevOps Software Development Automation Artificial Intelligence

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Step-by-step guide: Generative AI for your business

IBM Journey to AI blog

JULY 30, 2024

Data Scientists and AI experts: Historically we have seen Data Scientists build and choose traditional ML models for their use cases. Data Scientists will typically help with training, validating, and maintaining foundation models that are optimized for data tasks. IBM watsonx.ai

Generative AI

Generative AI Data Scientist Software Engineer AI

Customized model monitoring for near real-time batch inference with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 28, 2024

Real-world applications vary in inference requirements for their artificial intelligence and machine learning (AI/ML) solutions to optimize performance and reduce costs. SageMaker Model Monitor monitors the quality of SageMaker ML models in production.

ML

ML Metadata Data Scientist Machine Learning

Real value, real time: Production AI with Amazon SageMaker and Tecton

AWS Machine Learning Blog

DECEMBER 4, 2024

Businesses are under pressure to show return on investment (ROI) from AI use cases, whether predictive machine learning (ML) or generative AI. Only 54% of ML prototypes make it to production, and only 5% of generative AI use cases make it to production. Using SageMaker, you can build, train and deploy ML models.

ML

ML Machine Learning Generative AI AI

Data-centric ML benchmarking: Announcing DataPerf’s 2023 challenges

Google Research AI blog

MARCH 30, 2023

Posted by Peter Mattson, Senior Staff Engineer, ML Performance, and Praveen Paritosh, Senior Research Scientist, Google Research, Brain Team Machine learning (ML) offers tremendous potential, from diagnosing cancer to engineering safe self-driving cars to amplifying human productivity. Each step can introduce issues and biases.

ML

ML Algorithm NLP Neural Network

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 27, 2024

In this post, we share how Axfood, a large Swedish food retailer, improved operations and scalability of their existing artificial intelligence (AI) and machine learning (ML) operations by prototyping in close collaboration with AWS experts and using Amazon SageMaker. This is a guest post written by Axfood AB.

Machine Learning

Machine Learning DevOps Data Scientist Data Quality

Deliver your first ML use case in 8–12 weeks

AWS Machine Learning Blog

APRIL 26, 2023

Do you need help to move your organization’s Machine Learning (ML) journey from pilot to production? Most executives think ML can apply to any business decision, but on average only half of the ML projects make it to production. Challenges Customers may face several challenges when implementing machine learning (ML) solutions.

ML

ML Machine Learning Data Science Data Drift

How to build a successful AI strategy

IBM Journey to AI blog

DECEMBER 20, 2023

This calls for the organization to also make important decisions regarding data, talent and technology: A well-crafted strategy will provide a clear plan for managing, analyzing and leveraging data for AI initiatives. Identify potential partners and vendors Find companies in the AI and ML space that have worked within your industry.

AI Strategy

AI Strategy Artificial Intelligence Artificial Intelligence Machine Learning

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone makes it straightforward for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so they can discover, use, and collaborate to derive data-driven insights.

Machine Learning

Machine Learning Data Scientist ML Data Quality

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

AWS Machine Learning Blog

JANUARY 28, 2025

Furthermore, evaluation processes are important not only for LLMs, but are becoming essential for assessing prompt template quality, input data quality, and ultimately, the entire application stack. This allows you to keep track of your ML experiments. We specifically focus on SageMaker with MLflow.

LLM

LLM Large Language Models ML Algorithm

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

We recently announced the general availability of cross-account sharing of Amazon SageMaker Model Registry using AWS Resource Access Manager (AWS RAM) , making it easier to securely share and discover machine learning (ML) models across your AWS accounts.

ML

ML Machine Learning Auto-complete Auto-classification

Five benefits of a data catalog

IBM Journey to AI blog

DECEMBER 16, 2022

An enterprise data catalog does all that a library inventory system does – namely streamlining data discovery and access across data sources – and a lot more. For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance.

Metadata

Metadata Data Quality Data Discovery Data Scientist

The Weather Company enhances MLOps with Amazon SageMaker, AWS CloudFormation, and Amazon CloudWatch

AWS Machine Learning Blog

JULY 8, 2024

As industries begin adopting processes dependent on machine learning (ML) technologies, it is critical to establish machine learning operations (MLOps) that scale to support growth and utilization of this technology. There were noticeable challenges when running ML workflows in the cloud.

Data Scientist

Data Scientist ML Engineer Data Science Machine Learning

McKinsey QuantumBlack on automating data quality remediation with AI

Snorkel AI

JUNE 22, 2023

Jacomo Corbo is a Partner and Chief Scientist, and Bryan Richardson is an Associate Partner and Senior Data Scientist, for QuantumBlack AI by McKinsey. They presented “Automating Data Quality Remediation With AI” at Snorkel AI’s The Future of Data-Centric AI Summit in 2022.

Data Quality

Data Quality Automation Data Scientist ML

McKinsey QuantumBlack on automating data quality remediation with AI

Snorkel AI

JUNE 22, 2023

Jacomo Corbo is a Partner and Chief Scientist, and Bryan Richardson is an Associate Partner and Senior Data Scientist, for QuantumBlack AI by McKinsey. They presented “Automating Data Quality Remediation With AI” at Snorkel AI’s The Future of Data-Centric AI Summit in 2022.

Data Quality

Data Quality Automation Data Scientist ML

9 data governance strategies that will unlock the potential of your business data

IBM Journey to AI blog

SEPTEMBER 5, 2024

Access to high-quality data can help organizations start successful products, defend against digital attacks, understand failures and pivot toward success. Emerging technologies and trends, such as machine learning (ML), artificial intelligence (AI), automation and generative AI (gen AI), all rely on good data quality.

Metadata

Metadata Data Quality Auto-classification DevOps

McKinsey QuantumBlack on automating data quality remediation with AI

Snorkel AI

JUNE 22, 2023

Jacomo Corbo is a Partner and Chief Scientist, and Bryan Richardson is an Associate Partner and Senior Data Scientist, for QuantumBlack AI by McKinsey. They presented “Automating Data Quality Remediation With AI” at Snorkel AI’s The Future of Data-Centric AI Summit in 2022.

Data Quality

Data Quality Automation Data Scientist ML

Beyond the Human Eye: Enhancing Nondestructive Testing with AI Insights

Aiiot Talk

APRIL 30, 2024

Traditionally, NDT relied heavily on manual inspection techniques and human expertise, but the process has undergone a transformative evolution with the advent of AI and machine learning (ML). AI and ML are augmenting human capabilities and advanced data analysis, paving the way for safer and more reliable NDT processes in the following ways.

Automation

Automation Algorithm Machine Learning ML

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Flipboard

MARCH 22, 2023

Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and data and analytics. We also detail the steps that data scientists can take to configure the data flow, analyze the data quality, and add data transformations.

IDP

IDP Data Scientist Categorization Data Quality

Steven Hillion, SVP of Data and AI at Astronomer – Interview Series

Unite.AI

JUNE 24, 2024

Steven Hillion is the Senior Vice President of Data and AI at Astronomer , where he leverages his extensive academic background in research mathematics and over 15 years of experience in Silicon Valley's machine learning platform development. Can you elaborate on the use of synthetic data to fine-tune smaller models for accuracy?

Data Scientist

Data Scientist Large Language Models Machine Learning Software Engineer

Jay Mishra, COO of Astera Software – Interview Series

Unite.AI

SEPTEMBER 22, 2023

Jay Mishra is the Chief Operating Officer (COO) at Astera Software , a rapidly-growing provider of enterprise-ready data solutions. Speed Varying data formats Data publishing What are some ways that Astera has integrated AI into customer workflow? What initially attracted you to computer science?

Large Language Models

Large Language Models Automation Artificial Intelligence Artificial Intelligence

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

ETL

ETL Metadata AI AI

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

From data processing to quick insights, robust pipelines are a must for any ML system. Often the Data Team, comprising Data and ML Engineers , needs to build this infrastructure, and this experience can be painful. However, efficient use of ETL pipelines in ML can help make their life much easier.

ETL

ETL ML Machine Learning Data Scientist

Data Acquisition & Exploration: Exploring 5 Key MLOps Questions using AWS SageMaker

Towards AI

JUNE 24, 2023

The ’31 Questions that Shape Fortune 500 ML Strategy’ highlighted key questions to assess the maturity of an ML system. A robust ML platform offers managed solutions to easily address these aspects. Collaboration] How can multiple data scientists collaborate in real-time on the same dataset?

Data Scientist

Data Scientist ML Automation Data Quality

MLOps for batch inference with model monitoring and retraining using Amazon SageMaker, HashiCorp Terraform, and GitLab CI/CD

AWS Machine Learning Blog

AUGUST 29, 2023

The presented MLOps workflow provides a reusable template for managing the ML lifecycle through automation, monitoring, auditability, and scalability, thereby reducing the complexities and costs of maintaining batch inference workloads in production. SageMaker Pipelines serves as the orchestrator for ML model training and inference workflows.

Data Scientist

Data Scientist Data Quality Python ML

ML | Data Preprocessing in Python

Pickl AI

DECEMBER 3, 2024

It involves steps like handling missing values, normalizing data, and managing categorical features, ultimately enhancing model performance and ensuring data quality. Introduction Data preprocessing is a critical step in the Machine Learning pipeline, transforming raw data into a clean and usable format.

Python

Python ML Categorization Machine Learning

Smart Retail: Harnessing Machine Learning for Retail Demand Forecasting Excellence

Pickl AI

OCTOBER 9, 2023

This case study sheds light on how a leading global retailer, XYZ Retail, harnessed the power of Machine Learning (ML) to revolutionize its demand forecasting process. Real-time Updates: ML models continuously updated forecasts, allowing XYZ Retail to make agile decisions regarding inventory management, promotions, and supply chain logistics.

Machine Learning

Machine Learning Algorithm ML Data Quality

How To Improve AI Model Robustness in the Last Mile

ODSC - Open Data Science

APRIL 20, 2023

Artificial intelligence (AI) and machine learning (ML) have rapidly become key drivers of business transformation. However, the latest innovations in ML and AI have made one thing clear: AI does not work in isolation. By doing so, companies can maximize the potential of AI and ML to drive success in their organizations.

AI Modeling

AI Modeling Machine Learning Large Language Models Categorization

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 16, 2023

Amazon SageMaker provides purpose-built tools for machine learning operations (MLOps) to help automate and standardize processes across the ML lifecycle. In this post, we describe how Philips partnered with AWS to develop AI ToolSuite—a scalable, secure, and compliant ML platform on SageMaker.

Data Scientist

Data Scientist ML Data Science Machine Learning

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 1, 2024

Our experiments demonstrate that careful attention to data quality, hyperparameter optimization, and best practices in the fine-tuning process can yield substantial gains over base models. For our evaluation, we used the F1 score , which is an evaluation metric to assess the performance of LLMs and traditional ML models.

LLM

LLM Prompt Engineer Prompt Engineering Generative AI

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

Its goal is to help with a quick analysis of target characteristics, training vs testing data, and other such data characterization tasks. Apache Superset GitHub | Website Apache Superset is a must-try project for any ML engineer, data scientist, or data analyst.

Data Analysis

Data Analysis Data Science Business Intelligence Python

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

AWS Machine Learning Blog

NOVEMBER 29, 2023

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. Amazon SageMaker notebook jobs allow data scientists to run their notebooks on demand or on a schedule with a few clicks in SageMaker Studio.

Data Drift

Data Drift BERT Data Scientist Python

Monitoring Machine Learning Models in Production

Heartbeat

JUNE 12, 2023

The primary goal of model monitoring is to ensure that the model remains effective and reliable in making predictions or decisions, even as the data or environment in which it operates evolves. Learn more about building effective ML teams with our free ebook. Both can lead to a degradation in model performance and accuracy.

Machine Learning

Machine Learning Data Drift Explainability Data Quality

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

AWS Machine Learning Blog

JUNE 23, 2023

Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code.

Auto-complete

Auto-complete Auto-classification ML Data Quality

Learnings From Building the ML Platform at Stitch Fix

The MLOps Blog

AUGUST 3, 2023

This article was originally an episode of the ML Platform Podcast , a show where Piotr Niedźwiedź and Aurimas Griciūnas, together with ML platform professionals, discuss design choices, best practices, example tool stacks, and real-world learnings from some of the best ML platform professionals. Stefan: Yeah.

ML

ML Data Scientist Software Engineer Machine Learning

Use a data-centric approach to minimize the amount of data required to train Amazon SageMaker models

AWS Machine Learning Blog

MARCH 9, 2023

As machine learning (ML) models have improved, data scientists, ML engineers and researchers have shifted more of their attention to defining and bettering data quality. Applying these techniques allows ML practitioners to reduce the amount of data required to train an ML model.

ML Engineer

ML Engineer Data Scientist Convolutional Neural Networks ML

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS Machine Learning Blog

DECEMBER 1, 2023

The ability to quickly build and deploy machine learning (ML) models is becoming increasingly important in today’s data-driven world. However, building ML models requires significant time, effort, and specialized expertise. And experienced data scientists can be hard to come by.

Machine Learning

Machine Learning ML Categorization Data Quality

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

JANUARY 17, 2024

In this first post, we introduce mobility data, its sources, and a typical schema of this data. We then discuss the various use cases and explore how you can use AWS services to clean the data, how machine learning (ML) can aid in this effort, and how you can make ethical use of the data in generating visuals and insights.

ETL

ETL ML Machine Learning Data Scientist

Enable data sharing through federated learning: A policy approach for chief digital officers

AWS Machine Learning Blog

MARCH 15, 2024

This is a guest blog post written by Nitin Kumar, a Lead Data Scientist at T and T Consulting Services, Inc. Medical data restrictions You can use machine learning (ML) to assist doctors and researchers in diagnosis tasks, thereby speeding up the process. Stroke victims can lose around 1.9 So why hasn’t it been used yet?

ML

ML Data Scientist Machine Learning Natural Language Processing

How to Visualize Deep Learning Models

The MLOps Blog

NOVEMBER 14, 2023

This is where visualizations in ML come in. Graphical representations of structures and data flow within a deep learning model make its complexity easier to comprehend and enable insight into its decision-making process. Data scientists and ML engineers: Creating and training deep learning models is no easy feat.

Deep Learning

Deep Learning Neural Network Convolutional Neural Networks Data Scientist

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

AWS Machine Learning Blog

NOVEMBER 15, 2023

Therefore, when the Principal team started tackling this project, they knew that ensuring the highest standard of data security such as regulatory compliance, data privacy, and data quality would be a non-negotiable, key requirement.

Data Ingestion

Data Ingestion Metadata NLP Data Scientist

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

David Driggers, CTO of Cirrascale – Interview Series

Webinars

Trending Sources

AI in DevOps: Streamlining Software Deployment and Operations

Webinars

Step-by-step guide: Generative AI for your business

Customized model monitoring for near real-time batch inference with Amazon SageMaker

Real value, real time: Production AI with Amazon SageMaker and Tecton

Data-centric ML benchmarking: Announcing DataPerf’s 2023 challenges

MLOps Landscape in 2023: Top Tools and Platforms

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

Deliver your first ML use case in 8–12 weeks

How to build a successful AI strategy

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

Five benefits of a data catalog

The Weather Company enhances MLOps with Amazon SageMaker, AWS CloudFormation, and Amazon CloudWatch

McKinsey QuantumBlack on automating data quality remediation with AI

McKinsey QuantumBlack on automating data quality remediation with AI

9 data governance strategies that will unlock the potential of your business data

McKinsey QuantumBlack on automating data quality remediation with AI

Beyond the Human Eye: Enhancing Nondestructive Testing with AI Insights

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Steven Hillion, SVP of Data and AI at Astronomer – Interview Series

Jay Mishra, COO of Astera Software – Interview Series

Tackling AI’s data challenges with IBM databases on AWS

How to Build ETL Data Pipeline in ML

Data Acquisition & Exploration: Exploring 5 Key MLOps Questions using AWS SageMaker

MLOps for batch inference with model monitoring and retraining using Amazon SageMaker, HashiCorp Terraform, and GitLab CI/CD

ML | Data Preprocessing in Python

Smart Retail: Harnessing Machine Learning for Retail Demand Forecasting Excellence

How To Improve AI Model Robustness in the Last Mile

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

11 Open Source Data Exploration Tools You Need to Know in 2023

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

Monitoring Machine Learning Models in Production

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

Learnings From Building the ML Platform at Stitch Fix

Use a data-centric approach to minimize the amount of data required to train Amazon SageMaker models

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

Enable data sharing through federated learning: A policy approach for chief digital officers

How to Visualize Deep Learning Models

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

Stay Connected