Blog, Metadata and ML Engineer - Artificial Intelligence Zone

Data4ML Preparation Guidelines (Beyond The Basics)

Towards AI

NOVEMBER 8, 2024

Data preparation isn’t just a part of the ML engineering process — it’s the heart of it. Data is a key differentiator in ML projects (more on this in my blog post below). This step, often done with data engineers, ensures a reproducible data snapshot from sources like production databases or APIs.

Data Ingestion

Data Ingestion Metadata ML Engineer ML

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Structured Query Language (SQL) is a complex language that requires an understanding of databases and metadata. Third, despite the larger adoption of centralized analytics solutions like data lakes and warehouses, complexity rises with different table names and other metadata that is required to create the SQL for the desired sources.

Metadata

Metadata Generative AI LLM NLP

Deploy Amazon SageMaker pipelines using AWS Controllers for Kubernetes

AWS Machine Learning Blog

SEPTEMBER 4, 2024

In this post, we introduce an example to help DevOps engineers manage the entire ML lifecycle—including training and inference—using the same toolkit. Solution overview We consider a use case in which an ML engineer configures a SageMaker model building pipeline using a Jupyter notebook.

DevOps

DevOps ML Engineer ML Metadata

Webinars

4 HR Priorities for 2025 to Supercharge Your Employee Experience

AI in Marketing & Sales: Today’s Tools, Tomorrow’s Potential

AI for Paralegals: Everything You Need to Know (and How to Use It Safely)

MORE WEBINARS

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

AWS Machine Learning Blog

NOVEMBER 13, 2024

You can now register machine learning (ML) models in Amazon SageMaker Model Registry with Amazon SageMaker Model Cards , making it straightforward to manage governance information for specific model versions directly in SageMaker Model Registry in just a few clicks.

Metadata

Metadata ML Software Engineer Machine Learning

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

AWS Machine Learning Blog

AUGUST 22, 2024

It automatically keeps track of model artifacts, hyperparameters, and metadata, helping you to reproduce and audit model versions. The SageMaker Pipelines decorator feature helps convert local ML code written as a Python program into one or more pipeline steps. SageMaker Pipelines can handle model versioning and lineage tracking.

Generative AI

Generative AI Metadata Python ML

Driving advanced analytics outcomes at scale using Amazon SageMaker powered PwC’s Machine Learning Ops Accelerator

AWS Machine Learning Blog

DECEMBER 19, 2023

Artificial intelligence (AI) and machine learning (ML) are becoming an integral part of systems and processes, enabling decisions in real time, thereby driving top and bottom-line improvements across organizations. However, putting an ML model into production at scale is challenging and requires a set of best practices.

Machine Learning

Machine Learning ML Engineer DevOps ML

Learn how Amazon Ads created a generative AI-powered image generation capability using Amazon SageMaker

AWS Machine Learning Blog

MAY 15, 2024

This blog post shares more about how generative AI solutions from Amazon Ads help brands create more visually rich consumer experiences. In this blog post, we describe the architectural and operational details of how Amazon Ads implemented its generative AI-powered image creation solution on AWS.

Generative AI

Generative AI AI AI Machine Learning

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

An ML engineer deploys the model pipeline into the ML team test environment using a shared services CI/CD process. After stakeholder validation, the ML model is deployed to the team’s production environment. ML operations This module helps LOBs and ML engineers work on their dev instances of the model deployment template.

ML

ML Data Scientist ML Engineer Data Science

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

When thinking about a tool for metadata storage and management, you should consider: General business-related items : Pricing model, security, and support. When thinking about a tool for metadata storage and management, you should consider: General business-related items : Pricing model, security, and support. Can you compare images?

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

Let’s demystify this using the following personas and a real-world analogy: Data and ML engineers (owners and producers) – They lay the groundwork by feeding data into the feature store Data scientists (consumers) – They extract and utilize this data to craft their models Data engineers serve as architects sketching the initial blueprint.

ML

ML Machine Learning ML Engineer Data Scientist

From concept to reality: Navigating the Journey of RAG from proof of concept to production

AWS Machine Learning Blog

FEBRUARY 12, 2025

Machine learning (ML) engineers must make trade-offs and prioritize the most important factors for their specific use case and business requirements. You can use metadata filtering to narrow down search results by specifying inclusion and exclusion criteria.

Auto-classification

Auto-classification Metadata Generative AI Machine Learning

A guide to Amazon Bedrock Model Distillation (preview)

AWS Machine Learning Blog

DECEMBER 4, 2024

You can optionally add request metadata to these inference requests to filter your invocation logs for specific use cases. These logs capture the prompts, responses, and metadata from your previous model interactions, making them a valuable source of data. If you haven’t done this yet, see to the prerequisites section for instructions.

Metadata

Metadata Generative AI Categorization Data Scientist

Set up Amazon SageMaker Studio with Jupyter Lab 3 using the AWS CDK

AWS Machine Learning Blog

JANUARY 17, 2023

This post guides you through the steps to get started with setting up and deploying Studio to standardize ML model development and collaboration with fellow ML engineers and ML scientists. cdk.json – Contains metadata, and feature flags. Marcelo Aberle is an ML Engineer in the AWS AI organization.

Software Engineer

Software Engineer ML Engineer ML Machine Learning

How Earth.com and Provectus implemented their MLOps Infrastructure with Amazon SageMaker

AWS Machine Learning Blog

JUNE 27, 2023

This blog post is co-written with Marat Adayev and Dmitrii Evstiukhin from Provectus. When machine learning (ML) models are deployed into production and employed to drive business decisions, the challenge often lies in the operation and management of multiple models. Endpoints had to be deployed manually as well.

DevOps

DevOps ML Machine Learning ML Engineer

MLOps Is an Extension of DevOps. Not a Fork — My Thoughts on THE MLOPS Paper as an MLOps Startup CEO

The MLOps Blog

JANUARY 23, 2023

Came to ML from software. Founded neptune.ai , a modular MLOps component for ML metadata store , aka “experiment tracker + model registry”. Most of our customers are doing ML/MLOps at a reasonable scale, NOT at the hyperscale of big-tech FAANG companies. . – How about the ML engineer? Let me explain.

DevOps

DevOps Metadata Software Engineer Data Scientist

Create high-quality datasets with Amazon SageMaker Ground Truth and FiftyOne

AWS Machine Learning Blog

MAY 5, 2023

Solution overview Ground Truth is a fully self-served and managed data labeling service that empowers data scientists, machine learning (ML) engineers, and researchers to build high-quality datasets. For our example use case, we work with the Fashion200K dataset , released at ICCV 2017.

Metadata

Metadata Computer Vision Machine Learning Data Scientist

Use IP-restricted presigned URLs to enhance security in Amazon SageMaker Ground Truth

AWS Machine Learning Blog

AUGUST 20, 2024

In this blog post, we show you how to enable this feature, allowing you to enhance your data security as needed, and outline the success criteria for this feature, including the scenarios where it will be most beneficial. Follow the guidance provided in this blog to configure these security measures effectively.

Software Engineer

Software Engineer ML Machine Learning Metadata

How to Save Trained Model in Python

The MLOps Blog

MAY 10, 2023

Finally, you can store the model and other metadata information using the INSERT INTO command. Storing ML models in model registry A model registry is a central repository that can store, version, and manage machine learning models. It typically includes features like model versioning , metadata control, comparing model runs, etc.

Python

Python Metadata ML Machine Learning

Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

SEPTEMBER 29, 2023

Planet and AWS’s partnership on geospatial ML SageMaker geospatial capabilities empower data scientists and ML engineers to build, train, and deploy models using geospatial data. It also contains each scene’s metadata, its image ID, and a preview image reference.

Machine Learning

Machine Learning Data Scientist ML Python

Customized model monitoring for near real-time batch inference with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 28, 2024

You can use this framework as a starting point to monitor your custom metrics or handle other unique requirements for model quality monitoring in your AI/ML applications. Data Scientist at AWS, bringing a breadth of data science, ML engineering, MLOps, and AI/ML architecting to help businesses create scalable solutions on AWS.

ML

ML Metadata Data Scientist DevOps

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

AWS Machine Learning Blog

JANUARY 28, 2025

FMEval is an open source LLM evaluation library, designed to provide data scientists and machine learning (ML) engineers with a code-first experience to evaluate LLMs for various aspects, including accuracy, toxicity, fairness, robustness, and efficiency. This allows you to keep track of your ML experiments.

LLM

LLM Large Language Models ML Algorithm

Build an Amazon SageMaker Model Registry approval and promotion workflow with human intervention

AWS Machine Learning Blog

JANUARY 10, 2024

Specialist Data Engineering at Merck, and Prabakaran Mathaiyan, Sr. ML Engineer at Tiger Analytics. The large machine learning (ML) model development lifecycle requires a scalable model release process similar to that of software development. This post is co-written with Jayadeep Pabbisetty, Sr.

ML

ML Machine Learning Data Scientist ETL

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

AWS Machine Learning Blog

MAY 30, 2024

In this blog, we walkthrough the architectural components, evaluation criteria for the components selected by Vitech and the process flow of user interaction within VitechIQ. Prompt engineering Prompt engineering is crucial for the knowledge retrieval system.

Chatbots

Chatbots Prompt Engineering Prompt Engineer Large Language Models

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 16, 2023

This is a joint blog with AWS and Philips. Amazon SageMaker provides purpose-built tools for machine learning operations (MLOps) to help automate and standardize processes across the ML lifecycle. In this post, we describe how Philips partnered with AWS to develop AI ToolSuite—a scalable, secure, and compliant ML platform on SageMaker.

Data Scientist

Data Scientist ML Data Science Automation

How to Build Machine Learning Systems With a Feature Store

The MLOps Blog

JANUARY 26, 2024

We’ll see how this architecture applies to different classes of ML systems, discuss MLOps and testing aspects, and look at some example implementations. Understanding machine learning pipelines Machine learning (ML) pipelines are a key component of ML systems. But what is an ML pipeline?

Machine Learning

Machine Learning Metadata ML Python

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

AWS Machine Learning Blog

JULY 24, 2024

Fine-tuning an LLM can be a complex workflow for data scientists and machine learning (ML) engineers to operationalize. By logging your datasets with MLflow, you can store metadata, such as dataset descriptions, version numbers, and data statistics, alongside your MLflow runs.

LLM

LLM ML Generative AI Machine Learning

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 2, 2024

Additionally, you can enable model invocation logging to collect invocation logs, full request response data, and metadata for all Amazon Bedrock model API invocations in your AWS account. Leveraging her expertise in Computer Vision and Deep Learning, she empowers customers to harness the power of the ML in AWS cloud efficiently.

Generative AI

Generative AI Data Ingestion AI AI

Scale and simplify ML workload monitoring on Amazon EKS with AWS Neuron Monitor container

AWS Machine Learning Blog

JUNE 25, 2024

He has experience with multiple programming languages, AWS cloud services, AI/ML technologies, product and operations management, pre and early seed start-up ventures, and corporate finance. Geeta Gharpure is a senior software developer on the Annapurna ML engineering team. os operator: In values: - linux - key: node.kubernetes.io/instance-type

ML

ML Metadata Software Development Generative AI

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

AWS Machine Learning Blog

DECEMBER 13, 2023

ML operations, known as MLOps, focus on streamlining, automating, and monitoring ML models throughout their lifecycle. Data scientists, ML engineers, IT staff, and DevOps teams must work together to operationalize models from research to deployment and maintenance.

ML

ML Automation Metadata Software Development

CMU Researchers Introduce Zeno: A Framework for Behavioral Evaluation of Machine Learning (ML) Models

Marktechpost

JULY 19, 2023

Stakeholders such as ML engineers, designers, and domain experts must work together to identify a model’s expected and potential faults. Instead, ML engineers collaborate with domain experts and designers to describe a model’s expected capabilities before it is iterated and deployed.

Machine Learning

Machine Learning ML Python Metadata

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

Solution overview The ML solution for LTV forecasting is composed of four components: the training dataset ETL pipeline, MLOps pipeline, inference dataset ETL pipeline, and ML batch inference. ML engineers no longer need to manage this training metadata separately.

Automation

Automation ETL Data Drift ML

How Forethought saves over 66% in costs for generative AI models using Amazon SageMaker

AWS Machine Learning Blog

JUNE 13, 2023

This post is co-written with Jad Chamoun, Director of Engineering at Forethought Technologies, Inc. and Salina Wu, Senior ML Engineer at Forethought Technologies, Inc. We defined logic that would take in model metadata, format the endpoint deterministically based on the metadata, and check whether the endpoint existed.

Generative AI

Generative AI Auto-complete AI Modeling Machine Learning

Use Amazon SageMaker Model Card sharing to improve model governance

AWS Machine Learning Blog

AUGUST 31, 2023

Model cards are intended to be a single source of truth for business and technical metadata about the model that can reliably be used for auditing and documentation purposes. They provide a fact sheet of the model that is important for model governance.

ML

ML Data Scientist Machine Learning Data Science

Bundesliga Match Fact Ball Recovery Time: Quantifying teams’ success in pressing opponents on AWS

AWS Machine Learning Blog

MARCH 30, 2023

Metadata of the match is processed within the AWS Lambda function MetaDataIngestion , while positional data is ingested using the AWS Fargate container called MatchLink. Fotinos Kyriakides is an ML Engineer with AWS Professional Services. The following diagram illustrates the end-to-end workflow for Ball Recovery Time.

Machine Learning

Machine Learning Data Scientist Data Science Metadata

Logging PyMC and Arviz Artifacts on Neptune

The MLOps Blog

JANUARY 24, 2024

help data scientists systematically record, catalog, and analyze modeling artifacts and experiment metadata. is an experiment tracker for ML teams that struggle with debugging and reproducing experiments, sharing results, and messy model handover. Experiment trackers like neptune.ai Even though neptune.ai Aside neptune.ai

Metadata

Metadata Python Data Scientist ML

Exploring Generative AI in conversational experiences: An Introduction with Amazon Lex, Langchain, and SageMaker Jumpstart

AWS Machine Learning Blog

JUNE 8, 2023

A session stores metadata and application-specific data known as session attributes. Ryan Gomes is a Data & ML Engineer with the AWS Professional Services Intelligence Practice. A session persists over time unless manually stopped or timed out. He leads the NYC machine learning and AI meetup.

Generative AI

Generative AI LLM Machine Learning Large Language Models

Accelerate pre-training of Mistral’s Mathstral model with highly resilient clusters on Amazon SageMaker HyperPod

AWS Machine Learning Blog

SEPTEMBER 18, 2024

By directly integrating with Amazon Managed Service for Prometheus and Amazon Managed Grafana and abstracting the management of hardware failures and job resumption, SageMaker HyperPod allows data scientists and ML engineers to focus on model development rather than infrastructure management.

Auto-complete

Auto-complete ML Generative AI Deep Learning

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

MARCH 15, 2023

Cost and resource requirements There are several cost-related constraints we had to consider when we ventured into the ML model deployment journey Data storage costs: Storing the data used to train and test the model, as well as any new data used for prediction, can add to the cost of deployment. S3 buckets. Redshift, S3, and so on.

ETL

ETL Data Drift Machine Learning ML

Use Amazon SageMaker Model Cards sharing to improve model governance

AWS Machine Learning Blog

AUGUST 31, 2023

Model cards are intended to be a single source of truth for business and technical metadata about the model that can reliably be used for auditing and documentation purposes. They provide a fact sheet of the model that is important for model governance.

ML

ML Data Scientist Machine Learning Data Science

How Did We Get to ML Model Reproducibility

The MLOps Blog

MARCH 14, 2023

Read more How to Solve Reproducibility in ML Machine learning reproducibility checklist: solutions we adapted As ML engineers we make sure that every problem should have one or multiple possible solutions, as is the case for ML reproducibility challenges. seemed like the right solution.

ML

ML Machine Learning Metadata ML Engineer

Learnings From Building the ML Platform at Stitch Fix

The MLOps Blog

AUGUST 3, 2023

This is Piotr Niedźwiedź and Aurimas Griciūnas from neptune.ai , and you’re listening to ML Platform Podcast. Stefan is a software engineer, data scientist, and has been doing work as an ML engineer. Stefan: Thankfully, Stitch Fix had a blog that had a reasonable amount of readership. Stefan: Yeah.

ML

ML Data Scientist Software Engineer Machine Learning

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

The MLOps Blog

MARCH 28, 2023

quality attributes) and metadata enrichment (e.g., Regarding other teams, they may approach testing ML models differently, especially in tabular ML use cases, by testing on sub-populations of the data. If you want to learn more about Brainly’s technology ecosystem, check out their technology blog.

Machine Learning

Machine Learning Data Scientist Automation ML

Learnings From Building the ML Platform at Mailchimp

The MLOps Blog

OCTOBER 3, 2023

I see so many of these job seekers, especially on the MLOps side or the ML engineer side. There’s no component that stores metadata about this feature store? Mikiko Bazeley: In the case of the literal feature store, all it does is store features and metadata. For example, Shopify published a blog post about Merlin.

ML

ML Data Scientist Machine Learning Data Science

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

After the completion of the research phase, the data scientists need to collaborate with ML engineers to create automations for building (ML pipelines) and deploying models into production using CI/CD pipelines. Security SMEs review the architecture based on business security policies and needs.

Generative AI

Generative AI Prompt Engineering Prompt Engineer ML

Data4ML Preparation Guidelines (Beyond The Basics)

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

Webinars

Trending Sources

Deploy Amazon SageMaker pipelines using AWS Controllers for Kubernetes

Webinars

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

Driving advanced analytics outcomes at scale using Amazon SageMaker powered PwC’s Machine Learning Ops Accelerator

Learn how Amazon Ads created a generative AI-powered image generation capability using Amazon SageMaker

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

MLOps Landscape in 2023: Top Tools and Platforms

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

From concept to reality: Navigating the Journey of RAG from proof of concept to production

A guide to Amazon Bedrock Model Distillation (preview)

Set up Amazon SageMaker Studio with Jupyter Lab 3 using the AWS CDK

How Earth.com and Provectus implemented their MLOps Infrastructure with Amazon SageMaker

MLOps Is an Extension of DevOps. Not a Fork — My Thoughts on THE MLOPS Paper as an MLOps Startup CEO

Create high-quality datasets with Amazon SageMaker Ground Truth and FiftyOne

Use IP-restricted presigned URLs to enhance security in Amazon SageMaker Ground Truth

How to Save Trained Model in Python

Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities

Customized model monitoring for near real-time batch inference with Amazon SageMaker

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

Build an Amazon SageMaker Model Registry approval and promotion workflow with human intervention

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

How to Build Machine Learning Systems With a Feature Store

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

Scale and simplify ML workload monitoring on Amazon EKS with AWS Neuron Monitor container

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

CMU Researchers Introduce Zeno: A Framework for Behavioral Evaluation of Machine Learning (ML) Models

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

How Forethought saves over 66% in costs for generative AI models using Amazon SageMaker

Use Amazon SageMaker Model Card sharing to improve model governance

Bundesliga Match Fact Ball Recovery Time: Quantifying teams’ success in pressing opponents on AWS

Logging PyMC and Arviz Artifacts on Neptune

Exploring Generative AI in conversational experiences: An Introduction with Amazon Lex, Langchain, and SageMaker Jumpstart

Accelerate pre-training of Mistral’s Mathstral model with highly resilient clusters on Amazon SageMaker HyperPod

How to Build a CI/CD MLOps Pipeline [Case Study]

Use Amazon SageMaker Model Cards sharing to improve model governance

How Did We Get to ML Model Reproducibility

Learnings From Building the ML Platform at Stitch Fix

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

Learnings From Building the ML Platform at Mailchimp

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Stay Connected