Metadata and ML Engineer - Artificial Intelligence Zone

Customized model monitoring for near real-time batch inference with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 28, 2024

You can use this framework as a starting point to monitor your custom metrics or handle other unique requirements for model quality monitoring in your AI/ML applications. Data Scientist at AWS, bringing a breadth of data science, ML engineering, MLOps, and AI/ML architecting to help businesses create scalable solutions on AWS.

ML

ML Metadata Data Scientist DevOps

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams

Towards AI

AUGUST 7, 2024

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams Photo by Parabol | The Agile Meeting Toolbox on Unsplash In this article, we will explore the essential VS Code extensions that enhance productivity and collaboration for data scientists and machine learning (ML) engineers.

Data Science

Data Science ML ML Engineer Data Scientist

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Structured Query Language (SQL) is a complex language that requires an understanding of databases and metadata. Third, despite the larger adoption of centralized analytics solutions like data lakes and warehouses, complexity rises with different table names and other metadata that is required to create the SQL for the desired sources.

Metadata

Metadata Generative AI LLM NLP

Webinars

The Intersection of AI and Sales: Personalization Without Compromise

How to Achieve High-Accuracy Results When Using LLMs

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

MORE WEBINARS

Data4ML Preparation Guidelines (Beyond The Basics)

Towards AI

NOVEMBER 8, 2024

Data preparation isn’t just a part of the ML engineering process — it’s the heart of it. This step, often done with data engineers, ensures a reproducible data snapshot from sources like production databases or APIs. This member-only story is on us. Upgrade to access all of Medium.

Data Ingestion

Data Ingestion Metadata ML Engineer ML

Deploy Amazon SageMaker pipelines using AWS Controllers for Kubernetes

AWS Machine Learning Blog

SEPTEMBER 4, 2024

In this post, we introduce an example to help DevOps engineers manage the entire ML lifecycle—including training and inference—using the same toolkit. Solution overview We consider a use case in which an ML engineer configures a SageMaker model building pipeline using a Jupyter notebook.

DevOps

DevOps ML Engineer ML Metadata

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

AWS Machine Learning Blog

JANUARY 28, 2025

FMEval is an open source LLM evaluation library, designed to provide data scientists and machine learning (ML) engineers with a code-first experience to evaluate LLMs for various aspects, including accuracy, toxicity, fairness, robustness, and efficiency. This allows you to keep track of your ML experiments.

LLM

LLM Large Language Models ML Algorithm

Top Artificial Intelligence AI Courses from Google

Marktechpost

MAY 30, 2024

Introduction to AI and Machine Learning on Google Cloud This course introduces Google Cloud’s AI and ML offerings for predictive and generative projects, covering technologies, products, and tools across the data-to-AI lifecycle.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence BERT Computer Vision

From concept to reality: Navigating the Journey of RAG from proof of concept to production

AWS Machine Learning Blog

FEBRUARY 12, 2025

Machine learning (ML) engineers must make trade-offs and prioritize the most important factors for their specific use case and business requirements. You can use metadata filtering to narrow down search results by specifying inclusion and exclusion criteria.

Auto-classification

Auto-classification Metadata Generative AI Machine Learning

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

AWS Machine Learning Blog

AUGUST 22, 2024

It automatically keeps track of model artifacts, hyperparameters, and metadata, helping you to reproduce and audit model versions. The SageMaker Pipelines decorator feature helps convert local ML code written as a Python program into one or more pipeline steps. SageMaker Pipelines can handle model versioning and lineage tracking.

Generative AI

Generative AI Metadata Python ML

Driving advanced analytics outcomes at scale using Amazon SageMaker powered PwC’s Machine Learning Ops Accelerator

AWS Machine Learning Blog

DECEMBER 19, 2023

Artificial intelligence (AI) and machine learning (ML) are becoming an integral part of systems and processes, enabling decisions in real time, thereby driving top and bottom-line improvements across organizations. However, putting an ML model into production at scale is challenging and requires a set of best practices.

Machine Learning

Machine Learning ML Engineer DevOps ML

Best practices for Amazon SageMaker HyperPod task governance

AWS Machine Learning Blog

FEBRUARY 19, 2025

In this example, the ML engineering team is borrowing 5 GPUs for their training task With SageMaker HyperPod, you can additionally set up observability tools of your choice. metadata: name: job-name namespace: hyperpod-ns-researchers labels: kueue.x-k8s.io/queue-name: queue-name: hyperpod-ns-researchers-localqueue kueue.x-k8s.io/priority-class:

Data Scientist

Data Scientist Data Science ML Engineer Generative AI

Airbnb Researchers Develop Chronon: A Framework for Developing Production-Grade Features for Machine Learning Models

Marktechpost

AUGUST 8, 2023

In the ever-evolving landscape of machine learning, feature management has emerged as a key pain point for ML Engineers at Airbnb. A Seamless Integration for Airbnb’s ML Practitioners Chronon has proven to be a game-changer for Airbnb’s ML practitioners.

Machine Learning

Machine Learning ML Engineer Data Ingestion ML

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

An ML engineer deploys the model pipeline into the ML team test environment using a shared services CI/CD process. After stakeholder validation, the ML model is deployed to the team’s production environment. ML operations This module helps LOBs and ML engineers work on their dev instances of the model deployment template.

ML

ML Data Scientist ML Engineer Data Science

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

When thinking about a tool for metadata storage and management, you should consider: General business-related items : Pricing model, security, and support. When thinking about a tool for metadata storage and management, you should consider: General business-related items : Pricing model, security, and support. Can you compare images?

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

Top Large Language Models LLMs Courses

Marktechpost

JULY 25, 2024

It is ideal for ML engineers, data scientists, and technical leaders, providing real-world training for production-ready generative AI using Amazon Bedrock and cloud-native services.

Large Language Models

Large Language Models Prompt Engineering Prompt Engineer Chatbots

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

Let’s demystify this using the following personas and a real-world analogy: Data and ML engineers (owners and producers) – They lay the groundwork by feeding data into the feature store Data scientists (consumers) – They extract and utilize this data to craft their models Data engineers serve as architects sketching the initial blueprint.

ML

ML Machine Learning ML Engineer Data Scientist

Machine Learning Engineering in the Real World

ODSC - Open Data Science

SEPTEMBER 21, 2023

Secondly, to be a successful ML engineer in the real world, you cannot just understand the technology; you must understand the business. We should start by considering the broad elements that should constitute any ML solution, as indicated in the following diagram: Figure 1.2:

Machine Learning

Machine Learning ML Engineer ML Data Science

MLOps Is an Extension of DevOps. Not a Fork — My Thoughts on THE MLOPS Paper as an MLOps Startup CEO

The MLOps Blog

JANUARY 23, 2023

Came to ML from software. Founded neptune.ai , a modular MLOps component for ML metadata store , aka “experiment tracker + model registry”. Most of our customers are doing ML/MLOps at a reasonable scale, NOT at the hyperscale of big-tech FAANG companies. . – How about the ML engineer? Let me explain.

DevOps

DevOps Metadata Software Engineer Data Scientist

Set up Amazon SageMaker Studio with Jupyter Lab 3 using the AWS CDK

AWS Machine Learning Blog

JANUARY 17, 2023

This post guides you through the steps to get started with setting up and deploying Studio to standardize ML model development and collaboration with fellow ML engineers and ML scientists. cdk.json – Contains metadata, and feature flags. Marcelo Aberle is an ML Engineer in the AWS AI organization.

Software Engineer

Software Engineer ML Engineer ML Machine Learning

How to Save Trained Model in Python

The MLOps Blog

MAY 10, 2023

Finally, you can store the model and other metadata information using the INSERT INTO command. Storing ML models in model registry A model registry is a central repository that can store, version, and manage machine learning models. It typically includes features like model versioning , metadata control, comparing model runs, etc.

Python

Python Metadata ML Machine Learning

Learn how Amazon Ads created a generative AI-powered image generation capability using Amazon SageMaker

AWS Machine Learning Blog

MAY 15, 2024

Here, Amazon SageMaker Ground Truth allowed ML engineers to easily build the human-in-the-loop workflow (step v). The image is then uploaded into an Amazon Simple Storage Services (Amazon S3) bucket for images and the metadata about the image is stored in an Amazon DynamoDB table (step 6).

Generative AI

Generative AI AI AI Machine Learning

Build an Amazon SageMaker Model Registry approval and promotion workflow with human intervention

AWS Machine Learning Blog

JANUARY 10, 2024

Specialist Data Engineering at Merck, and Prabakaran Mathaiyan, Sr. ML Engineer at Tiger Analytics. The large machine learning (ML) model development lifecycle requires a scalable model release process similar to that of software development. This post is co-written with Jayadeep Pabbisetty, Sr.

ML

ML Machine Learning Data Scientist ETL

First ODSC Europe 2023 Sessions Announced

ODSC - Open Data Science

MARCH 27, 2023

ML Governance: A Lean Approach Ryan Dawson | Principal Data Engineer | Thoughtworks Meissane Chami | Senior ML Engineer | Thoughtworks During this session, you’ll discuss the day-to-day realities of ML Governance. Some of the questions you’ll explore include How much documentation is appropriate?

Machine Learning

Machine Learning Data Science Data Ingestion Deep Learning

How Earth.com and Provectus implemented their MLOps Infrastructure with Amazon SageMaker

AWS Machine Learning Blog

JUNE 27, 2023

Earth.com didn’t have an in-house ML engineering team, which made it hard to add new datasets featuring new species, release and improve new models, and scale their disjointed ML system. We initiated a series of enhancements to deliver managed MLOps platform and augment ML engineering.

DevOps

DevOps ML Machine Learning ML Engineer

How to Build Machine Learning Systems With a Feature Store

The MLOps Blog

JANUARY 26, 2024

We’ll see how this architecture applies to different classes of ML systems, discuss MLOps and testing aspects, and look at some example implementations. Understanding machine learning pipelines Machine learning (ML) pipelines are a key component of ML systems. But what is an ML pipeline?

Machine Learning

Machine Learning Metadata ML Python

Create high-quality datasets with Amazon SageMaker Ground Truth and FiftyOne

AWS Machine Learning Blog

MAY 5, 2023

Solution overview Ground Truth is a fully self-served and managed data labeling service that empowers data scientists, machine learning (ML) engineers, and researchers to build high-quality datasets. For our example use case, we work with the Fashion200K dataset , released at ICCV 2017.

Metadata

Metadata Computer Vision Machine Learning Data Scientist

Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

SEPTEMBER 29, 2023

Planet and AWS’s partnership on geospatial ML SageMaker geospatial capabilities empower data scientists and ML engineers to build, train, and deploy models using geospatial data. It also contains each scene’s metadata, its image ID, and a preview image reference.

Machine Learning

Machine Learning Data Scientist ML Python

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

AWS Machine Learning Blog

JULY 24, 2024

Fine-tuning an LLM can be a complex workflow for data scientists and machine learning (ML) engineers to operationalize. By logging your datasets with MLflow, you can store metadata, such as dataset descriptions, version numbers, and data statistics, alongside your MLflow runs.

LLM

LLM ML Generative AI Machine Learning

Introducing the Topic Tracks for ODSC East 2025: Spotlight on Gen AI, AI Agents, LLMs, & More

ODSC - Open Data Science

FEBRUARY 25, 2025

Topics Include: Agentic AI DesignPatterns LLMs & RAG forAgents Agent Architectures &Chaining Evaluating AI Agent Performance Building with LangChain and LlamaIndex Real-World Applications of Autonomous Agents Who Should Attend: Data Scientists, Developers, AI Architects, and ML Engineers seeking to build cutting-edge autonomous systems.

Data Scientist

Data Scientist Machine Learning Large Language Models ML Engineer

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

AWS Machine Learning Blog

MAY 30, 2024

Additionally, VitechIQ includes metadata from the vector database (for example, document URLs) in the model’s output, providing users with source attribution and enhancing trust in the generated answers. Prompt engineering Prompt engineering is crucial for the knowledge retrieval system.

Chatbots

Chatbots Prompt Engineer Prompt Engineering Large Language Models

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 2, 2024

Additionally, you can enable model invocation logging to collect invocation logs, full request response data, and metadata for all Amazon Bedrock model API invocations in your AWS account. Leveraging her expertise in Computer Vision and Deep Learning, she empowers customers to harness the power of the ML in AWS cloud efficiently.

Generative AI

Generative AI Data Ingestion AI AI

Scale and simplify ML workload monitoring on Amazon EKS with AWS Neuron Monitor container

AWS Machine Learning Blog

JUNE 25, 2024

He has experience with multiple programming languages, AWS cloud services, AI/ML technologies, product and operations management, pre and early seed start-up ventures, and corporate finance. Geeta Gharpure is a senior software developer on the Annapurna ML engineering team. os operator: In values: - linux - key: node.kubernetes.io/instance-type

ML

ML Metadata Software Development Generative AI

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 16, 2023

Amazon SageMaker provides purpose-built tools for machine learning operations (MLOps) to help automate and standardize processes across the ML lifecycle. In this post, we describe how Philips partnered with AWS to develop AI ToolSuite—a scalable, secure, and compliant ML platform on SageMaker.

Data Scientist

Data Scientist ML Data Science Automation

Revolutionizing clinical trials with the power of voice and AI

AWS Machine Learning Blog

MARCH 18, 2025

You will notice the content of this file as JSON with a text transcript available under the key transcripts, along with other metadata. Rushabh Lokhande is a Senior Data & ML Engineer with AWS Professional Services Analytics Practice. You can download a sample file and review the contents.

LLM

LLM NLP Data Integration AI

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

AWS Machine Learning Blog

DECEMBER 13, 2023

ML operations, known as MLOps, focus on streamlining, automating, and monitoring ML models throughout their lifecycle. Data scientists, ML engineers, IT staff, and DevOps teams must work together to operationalize models from research to deployment and maintenance.

ML

ML Automation Metadata Software Development

Use IP-restricted presigned URLs to enhance security in Amazon SageMaker Ground Truth

AWS Machine Learning Blog

AUGUST 20, 2024

You can call the SageMaker ListWorkteams or DescribeWorkteam APIs to view workteams’ metadata, including the WorkerAccessConfiguration. Abhinay Sandeboina is a Engineering Manager at AWS Human In The Loop (HIL). He has been in AWS for over 2 years and his teams are responsible for managing ML platform services.

Software Engineer

Software Engineer ML Machine Learning Metadata

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

Solution overview The ML solution for LTV forecasting is composed of four components: the training dataset ETL pipeline, MLOps pipeline, inference dataset ETL pipeline, and ML batch inference. ML engineers no longer need to manage this training metadata separately.

Automation

Automation ETL Data Drift ML

CMU Researchers Introduce Zeno: A Framework for Behavioral Evaluation of Machine Learning (ML) Models

Marktechpost

JULY 19, 2023

Stakeholders such as ML engineers, designers, and domain experts must work together to identify a model’s expected and potential faults. Instead, ML engineers collaborate with domain experts and designers to describe a model’s expected capabilities before it is iterated and deployed.

Machine Learning

Machine Learning ML Python Metadata

Use Amazon SageMaker Model Card sharing to improve model governance

AWS Machine Learning Blog

AUGUST 31, 2023

Model cards are intended to be a single source of truth for business and technical metadata about the model that can reliably be used for auditing and documentation purposes. They provide a fact sheet of the model that is important for model governance.

ML

ML Data Scientist Machine Learning Data Science

Bundesliga Match Fact Ball Recovery Time: Quantifying teams’ success in pressing opponents on AWS

AWS Machine Learning Blog

MARCH 30, 2023

Metadata of the match is processed within the AWS Lambda function MetaDataIngestion , while positional data is ingested using the AWS Fargate container called MatchLink. Fotinos Kyriakides is an ML Engineer with AWS Professional Services. The following diagram illustrates the end-to-end workflow for Ball Recovery Time.

Machine Learning

Machine Learning Data Scientist Data Science Metadata

The Sequence Chat: Emmanuel Turlay – CEO, Sematic

TheSequence

JULY 12, 2023

At Cruise, we noticed a wide gap between the complexity of cloud infrastructure, and the needs of the ML workforce. ML Engineers want to focus on writing Python logic, and visualizing the impact of their changes quickly. Could you please tell us about the vision and inspiration behind this project?

ML

ML Python Machine Learning Metadata

How Forethought saves over 66% in costs for generative AI models using Amazon SageMaker

AWS Machine Learning Blog

JUNE 13, 2023

This post is co-written with Jad Chamoun, Director of Engineering at Forethought Technologies, Inc. and Salina Wu, Senior ML Engineer at Forethought Technologies, Inc. We defined logic that would take in model metadata, format the endpoint deterministically based on the metadata, and check whether the endpoint existed.

Generative AI

Generative AI Auto-complete AI Modeling Machine Learning

Logging PyMC and Arviz Artifacts on Neptune

The MLOps Blog

JANUARY 24, 2024

help data scientists systematically record, catalog, and analyze modeling artifacts and experiment metadata. is an experiment tracker for ML teams that struggle with debugging and reproducing experiments, sharing results, and messy model handover. Experiment trackers like neptune.ai Even though neptune.ai Aside neptune.ai

Metadata

Metadata Python Data Scientist ML

Exploring Generative AI in conversational experiences: An Introduction with Amazon Lex, Langchain, and SageMaker Jumpstart

AWS Machine Learning Blog

JUNE 8, 2023

A session stores metadata and application-specific data known as session attributes. Ryan Gomes is a Data & ML Engineer with the AWS Professional Services Intelligence Practice. A session persists over time unless manually stopped or timed out. He leads the NYC machine learning and AI meetup.

Generative AI

Generative AI LLM Machine Learning Large Language Models

Customized model monitoring for near real-time batch inference with Amazon SageMaker

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams

Webinars

Trending Sources

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

Webinars

Data4ML Preparation Guidelines (Beyond The Basics)

Deploy Amazon SageMaker pipelines using AWS Controllers for Kubernetes

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

Top Artificial Intelligence AI Courses from Google

From concept to reality: Navigating the Journey of RAG from proof of concept to production

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

Driving advanced analytics outcomes at scale using Amazon SageMaker powered PwC’s Machine Learning Ops Accelerator

Best practices for Amazon SageMaker HyperPod task governance

Airbnb Researchers Develop Chronon: A Framework for Developing Production-Grade Features for Machine Learning Models

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

MLOps Landscape in 2023: Top Tools and Platforms

Top Large Language Models LLMs Courses

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Machine Learning Engineering in the Real World

MLOps Is an Extension of DevOps. Not a Fork — My Thoughts on THE MLOPS Paper as an MLOps Startup CEO

Set up Amazon SageMaker Studio with Jupyter Lab 3 using the AWS CDK

How to Save Trained Model in Python

Learn how Amazon Ads created a generative AI-powered image generation capability using Amazon SageMaker

Build an Amazon SageMaker Model Registry approval and promotion workflow with human intervention

First ODSC Europe 2023 Sessions Announced

How Earth.com and Provectus implemented their MLOps Infrastructure with Amazon SageMaker

How to Build Machine Learning Systems With a Feature Store

Create high-quality datasets with Amazon SageMaker Ground Truth and FiftyOne

Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

Introducing the Topic Tracks for ODSC East 2025: Spotlight on Gen AI, AI Agents, LLMs, & More

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

Scale and simplify ML workload monitoring on Amazon EKS with AWS Neuron Monitor container

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

Revolutionizing clinical trials with the power of voice and AI

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

Use IP-restricted presigned URLs to enhance security in Amazon SageMaker Ground Truth

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

CMU Researchers Introduce Zeno: A Framework for Behavioral Evaluation of Machine Learning (ML) Models

Use Amazon SageMaker Model Card sharing to improve model governance

Bundesliga Match Fact Ball Recovery Time: Quantifying teams’ success in pressing opponents on AWS

The Sequence Chat: Emmanuel Turlay – CEO, Sematic

How Forethought saves over 66% in costs for generative AI models using Amazon SageMaker

Logging PyMC and Arviz Artifacts on Neptune

Exploring Generative AI in conversational experiences: An Introduction with Amazon Lex, Langchain, and SageMaker Jumpstart

Stay Connected