Data Science, Metadata and ML - Artificial Intelligence Zone

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

Flipboard

NOVEMBER 22, 2024

This post is part of an ongoing series about governing the machine learning (ML) lifecycle at scale. This post dives deep into how to set up data governance at scale using Amazon DataZone for the data mesh. The data mesh is a modern approach to data management that decentralizes data ownership and treats data as a product.

ML

ML Data Science Metadata DevOps

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

AWS Machine Learning Blog

OCTOBER 16, 2024

With a growing library of long-form video content, DPG Media recognizes the importance of efficiently managing and enhancing video metadata such as actor information, genre, summary of episodes, the mood of the video, and more. Video data analysis with AI wasn’t required for generating detailed, accurate, and high-quality metadata.

Metadata

Metadata Automation Generative AI AI

Neptune.ai?—?A Metadata Store for MLOps

Analytics Vidhya

JANUARY 27, 2022

This article was published as a part of the Data Science Blogathon. A centralized location for research and production teams to govern models and experiments by storing metadata throughout the ML model lifecycle. A Metadata Store for MLOps appeared first on Analytics Vidhya. Keeping track of […].

Metadata

Metadata Machine Learning Data Science ML

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams

Towards AI

AUGUST 7, 2024

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams Photo by Parabol | The Agile Meeting Toolbox on Unsplash In this article, we will explore the essential VS Code extensions that enhance productivity and collaboration for data scientists and machine learning (ML) engineers.

Data Science

Data Science ML ML Engineer Data Scientist

Unstructured data management and governance using AWS AI/ML and analytics services

Flipboard

OCTOBER 25, 2023

After decades of digitizing everything in your enterprise, you may have an enormous amount of data, but with dormant value. However, with the help of AI and machine learning (ML), new software tools are now available to unearth the value of unstructured data. The solution integrates data in three tiers.

ML

ML Metadata Data Extraction AI

Achieve your AI goals with an open data lakehouse approach

IBM Journey to AI blog

OCTOBER 4, 2023

A data lakehouse architecture combines the performance of data warehouses with the flexibility of data lakes, to address the challenges of today’s complex data landscape and scale AI. New insights and relationships are found in this combination. All of this supports the use of AI.

Metadata

Metadata AI Strategy Data Scientist Big Data

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. Recent developments in generative AI models have further sped up the need of ML adoption across industries.

ML

ML Data Scientist ML Engineer Data Science

Bring light to the black box

IBM Journey to AI blog

MAY 9, 2023

It drives an AI governance solution without the excessive costs of switching from your current data science platform. The resulting automation drives scalability and accountability by capturing model development time and metadata, offering post-deployment model monitoring, and allowing for customized workflows.

Metadata

Metadata Explainability Automation Responsible AI

The Role of DevSecOps in Ensuring Data Privacy and Security in Data Science Projects

ODSC - Open Data Science

APRIL 17, 2023

Source Purpose of Using DevSecOps in Traditional and ML Applications The DevSecOps practices are different in traditional and ML applications as each comes with different challenges. The characteristics which we saw for DevSecOps for traditional applications also apply to ML-based applications.

Data Science

Data Science DevOps Deep Learning Automation

Customized model monitoring for near real-time batch inference with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 28, 2024

Real-world applications vary in inference requirements for their artificial intelligence and machine learning (AI/ML) solutions to optimize performance and reduce costs. SageMaker Model Monitor monitors the quality of SageMaker ML models in production. Your client applications invoke this endpoint to get inferences from the model.

ML

ML Metadata Data Scientist DevOps

Build a dynamic, role-based AI agent using Amazon Bedrock inline agents

AWS Machine Learning Blog

FEBRUARY 13, 2025

A/B testing and experimentation Data science teams can systematically evaluate different model-tool combinations, measure performance metrics, and analyze response patterns in controlled environments. The role information is also used to configure metadata filtering in the knowledge bases to generate relevant responses.

Metadata

Metadata Generative AI ML AI

Microsoft Azure OpenAI Service and DataRobot Modernize Data Science Work with Cutting-Edge Technology Innovations

DataRobot Blog

MARCH 16, 2023

Traditionally, developing appropriate data science code and interpreting the results to solve a use-case is manually done by data scientists. The integration allows you to generate intelligent data science code that reflects your use case. Data scientists still need to review and evaluate these results.

Data Science

Data Science OpenAI Data Scientist Large Language Models

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. An AI governance framework ensures the ethical, responsible and transparent use of AI and machine learning (ML). Capture and document model metadata for report generation.

Metadata

Metadata Explainability Automation Explainable AI

Visier’s data science team boosts their model output 10 times by migrating to Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 3, 2024

Users without data science or analytics experience can generate rigorous data-backed predictions to answer big questions like time-to-fill for important positions, or resignation risk for crucial employees. The data science team couldn’t roll out changes independently to production.

Data Science

Data Science Machine Learning Data Scientist ML

Introducing watsonx: The future of AI for business

IBM Journey to AI blog

MAY 9, 2023

After some impressive advances over the past decade, largely thanks to the techniques of Machine Learning (ML) and Deep Learning , the technology seems to have taken a sudden leap forward. Through workload optimization an organization can reduce data warehouse costs by up to 50 percent by augmenting with this solution. [1] But why now?

Data Scientist

Data Scientist Machine Learning Automation Metadata

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

We recently announced the general availability of cross-account sharing of Amazon SageMaker Model Registry using AWS Resource Access Manager (AWS RAM) , making it easier to securely share and discover machine learning (ML) models across your AWS accounts.

ML

ML Machine Learning Auto-complete Auto-classification

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. foundation models to help users discover, augment, and enrich data with natural language. IBM watsonx.ai Later this year, it will leverage watsonx.ai

Machine Learning

Machine Learning Metadata Automation AI

Build a multi-tenant generative AI environment for your enterprise on AWS

AWS Machine Learning Blog

NOVEMBER 7, 2024

SageMaker JumpStart is a machine learning (ML) hub that provides a wide range of publicly available and proprietary FMs from providers such as AI21 Labs, Cohere, Hugging Face, Meta, and Stability AI, which you can deploy to SageMaker endpoints in your own AWS account. It’s serverless so you don’t have to manage the infrastructure.

Generative AI

Generative AI Machine Learning AI AI

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Structured Query Language (SQL) is a complex language that requires an understanding of databases and metadata. The solution in this post aims to bring enterprise analytics operations to the next level by shortening the path to your data using natural language. Today, generative AI can enable people without SQL knowledge.

Metadata

Metadata Generative AI LLM NLP

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

AWS Machine Learning Blog

AUGUST 22, 2024

You can use Amazon SageMaker Model Building Pipelines to collaborate between multiple AI/ML teams. SageMaker Pipelines You can use SageMaker Pipelines to define and orchestrate the various steps involved in the ML lifecycle, such as data preprocessing, model training, evaluation, and deployment.

Generative AI

Generative AI Metadata Python ML

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

AWS Machine Learning Blog

MARCH 18, 2025

SQL is one of the key languages widely used across businesses, and it requires an understanding of databases and table metadata. Embedding is usually performed by a machine learning (ML) model. Streamlit This open source Python library makes it straightforward to create and share beautiful, custom web apps for ML and data science.

LLM

LLM Metadata Large Language Models Python

Best practices for Amazon SageMaker HyperPod task governance

AWS Machine Learning Blog

FEBRUARY 19, 2025

In this post, we provide best practices to maximize the value of SageMaker HyperPod task governance and make the administration and data science experiences seamless. Access control When working with SageMaker HyperPod task governance, data scientists will assume their specific role.

Data Scientist

Data Scientist Data Science ML Engineer Generative AI

Dynamic video content moderation and policy evaluation using AWS generative AI services

AWS Machine Learning Blog

MAY 30, 2024

Each frame will be analyzed using Amazon Rekognition and Amazon Bedrock for metadata extraction. Policy evaluation – Using the extracted metadata from the video, the system conducts LLM evaluation. An Amazon OpenSearch Service cluster stores the extracted video metadata and facilitates users’ search and discovery needs.

Generative AI

Generative AI Metadata ML AI

Time series forecasting with LLM-based foundation models and scalable AIOps on AWS

AWS Machine Learning Blog

MARCH 5, 2025

It stores models, organizes model versions, captures essential metadata and artifacts such as container images, and governs the approval status of each model. He solves complex organizational and technical challenges using data science and engineering. In addition, he builds and deploys AI/ML models on the AWS Cloud.

LLM

LLM Machine Learning Natural Language Processing Computer Vision

MLOps Helps Mitigate the Unforeseen in AI Projects

DataRobot Blog

SEPTEMBER 1, 2022

IDC 2 predicts that by 2024, 60% of enterprises would have operationalized their ML workflows by using MLOps. The same is true for your ML workflows – you need the ability to navigate change and make strong business decisions. These and many other questions are now on top of the agenda of every data science team.

Data Drift

Data Drift Data Science AI AI

Machine Learning Engineering in the Real World

ODSC - Open Data Science

SEPTEMBER 21, 2023

Secondly, to be a successful ML engineer in the real world, you cannot just understand the technology; you must understand the business. Some typical examples are given in the following table, along with some discussion as to whether or not ML would be an appropriate tool for solving the problem: Figure 1.1:

Machine Learning

Machine Learning ML Engineer ML Data Science

Build a multi-interface AI assistant using Amazon Q and Slack with Amazon CloudFront clickable references from an Amazon S3 bucket

AWS Machine Learning Blog

FEBRUARY 5, 2025

Amazon Kendra also supports the use of metadata for each source file, which enables both UIs to provide a link to its sources, whether it is the Spack documentation website or a CloudFront link. Furthermore, Amazon Kendra supports relevance tuning , enabling boosting certain data sources.

Data Ingestion

Data Ingestion AI AI Metadata

How Twitch used agentic workflow with RAG on Amazon Bedrock to supercharge ad sales

Flipboard

DECEMBER 13, 2024

We discuss the solution components to build a multimodal knowledge base, drive agentic workflow, use metadata to address hallucinations, and also share the lessons learned through the solution development using multiple large language models (LLMs) and Amazon Bedrock Knowledge Bases. Yunfei Bai is a Principal Solutions Architect at AWS.

Metadata

Metadata LLM Software Engineer Chatbots

A Beginner’s Guide to Data Warehousing

Unite.AI

DECEMBER 5, 2023

In BI systems, data warehousing first converts disparate raw data into clean, organized, and integrated data, which is then used to extract actionable insights to facilitate analysis, reporting, and data-informed decision-making. The pipeline ensures correct, complete, and consistent data.

Metadata

Metadata Big Data ETL Data Mining

How Games24x7 transformed their retraining MLOps pipelines with Amazon SageMaker

AWS Machine Learning Blog

APRIL 12, 2023

With “Science of Gaming” as their core philosophy, they have enabled a vision of end-to-end informatics around game dynamics, game platforms, and players by consolidating orthogonal research directions of game AI, game data science, and game user research. The already existing solution through Step Functions had limitations.

Metadata

Metadata Deep Learning ML Data Science

Data4ML Preparation Guidelines (Beyond The Basics)

Towards AI

NOVEMBER 8, 2024

Data preparation isn’t just a part of the ML engineering process — it’s the heart of it. Photo by Myriam Jessier on Unsplash To set the stage, let’s examine the nuances between research-phase data and production-phase data. Data is a key differentiator in ML projects (more on this in my blog post below).

Data Ingestion

Data Ingestion Metadata ML Engineer ML

First ODSC Europe 2023 Sessions Announced

ODSC - Open Data Science

MARCH 27, 2023

Learn about the flow, difficulties, and tools for performing ML clustering at scale Ori Nakar | Principal Engineer, Threat Research | Imperva Given that there are billions of daily botnet attacks from millions of different IPs, the most difficult challenge of botnet detection is choosing the most relevant data.

Machine Learning

Machine Learning Data Science Deep Learning Data Ingestion

Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 2

AWS Machine Learning Blog

APRIL 19, 2024

We add additional metadata fields to perform rich search queries using OpenSearch’s powerful search capabilities. The OSI pipeline ingests the data as documents into an OpenSearch Serverless index. The sample deck has 31 slides, therefore we generate 31 sets of vector embeddings, each with 1536 dimensions.

Metadata

Metadata ML Machine Learning Natural Language Processing

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications. For information about model pricing, refer to Amazon Bedrock pricing.

Metadata

Metadata LLM Prompt Engineering Prompt Engineer

How to Automate ML Experiment Management With CI/CD

The MLOps Blog

JUNE 4, 2024

TL;DR Using CI/CD workflows to run ML experiments ensures their reproducibility, as all the required information has to be contained under version control. The compute resources offered by GitHub Actions directly are not suitable for larger-scale ML workloads. ML experiments are, by nature, full of uncertainty and surprises.

ML

ML Automation Machine Learning Metadata

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

AWS Machine Learning Blog

AUGUST 4, 2023

If you are a returning user to SageMaker Studio, in order to ensure Salesforce Data Cloud is enabled, upgrade to the latest Jupyter and SageMaker Data Wrangler kernels. This completes the setup to enable data access from Salesforce Data Cloud to SageMaker Studio to build AI and machine learning (ML) models.

ML

ML Categorization AI AI

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

Utilizing data streamed through LnW Connect, L&W aims to create better gaming experience for their end-users as well as bring more value to their casino customers. Predictive maintenance is a common ML use case for businesses with physical equipment or machinery assets.

Neural Network

Neural Network Metadata ML Machine Learning

Collaborate Smarter, Not Harder: Comet’s Integrations for Effective ML Project Management

Heartbeat

JUNE 5, 2023

However, managing machine learning projects can be challenging, especially as the size and complexity of the data and models increase. Without proper tracking, optimization, and collaboration tools, ML practitioners can quickly become overwhelmed and lose track of their progress. This is where Comet comes in.

ML

ML Machine Learning Natural Language Processing Data Scientist

Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 1

AWS Machine Learning Blog

JANUARY 30, 2024

We add additional metadata fields to these generated vector embeddings and create a JSON file. These additional metadata fields can be used to perform rich search queries using OpenSearch’s powerful search capabilities. json", "metadata": { "slide_filename": "mypowerpoint1.pptx", See the following code: session = boto3.Session()

Metadata

Metadata ML Large Language Models Machine Learning

Top Tools for Machine Learning (ML) Experiment Tracking and Management (2023)

Marktechpost

JULY 14, 2023

Experiment tracking in machine learning is the practice of preserving all pertinent data for each experiment you conduct. Experiment tracking is implemented by ML teams in a variety of ways, including using spreadsheets, GitHub, or in-house platforms. Major ML and DL libraries like TensorFlow, Keras, or Scikit-learn are also supported.

Machine Learning

Machine Learning ML Data Scientist Metadata

Why is Git Not the Best for ML Model Version Control

The MLOps Blog

NOVEMBER 30, 2022

These days enterprises are sitting on a pool of data and increasingly employing machine learning and deep learning algorithms to forecast sales, predict customer churn and fraud detection, etc., Data science practitioners experiment with algorithms, data, and hyperparameters to develop a model that generates business insights.

ML

ML Metadata Machine Learning Software Development

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

AWS Machine Learning Blog

APRIL 5, 2023

Amazon SageMaker Serverless Inference is a purpose-built inference service that makes it easy to deploy and scale machine learning (ML) models. Data overview and preparation You can use a SageMaker Studio notebook with a Python 3 (Data Science) kernel to run the sample code. We use the first metadata file in this demo.

Metadata

Metadata Neural Network ML Python

Generating fashion product descriptions by fine-tuning a vision-language model with SageMaker and Amazon Bedrock

AWS Machine Learning Blog

MAY 22, 2024

Using machine learning (ML) and natural language processing (NLP) to automate product description generation has the potential to save manual effort and transform the way ecommerce platforms operate. medium instance and the Data Science 3.0 jpg and the complete metadata from styles/38642.json.

Generative AI

Generative AI Machine Learning Natural Language Processing Large Language Models

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

Webinars

Trending Sources

Neptune.ai?—?A Metadata Store for MLOps

Webinars

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams

Unstructured data management and governance using AWS AI/ML and analytics services

Achieve your AI goals with an open data lakehouse approach

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

Bring light to the black box

The Role of DevSecOps in Ensuring Data Privacy and Security in Data Science Projects

Customized model monitoring for near real-time batch inference with Amazon SageMaker

Build a dynamic, role-based AI agent using Amazon Bedrock inline agents

Microsoft Azure OpenAI Service and DataRobot Modernize Data Science Work with Cutting-Edge Technology Innovations

How to use foundation models and trusted governance to manage AI workflow risk

Visier’s data science team boosts their model output 10 times by migrating to Amazon SageMaker

Introducing watsonx: The future of AI for business

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

MLOps Landscape in 2023: Top Tools and Platforms

Exploring the AI and data capabilities of watsonx

Build a multi-tenant generative AI environment for your enterprise on AWS

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

Best practices for Amazon SageMaker HyperPod task governance

Dynamic video content moderation and policy evaluation using AWS generative AI services

Time series forecasting with LLM-based foundation models and scalable AIOps on AWS

MLOps Helps Mitigate the Unforeseen in AI Projects

Machine Learning Engineering in the Real World

Build a multi-interface AI assistant using Amazon Q and Slack with Amazon CloudFront clickable references from an Amazon S3 bucket

How Twitch used agentic workflow with RAG on Amazon Bedrock to supercharge ad sales

A Beginner’s Guide to Data Warehousing

How Games24x7 transformed their retraining MLOps pipelines with Amazon SageMaker

Data4ML Preparation Guidelines (Beyond The Basics)

First ODSC Europe 2023 Sessions Announced

Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 2

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

How to Automate ML Experiment Management With CI/CD

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

Collaborate Smarter, Not Harder: Comet’s Integrations for Effective ML Project Management

Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 1

Top Tools for Machine Learning (ML) Experiment Tracking and Management (2023)

Why is Git Not the Best for ML Model Version Control

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

Generating fashion product descriptions by fine-tuning a vision-language model with SageMaker and Amazon Bedrock

Stay Connected