Data Scientist, Metadata and ML - Artificial Intelligence Zone

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

Flipboard

NOVEMBER 22, 2024

This post is part of an ongoing series about governing the machine learning (ML) lifecycle at scale. This post dives deep into how to set up data governance at scale using Amazon DataZone for the data mesh. The data mesh is a modern approach to data management that decentralizes data ownership and treats data as a product.

ML

ML Data Science Metadata DevOps

How to establish lineage transparency for your machine learning initiatives

IBM Journey to AI blog

MAY 20, 2024

Machine learning (ML) has become a critical component of many organizations’ digital transformation strategy. From predicting customer behavior to optimizing business processes, ML algorithms are increasingly being used to make decisions that impact business outcomes.

Machine Learning

Machine Learning Data Scientist ML ETL

Process formulas and charts with Anthropic’s Claude on Amazon Bedrock

AWS Machine Learning Blog

MARCH 21, 2025

This enables the efficient processing of content, including scientific formulas and data visualizations, and the population of Amazon Bedrock Knowledge Bases with appropriate metadata. JupyterLab applications flexible and extensive interface can be used to configure and arrange machine learning (ML) workflows.

Metadata

Metadata Convolutional Neural Networks Generative AI Data Scientist

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Empower your generative AI application with a comprehensive custom observability solution

AWS Machine Learning Blog

OCTOBER 29, 2024

Recently, we’ve been witnessing the rapid development and evolution of generative AI applications, with observability and evaluation emerging as critical aspects for developers, data scientists, and stakeholders.

Generative AI

Generative AI Metadata Data Scientist AI

Access control for vector stores using metadata filtering with Knowledge Bases for Amazon Bedrock

AWS Machine Learning Blog

JULY 2, 2024

With metadata filtering now available in Knowledge Bases for Amazon Bedrock, you can define and use metadata fields to filter the source data used for retrieving relevant context during RAG. This helps improve the relevance and quality of retrieved context while reducing potential hallucinations or noise from irrelevant data.

Metadata

Metadata Generative AI Python Computer Vision

Achieve your AI goals with an open data lakehouse approach

IBM Journey to AI blog

OCTOBER 4, 2023

A data lakehouse architecture combines the performance of data warehouses with the flexibility of data lakes, to address the challenges of today’s complex data landscape and scale AI. New insights and relationships are found in this combination. All of this supports the use of AI.

Metadata

Metadata AI Strategy Data Scientist Big Data

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams

Towards AI

AUGUST 7, 2024

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams Photo by Parabol | The Agile Meeting Toolbox on Unsplash In this article, we will explore the essential VS Code extensions that enhance productivity and collaboration for data scientists and machine learning (ML) engineers.

Data Science

Data Science ML ML Engineer Data Scientist

Automate Amazon Bedrock batch inference: Building a scalable and efficient pipeline

AWS Machine Learning Blog

OCTOBER 29, 2024

It stores information such as job ID, status, creation time, and other metadata. The following is a screenshot of the DynamoDB table where you can track the job status and other types of metadata related to the job. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value.

Automation

Automation Generative AI Metadata Data Scientist

Introducing watsonx: The future of AI for business

IBM Journey to AI blog

MAY 9, 2023

After some impressive advances over the past decade, largely thanks to the techniques of Machine Learning (ML) and Deep Learning , the technology seems to have taken a sudden leap forward. Through workload optimization an organization can reduce data warehouse costs by up to 50 percent by augmenting with this solution. [1] But why now?

Data Scientist

Data Scientist Machine Learning Automation Metadata

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. Recent developments in generative AI models have further sped up the need of ML adoption across industries.

ML

ML Data Scientist ML Engineer Data Science

LightAutoML: AutoML Solution for a Large Financial Services Ecosystem

Unite.AI

JUNE 11, 2024

Although AutoML rose to popularity a few years ago, the ealy work on AutoML dates back to the early 90’s when scientists published the first papers on hyperparameter optimization. It was in 2014 when ICML organized the first AutoML workshop that AutoML gained the attention of ML developers.

Auto-classification

Auto-classification Machine Learning Data Scientist Metadata

Use custom metadata created by Amazon Comprehend to intelligently process insurance claims using Amazon Kendra

AWS Machine Learning Blog

DECEMBER 5, 2023

For instance, according to International Data Corporation (IDC), the world’s data volume is expected to increase tenfold by 2025, with unstructured data accounting for a significant portion. The custom metadata helps organizations and enterprises categorize information in their preferred way.

Metadata

Metadata Auto-classification Auto-complete Content Enrichment

Customized model monitoring for near real-time batch inference with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 28, 2024

Real-world applications vary in inference requirements for their artificial intelligence and machine learning (AI/ML) solutions to optimize performance and reduce costs. SageMaker Model Monitor monitors the quality of SageMaker ML models in production. Your client applications invoke this endpoint to get inferences from the model.

ML

ML Metadata Data Scientist DevOps

Build a dynamic, role-based AI agent using Amazon Bedrock inline agents

AWS Machine Learning Blog

FEBRUARY 13, 2025

Employees and managers see different levels of company policy information, with managers getting additional access to confidential data like performance review and compensation details. The role information is also used to configure metadata filtering in the knowledge bases to generate relevant responses.

Metadata

Metadata Generative AI ML AI

Asure’s approach to enhancing their call center experience using generative AI and Amazon Q in Quicksight

AWS Machine Learning Blog

MARCH 20, 2025

Amazon Bedrock offers fine-tuning capabilities that allow you to customize these pre-trained models using proprietary call transcript data, facilitating high accuracy and relevance without the need for extensive machine learning (ML) expertise. Architecture The following diagram illustrates the solution architecture.

Generative AI

Generative AI Metadata AI AI

Evaluate models or RAG systems using Amazon Bedrock Evaluations – Now generally available

AWS Machine Learning Blog

APRIL 4, 2025

Additionally, for every retrieval result you bring, you can provide a name and additional metadata in the form of key-value pairs. His expertise is in reproducible and end-to-end AI/ML methods, practical implementations, and helping global customers formulate and develop scalable solutions to interdisciplinary problems. are optional.

Generative AI

Generative AI Metadata Python Data Scientist

Integrate SaaS platforms with Amazon SageMaker to enable ML-powered applications

AWS Machine Learning Blog

JULY 6, 2023

Many organizations choose SageMaker as their ML platform because it provides a common set of tools for developers and data scientists. This is usually in a dedicated customer AWS account, meaning there still needs to be cross-account access to the customer AWS account where SageMaker is running.

ML

ML Data Scientist Metadata ETL

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

Best practices for Amazon SageMaker HyperPod task governance

AWS Machine Learning Blog

FEBRUARY 19, 2025

In this example, the ML engineering team is borrowing 5 GPUs for their training task With SageMaker HyperPod, you can additionally set up observability tools of your choice. Data scientist experience Data scientists are the second persona interacting with SageMaker HyperPod clusters.

Data Scientist

Data Scientist Data Science ML Engineer Generative AI

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

We recently announced the general availability of cross-account sharing of Amazon SageMaker Model Registry using AWS Resource Access Manager (AWS RAM) , making it easier to securely share and discover machine learning (ML) models across your AWS accounts.

ML

ML Machine Learning Auto-complete Auto-classification

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

AWS Machine Learning Blog

JANUARY 28, 2025

FMEval is an open source LLM evaluation library, designed to provide data scientists and machine learning (ML) engineers with a code-first experience to evaluate LLMs for various aspects, including accuracy, toxicity, fairness, robustness, and efficiency. This allows you to keep track of your ML experiments.

LLM

LLM Large Language Models ML Algorithm

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

AWS Machine Learning Blog

AUGUST 22, 2024

You can use Amazon SageMaker Model Building Pipelines to collaborate between multiple AI/ML teams. SageMaker Pipelines You can use SageMaker Pipelines to define and orchestrate the various steps involved in the ML lifecycle, such as data preprocessing, model training, evaluation, and deployment.

Generative AI

Generative AI Metadata Python ML

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. Automated development: Automates data preparation, model development, feature engineering and hyperparameter optimization using AutoAI. IBM watsonx.ai

Machine Learning

Machine Learning Metadata Automation AI

Top Artificial Intelligence AI Courses from Google

Marktechpost

MAY 30, 2024

Introduction to AI and Machine Learning on Google Cloud This course introduces Google Cloud’s AI and ML offerings for predictive and generative projects, covering technologies, products, and tools across the data-to-AI lifecycle. It includes labs on feature engineering with BigQuery ML, Keras, and TensorFlow.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence BERT Computer Vision

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

This involves unifying and sharing a single copy of data and metadata across IBM® watsonx.data ™, IBM® Db2 ®, IBM® Db2® Warehouse and IBM® Netezza ®, using native integrations and supporting open formats, all without the need for migration or recataloging.

ETL

ETL Metadata AI AI

Operationalize ML models built in Amazon SageMaker Canvas to production using the Amazon SageMaker Model Registry

AWS Machine Learning Blog

MAY 10, 2023

You can now register machine learning (ML) models built in Amazon SageMaker Canvas with a single click to the Amazon SageMaker Model Registry , enabling you to operationalize ML models in production. By keeping track of model versions, you can easily revert to a previous version if a new version causes issues.

ML

ML Metadata Data Scientist Software Development

9 data governance strategies that will unlock the potential of your business data

IBM Journey to AI blog

SEPTEMBER 5, 2024

Access to high-quality data can help organizations start successful products, defend against digital attacks, understand failures and pivot toward success. Emerging technologies and trends, such as machine learning (ML), artificial intelligence (AI), automation and generative AI (gen AI), all rely on good data quality.

Metadata

Metadata Data Quality Auto-classification DevOps

Five benefits of a data catalog

IBM Journey to AI blog

DECEMBER 16, 2022

An enterprise data catalog does all that a library inventory system does – namely streamlining data discovery and access across data sources – and a lot more. For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance.

Metadata

Metadata Data Quality Data Discovery Data Scientist

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone makes it straightforward for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so they can discover, use, and collaborate to derive data-driven insights.

Machine Learning

Machine Learning Data Scientist ML Data Quality

Advanced tracing and evaluation of generative AI agents using LangChain and Amazon SageMaker AI MLFlow

AWS Machine Learning Blog

APRIL 7, 2025

This post explores how Amazon SageMaker AI with MLflow can help you as a developer and a machine learning (ML) practitioner efficiently experiment, evaluate generative AI agent performance, and optimize their applications for production readiness.

Generative AI

Generative AI AI AI LLM

Transforming financial analysis with CreditAI on Amazon Bedrock: Octus’s journey with AWS

AWS Machine Learning Blog

MARCH 10, 2025

In addition, the Amazon Bedrock Knowledge Bases team worked closely with us to address several critical elements, including expanding embedding limits, managing the metadata limit (250 characters), testing different chunking methods, and syncing throughput to the knowledge base.

DevOps

DevOps Metadata Auto-complete Automation

ML Model Packaging [The Ultimate Guide]

The MLOps Blog

APRIL 5, 2023

In this comprehensive guide, we’ll explore the key concepts, challenges, and best practices for ML model packaging, including the different types of packaging formats, techniques, and frameworks. These teams may include but are not limited to data scientists, software developers, machine learning engineers, and DevOps engineers.

ML

ML Machine Learning Deep Learning Metadata

Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

SEPTEMBER 29, 2023

This guest post is co-written by Lydia Lihui Zhang, Business Development Specialist, and Mansi Shah, Software Engineer/Data Scientist, at Planet Labs. In this post, we illustrate how to use a segmentation machine learning (ML) model to identify crop and non-crop regions in an image.

Machine Learning

Machine Learning Data Scientist ML Python

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. Data engineers use data warehouses, data lakes, and analytics tools to load, transform, clean, and aggregate data.

Big Data Architect

Big Data Architect Big Data ML Generative AI

Microsoft Azure OpenAI Service and DataRobot Modernize Data Science Work with Cutting-Edge Technology Innovations

DataRobot Blog

MARCH 16, 2023

Traditionally, developing appropriate data science code and interpreting the results to solve a use-case is manually done by data scientists. Data scientists still need to review and evaluate these results. Ultimately, users benefit from a transparent, and clear explanation of what ML predictions means to them.

Data Science

Data Science OpenAI Data Scientist Large Language Models

Knowledge Bases for Amazon Bedrock now supports advanced parsing, chunking, and query reformulation giving greater control of accuracy in RAG based applications

AWS Machine Learning Blog

JULY 10, 2024

It involves breaking down the document into its constituent parts, such as text, tables, images, and metadata, and identifying the relationships between these elements. Metadata customization for.csv files Knowledge Bases for Amazon Bedrock now offers an enhanced.csv file processing feature that separates content and metadata.

Metadata

Metadata Generative AI Machine Learning Data Scientist

How the UNDP Independent Evaluation Office is using AWS AI/ML services to enhance the use of evaluation to support progress toward the Sustainable Development Goals

AWS Machine Learning Blog

MARCH 29, 2023

In this post, we discuss how the IEO developed UNDP’s artificial intelligence and machine learning (ML) platform—named Artificial Intelligence for Development Analytics (AIDA)— in collaboration with AWS, UNDP’s Information and Technology Management Team (UNDP ITM), and the United Nations International Computing Centre (UNICC).

ML

ML Metadata Data Ingestion Data Extraction

Driving advanced analytics outcomes at scale using Amazon SageMaker powered PwC’s Machine Learning Ops Accelerator

AWS Machine Learning Blog

DECEMBER 19, 2023

Artificial intelligence (AI) and machine learning (ML) are becoming an integral part of systems and processes, enabling decisions in real time, thereby driving top and bottom-line improvements across organizations. However, putting an ML model into production at scale is challenging and requires a set of best practices.

Machine Learning

Machine Learning ML Engineer DevOps ML

Power recommendations and search using an IMDb knowledge graph – Part 3

AWS Machine Learning Blog

JANUARY 6, 2023

We downloaded the data from AWS Data Exchange and processed it in AWS Glue to generate KG files. In Part 2 , we demonstrated how to use Amazon Neptune ML (in Amazon SageMaker ) to train the KG and create KG embeddings. The KG files were stored in Amazon Simple Storage Service (Amazon S3) and then loaded in Amazon Neptune.

Metadata

Metadata Machine Learning Data Scientist ML

Top MLOps Tools Guide: Weights & Biases, Comet and More

Unite.AI

JUNE 24, 2024

MLOps , or Machine Learning Operations, is a multidisciplinary field that combines the principles of ML, software engineering, and DevOps practices to streamline the deployment, monitoring, and maintenance of ML models in production environments. What is MLOps?

Data Drift

Data Drift Machine Learning Data Scientist ML

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

aws sagemaker create-cluster --cli-input-json file://cluster-config.json --region $AWS_REGION You should be able to see your cluster by navigating to SageMaker Hyperpod in the AWS Management Console and see a cluster named ml-cluster listed. After a few minutes, its status should change from Creating to InService. using the following code.

Deep Learning

Deep Learning Generative AI Python Machine Learning

How to Save Trained Model in Python

The MLOps Blog

MAY 10, 2023

When working on real-world machine learning (ML) use cases, finding the best algorithm/model is not the end of your responsibilities. These practices are needed for a number of reasons: Backup: A trained model can be saved as a backup in case the original data is damaged or destroyed. model.save()) for saving the models.

Python

Python Metadata ML Machine Learning

Train self-supervised vision transformers on overhead imagery with Amazon SageMaker

AWS Machine Learning Blog

AUGUST 16, 2023

Training machine learning (ML) models to interpret this data, however, is bottlenecked by costly and time-consuming human annotation efforts. Additionally, each folder contains a JSON file with the image metadata. A detailed description of the data is provided in the BigEarthNet Guide. tif" --include "_B03.tif"

Metadata

Metadata Data Scientist Generative AI Natural Language Processing

Building ML Platform in Retail and eCommerce

The MLOps Blog

MAY 31, 2023

And eCommerce companies have a ton of use cases where ML can help. The problem is, with more ML models and systems in production, you need to set up more infrastructure to reliably manage everything. And because of that, many companies decide to centralize this effort in an internal ML platform. But how to build it?

ML

ML Algorithm Data Drift Machine Learning

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

How to establish lineage transparency for your machine learning initiatives

Webinars

Trending Sources

Process formulas and charts with Anthropic’s Claude on Amazon Bedrock

Webinars

Empower your generative AI application with a comprehensive custom observability solution

Access control for vector stores using metadata filtering with Knowledge Bases for Amazon Bedrock

Achieve your AI goals with an open data lakehouse approach

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams

Automate Amazon Bedrock batch inference: Building a scalable and efficient pipeline

Introducing watsonx: The future of AI for business

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

LightAutoML: AutoML Solution for a Large Financial Services Ecosystem

Use custom metadata created by Amazon Comprehend to intelligently process insurance claims using Amazon Kendra

Customized model monitoring for near real-time batch inference with Amazon SageMaker

Build a dynamic, role-based AI agent using Amazon Bedrock inline agents

Asure’s approach to enhancing their call center experience using generative AI and Amazon Q in Quicksight

Evaluate models or RAG systems using Amazon Bedrock Evaluations – Now generally available

Integrate SaaS platforms with Amazon SageMaker to enable ML-powered applications

MLOps Landscape in 2023: Top Tools and Platforms

Best practices for Amazon SageMaker HyperPod task governance

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

Exploring the AI and data capabilities of watsonx

Top Artificial Intelligence AI Courses from Google

Tackling AI’s data challenges with IBM databases on AWS

Operationalize ML models built in Amazon SageMaker Canvas to production using the Amazon SageMaker Model Registry

9 data governance strategies that will unlock the potential of your business data

Five benefits of a data catalog

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Advanced tracing and evaluation of generative AI agents using LangChain and Amazon SageMaker AI MLFlow

Transforming financial analysis with CreditAI on Amazon Bedrock: Octus’s journey with AWS

ML Model Packaging [The Ultimate Guide]

Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Microsoft Azure OpenAI Service and DataRobot Modernize Data Science Work with Cutting-Edge Technology Innovations

Knowledge Bases for Amazon Bedrock now supports advanced parsing, chunking, and query reformulation giving greater control of accuracy in RAG based applications

How the UNDP Independent Evaluation Office is using AWS AI/ML services to enhance the use of evaluation to support progress toward the Sustainable Development Goals

Driving advanced analytics outcomes at scale using Amazon SageMaker powered PwC’s Machine Learning Ops Accelerator

Power recommendations and search using an IMDb knowledge graph – Part 3

Top MLOps Tools Guide: Weights & Biases, Comet and More

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

How to Save Trained Model in Python

Train self-supervised vision transformers on overhead imagery with Amazon SageMaker

Building ML Platform in Retail and eCommerce

Stay Connected