Data Science and Metadata - Artificial Intelligence Zone

AWS Glue for Handling Metadata

Analytics Vidhya

AUGUST 19, 2022

This article was published as a part of the Data Science Blogathon. Introduction AWS Glue helps Data Engineers to prepare data for other data consumers through the Extract, Transform & Load (ETL) Process. The post AWS Glue for Handling Metadata appeared first on Analytics Vidhya.

Metadata

Metadata ETL Categorization Big Data

Dynamic metadata filtering for Amazon Bedrock Knowledge Bases with LangChain

Flipboard

MARCH 4, 2025

Amazon Bedrock Knowledge Bases has a metadata filtering capability that allows you to refine search results based on specific attributes of the documents, improving retrieval accuracy and the relevance of responses. These metadata filters can be used in combination with the typical semantic (or hybrid) similarity search.

Metadata

Metadata Data Science LLM Generative AI

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

AWS Machine Learning Blog

OCTOBER 16, 2024

With a growing library of long-form video content, DPG Media recognizes the importance of efficiently managing and enhancing video metadata such as actor information, genre, summary of episodes, the mood of the video, and more. Video data analysis with AI wasn’t required for generating detailed, accurate, and high-quality metadata.

Metadata

Metadata Automation Generative AI AI

Webinars

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Underlying Engineering Behind Alexa’s Contextual ASR

Analytics Vidhya

SEPTEMBER 17, 2022

This article was published as a part of the Data Science Blogathon. Any type of contextual information, like device context, conversational context, and metadata, […]. Any type of contextual information, like device context, conversational context, and metadata, […].

Metadata

Metadata Data Science Machine Learning

Neptune.ai?—?A Metadata Store for MLOps

Analytics Vidhya

JANUARY 27, 2022

This article was published as a part of the Data Science Blogathon. A centralized location for research and production teams to govern models and experiments by storing metadata throughout the ML model lifecycle. A Metadata Store for MLOps appeared first on Analytics Vidhya. Keeping track of […].

Metadata

Metadata Machine Learning Data Science ML

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

Flipboard

NOVEMBER 22, 2024

For example, in the bank marketing use case, the management account would be responsible for setting up the organizational structure for the bank’s data and analytics teams, provisioning separate accounts for data governance, data lakes, and data science teams, and maintaining compliance with relevant financial regulations.

ML

ML Data Science Metadata DevOps

Syngenta develops a generative AI assistant to support sales representatives using Amazon Bedrock Agents

Flipboard

DECEMBER 3, 2024

As a global leader in agriculture, Syngenta has led the charge in using data science and machine learning (ML) to elevate customer experiences with an unwavering commitment to innovation. It facilitates real-time data synchronization and updates by using GraphQL APIs, providing seamless and responsive user experiences.

Generative AI

Generative AI Metadata Machine Learning Natural Language Processing

Data Warehouses: Basic Concepts for data enthusiasts

Analytics Vidhya

SEPTEMBER 13, 2022

This article was published as a part of the Data Science Blogathon. Introduction The purpose of a data warehouse is to combine multiple sources to generate different insights that help companies make better decisions and forecasting. It consists of historical and commutative data from single or multiple sources.

Data Scientist

Data Scientist Big Data Data Science Metadata

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams

Towards AI

AUGUST 7, 2024

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams Photo by Parabol | The Agile Meeting Toolbox on Unsplash In this article, we will explore the essential VS Code extensions that enhance productivity and collaboration for data scientists and machine learning (ML) engineers.

Data Science

Data Science ML ML Engineer Data Scientist

Achieve your AI goals with an open data lakehouse approach

IBM Journey to AI blog

OCTOBER 4, 2023

Typically, on their own, data warehouses can be restricted by high storage costs that limit AI and ML model collaboration and deployments, while data lakes can result in low-performing data science workloads. New insights and relationships are found in this combination. All of this supports the use of AI.

Metadata

Metadata AI Strategy Data Scientist Big Data

Bring light to the black box

IBM Journey to AI blog

MAY 9, 2023

It drives an AI governance solution without the excessive costs of switching from your current data science platform. The resulting automation drives scalability and accountability by capturing model development time and metadata, offering post-deployment model monitoring, and allowing for customized workflows.

Metadata

Metadata Explainability Automation Responsible AI

Accelerating insurance policy reviews with generative AI: Verisk’s Mozart companion

Flipboard

MARCH 7, 2025

Along with each document slice, we store the metadata associated with it using an internal Metadata API, which provides document characteristics like document type, jurisdiction, version number, and effective dates. This process has been implemented as a periodic job to keep the vector database updated with new documents.

Generative AI

Generative AI Prompt Engineer Prompt Engineering Software Development

The Role of DevSecOps in Ensuring Data Privacy and Security in Data Science Projects

ODSC - Open Data Science

APRIL 17, 2023

Data security must begin by understanding whether the collected data is compliant with data protection regulations such as GDPR or HIPAA. In this case, the provenance of the collected data is analyzed and the metadata is logged for future audit purposes.

Data Science

Data Science DevOps Deep Learning Automation

Visier’s data science team boosts their model output 10 times by migrating to Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 3, 2024

Users without data science or analytics experience can generate rigorous data-backed predictions to answer big questions like time-to-fill for important positions, or resignation risk for crucial employees. The data science team couldn’t roll out changes independently to production.

Data Science

Data Science Machine Learning Data Scientist ML

Microsoft Azure OpenAI Service and DataRobot Modernize Data Science Work with Cutting-Edge Technology Innovations

DataRobot Blog

MARCH 16, 2023

Traditionally, developing appropriate data science code and interpreting the results to solve a use-case is manually done by data scientists. The integration allows you to generate intelligent data science code that reflects your use case. Data scientists still need to review and evaluate these results.

Data Science

Data Science OpenAI Data Scientist Large Language Models

Han Heloir, MongoDB: The role of scalable databases in AI-powered apps

AI News

SEPTEMBER 29, 2024

As AI models grow and data volumes expand, databases must scale horizontally, to allow organisations to add capacity without significant downtime or performance degradation. We unify source data, metadata, operational data, vector data and generated data—all in one platform.

Big Data

Big Data Generative AI ETL Data Ingestion

Unstructured data management and governance using AWS AI/ML and analytics services

Flipboard

OCTOBER 25, 2023

But most important of all, the assumed dormant value in the unstructured data is a question mark, which can only be answered after these sophisticated techniques have been applied. Therefore, there is a need to being able to analyze and extract value from the data economically and flexibly. The solution integrates data in three tiers.

ML

ML Metadata Data Extraction AI

How to modernize data lakes with a data lakehouse architecture

IBM Journey to AI blog

JULY 5, 2023

Several items quickly raised to the top as table stakes: Resilient and scalable storage that could satisfy the demand of an ever-increasing data scale. Open data formats that kept the data accessible by all but optimized for high performance and with a well-defined structure. Comprehensive data security and data governance (i.e.

Metadata

Metadata Data Platform Big Data Data Science

Build a dynamic, role-based AI agent using Amazon Bedrock inline agents

AWS Machine Learning Blog

FEBRUARY 13, 2025

A/B testing and experimentation Data science teams can systematically evaluate different model-tool combinations, measure performance metrics, and analyze response patterns in controlled environments. The role information is also used to configure metadata filtering in the knowledge bases to generate relevant responses.

Metadata

Metadata Generative AI ML AI

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. An AI governance toolkit lets you direct, manage and monitor AI activities without the expense of switching your data science platform, even for models developed using third-party tools.

Metadata

Metadata Explainability Automation Explainable AI

Introducing watsonx: The future of AI for business

IBM Journey to AI blog

MAY 9, 2023

Through workload optimization an organization can reduce data warehouse costs by up to 50 percent by augmenting with this solution. [1] 1] Users can access data through a single point of entry, with a shared metadata layer across clouds and on-premises environments.

Data Scientist

Data Scientist Machine Learning Automation Metadata

Four starting points to transform your organization into a data-driven enterprise

IBM Journey to AI blog

JANUARY 17, 2023

IBM Cloud Pak for Data Express solutions offer clients a simple on ramp to start realizing the business value of a modern architecture. Data governance. The data governance capability of a data fabric focuses on the collection, management and automation of an organization’s data. Data science and MLOps.

Data Science

Data Science Data Integration Automation Metadata

Patterns in the Noise: Visualizing the Hidden Structures of Unstructured Documents

ODSC - Open Data Science

MARCH 31, 2025

Each text, including the rotated text on the left of the page, is identified and extracted as a stand-alone text element with coordinates and other metadata that makes it possible to render a document very close to the original PDF but from a structured JSONformat.

Metadata

Metadata DevOps NLP Large Language Models

Effective Project Management for Data Science: From Scoping to Ethical Deployment

ODSC - Open Data Science

OCTOBER 18, 2024

The advent of big data, affordable computing power, and advanced machine learning algorithms has fueled explosive growth in data science across industries. However, research shows that up to 85% of data science projects fail to move beyond proofs of concept to full-scale deployment.

Data Science

Data Science ETL Data Scientist Data Quality

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

By supporting open-source frameworks and tools for code-based, automated and visual data science capabilities — all in a secure, trusted studio environment — we’re already seeing excitement from companies ready to use both foundation models and machine learning to accomplish key tasks.

Machine Learning

Machine Learning Metadata Automation AI

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

AWS Machine Learning Blog

MARCH 18, 2025

SQL is one of the key languages widely used across businesses, and it requires an understanding of databases and table metadata. Streamlit This open source Python library makes it straightforward to create and share beautiful, custom web apps for ML and data science. The following diagram illustrates the RAG framework.

LLM

LLM Metadata Large Language Models Python

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Structured Query Language (SQL) is a complex language that requires an understanding of databases and metadata. The solution in this post aims to bring enterprise analytics operations to the next level by shortening the path to your data using natural language. Today, generative AI can enable people without SQL knowledge.

Metadata

Metadata Generative AI LLM NLP

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

With built-in components and integration with Google Cloud services, Vertex AI simplifies the end-to-end machine learning process, making it easier for data science teams to build and deploy models at scale. Metaflow Metaflow helps data scientists and machine learning engineers build, manage, and deploy data science projects.

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

Customized model monitoring for near real-time batch inference with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 28, 2024

Data Scientist at AWS, bringing a breadth of data science, ML engineering, MLOps, and AI/ML architecting to help businesses create scalable solutions on AWS. Publish the BYOC image to Amazon ECR Create a script named model_quality_monitoring.py amazonaws.com/sm-mm-mqm-byoc:1.0", instance_count=1, instance_type='ml.m5.xlarge',

ML

ML Metadata Data Scientist Machine Learning

Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

AWS Machine Learning Blog

APRIL 19, 2024

environment: HF_MODEL_ID: databricks/dolly-v2-7b HF_TASK: text-generation apiVersion: sagemaker.services.k8s.aws/v1alpha1 kind: Model metadata: name: flan-t5-xxl spec: modelName: flan-t5-xxl executionRoleARN: containers: - image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04

Metadata

Metadata LLM Software Development Machine Learning

Data integrity vs. data quality: Is there a difference?

IBM Journey to AI blog

JULY 13, 2023

Data science tasks such as machine learning also greatly benefit from good data integrity. When an underlying machine learning model is being trained on data records that are trustworthy and accurate, the better that model will be at making business predictions or automating tasks.

Data Quality

Data Quality Data Integration Metadata Automation

AI and the future of unstructured data

IBM Journey to AI blog

OCTOBER 14, 2024

. “Most data being generated every day is unstructured and presents the biggest new opportunity.” ” We wanted to learn more about what unstructured data has in store for AI. Donahue: We’re beginning to see data science and machine learning engineering teams work more closely with data engineering teams.

Business Intelligence

Business Intelligence AI AI Machine Learning

Time series forecasting with LLM-based foundation models and scalable AIOps on AWS

AWS Machine Learning Blog

MARCH 5, 2025

It stores models, organizes model versions, captures essential metadata and artifacts such as container images, and governs the approval status of each model. He solves complex organizational and technical challenges using data science and engineering. Nick Biso is a Machine Learning Engineer at AWS Professional Services.

LLM

LLM Machine Learning Natural Language Processing Computer Vision

Optimizing RAG Pipelines in Financial Services: Advanced Strategies from Fitch Group

ODSC - Open Data Science

FEBRUARY 17, 2025

Metadata tagging and filtering mechanisms safeguard proprietary data. Key Takeaways Data quality is critical for effective RAG implementation. Vector search alone is insufficient; metadata, filtering, and retrieval agents improve accuracy.

Metadata

Metadata Machine Learning Data Scientist Data Quality

How Twitch used agentic workflow with RAG on Amazon Bedrock to supercharge ad sales

Flipboard

DECEMBER 13, 2024

We discuss the solution components to build a multimodal knowledge base, drive agentic workflow, use metadata to address hallucinations, and also share the lessons learned through the solution development using multiple large language models (LLMs) and Amazon Bedrock Knowledge Bases. Yunfei Bai is a Principal Solutions Architect at AWS.

Metadata

Metadata LLM Software Engineer Chatbots

Dr. Mike Flaxman, VP or Product Management at HEAVY.AI – Interview Series

Unite.AI

SEPTEMBER 19, 2024

Mike Flaxman is currently the VP of Product at HEAVY.AI, having previously served as Product Manager and led the Spatial Data Science practice in Professional Services. What measures are in place to prevent metadata leakage when using HeavyIQ? This includes not only data but also several kinds of metadata.

Metadata

Metadata Big Data Large Language Models Natural Language Processing

A Beginner’s Guide to Data Warehousing

Unite.AI

DECEMBER 5, 2023

ETL ( Extract, Transform, Load ) Pipeline: It is a data integration mechanism responsible for extracting data from data sources, transforming it into a suitable format, and loading it into the data destination like a data warehouse. The pipeline ensures correct, complete, and consistent data.

Metadata

Metadata Big Data ETL Data Mining

Best practices for Amazon SageMaker HyperPod task governance

AWS Machine Learning Blog

FEBRUARY 19, 2025

In this post, we provide best practices to maximize the value of SageMaker HyperPod task governance and make the administration and data science experiences seamless. Access control When working with SageMaker HyperPod task governance, data scientists will assume their specific role.

Data Scientist

Data Scientist Data Science ML Engineer Generative AI

Simplifying Time Series Analysis for Data Scientists

ODSC - Open Data Science

SEPTEMBER 12, 2023

Most data scientists are familiar with the concept of time series data and work with it often. The time series database (TSDB) , however, is still an underutilized tool in the data science community. You can also get data science training on-demand wherever you are with our Ai+ Training platform.

Data Scientist

Data Scientist Data Science Metadata Python

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

AWS Machine Learning Blog

AUGUST 22, 2024

It automatically keeps track of model artifacts, hyperparameters, and metadata, helping you to reproduce and audit model versions. In this post, we show you how to convert Python code that fine-tunes a generative AI model in Amazon Bedrock from local files to a reusable workflow using Amazon SageMaker Pipelines decorators.

Generative AI

Generative AI Metadata Python ML

Why Accelerated Data Processing Is Crucial for AI Innovation in Every Industry

NVIDIA

JUNE 7, 2024

Finance Organizations Detect Fraud in a Fraction of a Second Financial organizations face a significant challenge in detecting patterns of fraud due to the vast amount of transactional data that requires rapid analysis. Additionally, the scarcity of labeled data for actual instances of fraud poses a difficulty in training AI models.

Auto-complete

Auto-complete Metadata Data Scientist Data Science

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

This post highlights how Twilio enabled natural language-driven data exploration of business intelligence (BI) data with RAG and Amazon Bedrock. Twilio’s use case Twilio wanted to provide an AI assistant to help their data analysts find data in their data lake.

Metadata

Metadata LLM Prompt Engineer Prompt Engineering

How Games24x7 transformed their retraining MLOps pipelines with Amazon SageMaker

AWS Machine Learning Blog

APRIL 12, 2023

With “Science of Gaming” as their core philosophy, they have enabled a vision of end-to-end informatics around game dynamics, game platforms, and players by consolidating orthogonal research directions of game AI, game data science, and game user research. We provided metadata to uniquely distinguish the model from each other.

Metadata

Metadata Deep Learning ML Data Science

Dynamic video content moderation and policy evaluation using AWS generative AI services

AWS Machine Learning Blog

MAY 30, 2024

Each frame will be analyzed using Amazon Rekognition and Amazon Bedrock for metadata extraction. Policy evaluation – Using the extracted metadata from the video, the system conducts LLM evaluation. An Amazon OpenSearch Service cluster stores the extracted video metadata and facilitates users’ search and discovery needs.

Generative AI

Generative AI Metadata ML AI

AWS Glue for Handling Metadata

Dynamic metadata filtering for Amazon Bedrock Knowledge Bases with LangChain

Webinars

Trending Sources

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

Webinars

Underlying Engineering Behind Alexa’s Contextual ASR

Neptune.ai?—?A Metadata Store for MLOps

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

Syngenta develops a generative AI assistant to support sales representatives using Amazon Bedrock Agents

Data Warehouses: Basic Concepts for data enthusiasts

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams

Achieve your AI goals with an open data lakehouse approach

Bring light to the black box

Accelerating insurance policy reviews with generative AI: Verisk’s Mozart companion

The Role of DevSecOps in Ensuring Data Privacy and Security in Data Science Projects

Visier’s data science team boosts their model output 10 times by migrating to Amazon SageMaker

Microsoft Azure OpenAI Service and DataRobot Modernize Data Science Work with Cutting-Edge Technology Innovations

Han Heloir, MongoDB: The role of scalable databases in AI-powered apps

Unstructured data management and governance using AWS AI/ML and analytics services

How to modernize data lakes with a data lakehouse architecture

Build a dynamic, role-based AI agent using Amazon Bedrock inline agents

How to use foundation models and trusted governance to manage AI workflow risk

Introducing watsonx: The future of AI for business

Four starting points to transform your organization into a data-driven enterprise

Patterns in the Noise: Visualizing the Hidden Structures of Unstructured Documents

Effective Project Management for Data Science: From Scoping to Ethical Deployment

Exploring the AI and data capabilities of watsonx

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

MLOps Landscape in 2023: Top Tools and Platforms

Customized model monitoring for near real-time batch inference with Amazon SageMaker

Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

Data integrity vs. data quality: Is there a difference?

AI and the future of unstructured data

Time series forecasting with LLM-based foundation models and scalable AIOps on AWS

Optimizing RAG Pipelines in Financial Services: Advanced Strategies from Fitch Group

How Twitch used agentic workflow with RAG on Amazon Bedrock to supercharge ad sales

Dr. Mike Flaxman, VP or Product Management at HEAVY.AI – Interview Series

A Beginner’s Guide to Data Warehousing

Best practices for Amazon SageMaker HyperPod task governance

Simplifying Time Series Analysis for Data Scientists

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

Why Accelerated Data Processing Is Crucial for AI Innovation in Every Industry

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

How Games24x7 transformed their retraining MLOps pipelines with Amazon SageMaker

Dynamic video content moderation and policy evaluation using AWS generative AI services

Stay Connected