Metadata, ML and Python - Artificial Intelligence Zone

Enrich your AWS Glue Data Catalog with generative AI metadata using Amazon Bedrock

Flipboard

NOVEMBER 15, 2024

Metadata can play a very important role in using data assets to make data driven decisions. Generating metadata for your data assets is often a time-consuming and manual task. This post shows you how to enrich your AWS Glue Data Catalog with dynamic metadata using foundation models (FMs) on Amazon Bedrock and your data documentation.

Metadata

Metadata Generative AI LLM AI

Google AI Introduces Croissant: A Metadata Format for Machine Learning-Ready Datasets

Marktechpost

MARCH 12, 2024

When building machine learning (ML) models using preexisting datasets, experts in the field must first familiarize themselves with the data, decipher its structure, and determine which subset to use as features. So much so that a basic barrier, the great range of data formats, is slowing advancement in ML.

Metadata

Metadata Machine Learning ML Data Discovery

Access control for vector stores using metadata filtering with Knowledge Bases for Amazon Bedrock

AWS Machine Learning Blog

JULY 2, 2024

With metadata filtering now available in Knowledge Bases for Amazon Bedrock, you can define and use metadata fields to filter the source data used for retrieving relevant context during RAG. Metadata filtering gives you more control over the RAG process for better results tailored to your specific use case needs.

Metadata

Metadata Generative AI Python Computer Vision

Webinars

The Intersection of AI and Sales: Personalization Without Compromise

How to Achieve High-Accuracy Results When Using LLMs

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

MORE WEBINARS

Empower your generative AI application with a comprehensive custom observability solution

AWS Machine Learning Blog

OCTOBER 29, 2024

This solution uses decorators in your application code to capture and log metadata such as input prompts, output results, run time, and custom metadata, offering enhanced security, ease of use, flexibility, and integration with native AWS services. However, some components may incur additional usage-based costs.

Generative AI

Generative AI Metadata Data Scientist AI

Building a Multimodal Gradio Chatbot with Llama 3.2 Using the Ollama API

Flipboard

FEBRUARY 10, 2025

Whether youre new to Gradio or looking to expand your machine learning (ML) toolkit, this guide will equip you to create versatile and impactful applications. Gradio is an open-source Python library that enables developers to create user-friendly and interactive web applications effortlessly. and the Ollama API, just keep reading.

Chatbots

Chatbots Computer Vision Deep Learning Large Language Models

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams

Towards AI

AUGUST 7, 2024

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams Photo by Parabol | The Agile Meeting Toolbox on Unsplash In this article, we will explore the essential VS Code extensions that enhance productivity and collaboration for data scientists and machine learning (ML) engineers.

Data Science

Data Science ML ML Engineer Data Scientist

Customized model monitoring for near real-time batch inference with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 28, 2024

Real-world applications vary in inference requirements for their artificial intelligence and machine learning (AI/ML) solutions to optimize performance and reduce costs. SageMaker Model Monitor monitors the quality of SageMaker ML models in production. Your client applications invoke this endpoint to get inferences from the model.

ML

ML Metadata Data Scientist DevOps

How to Save Trained Model in Python

The MLOps Blog

MAY 10, 2023

When working on real-world machine learning (ML) use cases, finding the best algorithm/model is not the end of your responsibilities. Reusability & reproducibility: Building ML models is time-consuming by nature. Save vs package vs store ML models Although all these terms look similar, they are not the same.

Python

Python Metadata ML Machine Learning

Read graphs, diagrams, tables, and scanned pages using multimodal prompts in Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 26, 2024

We add the following to the end of the prompt: provide the response in json format with the key as “class” and the value as the class of the document We get the following response: { "class": "ID" } You can now read the JSON response using a library of your choice, such as the Python JSON library. The following image is of a gearbox.

LLM

LLM Convolutional Neural Networks Metadata Explainability

Build a video insights and summarization engine using generative AI with Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 29, 2024

This engine uses artificial intelligence (AI) and machine learning (ML) services and generative AI on AWS to extract transcripts, produce a summary, and provide a sentiment for the call. This post provides guidance on how you can create a video insights and summarization engine using AWS AI/ML services.

Generative AI

Generative AI ML AI AI

Metadata Metamorphosis: from plain Data to Enhanced insights with Retrieval Augmented Generation

Mlearning.ai

DECEMBER 3, 2023

Discover how metadata, the hidden gem of your knowledge base, can be transformed into a powerful tool for enriching your RAG pipeline and… Continue reading on MLearning.ai »

Metadata

Metadata ML Python Artificial Intelligence

Allen Institute for AI Released olmOCR: A High-Performance Open Source Toolkit Designed to Convert PDFs and Document Images into Clean and Structured Plain Text

Marktechpost

FEBRUARY 26, 2025

Researchers at the Allen Institute for AI introduced olmOCR , an open-source Python toolkit designed to efficiently convert PDFs into structured plain text while preserving logical reading order. Image Source The core innovation behind olmOCR is document anchoring, a technique that combines textual metadata with image-based analysis.

Metadata

Metadata Inference Engine Deep Learning AI

MBRS: A Python Library for Minimum Bayes Risk (MBR) Decoding

Marktechpost

AUGUST 13, 2024

The MBRS library is implemented primarily in Python and PyTorch and offers several key features. Additionally, MBRS provides metadata analysis capabilities, allowing users to analyze the origins of output texts and visualize the decision-making process of MBR decoding. If you like our work, you will love our newsletter.

Python

Python Metadata Algorithm ML

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

AWS Machine Learning Blog

AUGUST 22, 2024

In this post, we show you how to convert Python code that fine-tunes a generative AI model in Amazon Bedrock from local files to a reusable workflow using Amazon SageMaker Pipelines decorators. You can use Amazon SageMaker Model Building Pipelines to collaborate between multiple AI/ML teams.

Generative AI

Generative AI Metadata Python ML

Integrate SaaS platforms with Amazon SageMaker to enable ML-powered applications

AWS Machine Learning Blog

JULY 6, 2023

Many organizations choose SageMaker as their ML platform because it provides a common set of tools for developers and data scientists. There are a few different ways in which authentication across AWS accounts can be achieved when data in the SaaS platform is accessed from SageMaker and when the ML model is invoked from the SaaS platform.

ML

ML Data Scientist Metadata Python

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

aws sagemaker create-cluster --cli-input-json file://cluster-config.json --region $AWS_REGION You should be able to see your cluster by navigating to SageMaker Hyperpod in the AWS Management Console and see a cluster named ml-cluster listed. Step 1: Environment setup You first need to install the required Python packages for fine tuning.

Deep Learning

Deep Learning Generative AI Python Machine Learning

Meet Chroma: An AI-Native Open-Source Vector Database For LLMs: A Faster Way to Build Python or JavaScript LLM Apps with Memory

Marktechpost

AUGUST 19, 2023

Chroma can be used to create word embeddings using Python or JavaScript programming. Each referenced string can have extra metadata that describes the original document. Researchers fabricated some metadata to use in the tutorial. Metadata (or IDs) can also be queried in the Chroma database.

Python

Python Metadata LLM Big Data

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

AWS Machine Learning Blog

JANUARY 28, 2025

FMEval is an open source LLM evaluation library, designed to provide data scientists and machine learning (ML) engineers with a code-first experience to evaluate LLMs for various aspects, including accuracy, toxicity, fairness, robustness, and efficiency. This allows you to keep track of your ML experiments.

LLM

LLM Large Language Models ML Algorithm

This AI Paper from the Netherlands Introduce an AutoML Framework Designed to Synthesize End-to-End Multimodal Machine Learning ML Pipelines Efficiently

Marktechpost

JULY 24, 2024

An improvement in AutoML for dealing with complicated data modalities, including tabular-text, text-vision, and vision-text-tabular configurations, the proposed method simplifies and guarantees the efficiency and adaptability of multimodal ML pipelines. This collection was chosen after the pipeline variants had been designed.

Machine Learning

Machine Learning ML Natural Language Processing Computer Vision

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

AWS Machine Learning Blog

MARCH 18, 2025

SQL is one of the key languages widely used across businesses, and it requires an understanding of databases and table metadata. Embedding is usually performed by a machine learning (ML) model. Streamlit This open source Python library makes it straightforward to create and share beautiful, custom web apps for ML and data science.

LLM

LLM Metadata Large Language Models Python

Host ML models on Amazon SageMaker using Triton: CV model with PyTorch backend

AWS Machine Learning Blog

MAY 31, 2023

PyTorch is a machine learning (ML) framework based on the Torch library, used for applications such as computer vision and natural language processing. One of the primary reasons that customers are choosing a PyTorch framework is its simplicity and the fact that it’s designed and assembled to work with Python.

ML

ML Auto-classification Auto-complete Natural Language Processing

Amazon Personalize launches new recipes supporting larger item catalogs with lower latency

AWS Machine Learning Blog

MAY 2, 2024

Amazon Personalize makes it straightforward to personalize your website, app, emails, and more, using the same machine learning (ML) technology used by Amazon, without requiring ML expertise. If you use Amazon Personalize with generative AI, you can also feed the metadata into prompts. compared to previous versions.

Metadata

Metadata Software Engineer Machine Learning Large Language Models

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Knowledge and skills in the organization Evaluate the level of expertise and experience of your ML team and choose a tool that matches their skill set and learning curve. For example, if your team is proficient in Python and R, you may want an MLOps tool that supports open data formats like Parquet, JSON, CSV, etc.,

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

How Untold Studios empowers artists with an AI assistant built on Amazon Bedrock

AWS Machine Learning Blog

FEBRUARY 7, 2025

Our commitment to innovation led us to a pivotal challenge: how to harness the power of machine learning (ML) to further enhance our competitive edge while balancing this technological advancement with strict data security requirements and the need to streamline access to our existing internal resources.

LLM

LLM Python AI AI

DeepSeek AI Releases Fire-Flyer File System (3FS): A High-Performance Distributed File System Designed to Address the Challenges of AI Training and Inference Workload

Marktechpost

FEBRUARY 28, 2025

In addition, 3FS incorporates stateless metadata services that are supported by a transactional key-value store, such as FoundationDB. By decoupling metadata management from the storage layer, the system not only becomes more scalable but also reduces potential bottlenecks related to metadata operations.

Metadata

Metadata AI AI Artificial Intelligence

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Structured Query Language (SQL) is a complex language that requires an understanding of databases and metadata. Third, despite the larger adoption of centralized analytics solutions like data lakes and warehouses, complexity rises with different table names and other metadata that is required to create the SQL for the desired sources.

Metadata

Metadata Generative AI LLM NLP

How Veritone uses Amazon Bedrock, Amazon Rekognition, Amazon Transcribe, and information retrieval to update their video search pipeline

AWS Machine Learning Blog

MAY 7, 2024

Veritone’s current media search and retrieval system relies on keyword matching of metadata generated from ML services, including information related to faces, sentiment, and objects. We use the Amazon Titan Text and Multimodal Embeddings models to embed the metadata and the video frames and index them in OpenSearch Service.

Metadata

Metadata Generative AI Machine Learning Large Language Models

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. IBM watsonx.ai With watsonx.ai, businesses can effectively train, validate, tune and deploy AI models with confidence and at scale across their enterprise.

Machine Learning

Machine Learning Metadata Automation AI

ML Model Packaging [The Ultimate Guide]

The MLOps Blog

APRIL 5, 2023

In this comprehensive guide, we’ll explore the key concepts, challenges, and best practices for ML model packaging, including the different types of packaging formats, techniques, and frameworks. Best practices for ml model packaging Here is how you can package a model efficiently. What is model packaging in machine learning?

ML

ML Machine Learning Deep Learning Metadata

Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

SEPTEMBER 29, 2023

In this post, we illustrate how to use a segmentation machine learning (ML) model to identify crop and non-crop regions in an image. Identifying crop regions is a core step towards gaining agricultural insights, and the combination of rich geospatial data and ML can lead to insights that drive decisions and actions.

Machine Learning

Machine Learning Data Scientist ML Python

How the UNDP Independent Evaluation Office is using AWS AI/ML services to enhance the use of evaluation to support progress toward the Sustainable Development Goals

AWS Machine Learning Blog

MARCH 29, 2023

In this post, we discuss how the IEO developed UNDP’s artificial intelligence and machine learning (ML) platform—named Artificial Intelligence for Development Analytics (AIDA)— in collaboration with AWS, UNDP’s Information and Technology Management Team (UNDP ITM), and the United Nations International Computing Centre (UNICC).

ML

ML Metadata Data Ingestion Data Extraction

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Second, because data, code, and other development artifacts like machine learning (ML) models are stored within different services, it can be cumbersome for users to understand how they interact with each other and make changes. Data and AI governance Publish your data products to the catalog with glossaries and metadata forms.

Big Data Architect

Big Data Architect Big Data ML Generative AI

Build high-performance ML models using PyTorch 2.0 on AWS – Part 1

AWS Machine Learning Blog

JUNE 6, 2023

PyTorch is a machine learning (ML) framework that is widely used by AWS customers for a variety of applications, such as computer vision, natural language processing, content creation, and more. Our next generation release that is faster, more Pythonic and Dynamic as ever for details. With the recent PyTorch 2.0 Refer to PyTorch 2.0:

ML

ML Deep Learning BERT Python

How to Automate ML Experiment Management With CI/CD

The MLOps Blog

JUNE 4, 2024

TL;DR Using CI/CD workflows to run ML experiments ensures their reproducibility, as all the required information has to be contained under version control. The compute resources offered by GitHub Actions directly are not suitable for larger-scale ML workloads. ML experiments are, by nature, full of uncertainty and surprises.

ML

ML Automation Machine Learning Metadata

Power recommendations and search using an IMDb knowledge graph – Part 3

AWS Machine Learning Blog

JANUARY 6, 2023

In Part 2 , we demonstrated how to use Amazon Neptune ML (in Amazon SageMaker ) to train the KG and create KG embeddings. This mapping can be done by manually mapping frequent OOC queries to catalog content or can be automated using machine learning (ML). Creates an OpenSearch Service domain for the search application.

Metadata

Metadata Machine Learning Data Scientist ML

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

AWS Machine Learning Blog

APRIL 5, 2023

Amazon SageMaker Serverless Inference is a purpose-built inference service that makes it easy to deploy and scale machine learning (ML) models. Data overview and preparation You can use a SageMaker Studio notebook with a Python 3 (Data Science) kernel to run the sample code. We use the first metadata file in this demo.

Metadata

Metadata Neural Network ML Python

Astral Released uv with Advanced Features: A Comprehensive and High-Performance Tool for Unified Python Packaging and Project Management

Marktechpost

AUGUST 22, 2024

Astral, a company renowned for its high-performance developer tools in the Python ecosystem, has recently released uv: Unified Python packaging , a comprehensive tool designed to streamline Python package management. Developers can now use uv to generate and install cross-platform lockfiles based on standards-compliant metadata.

Python

Python Metadata ML AI Tools

How Games24x7 transformed their retraining MLOps pipelines with Amazon SageMaker

AWS Machine Learning Blog

APRIL 12, 2023

There was no mechanism to pass and store the metadata of the multiple experiments done on the model. We could re-use the previous Sagemaker Python SDK code to run the modules individually into Sagemaker Pipeline SDK based runs.

Metadata

Metadata Deep Learning ML Data Science

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

AWS Machine Learning Blog

DECEMBER 6, 2023

The search precision can also be improved with metadata filtering. To overcome these limitations, we propose a solution that combines RAG with metadata and entity extraction, SQL querying, and LLM agents, as described in the following sections. We use the following Python script to recreate tables as pandas DataFrames.

Metadata

Metadata LLM NLP Conversational AI

#35 Advanced prompting techniques are a myth…it’s all about good communication!

Towards AI

AUGUST 8, 2024

Featured Community post from the Discord Arwmoffat just released Manifest, a tool that lets you write a Python function and have an LLM execute it. Manifest relies on runtime metadata, such as a function’s name, docstring, arguments, and type hints. It uses this metadata to compose a prompt and sends it to an LLM.

Neural Network

Neural Network Metadata Explainability LLM

Google DeepMind Researchers Propose CaMeL: A Robust Defense that Creates a Protective System Layer around the LLM, Securing It even when Underlying Models may be Susceptible to Attacks

Marktechpost

MARCH 26, 2025

CaMeL further strengthens security by assigning metadata or “capabilities” to each data value, defining strict policies about how each piece of information can be utilized. Also,feel free to follow us on Twitter and dont forget to join our 85k+ ML SubReddit. increase in input tokens and a 2.73 Check out the Paper.

LLM

LLM Metadata Large Language Models Prompt Engineer

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

This setup uses the AWS SDK for Python (Boto3) to interact with AWS services. You then format these pairs as individual text files with corresponding metadata JSON files , upload them to an S3 bucket, and ingest them into your cache knowledge base.

LLM

LLM Large Language Models Natural Language Processing Machine Learning

CodeQueries: Answering Semantic Queries Over Code

Towards AI

FEBRUARY 15, 2024

An example of “Conflicting attributes in the base class” query In Python, when a class subclasses multiple base classes, attribute lookup is performed from left to right amongst the base classes following a method called “method resolution order.”¹ CodeQueries has 52 public CodeQL queries python source code files from a common corpus³.

Metadata

Metadata Python ML LLM

Streamline diarization using AI as an assistive technology: ZOO Digital’s story

AWS Machine Learning Blog

FEBRUARY 20, 2024

In this post, we discuss deploying scalable machine learning (ML) models for diarizing media content using Amazon SageMaker , with a focus on the WhisperX model. When selecting the Docker image, consider the following settings: framework (Hugging Face), task (inference), Python version, and hardware (for example, GPU).

Metadata

Metadata Auto-complete Machine Learning Deep Learning

Enrich your AWS Glue Data Catalog with generative AI metadata using Amazon Bedrock

Google AI Introduces Croissant: A Metadata Format for Machine Learning-Ready Datasets

Webinars

Trending Sources

Access control for vector stores using metadata filtering with Knowledge Bases for Amazon Bedrock

Webinars

Empower your generative AI application with a comprehensive custom observability solution

Building a Multimodal Gradio Chatbot with Llama 3.2 Using the Ollama API

From Solo Notebooks to Collaborative Powerhouse: VS Code Extensions for Data Science and ML Teams

Customized model monitoring for near real-time batch inference with Amazon SageMaker

How to Save Trained Model in Python

Read graphs, diagrams, tables, and scanned pages using multimodal prompts in Amazon Bedrock

Build a video insights and summarization engine using generative AI with Amazon Bedrock

Metadata Metamorphosis: from plain Data to Enhanced insights with Retrieval Augmented Generation

Allen Institute for AI Released olmOCR: A High-Performance Open Source Toolkit Designed to Convert PDFs and Document Images into Clean and Structured Plain Text

MBRS: A Python Library for Minimum Bayes Risk (MBR) Decoding

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

Integrate SaaS platforms with Amazon SageMaker to enable ML-powered applications

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

Meet Chroma: An AI-Native Open-Source Vector Database For LLMs: A Faster Way to Build Python or JavaScript LLM Apps with Memory

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

This AI Paper from the Netherlands Introduce an AutoML Framework Designed to Synthesize End-to-End Multimodal Machine Learning ML Pipelines Efficiently

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

Host ML models on Amazon SageMaker using Triton: CV model with PyTorch backend

Amazon Personalize launches new recipes supporting larger item catalogs with lower latency

MLOps Landscape in 2023: Top Tools and Platforms

How Untold Studios empowers artists with an AI assistant built on Amazon Bedrock

DeepSeek AI Releases Fire-Flyer File System (3FS): A High-Performance Distributed File System Designed to Address the Challenges of AI Training and Inference Workload

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

How Veritone uses Amazon Bedrock, Amazon Rekognition, Amazon Transcribe, and information retrieval to update their video search pipeline

Exploring the AI and data capabilities of watsonx

ML Model Packaging [The Ultimate Guide]

Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities

How the UNDP Independent Evaluation Office is using AWS AI/ML services to enhance the use of evaluation to support progress toward the Sustainable Development Goals

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Build high-performance ML models using PyTorch 2.0 on AWS – Part 1

How to Automate ML Experiment Management With CI/CD

Power recommendations and search using an IMDb knowledge graph – Part 3

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

Astral Released uv with Advanced Features: A Comprehensive and High-Performance Tool for Unified Python Packaging and Project Management

How Games24x7 transformed their retraining MLOps pipelines with Amazon SageMaker

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

#35 Advanced prompting techniques are a myth…it’s all about good communication!

Google DeepMind Researchers Propose CaMeL: A Robust Defense that Creates a Protective System Layer around the LLM, Securing It even when Underlying Models may be Susceptible to Attacks

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

CodeQueries: Answering Semantic Queries Over Code

Streamline diarization using AI as an assistive technology: ZOO Digital’s story

Stay Connected