Document, Metadata and ML Engineer - Artificial Intelligence Zone

Search enterprise data assets using LLMs backed by knowledge graphs

Flipboard

NOVEMBER 27, 2024

Customers want to search through all of the data and applications across their organization, and they want to see the provenance information for all of the documents retrieved. The application needs to search through the catalog and show the metadata information related to all of the data assets that are relevant to the search context.

Metadata

Metadata Auto-complete Data Discovery ML Engineer

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Structured Query Language (SQL) is a complex language that requires an understanding of databases and metadata. Third, despite the larger adoption of centralized analytics solutions like data lakes and warehouses, complexity rises with different table names and other metadata that is required to create the SQL for the desired sources.

Metadata

Metadata Generative AI LLM NLP

Data4ML Preparation Guidelines (Beyond The Basics)

Towards AI

NOVEMBER 8, 2024

Data preparation isn’t just a part of the ML engineering process — it’s the heart of it. This post dives into key steps for preparing data to build real-world ML systems. Data ingestion ensures that all relevant data is aggregated, documented, and traceable. This member-only story is on us.

Data Ingestion

Data Ingestion Metadata ML Engineer ML

Webinars

4 HR Priorities for 2025 to Supercharge Your Employee Experience

AI in Marketing & Sales: Today’s Tools, Tomorrow’s Potential

AI for Paralegals: Everything You Need to Know (and How to Use It Safely)

MORE WEBINARS

Top Artificial Intelligence AI Courses from Google

Marktechpost

MAY 30, 2024

Introduction to AI and Machine Learning on Google Cloud This course introduces Google Cloud’s AI and ML offerings for predictive and generative projects, covering technologies, products, and tools across the data-to-AI lifecycle. It includes labs on feature engineering with BigQuery ML, Keras, and TensorFlow.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence BERT Computer Vision

Deploy Amazon SageMaker pipelines using AWS Controllers for Kubernetes

AWS Machine Learning Blog

SEPTEMBER 4, 2024

In this post, we introduce an example to help DevOps engineers manage the entire ML lifecycle—including training and inference—using the same toolkit. Solution overview We consider a use case in which an ML engineer configures a SageMaker model building pipeline using a Jupyter notebook.

DevOps

DevOps ML Engineer ML Metadata

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

AWS Machine Learning Blog

NOVEMBER 13, 2024

You can now register machine learning (ML) models in Amazon SageMaker Model Registry with Amazon SageMaker Model Cards , making it straightforward to manage governance information for specific model versions directly in SageMaker Model Registry in just a few clicks. The Amazon DataZone project ID is captured in the Documentation section.

Metadata

Metadata ML Software Engineer Machine Learning

Top Large Language Models LLMs Courses

Marktechpost

JULY 25, 2024

LangChain Chat with Your Data Difficulty Level: Beginner This course teaches Retrieval Augmented Generation and building chatbots that respond based on document content. It covers topics like document loading, splitting, vector stores, embeddings, retrieval techniques, question answering, and chatbot development using LangChain.

Large Language Models

Large Language Models Prompt Engineer Prompt Engineering Chatbots

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

AWS Machine Learning Blog

MAY 30, 2024

To serve their customers, Vitech maintains a repository of information that includes product documentation (user guides, standard operating procedures, runbooks), which is currently scattered across multiple internal platforms (for example, Confluence sites and SharePoint folders).

Chatbots

Chatbots Prompt Engineering Prompt Engineer Large Language Models

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

User support arrangements Consider the availability and quality of support from the provider or vendor, including documentation, tutorials, forums, customer service, etc. Kubeflow integrates with popular ML frameworks, supports versioning and collaboration, and simplifies the deployment and management of ML pipelines on Kubernetes clusters.

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

From concept to reality: Navigating the Journey of RAG from proof of concept to production

AWS Machine Learning Blog

FEBRUARY 12, 2025

Machine learning (ML) engineers must make trade-offs and prioritize the most important factors for their specific use case and business requirements. For example, your input document might include tables within the PDF. Another important consideration is the way your source document is split up into chunks.

Auto-classification

Auto-classification Metadata Generative AI Machine Learning

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

AWS Machine Learning Blog

JANUARY 28, 2025

By documenting the specific model versions, fine-tuning parameters, and prompt engineering techniques employed, teams can better understand the factors contributing to their AI systems performance. This allows you to keep track of your ML experiments.

LLM

LLM Large Language Models ML Algorithm

Use IP-restricted presigned URLs to enhance security in Amazon SageMaker Ground Truth

AWS Machine Learning Blog

AUGUST 20, 2024

You can use SageMaker Ground Truth to create labeling jobs, which are workflows where data objects (such as images, videos, or documents) need to be annotated by human workers. Workers can then use these URLs to directly access the necessary files, such as images or documents, in their web browsers for annotation purposes.

Software Engineer

Software Engineer ML Machine Learning Metadata

A guide to Amazon Bedrock Model Distillation (preview)

AWS Machine Learning Blog

DECEMBER 4, 2024

Document summarization : Process vast amounts of business content in real time, such as summarizing thousands of customer call transcripts daily, enabling insights at a scale previously limited by latency constraints. You can optionally add request metadata to these inference requests to filter your invocation logs for specific use cases.

Metadata

Metadata Generative AI Categorization Data Scientist

How to Save Trained Model in Python

The MLOps Blog

MAY 10, 2023

Cons of Saving ML Models with Joblib 1 Joblib is optimized for numpy arrays, and may not work as well with other object types. 3 Compared to Pickle, Joblib is less well known, which can make it more difficult to locate help and documentation around it. You can find all of this information in the model metadata tab of a Neptune project.

Python

Python Metadata ML Machine Learning

First ODSC Europe 2023 Sessions Announced

ODSC - Open Data Science

MARCH 27, 2023

ML Governance: A Lean Approach Ryan Dawson | Principal Data Engineer | Thoughtworks Meissane Chami | Senior ML Engineer | Thoughtworks During this session, you’ll discuss the day-to-day realities of ML Governance. Some of the questions you’ll explore include How much documentation is appropriate?

Machine Learning

Machine Learning Data Ingestion Data Science Deep Learning

Use Amazon SageMaker Model Card sharing to improve model governance

AWS Machine Learning Blog

AUGUST 31, 2023

One of the tools available as part of the ML governance is Amazon SageMaker Model Cards , which has the capability to create a single source of truth for model information by centralizing and standardizing documentation throughout the model lifecycle. They provide a fact sheet of the model that is important for model governance.

ML

ML Data Scientist Machine Learning Data Science

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 16, 2023

Amazon SageMaker provides purpose-built tools for machine learning operations (MLOps) to help automate and standardize processes across the ML lifecycle. In this post, we describe how Philips partnered with AWS to develop AI ToolSuite—a scalable, secure, and compliant ML platform on SageMaker.

Data Scientist

Data Scientist ML Data Science Automation

Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

SEPTEMBER 29, 2023

Planet and AWS’s partnership on geospatial ML SageMaker geospatial capabilities empower data scientists and ML engineers to build, train, and deploy models using geospatial data. It also contains each scene’s metadata, its image ID, and a preview image reference.

Machine Learning

Machine Learning Data Scientist ML Python

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

AWS Machine Learning Blog

DECEMBER 13, 2023

ML operations, known as MLOps, focus on streamlining, automating, and monitoring ML models throughout their lifecycle. Data scientists, ML engineers, IT staff, and DevOps teams must work together to operationalize models from research to deployment and maintenance.

ML

ML Automation Metadata Software Development

How Did We Get to ML Model Reproducibility

The MLOps Blog

MARCH 14, 2023

This is where version control tools and documentation helped us as team members, and new joiners had access to specific versions of code, data, and ML models. The important part is, all these challenges are very common for any type of ML or DL use case. helped our ML developers better understand the ML project.

ML

ML Machine Learning Metadata ML Engineer

Exploring Generative AI in conversational experiences: An Introduction with Amazon Lex, Langchain, and SageMaker Jumpstart

AWS Machine Learning Blog

JUNE 8, 2023

A session stores metadata and application-specific data known as session attributes. Ryan Gomes is a Data & ML Engineer with the AWS Professional Services Intelligence Practice. This is your Custom Python Hook speaking!" A session persists over time unless manually stopped or timed out.

Generative AI

Generative AI LLM Machine Learning Large Language Models

Logging PyMC and Arviz Artifacts on Neptune

The MLOps Blog

JANUARY 24, 2024

help data scientists systematically record, catalog, and analyze modeling artifacts and experiment metadata. is an experiment tracker for ML teams that struggle with debugging and reproducing experiments, sharing results, and messy model handover. Experiment trackers like neptune.ai Even though neptune.ai Aside neptune.ai

Metadata

Metadata Python Data Scientist ML

Custom AI Solutions vs. Off-the-Shelf Products: Key Factors to Consider

Dlabs.ai

JUNE 11, 2019

Most of them are not directly computer-readable as they are scans and there is some metadata, but not complete enough to be sufficient. You want to digitize all of them and then group them into categories and have read about OCR (optical character recognition) software that could “read” a document scan and spew out a blob of text.

Machine Learning

Machine Learning Artificial Intelligence Artificial Intelligence Algorithm

MLflow: Simplifying Machine Learning Experimentation

Viso.ai

MARCH 29, 2024

MLflow is an open-source platform designed to manage the entire machine learning lifecycle, making it easier for ML Engineers, Data Scientists, Software Developers, and everyone involved in the process. Machine learning operations (MLOps) are a set of practices that automate and simplify machine learning (ML) workflows and deployments.

Machine Learning

Machine Learning ML Automation Data Scientist

Use Amazon SageMaker Model Cards sharing to improve model governance

AWS Machine Learning Blog

AUGUST 31, 2023

One of the tools available as part of the ML governance is Amazon SageMaker Model Cards , which has the capability to create a single source of truth for model information by centralizing and standardizing documentation throughout the model lifecycle. They provide a fact sheet of the model that is important for model governance.

ML

ML Data Scientist Machine Learning Data Science

Google experts on practical paths to data-centricity in applied AI

Snorkel AI

JULY 5, 2023

RC : I have had ML engineers tell me, “You didn’t need to do feature selection anymore, and that you could just throw everything at the model and it will figure out what to keep and what to throw away.” So does that mean feature selection is no longer necessary? If not, when should we consider using feature selection?”

Large Language Models

Large Language Models Metadata Machine Learning AI

Google experts on practical paths to data-centricity in applied AI

Snorkel AI

JULY 5, 2023

RC : I have had ML engineers tell me, “You didn’t need to do feature selection anymore, and that you could just throw everything at the model and it will figure out what to keep and what to throw away.” So does that mean feature selection is no longer necessary? If not, when should we consider using feature selection?”

Large Language Models

Large Language Models Metadata Machine Learning AI

Google experts on practical paths to data-centricity in applied AI

Snorkel AI

JULY 5, 2023

RC : I have had ML engineers tell me, “You didn’t need to do feature selection anymore, and that you could just throw everything at the model and it will figure out what to keep and what to throw away.” So does that mean feature selection is no longer necessary? If not, when should we consider using feature selection?”

Large Language Models

Large Language Models Metadata Machine Learning AI

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

MARCH 15, 2023

Cost and resource requirements There are several cost-related constraints we had to consider when we ventured into the ML model deployment journey Data storage costs: Storing the data used to train and test the model, as well as any new data used for prediction, can add to the cost of deployment. S3 buckets. Redshift, S3, and so on.

ETL

ETL Data Drift Machine Learning ML

Learnings From Building the ML Platform at Mailchimp

The MLOps Blog

OCTOBER 3, 2023

I see so many of these job seekers, especially on the MLOps side or the ML engineer side. There’s no component that stores metadata about this feature store? Mikiko Bazeley: In the case of the literal feature store, all it does is store features and metadata. The second part is documentation, which is a huge part.

ML

ML Data Scientist Machine Learning Data Science

Learnings From Building the ML Platform at Stitch Fix

The MLOps Blog

AUGUST 3, 2023

This is Piotr Niedźwiedź and Aurimas Griciūnas from neptune.ai , and you’re listening to ML Platform Podcast. Stefan is a software engineer, data scientist, and has been doing work as an ML engineer. Today we have invited a pretty unique and interesting guest, Stefan Krawczyk. Stefan: Yeah.

ML

ML Data Scientist Software Engineer Machine Learning

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

The MLOps Blog

MARCH 28, 2023

quality attributes) and metadata enrichment (e.g., This brings interpersonal challenges, and the AI/ML teams are encouraged to build good relationships with clients to help support the models by telling people how to use the solution instead of just exposing the endpoint without documentation or telling them how.

Machine Learning

Machine Learning Data Scientist Automation ML

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

After the completion of the research phase, the data scientists need to collaborate with ML engineers to create automations for building (ML pipelines) and deploying models into production using CI/CD pipelines. Security SMEs review the architecture based on business security policies and needs. words for English).

Generative AI

Generative AI Prompt Engineering Prompt Engineer ML

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

One of the most prevalent complaints we hear from ML engineers in the community is how costly and error-prone it is to manually go through the ML workflow of building and deploying models. Building end-to-end machine learning pipelines lets ML engineers build once, rerun, and reuse many times.

ML

ML Machine Learning Metadata Data Science

Evaluate large language models for quality and responsibility

AWS Machine Learning Blog

NOVEMBER 30, 2023

It also integrates with Machine Learning and Operation (MLOps) workflows in Amazon SageMaker to automate and scale the ML lifecycle. Here you can provide the metadata for this model hosting information along with the input format/template your specific model expects. What is FMEval?

Large Language Models

Large Language Models Algorithm LLM Responsible AI

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

However, model governance functions in an organization are centralized and to perform those functions, teams need access to metadata about model lifecycle activities across those accounts for validation, approval, auditing, and monitoring to manage risk and compliance. These stages are applicable to both use case and model stages.

ML

ML Auto-complete Machine Learning Auto-classification

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

As the number of ML-powered apps and services grows, it gets overwhelming for data scientists and ML engineers to build and deploy models at scale. In this comprehensive guide, we’ll explore everything you need to know about machine learning platforms, including: Components that make up an ML platform.

Machine Learning

Machine Learning Data Scientist ML Metadata

Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

AWS Machine Learning Blog

APRIL 19, 2024

Refer to the guidance provided in the API documentation for more details. environment: HF_MODEL_ID: databricks/dolly-v2-7b HF_TASK: text-generation apiVersion: sagemaker.services.k8s.aws/v1alpha1 kind: Model metadata: name: flan-t5-xxl spec: modelName: flan-t5-xxl executionRoleARN: containers: - image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04

Metadata

Metadata LLM Software Development Machine Learning

Dive deep into vector data stores using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

OCTOBER 11, 2024

Use cases for vector databases for RAG In the context of RAG architectures, the external knowledge can come from relational databases, search and document stores, or other data stores. Knowledge bases are essential for various use cases, such as customer support, product documentation, internal knowledge sharing, and decision-making systems.

Metadata

Metadata Generative AI LLM Data Ingestion

Search enterprise data assets using LLMs backed by knowledge graphs

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

Webinars

Trending Sources

Data4ML Preparation Guidelines (Beyond The Basics)

Webinars

Top Artificial Intelligence AI Courses from Google

Deploy Amazon SageMaker pipelines using AWS Controllers for Kubernetes

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

Top Large Language Models LLMs Courses

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

MLOps Landscape in 2023: Top Tools and Platforms

From concept to reality: Navigating the Journey of RAG from proof of concept to production

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

Use IP-restricted presigned URLs to enhance security in Amazon SageMaker Ground Truth

A guide to Amazon Bedrock Model Distillation (preview)

How to Save Trained Model in Python

First ODSC Europe 2023 Sessions Announced

Use Amazon SageMaker Model Card sharing to improve model governance

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

How Did We Get to ML Model Reproducibility

Exploring Generative AI in conversational experiences: An Introduction with Amazon Lex, Langchain, and SageMaker Jumpstart

Logging PyMC and Arviz Artifacts on Neptune

Custom AI Solutions vs. Off-the-Shelf Products: Key Factors to Consider

MLflow: Simplifying Machine Learning Experimentation

Use Amazon SageMaker Model Cards sharing to improve model governance

Google experts on practical paths to data-centricity in applied AI

Google experts on practical paths to data-centricity in applied AI

Google experts on practical paths to data-centricity in applied AI

How to Build a CI/CD MLOps Pipeline [Case Study]

Learnings From Building the ML Platform at Mailchimp

Learnings From Building the ML Platform at Stitch Fix

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

How to Build an End-To-End ML Pipeline

Evaluate large language models for quality and responsibility

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

Definite Guide to Building a Machine Learning Platform

Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

Dive deep into vector data stores using Amazon Bedrock Knowledge Bases

Stay Connected