Automation, Data Ingestion and Data Science - Artificial Intelligence Zone

Automation

Data Ingestion

Data Science

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Rockets legacy data science environment challenges Rockets previous data science solution was built around Apache Spark and combined the use of a legacy version of the Hadoop environment and vendor-provided Data Science Experience development tools.

Data Science

Data Science Data Scientist Data Ingestion DevOps

Han Heloir, MongoDB: The role of scalable databases in AI-powered apps

AI News

SEPTEMBER 29, 2024

As AI models grow and data volumes expand, databases must scale horizontally, to allow organisations to add capacity without significant downtime or performance degradation. Additionally, they accelerate time-to-market for AI-driven innovations by enabling rapid data ingestion and retrieval, facilitating faster experimentation.

Big Data

Big Data Generative AI ETL Data Ingestion

Join 15,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

AI for Paralegals: Everything You Need to Know (and How to Use It Safely)

The Intersection of AI and Sales: Personalization Without Compromise

How to Achieve High-Accuracy Results When Using LLMs

Beyond the Buzz: How to Turn Marketing Trends into Revenue-Driving Strategies

MORE WEBINARS

Trending Sources

Automate the deployment of an Amazon Forecast time-series forecasting model

AWS Machine Learning Blog

MAY 4, 2023

You can implement this workflow in Forecast either from the AWS Management Console , the AWS Command Line Interface (AWS CLI), via API calls using Python notebooks , or via automation solutions. The console and AWS CLI methods are best suited for quick experimentation to check the feasibility of time series forecasting using your data.

Automation

Automation Metadata Data Ingestion Data Scientist

Webinars

AI for Paralegals: Everything You Need to Know (and How to Use It Safely)

The Intersection of AI and Sales: Personalization Without Compromise

How to Achieve High-Accuracy Results When Using LLMs

Beyond the Buzz: How to Turn Marketing Trends into Revenue-Driving Strategies

MORE WEBINARS

Foundational models at the edge

IBM Journey to AI blog

SEPTEMBER 20, 2023

These include data ingestion, data selection, data pre-processing, FM pre-training, model tuning to one or more downstream tasks, inference serving, and data and AI model governance and lifecycle management—all of which can be described as FMOps.

Large Language Models

Large Language Models DevOps Data Science AI Modeling

The Three Big Announcements by Databricks AI Team in June 2024

Marktechpost

JUNE 16, 2024

In June 2024, Databricks made three significant announcements that have garnered considerable attention in the data science and engineering communities. These announcements focus on enhancing user experience, optimizing data management, and streamlining data engineering workflows.

Data Ingestion

Data Ingestion Python Automation Data Scientist

Celebrating 40 years of Db2: Running the world’s mission critical workloads

IBM Journey to AI blog

SEPTEMBER 11, 2023

Forrester’s 2022 Total Economic Impact Report for Data Management highlights the impact Db2 and the IBM data management portfolio is having for customers: Return on investment (ROI) of 241% and payback <6 months. Both services offer independent compute and storage scaling, high availability, and automated DBA tasks.

Machine Learning

Machine Learning Data Ingestion Automation Data Scientist

NVIDIA Blackwell Powers Real-Time AI for Entertainment Workflows

NVIDIA

MARCH 19, 2025

AI has been shaping the media and entertainment industry for decades, from early recommendation engines to AI-driven editing and visual effects automation. Real-time AI which lets companies actively drive content creation, personalize viewing experiences and rapidly deliver data insights marks the next wave of that transformation.

A Comprehensive Overview of Data Engineering Pipeline Tools

Marktechpost

JUNE 13, 2024

Data scientists often spend up to 80% of their time on data engineering in data science projects. Objective of Data Engineering: The main goal is to transform raw data into structured data suitable for downstream tasks such as machine learning.

ETL

ETL Machine Learning Data Ingestion Big Data

Build a multi-interface AI assistant using Amazon Q and Slack with Amazon CloudFront clickable references from an Amazon S3 bucket

AWS Machine Learning Blog

FEBRUARY 5, 2025

Brian Weston, Cloud Transformation for Mission Science Program Lead at LLNL, advised in the development of this assistant. There is also an automated ingestion job from Slack conversation data to the S3 bucket powered by an AWS Lambda function. In addition, he builds and deploys AI/ML models on the AWS Cloud.

Data Ingestion

Data Ingestion AI Metadata AI

Best Practices for Data Lake Security

ODSC - Open Data Science

JUNE 22, 2023

There are considered three main layers of the process data lakes use to receive and store new data. Data ingestion is when new data is introduced and absorbed into the lake. The processing layer is when data is managed and sorted into its storage category.

Data Ingestion

Data Ingestion Data Science Automation AI

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 27, 2024

Axfood has a structure with multiple decentralized data science teams with different areas of responsibility. Together with a central data platform team, the data science teams bring innovation and digital transformation through AI and ML solutions to the organization.

Machine Learning

Machine Learning DevOps Data Scientist Data Quality

Build an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and the AWS CDK

AWS Machine Learning Blog

AUGUST 28, 2024

This post demonstrates how to seamlessly automate the deployment of an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and the AWS Cloud Development Kit (AWS CDK), enabling organizations to quickly set up a powerful question answering system. Choose Sync to initiate the data ingestion job.

Data Ingestion

Data Ingestion Natural Language Processing Machine Learning Generative AI

Data architecture strategy for data quality

IBM Journey to AI blog

JANUARY 5, 2023

The right data architecture can help your organization improve data quality because it provides the framework that determines how data is collected, transported, stored, secured, used and shared for business intelligence and data science use cases. As previously mentioned, a data fabric is one such architecture.

Data Quality

Data Quality Metadata Big Data ETL

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 2, 2024

This allows you to create rules that invoke specific actions when certain events occur, enhancing the automation and responsiveness of your observability setup (for more details, see Monitor Amazon Bedrock ). The job could be automated based on a ground truth, or you could use humans to bring in expertise on the matter.

Generative AI

Generative AI Data Ingestion AI AI

Deliver your first ML use case in 8–12 weeks

AWS Machine Learning Blog

APRIL 26, 2023

This includes AWS Identity and Access Management (IAM) or single sign-on (SSO) access, security guardrails, Amazon SageMaker Studio provisioning, automated stop/start to save costs, and Amazon Simple Storage Service (Amazon S3) set up. MLOps engineering – Focuses on automating the DevOps pipelines for operationalizing the ML use case.

ML Machine Learning Data Science Data Drift

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Core features of end-to-end MLOps platforms End-to-end MLOps platforms combine a wide range of essential capabilities and tools, which should include: Data management and preprocessing : Provide capabilities for data ingestion, storage, and preprocessing, allowing you to efficiently manage and prepare data for training and evaluation.

Machine Learning

Machine Learning Metadata Data Scientist Data Quality

Introducing the Amazon Comprehend flywheel for MLOps

AWS Machine Learning Blog

MARCH 1, 2023

MLOps focuses on the intersection of data science and data engineering in combination with existing DevOps practices to streamline model delivery across the ML development lifecycle. MLOps requires the integration of software development, operations, data engineering, and data science.

Data Ingestion

Data Ingestion DevOps ML Automation

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

AWS Machine Learning Blog

NOVEMBER 22, 2023

Rather than requiring your data science and IT teams to build and maintain AI models, you can use pre-trained AI services that can automate tasks for you. Additionally, the solution must handle high data volumes with low latency and high throughput. Amazon Textract Service Quota Calculator.

IDP

IDP ML Machine Learning Automation

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

Manager Data Science at Marubeni Power International. Amazon Athena to provide developers and business analysts SQL access to the generated data for analysis and troubleshooting. Amazon EventBridge to trigger the data ingestion and ML pipeline on a schedule and in response to events. He holds a Ph.D.

Machine Learning

Machine Learning Data Ingestion ML Data Science

Operationalizing Large Language Models: How LLMOps can help your LLM-based applications succeed

deepsense.ai

JULY 30, 2023

Preceded by data analysis and feature engineering, a model is trained and ready to be productionized. We may observe a growing awareness among machine learning and data science practitioners of the crucial role played by pre- and post-training activities. But what happens next? What is LLMOps? Not the best combination, right?

Large Language Models

Large Language Models LLM Machine Learning Automation

Building Scalable AI Pipelines with MLOps: A Guide for Software Engineers

ODSC - Open Data Science

OCTOBER 7, 2024

One of the key challenges in AI development is building scalable pipelines that can handle the complexities of modern data systems and models. These challenges range from managing large datasets to automating model deployment and monitoring for performance drift. As datasets grow, scalable data ingestion and storage become critical.

Software Engineer

Software Engineer Data Ingestion Machine Learning Data Scientist

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

AWS Machine Learning Blog

NOVEMBER 22, 2023

Codify Operations for Efficiency and Reproducibility By performing operations as code and incorporating automated deployment methodologies, organizations can achieve scalable, repeatable, and consistent processes. By centralizing datasets within the flywheel’s dedicated Amazon S3 data lake, you ensure efficient data management.

IDP

IDP Machine Learning Data Extraction ML

Vertex AI: Guide to Google’s Unified Machine Learning Platform

Pickl AI

AUGUST 28, 2024

Vertex AI combines data engineering, data science, and ML engineering into a single, cohesive environment, making it easier for data scientists and ML engineers to build, deploy, and manage ML models. This unified approach enables seamless collaboration among data scientists, data engineers, and ML engineers.

Machine Learning

Machine Learning ML Engineer ML Automation

What Do Data Scientists Do? A Guide to AI Maturity, Challenges, and Solutions

DataRobot Blog

SEPTEMBER 13, 2022

At this level, the data science team will be small or nonexistent. Businesses will then require more information-literate staff, but they’ll need to contend with an ongoing shortage of data scientists. These features reduce the need for a large workforce of data professionals. BARC ANALYST REPORT. Download Now.

Data Scientist

Data Scientist Automation ML Machine Learning

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Summary: This blog provides a comprehensive roadmap for aspiring Azure Data Scientists, outlining the essential skills, certifications, and steps to build a successful career in Data Science using Microsoft Azure. Integration: Seamlessly integrates with popular Data Science tools and frameworks, such as TensorFlow and PyTorch.

Data Scientist

Data Scientist Data Science Machine Learning Data Analysis

Navigating the Complex World of Financial Data Engineering

ODSC - Open Data Science

DECEMBER 11, 2024

This evolution underscores the demand for innovative platforms that simplify data ingestion and transformation, enabling faster, more reliable decision-making. Additionally, generative AI and agentic AI offer exciting possibilities for automated decision-making and trading.

Data Ingestion

Data Ingestion Large Language Models Data Scientist Data Science

Machine Learning Operations (MLOPs) with Azure Machine Learning

ODSC - Open Data Science

JULY 19, 2023

A well-implemented MLOps process not only expedites the transition from testing to production but also offers ownership, lineage, and historical data about ML artifacts used within the team. For the customer, this helps them reduce the time it takes to bootstrap a new data science project and get it to production.

Machine Learning

Machine Learning Data Drift Data Science Data Scientist

Introducing the Topic Tracks for ODSC East 2025: Spotlight on Gen AI, AI Agents, LLMs, & More

ODSC - Open Data Science

FEBRUARY 25, 2025

Generative AI TrackBuild the Future with GenAI Generative AI has captured the worlds attention with tools like ChatGPT, DALL-E, and Stable Diffusion revolutionizing how we create content and automate tasks. Data Engineering TrackBuild the Data Foundation forAI Data engineering powers every AI system.

Data Scientist

Data Scientist Machine Learning Large Language Models ML Engineer

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

AWS Machine Learning Blog

APRIL 12, 2023

As the lifeline of the airports, a BHS is a linear asset that can exceed 34,000 meters in length (for a single airport) handling over 70 million bags annually, making it one of the most complex automated systems and a vital component of airport operations. It’s an easy way to run analytics on IoT data to gain accurate insights.

ML Machine Learning Data Ingestion Automation

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

AWS Machine Learning Blog

FEBRUARY 7, 2025

Data science teams often face challenges when transitioning models from the development environment to production. Usually, there is one lead data scientist for a data science group in a business unit, such as marketing. ML Dev Account This is where data scientists perform their work.

ML Data Scientist ML Engineer Data Science

How to Integrate DataRobot and Apache Airflow for Orchestration and MLOps Workflows

DataRobot Blog

JUNE 16, 2022

There are multiple DataRobot operators and sensors that automate the DataRobot ML pipeline steps. DataRobot MLOps automates the capabilities to deploy, manage, monitor, and govern models in production environments. Multipersona Data Science and Machine Learning (DSML) Platforms. DataRobot Provider Modules. References. *

Python

Python ML Machine Learning Data Ingestion

Popular Data Transformation Tools: Importance and Best Practices

Pickl AI

OCTOBER 10, 2024

Summary: Data transformation tools streamline data processing by automating the conversion of raw data into usable formats. These tools enhance efficiency, improve data quality, and support Advanced Analytics like Machine Learning. These tools automate the process, making it faster and more accurate.

Data Quality

Data Quality ETL Machine Learning Business Intelligence

Differentiation: Microsoft Fabric vs Power BI

Pickl AI

DECEMBER 16, 2024

Data Factory : Simplifies the creation of ETL pipelines to integrate data from diverse sources. Data Activator : Automates workflows, making data-triggered actions possible. These components work together to ensure businesses can manage their data efficiently in one place.

ETL

ETL Data Ingestion Data Integration Machine Learning

Power recommendations and search using an IMDb knowledge graph – Part 3

AWS Machine Learning Blog

JANUARY 6, 2023

This mapping can be done by manually mapping frequent OOC queries to catalog content or can be automated using machine learning (ML). Creates two indexes for text ( ooc_text ) and kNN embedding search ( ooc_knn ) and bulk uploads data from the combined dataframe through the ingest_data_into_ops function.

Metadata

Metadata Machine Learning ML Data Scientist

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Bulk Data Load Data migration to Snowflake can be a challenge. The solution provides Snowpipe for extended data loading; however, sometimes, it’s not the best option. There can be alternatives that expedite and automate data flows. Therefore, quick data ingestion for instant use can be challenging.

Business Intelligence

Business Intelligence Data Ingestion Metadata Machine Learning

ETL Process Explained: Essential Steps for Effective Data Management

Pickl AI

OCTOBER 17, 2024

ETL also enhances data quality and consistency by performing necessary data cleansing and validation during the transformation stage. This ensures that the data loaded into the data warehouse is reliable and ready for analysis. Automate Processes Automation is key to achieving consistent and efficient ETL operations.

ETL

ETL Explainability Data Integration Data Extraction

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

As stated above, data pipelines represent the backbone of modern data architecture. These pipelines automate collecting, transforming, and delivering data, crucial for informed decision-making and operational efficiency across industries. Web Scraping: Automated extraction from websites using scripts or specialised tools.

Data Quality

Data Quality ETL Data Integration Automation

Forecast Time Series at Scale with Google BigQuery and DataRobot

DataRobot Blog

NOVEMBER 3, 2022

It takes something that’s hard to do but important to get right — forecasting — and supercharges data scientists. With automated feature engineering, automated model development, and more explainable forecasts, data scientists can build more models with more accuracy, speed, and confidence.

Data Scientist

Data Scientist Explainability Black Box AI Automation

Use of Elasticsearch: Implementation and Importance

Pickl AI

OCTOBER 22, 2024

It works well with data visualisation platforms like Kibana for analytics and reporting. Rich Ecosystem Elasticsearch is part of the larger Elastic Stack, which includes tools like Logstash for data ingestion and Kibana for data visualisation. Thus, it offers an end-to-end solution for data processing and analysis.

Data Analysis

Data Analysis Data Ingestion Business Intelligence Automation

Big Data as a Service (BDaaS): A Comprehensive Overview

Pickl AI

SEPTEMBER 11, 2024

Data as a Service (DaaS) DaaS allows organisations to access and integrate data from various sources without the need for complex data management. It provides APIs and data connectors to facilitate data ingestion, transformation, and delivery.

Big Data

Big Data Data Integration Machine Learning Data Ingestion

Unlocking the 12 Ways to Improve Data Quality

Pickl AI

OCTOBER 19, 2023

Data Governance Establish data governance policies to define roles, responsibilities, and data ownership within your organization. ETL (Extract, Transform, Load) Processes Enhance ETL processes to ensure data quality checks are performed during data ingestion.

Data Quality

Data Quality ETL Machine Learning Data Ingestion

Cassandra vs MongoDB

Pickl AI

SEPTEMBER 20, 2024

Flexible Data Model: Supports a wide variety of data formats and allows for dynamic schema changes. Fast Writes: Optimised for high write throughput, making it suitable for applications requiring rapid data ingestion. What is MongoDB? MongoDB is another leading NoSQL database that operates on a document-oriented model.

Data Ingestion

Data Ingestion Algorithm Automation Data Science

What Can AI Teach Us About Data Centers? Part 1: Overview and Technical Considerations

ODSC - Open Data Science

JULY 11, 2023

Management software and tools: These are used to monitor and manage the data center’s performance, including tools for monitoring server and storage usage, and for automating routine tasks such as backups and software updates. You can also get data science training on-demand wherever you are with our Ai+ Training platform.

ChatGPT

ChatGPT AI AI Chatbots

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

Heartbeat

NOVEMBER 6, 2023

Image Source — Pixel Production Inc In the previous article, you were introduced to the intricacies of data pipelines, including the two major types of existing data pipelines. You also learned how to build an Extract Transform Load (ETL) pipeline and discovered the automation capabilities of Apache Airflow for ETL pipelines.

ETL

ETL Python Metadata Deep Learning

How Rocket Companies modernized their data science solution on AWS

Han Heloir, MongoDB: The role of scalable databases in AI-powered apps

Webinars

Trending Sources

Automate the deployment of an Amazon Forecast time-series forecasting model

Webinars

Foundational models at the edge

The Three Big Announcements by Databricks AI Team in June 2024

Celebrating 40 years of Db2: Running the world’s mission critical workloads

NVIDIA Blackwell Powers Real-Time AI for Entertainment Workflows

A Comprehensive Overview of Data Engineering Pipeline Tools

Build a multi-interface AI assistant using Amazon Q and Slack with Amazon CloudFront clickable references from an Amazon S3 bucket

Best Practices for Data Lake Security

How Axfood enables accelerated machine learning throughout the organization using Amazon SageMaker

Build an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and the AWS CDK

Data architecture strategy for data quality

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

Deliver your first ML use case in 8–12 weeks

MLOps Landscape in 2023: Top Tools and Platforms

Introducing the Amazon Comprehend flywheel for MLOps

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Operationalizing Large Language Models: How LLMOps can help your LLM-based applications succeed

Building Scalable AI Pipelines with MLOps: A Guide for Software Engineers

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

Vertex AI: Guide to Google’s Unified Machine Learning Platform

What Do Data Scientists Do? A Guide to AI Maturity, Challenges, and Solutions

Your Complete Roadmap to Become an Azure Data Scientist

Navigating the Complex World of Financial Data Engineering

Machine Learning Operations (MLOPs) with Azure Machine Learning

Introducing the Topic Tracks for ODSC East 2025: Spotlight on Gen AI, AI Agents, LLMs, & More

Deploy a predictive maintenance solution for airport baggage handling systems with Amazon Lookout for Equipment

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

How to Integrate DataRobot and Apache Airflow for Orchestration and MLOps Workflows

Popular Data Transformation Tools: Importance and Best Practices

Differentiation: Microsoft Fabric vs Power BI

Power recommendations and search using an IMDb knowledge graph – Part 3

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

ETL Process Explained: Essential Steps for Effective Data Management

Build Data Pipelines: Comprehensive Step-by-Step Guide

Forecast Time Series at Scale with Google BigQuery and DataRobot

Use of Elasticsearch: Implementation and Importance

Big Data as a Service (BDaaS): A Comprehensive Overview

Unlocking the 12 Ways to Improve Data Quality

Cassandra vs MongoDB

What Can AI Teach Us About Data Centers? Part 1: Overview and Technical Considerations

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

Stay Connected