Data Science, ETL and ML - Artificial Intelligence Zone

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Rockets legacy data science environment challenges Rockets previous data science solution was built around Apache Spark and combined the use of a legacy version of the Hadoop environment and vendor-provided Data Science Experience development tools.

Data Science

Data Science Data Scientist Data Ingestion DevOps

30% Off ODSC East, Fan-Favorite Speakers, Foundation Models for Times Series, and ETL Pipeline…

ODSC - Open Data Science

MARCH 20, 2025

30% Off ODSC East, Fan-Favorite Speakers, Foundation Models for Times Series, and ETL Pipeline Orchestration The ODSC East 2025 Schedule isLIVE! Explore the must-attend sessions and cutting-edge tracks designed to equip AI practitioners, data scientists, and engineers with the latest advancements in AI and machine learning.

ETL

ETL Prompt Engineer Prompt Engineering Data Science

The Rise and Fall of Data Science Trends: A 2018–2024 Conference Perspective

ODSC - Open Data Science

MARCH 12, 2025

The field of data science has evolved dramatically over the past several years, driven by technological breakthroughs, industry demands, and shifting priorities within the community. 20212024: Interest declined as deep learning and pre-trained models took over, automating many tasks previously handled by classical ML techniques.

Data Science

Data Science ETL Machine Learning AI Engineer

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Introduction to ETL Pipelines for Data Scientists

Towards AI

JULY 1, 2024

Learn the basics of data engineering to improve your ML modelsPhoto by Mike Benna on Unsplash It is not news that developing Machine Learning algorithms requires data, often a lot of data. Collecting this data is not trivial, in fact, it is one of the most relevant and difficult parts of the entire workflow.

ETL

ETL Data Scientist Data Science LLM

A beginner tale of Data Science

Becoming Human

JANUARY 23, 2023

Data Science You heard this term most of the time all over the internet, as well this is the most concerning topic for newbies who want to enter the world of data but don’t know the actual meaning of it. I’m not saying those are incorrect or wrong even though every article has its mindset behind the term ‘ Data Science ’.

Data Science

Data Science Big Data Data Mining Deep Learning

Modernizing data science lifecycle management with AWS and Wipro

AWS Machine Learning Blog

JANUARY 5, 2024

This post was written in collaboration with Bhajandeep Singh and Ajay Vishwakarma from Wipro’s AWS AI/ML Practice. Many organizations have been using a combination of on-premises and open source data science solutions to create and manage machine learning (ML) models.

Data Science

Data Science Data Drift DevOps Auto-complete

What is Data Quality in Machine Learning?

Analytics Vidhya

JANUARY 20, 2023

Introduction Machine learning has become an essential tool for organizations of all sizes to gain insights and make data-driven decisions. However, the success of ML projects is heavily dependent on the quality of data used to train models. Poor data quality can lead to inaccurate predictions and poor model performance.

Data Quality

Data Quality Machine Learning ML ETL

A Comprehensive Overview of Data Engineering Pipeline Tools

Marktechpost

JUNE 13, 2024

Introduction to Data Engineering Data Engineering Challenges: Data engineering involves obtaining, organizing, understanding, extracting, and formatting data for analysis, a tedious and time-consuming task. Data scientists often spend up to 80% of their time on data engineering in data science projects.

ETL

ETL Machine Learning Data Ingestion Big Data

Twilio Segment: Transforming customer experiences with AI

AI News

SEPTEMBER 26, 2023

AI and machine learning (ML) models are incredibly effective at doing this but are complex to build and require data science expertise. HT: Today’s marketers need to not only understand past customer behaviour but must be able to anticipate and act on customers’ future wants and needs.

Big Data

Big Data AI AI ETL

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

Flipboard

MARCH 7, 2023

This post presents a solution that uses a workflow and AWS AI and machine learning (ML) services to provide actionable insights based on those transcripts. We use multiple AWS AI/ML services, such as Contact Lens for Amazon Connect and Amazon SageMaker , and utilize a combined architecture. Validation set 11 1500 0.82

ML

ML Deep Learning Algorithm Categorization

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Summary: This article explores the significance of ETL Data in Data Management. It highlights key components of the ETL process, best practices for efficiency, and future trends like AI integration and real-time processing, ensuring organisations can leverage their data effectively for strategic decision-making.

ETL

ETL Data Quality Data Integration Data Extraction

ETL vs. ELT for Data Science

Mlearning.ai

FEBRUARY 26, 2023

Which Approach is Right for Your Business? Continue reading on MLearning.ai »

ETL

ETL Data Science ML Artificial Intelligence

Top AI/Machine Learning/Data Science Courses from Udacity

Marktechpost

JULY 5, 2024

Programming for Data Science with Python This course series teaches essential programming skills for data analysis, including SQL fundamentals for querying databases and Unix shell basics. Students also learn Python programming, from fundamentals to data manipulation with NumPy and Pandas, along with version control using Git.

Data Science

Data Science Machine Learning Data Analysis Software Engineer

ETL Pipelines With Python Azure Functions

Mlearning.ai

JULY 8, 2023

In this article we’re going to check what is an Azure function and how we can employ it to create a basic extract, transform and load (ETL) pipeline with minimal code. Extract, transform and Load Before we begin, let’s shed some light on what an ETL pipeline essentially is. ELT stands for extract, load and transform.

ETL

ETL Python ML Software Development

Jay Mishra, COO of Astera Software – Interview Series

Unite.AI

SEPTEMBER 22, 2023

Automation has been a key trend in the past few years and that ranges from the design to building of a data warehouse to loading and maintaining, all of that can be automated. Speed Varying data formats Data publishing What are some ways that Astera has integrated AI into customer workflow?

Large Language Models

Large Language Models Automation Artificial Intelligence Artificial Intelligence

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

Statistical methods and machine learning (ML) methods are actively developed and adopted to maximize the LTV. In this post, we share how Kakao Games and the Amazon Machine Learning Solutions Lab teamed up to build a scalable and reliable LTV prediction solution by using AWS data and ML services such as AWS Glue and Amazon SageMaker.

Automation

Automation ETL Data Drift ML

The 2021 Executive Guide To Data Science and AI

Applied Data Science

AUGUST 2, 2021

This post is a bitesize walk-through of the 2021 Executive Guide to Data Science and AI — a white paper packed with up-to-date advice for any CIO or CDO looking to deliver real value through data. Team Building the right data science team is complex. Download the free, unabridged version here.

Data Science

Data Science Data Scientist Machine Learning Automation

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. Watsonx.data allows customers to augment data warehouses such as Db2 Warehouse and Netezza and optimize workloads for performance and cost. IBM watsonx.ai

Machine Learning

Machine Learning Metadata Automation AI

Build an Amazon SageMaker Model Registry approval and promotion workflow with human intervention

AWS Machine Learning Blog

JANUARY 10, 2024

Specialist Data Engineering at Merck, and Prabakaran Mathaiyan, Sr. ML Engineer at Tiger Analytics. The large machine learning (ML) model development lifecycle requires a scalable model release process similar to that of software development. The input to the training pipeline is the features dataset.

ML

ML Machine Learning Data Scientist ETL

Monitor embedding drift for LLMs deployed from Amazon SageMaker JumpStart

AWS Machine Learning Blog

FEBRUARY 2, 2024

The embeddings are captured in Amazon Simple Storage Service (Amazon S3) via Amazon Kinesis Data Firehose , and we run a combination of AWS Glue extract, transform, and load (ETL) jobs and Jupyter notebooks to perform the embedding analysis. Set the parameters for the ETL job as follows and run the job: Set --job_type to BASELINE.

ETL

ETL DevOps LLM Generative AI

5 Reasons Why SQL is Still the Most Accessible Language for New Data Scientists

ODSC - Open Data Science

APRIL 6, 2023

For budding data scientists and data analysts, there are mountains of information about why you should learn R over Python and the other way around. Though both are great to learn, what gets left out of the conversation is a simple yet powerful programming language that everyone in the data science world can agree on, SQL.

Data Scientist

Data Scientist Data Science Data Analysis Python

Software Engineering Patterns for Machine Learning

The MLOps Blog

SEPTEMBER 7, 2023

This situation is not different in the ML world. Data Scientists and ML Engineers typically write lots and lots of code. Building a mental model for ETL components Learn the art of constructing a mental representation of the components within an ETL process.

Software Engineer

Software Engineer Machine Learning ETL Data Scientist

A Beginner’s Guide to Data Warehousing

Unite.AI

DECEMBER 5, 2023

In BI systems, data warehousing first converts disparate raw data into clean, organized, and integrated data, which is then used to extract actionable insights to facilitate analysis, reporting, and data-informed decision-making. They can contain structured, unstructured, or semi-structured data.

Metadata

Metadata Big Data ETL Data Mining

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

ML operationalization summary As defined in the post MLOps foundation roadmap for enterprises with Amazon SageMaker , ML and operations (MLOps) is the combination of people, processes, and technology to productionize machine learning (ML) solutions efficiently.

Generative AI

Generative AI Prompt Engineer Prompt Engineering ML

Deployment of Data and ML Pipelines for the Most Chaotic Industry: The Stirred Rivers of Crypto

The MLOps Blog

DECEMBER 7, 2022

And we at deployr , worked alongside them to find the best possible answers for everyone involved and build their Data and ML Pipelines. Building data and ML pipelines: from the ground to the cloud It was the beginning of 2022, and things were looking bright after the lockdown’s end.

ML

ML ETL Data Scientist Automation

Learnings From Building the ML Platform at Stitch Fix

The MLOps Blog

AUGUST 3, 2023

This article was originally an episode of the ML Platform Podcast , a show where Piotr Niedźwiedź and Aurimas Griciūnas, together with ML platform professionals, discuss design choices, best practices, example tool stacks, and real-world learnings from some of the best ML platform professionals. Stefan: Yeah.

ML

ML Data Scientist Software Engineer Machine Learning

Top Data Analytics Skills and Platforms for 2023

ODSC - Open Data Science

APRIL 3, 2023

As the sibling of data science, data analytics is still a hot field that garners significant interest. Companies have plenty of data at their disposal and are looking for people who can make sense of it and make deductions quickly and efficiently.

Data Science

Data Science Big Data ETL Deep Learning

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

Dolt LakeFS Delta Lake Pachyderm Git-like versioning Database tool Data lake Data pipelines Experiment tracking Integration with cloud platforms Integrations with ML tools Examples of data version control tools in ML DVC Data Version Control DVC is a version control system for data and machine learning teams.

ML

ML Machine Learning Metadata Data Scientist

Top Predictive Analytics Tools/Platforms (2023)

Marktechpost

JULY 17, 2023

IBM merged the critical capabilities of the vendor into its more contemporary Watson Studio running on the IBM Cloud Pak for Data platform as it continues to innovate. The platform makes collaborative data science better for corporate users and simplifies predictive analytics for professional data scientists.

Machine Learning

Machine Learning Data Mining Data Scientist Data Science

Azure service cloud summarized: Part I

Mlearning.ai

APRIL 24, 2023

Over the past few years Data Science has MIGRATED from individual computers to service cloud platforms. I just finished learning Azure’s service cloud platform using Coursera and the Microsoft Learning Path for Data Science. Be sure to create an Environment for the ML workspace.

DevOps

DevOps ETL Python Machine Learning

Working as a Data Scientist?—?expectation versus reality!

Mlearning.ai

FEBRUARY 9, 2023

11 key differences in 2023 Photo by Jan Tinneberg on Unsplash Working in Data Science and Machine Learning (ML) professions can be a lot different from the expectation of it. A popular focus of a majority of Data Science courses, degrees, and online competitions is on creating a model that has the highest accuracy or best fit.

Data Scientist

Data Scientist Data Science ML Machine Learning

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

AWS Machine Learning Blog

NOVEMBER 29, 2023

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. Amazon SageMaker notebook jobs allow data scientists to run their notebooks on demand or on a schedule with a few clicks in SageMaker Studio.

Data Drift

Data Drift BERT Data Scientist Python

Ground truth generation and review best practices for evaluating generative AI question-answering with FMEval

AWS Machine Learning Blog

MARCH 5, 2025

About the authors Samantha Stuart is a Data Scientist with AWS Professional Services, and has delivered for customers across generative AI, MLOps, and ETL engagements. He has touched on most aspects of these projects, from infrastructure and DevOps to software development and AI/ML.

Generative AI

Generative AI LLM AI AI

Google improves upon NIMA(Neural Image Assessment) through MUSIQ

Bugra Akyildiz

NOVEMBER 20, 2022

Microsoft introduces a new unit AIOps detailing in the following post : Cloud Intelligence/AIOps (“AIOps” for brevity) aims to innovate AI/ML technologies to help design, build, and operate complex cloud platforms and services at scale—effectively and efficiently.

ML

ML Data Science ETL DevOps

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to collect and prepare data for machine learning (ML) from weeks to minutes. Account A is the data lake account that houses all the ML-ready data obtained through extract, transform, and load (ETL) processes. Let’s go back to our data flow.

Data Scientist

Data Scientist Machine Learning ML Big Data

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

ODSC - Open Data Science

OCTOBER 9, 2024

Confirmed sessions related to software engineering include: Building Data Contracts with Open-Source Tools Chronon — Open Source Data Platform for AI/ML Creating APIs That Data Scientists Will Love with FastAPI, SQLAlchemy, and Pydantic Using APIs in Data Science Without Breaking Anything Don’t Go Over the Deep End: Building an Effective OSS Management (..)

Software Engineer

Software Engineer Software Development DevOps Machine Learning

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. In the process of working on their ML tasks, data scientists typically start their workflow by discovering relevant data sources and connecting to them.

Data Scientist

Data Scientist Generative AI Machine Learning ML

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

Jupyter notebooks have been one of the most controversial tools in the data science community. Nevertheless, many data scientists will agree that they can be really valuable – if used well. I’ll show you best practices for using Jupyter Notebooks for exploratory data analysis. Aside neptune.ai

Data Scientist

Data Scientist Python Explainability ETL

Top Data Analytics Trends Shaping 2025

Pickl AI

DECEMBER 10, 2024

A unified data fabric also enhances data security by enabling centralised governance and compliance management across all platforms. Automated Data Integration and ETL Tools The rise of no-code and low-code tools is transforming data integration and Extract, Transform, and Load (ETL) processes.

Explainable AI

Explainable AI Explainability ETL Automation

Beginner’s Guide To GCP BigQuery (Part 1)

Mlearning.ai

JULY 10, 2023

In my 7 years of Data Science journey, I’ve been exposed to a number of different databases including but not limited to Oracle Database, MS SQL, MySQL, EDW, and Apache Hadoop. A lot of you who are already in the data science field must be familiar with BigQuery and its advantages.

Data Science

Data Science Big Data ETL Automation

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

Heartbeat

NOVEMBER 6, 2023

Image Source — Pixel Production Inc In the previous article, you were introduced to the intricacies of data pipelines, including the two major types of existing data pipelines. You also learned how to build an Extract Transform Load (ETL) pipeline and discovered the automation capabilities of Apache Airflow for ETL pipelines.

ETL

ETL Python Metadata Deep Learning

A brief history of Data Engineering: From IDS to Real-Time streaming

Artificial Corner

JUNE 6, 2023

It provided a platform for big data processing and machine learning, simplifying the process of building and deploying data pipelines. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It helps data engineering teams by simplifying ETL development and management.

Data Mining

Data Mining Big Data ETL Machine Learning

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Tools such as Python’s Pandas library, Apache Spark, or specialised data cleaning software streamline these processes, ensuring data integrity before further transformation. Step 3: Data Transformation Data transformation focuses on converting cleaned data into a format suitable for analysis and storage.

Data Quality

Data Quality ETL Data Integration Automation

How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker

AWS Machine Learning Blog

JANUARY 20, 2023

As the company continues to evolve to integrate AI into its existing and new product catalog, this requires sophisticated approaches to train and deploy multi-modal machine learning (ML) ensemble models for solving complex business needs. Daniel Suarez is a Data Science Engineer at CCC Intelligent Solutions.

AI Modeling

AI Modeling Computer Vision AI AI

How Rocket Companies modernized their data science solution on AWS

30% Off ODSC East, Fan-Favorite Speakers, Foundation Models for Times Series, and ETL Pipeline…

Webinars

Trending Sources

The Rise and Fall of Data Science Trends: A 2018–2024 Conference Perspective

Webinars

Introduction to ETL Pipelines for Data Scientists

A beginner tale of Data Science

Modernizing data science lifecycle management with AWS and Wipro

What is Data Quality in Machine Learning?

A Comprehensive Overview of Data Engineering Pipeline Tools

Twilio Segment: Transforming customer experiences with AI

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

Maximising Efficiency with ETL Data: Future Trends and Best Practices

ETL vs. ELT for Data Science

Top AI/Machine Learning/Data Science Courses from Udacity

ETL Pipelines With Python Azure Functions

Jay Mishra, COO of Astera Software – Interview Series

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

The 2021 Executive Guide To Data Science and AI

Exploring the AI and data capabilities of watsonx

Build an Amazon SageMaker Model Registry approval and promotion workflow with human intervention

Monitor embedding drift for LLMs deployed from Amazon SageMaker JumpStart

5 Reasons Why SQL is Still the Most Accessible Language for New Data Scientists

Software Engineering Patterns for Machine Learning

A Beginner’s Guide to Data Warehousing

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Deployment of Data and ML Pipelines for the Most Chaotic Industry: The Stirred Rivers of Crypto

Learnings From Building the ML Platform at Stitch Fix

Top Data Analytics Skills and Platforms for 2023

How to Version Control Data in ML for Various Data Sources

Top Predictive Analytics Tools/Platforms (2023)

Azure service cloud summarized: Part I

Working as a Data Scientist?—?expectation versus reality!

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

Ground truth generation and review best practices for evaluating generative AI question-answering with FMEval

Google improves upon NIMA(Neural Image Assessment) through MUSIQ

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

How to Use Exploratory Notebooks [Best Practices]

Top Data Analytics Trends Shaping 2025

Beginner’s Guide To GCP BigQuery (Part 1)

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

A brief history of Data Engineering: From IDS to Real-Time streaming

Build Data Pipelines: Comprehensive Step-by-Step Guide

­­How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker

Stay Connected

How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker