Computer Vision, Large Language Models and ML - Artificial Intelligence Zone

Computer Vision

Large Language Models

Mini-InternVL: A Series of Multimodal Large Language Models (MLLMs) 1B to 4B, Achieving 90% of the Performance with Only 5% of the Parameters

Marktechpost

OCTOBER 29, 2024

Multimodal large language models (MLLMs) rapidly evolve in artificial intelligence, integrating vision and language processing to enhance comprehension and interaction across diverse data types. Check out the Paper and Model Card on Hugging Face. Don’t Forget to join our 55k+ ML SubReddit.

Large Language Models

Large Language Models Artificial Intelligence Artificial Intelligence Data Analysis

Using Large Language Models on Amazon Bedrock for multi-step task execution

AWS Machine Learning Blog

APRIL 2, 2025

The goal of this blog post is to show you how a large language model (LLM) can be used to perform tasks that require multi-step dynamic reasoning and execution. Rushabh Lokhande is a Senior Data & ML Engineer with AWS Professional Services Analytics Practice.

Large Language Models

Large Language Models LLM Machine Learning Big Data

Join 15,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Trending Sources

Building a Multimodal Gradio Chatbot with Llama 3.2 Using the Ollama API

Flipboard

FEBRUARY 10, 2025

Along the way, youll gain insights into what Ollama is, where it stores models, and how it integrates seamlessly with Gradio for multimodal applications. Whether youre new to Gradio or looking to expand your machine learning (ML) toolkit, this guide will equip you to create versatile and impactful applications. Introducing Llama 3.2

Chatbots

Chatbots Computer Vision Deep Learning Large Language Models

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Exploring Parameter-Efficient Fine-Tuning Strategies for Large Language Models

Marktechpost

APRIL 30, 2024

Large Language Models (LLMs) signify a revolutionary leap in numerous application domains, facilitating impressive accomplishments in diverse tasks. Yet, their immense size incurs substantial computational expenses. With billions of parameters, these models demand extensive computational resources for operation.

Large Language Models

Large Language Models Categorization Algorithm Natural Language Processing

10 Best JavaScript Frameworks for Building AI Systems (October 2024)

Unite.AI

OCTOBER 27, 2024

The ecosystem has rapidly evolved to support everything from large language models (LLMs) to neural networks, making it easier than ever for developers to integrate AI capabilities into their applications. Key Features: Hardware-accelerated ML operations using WebGL and Node.js environments.

Neural Network

Neural Network Machine Learning NLP Natural Language Processing

A Survey Report on New Strategies to Mitigate Hallucination in Multimodal Large Language Models

Marktechpost

MAY 10, 2024

Multimodal large language models (MLLMs) represent a cutting-edge intersection of language processing and computer vision, tasked with understanding and generating responses that consider both text and imagery. Join our Telegram Channel , Discord Channel , and LinkedIn Gr oup.

Large Language Models

Large Language Models Computer Vision Data Quality ML

Shanghai AI Lab Presents HuixiangDou: A Domain-Specific Knowledge Assistant Powered by Large Language Models (LLM)

Marktechpost

JANUARY 31, 2024

Researchers from Shanghai AI Laboratory introduced HuixiangDou, a technical assistant based on Large Language Models (LLM), to tackle these issues, marking a significant breakthrough. HuixiangDou is designed for group chat scenarios in technical domains like computer vision and deep learning.

Large Language Models

Large Language Models LLM Computer Vision Deep Learning

Can a Language Model Revolutionize Radiology? Meet Radiology-Llama2: A Large Language Model Specialized For Radiology Through a Process Known as Instruction Tuning

Marktechpost

SEPTEMBER 17, 2023

Large language models (LLMs) built on transformers, including ChatGPT and GPT-4, have demonstrated amazing natural language processing abilities. The creation of transformer-based NLP models has sparked advancements in designing and using transformer-based models in computer vision and other modalities.

Large Language Models

Large Language Models Natural Language Processing BERT Computer Vision

Voxel51 Open-Sources VoxelGPT: An AI Assistant That Harnesses GPT-3.5’s Power to Generate Python Code for Computer Vision Dataset Analysis

Flipboard

JUNE 22, 2023

Voxel51, a prominent innovator in data-centric computer vision and machine learning software, has recently introduced a remarkable breakthrough in the field of computer vision with the launch of VoxelGPT. VoxelGPT offers several key capabilities that streamline computer vision workflows, saving time and resources: 1.

Computer Vision

Computer Vision Python Machine Learning AI Tools

Researchers from Microsoft and Georgia Tech Introduce VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Marktechpost

DECEMBER 27, 2023

In the evolving landscape of artificial intelligence and machine learning, the integration of visual perception with language processing has become a frontier of innovation. This integration is epitomized in the development of Multimodal Large Language Models (MLLMs), which have shown remarkable prowess in a range of vision-language tasks.

Large Language Models

Large Language Models Artificial Intelligence Artificial Intelligence Machine Learning

This AI Paper by NVIDIA Introduces NVLM 1.0: A Family of Multimodal Large Language Models with Improved Text and Image Processing Capabilities

Marktechpost

SEPTEMBER 20, 2024

Multimodal large language models (MLLMs) focus on creating artificial intelligence (AI) systems that can interpret textual and visual data seamlessly. The NVLM-H model, in particular, strikes a balance between image processing efficiency and multimodal reasoning accuracy, making it one of the most promising models in this field.

Large Language Models

Large Language Models Natural Language Processing Computer Vision AI

Microsoft AI Releases LLMLingua: A Unique Quick Compression Technique that Compresses Prompts for Accelerated Inference of Large Language Models (LLMs)

Marktechpost

DECEMBER 13, 2023

Large Language Models (LLMs), due to their strong generalization and reasoning powers, have significantly uplifted the Artificial Intelligence (AI) community. The method demonstrates excellent performance and offers a useful way to improve the effectiveness and affordability of LLM-based applications.

Large Language Models

Large Language Models Natural Language Processing LLM Computer Vision

MLPerf Inference v3.1 introduces new LLM and recommendation benchmarks

AI News

SEPTEMBER 12, 2023

It requires real engineering work and is a testament to our submitters’ commitment to AI, to their customers, and to ML.” MLPerf Inference is a critical benchmark suite that measures the speed at which AI systems can execute models in various deployment scenarios.

LLM

LLM Big Data Computer Vision AI Chatbots

Enhancing Visual Search with Aesthetic Alignment: A Reinforcement Learning Approach Using Large Language Models and Benchmark Evaluations

Marktechpost

JUNE 17, 2024

Computer vision focuses on enabling devices to interpret & understand visual information from the world. This involves various tasks such as image recognition, object detection, and visual search, where the goal is to develop models that can process and analyze visual data effectively. Check out the Paper.

Large Language Models

Large Language Models Computer Vision Data Quality Responsible AI

Microsoft Research Introduces Florence-2: A Novel Vision Foundation Model with a Unified Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks

Marktechpost

NOVEMBER 22, 2023

The popularity of NLP encourages a complementary strategy in computer vision. Unique obstacles arise from the necessity for broad perceptual capacities in universal representation for various vision-related activities. Their method achieves a universal representation and has wide-ranging use in many visual tasks.

Computer Vision

Computer Vision Natural Language Processing NLP Large Language Models

Beyond High-Level Features: Dense Connector Boosts Multimodal Large Language Models MLLMs with Multi-Layer Visual Integration

Marktechpost

MAY 29, 2024

Multimodal Large Language Models (MLLMs) represent an advanced field in artificial intelligence where models integrate visual and textual information to understand and generate responses. Join our Telegram Channel , Discord Channel , and LinkedIn Gr oup. If you like our work, you will love our newsletter.

Large Language Models

Large Language Models LLM Artificial Intelligence Artificial Intelligence

Cobra for Multimodal Language Learning: Efficient Multimodal Large Language Models (MLLM) with Linear Computational Complexity

Marktechpost

MARCH 24, 2024

Recent advancements in multimodal large language models (MLLM) have revolutionized various fields, leveraging the transformative capabilities of large-scale language models like ChatGPT. Join our Telegram Channel , Discord Channel , and LinkedIn Gr oup. If you like our work, you will love our newsletter.

Large Language Models

Large Language Models Natural Language Processing Robotics ChatGPT

Meet BLIVA: A Multimodal Large Language Model for Better Handling of Text-Rich Visual Questions

Marktechpost

SEPTEMBER 15, 2023

Recently, Large Language Models (LLMs) have played a crucial role in the field of natural language understanding, showcasing remarkable capabilities in generalizing across a wide range of tasks, including zero-shot and few-shot scenarios. All Credit For This Research Goes To the Researchers on This Project.

Large Language Models

Large Language Models LLM OpenAI AI Research

Cephalo: A Series of Open-Source Multimodal Vision Large Language Models (V-LLMs) Specifically in the Context of Bio-Inspired Design

Marktechpost

JUNE 23, 2024

Existing work includes isolated computer vision techniques for image classification and natural language processing for textual data analysis. The difficulty lies in extracting relevant information from images and correlating it with textual data, essential for advancing research and applications in this field.

Large Language Models

Large Language Models Computer Vision Natural Language Processing Data Analysis

How Do Large Language Models Perform in Long-Form Question Answering? A Deep Dive by Salesforce Researchers into LLM Robustness and Capabilities

Marktechpost

SEPTEMBER 23, 2023

While Large Language Models (LLMs) like ChatGPT and GPT-4 have demonstrated better performance across several benchmarks, open-source projects like MMLU and OpenLLMBoard have quickly progressed in catching up across multiple applications and benchmarks. All Credit For This Research Goes To the Researchers on This Project.

Large Language Models

Large Language Models LLM ChatGPT AI Researcher

Create a multimodal chatbot tailored to your unique dataset with Amazon Bedrock FMs

AWS Machine Learning Blog

OCTOBER 14, 2024

With recent advances in large language models (LLMs), a wide array of businesses are building new chatbot applications, either to help their external customers or to support internal teams. This script can be acquired directly from Amazon S3 using aws s3 cp s3://aws-blogs-artifacts-public/artifacts/ML-16363/deploy.sh.

Chatbots

Chatbots Computer Vision LLM Generative AI

SEED-Bench-2-Plus: An Extensive Benchmark Specifically Designed for Evaluating Multimodal Large Language Models (MLLMs) in Text-Rich Scenarios

Marktechpost

APRIL 29, 2024

Evaluating Multimodal Large Language Models (MLLMs) in text-rich scenarios is crucial, given their increasing versatility. MLLMs like GPT-4V, Gemini-Pro-Vision, and Claude-3-Opus showcase impressive capabilities but lack comprehensive evaluation in text-rich contexts. If you like our work, you will love our newsletter.

Large Language Models

Large Language Models ML Computer Vision AI

This AI Research Introduces TinyGPT-V: A Parameter-Efficient MLLMs (Multimodal Large Language Models) Tailored for a Range of Real-World Vision-Language Applications

Marktechpost

JANUARY 2, 2024

The development of multimodal large language models (MLLMs) represents a significant leap forward. These advanced systems, which integrate language and visual processing, have broad applications, from image captioning to visible question answering. If you like our work, you will love our newsletter.

Large Language Models

Large Language Models AI Researcher AI Research AI

FakeShield: An Explainable AI Framework for Universal Image Forgery Detection and Localization Using Multimodal Large Language Models

Marktechpost

OCTOBER 6, 2024

The rise of powerful image editing models has further blurred the line between real and fake content, posing risks such as misinformation and legal issues. Don’t Forget to join our 50k+ ML SubReddit Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Let’s collaborate!

Large Language Models

Large Language Models Explainability Explainable AI AI

CausalMM: A Causal Inference Framework that Applies Structural Causal Modeling to Multimodal Large Language Models (MLLMs)

Marktechpost

OCTOBER 12, 2024

Multimodal Large Language Models (MLLMs) have made significant progress in various applications using the power of Transformer models and their attention mechanisms. Researchers are focusing on addressing these biases without altering the model’s weights. If you like our work, you will love our newsletter.

Large Language Models

Large Language Models LLM ML Computer Vision

Apple Researchers Propose MAD-Bench Benchmark to Overcome Hallucinations and Deceptive Prompts in Multimodal Large Language Models

Marktechpost

FEBRUARY 29, 2024

Multimodal Large Language Models (MLLMs), having contributed to remarkable progress in AI, face challenges in accurately processing and responding to misleading information, leading to incorrect or hallucinated responses. Join our 38k+ ML SubReddit , 41k+ Facebook Community, Discord Channel , and LinkedIn Gr oup.

Large Language Models

Large Language Models Prompt Engineer Prompt Engineering ML

Pruner-Zero: A Machine Learning Framework for Symbolic Pruning Metric Discovery for Large Language Models (LLMs)

Marktechpost

JULY 1, 2024

A major challenge in computer vision and graphics is the ability to reconstruct 3D scenes from sparse 2D images. Don’t Forget to join our 45k+ ML SubReddit The post Pruner-Zero: A Machine Learning Framework for Symbolic Pruning Metric Discovery for Large Language Models (LLMs) appeared first on MarkTechPost.

Large Language Models

Large Language Models Machine Learning Neural Network Robotics

UCSD Researchers Open-Source Graphologue: A Unique AI Technique That Transforms Large Language Models Such As GPT-4 Responses Into Interactive Diagrams In Real-Time

Marktechpost

SEPTEMBER 23, 2023

Large Language Models (LLMs) have recently gained immense popularity due to their accessibility and remarkable ability to generate text responses for a wide range of user queries. More than a billion people have utilized LLMs like ChatGPT to get information and solutions to their problems.

Large Language Models

Large Language Models LLM ChatGPT AI

MathVerse: An All-Around Visual Math Benchmark Designed for an Equitable and In-Depth Evaluation of Multi-modal Large Language Models (MLLMs)

Marktechpost

MARCH 25, 2024

The performance of multimodal large Language Models (MLLMs) in visual situations has been exceptional, gaining unmatched attention. However, their ability to solve visual math problems must still be fully assessed and comprehended. Join our Telegram Channel , Discord Channel , and LinkedIn Gr oup.

Large Language Models

Large Language Models Artificial Intelligence Artificial Intelligence ML

VideoLLaMA 2 Released: A Set of Video Large Language Models Designed to Advance Multimodal Research in the Arena of Video-Language Modeling

Marktechpost

AUGUST 15, 2024

VideoLLaMA 2 retains the dual-branch architecture of its predecessor, with separate Vision-Language and Audio-Language branches that connect pre-trained visual and audio encoders to a large language model.

Large Language Models

Large Language Models LLM Data Integration ML

Tsinghua University Researchers Released the GLM-Edge Series: A Family of AI Models Ranging from 1.5B to 5B Parameters Designed Specifically for Edge Devices

Marktechpost

NOVEMBER 29, 2024

The language models are capable of carrying out complex dialogues with reduced latency, while the vision models support various computer vision tasks, such as object detection and image captioning, in real-time. Don’t Forget to join our 55k+ ML SubReddit.

AI Modeling

AI Modeling Large Language Models Conversational AI Computer Vision

Meet DualFocus: An Artificial Intelligence Framework for Integrating Macro and Micro Perspectives within Multi-Modal Large Language Models (MLLMs) to Enhance Vision-Language Task Performance

Marktechpost

MARCH 5, 2024

In recent years, the landscape of natural language processing (NLP) has been dramatically reshaped by the emergence of Large Language Models (LLMs). Spearheaded by pioneers like ChatGPT and GPT-4 from OpenAI, these models have demonstrated an unprecedented proficiency in understanding and generating human-like text.

Large Language Models

Large Language Models Artificial Intelligence Artificial Intelligence Natural Language Processing

Meet SPHINX-X: An Extensive Multimodality Large Language Model (MLLM) Series Developed Upon SPHINX

Marktechpost

FEBRUARY 21, 2024

The emergence of Multimodality Large Language Models (MLLMs), such as GPT-4 and Gemini, has sparked significant interest in combining language understanding with various modalities like vision. Join our 37k+ ML SubReddit , 41k+ Facebook Community, Discord Channel , and LinkedIn Gr oup.

Large Language Models

Large Language Models ML Computer Vision AI

Meet Waymo’s MotionLM: The State-of-the-Art Multi-Agent Motion Prediction Approach that can Make it Possible for Large Language Models (LLMs) to Help Drive Cars

Marktechpost

OCTOBER 9, 2023

Also, don’t forget to join our 31k+ ML SubReddit , 40k+ Facebook Community, Discord Channel , and Email Newsletter , where we share the latest AI research news, cool AI projects, and more. Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. If you like our work, you will love our newsletter.

Large Language Models

Large Language Models AI Researcher AI Research ML

XR-Objects: A New Open-Source Augmented Reality Prototype that Transforms Physical Objects into Interactive Digital Portals Using Real-Time Object Segmentation and Multimodal Large Language Models

Marktechpost

OCTOBER 5, 2024

However, despite the innumerable sensors, plethora of cameras, and expensive computer vision techniques, this integration poses a few critical questions. Don’t Forget to join our 50k+ ML SubReddit Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Let’s collaborate!

Large Language Models

Large Language Models Computer Vision Categorization Chatbots

Five machine learning types to know

IBM Journey to AI blog

DECEMBER 20, 2023

Machine learning (ML) technologies can drive decision-making in virtually all industries, from healthcare to human resources to finance and in myriad use cases, like computer vision , large language models (LLMs), speech recognition, self-driving cars and more. What is machine learning?

Machine Learning

Machine Learning Neural Network Algorithm Computer Vision

Apple AI Research Introduces MM1.5: A New Family of Highly Performant Generalist Multimodal Large Language Models (MLLMs)

Marktechpost

OCTOBER 4, 2024

Multimodal large language models (MLLMs) represent a cutting-edge area in artificial intelligence, combining diverse data modalities like text, images, and even video to build a unified understanding across domains. In conclusion, the MM1.5 is poised to address key challenges in multimodal AI. Let’s collaborate!

Large Language Models

Large Language Models AI Researcher AI Research AI

Meta AI Introduces MLGym: A New AI Framework and Benchmark for Advancing AI Research Agents

Marktechpost

FEBRUARY 23, 2025

This system, the first Gym environment for ML tasks, facilitates the study of RL techniques for training AI agents. The benchmark, MLGym-Bench, includes 13 open-ended tasks spanning computer vision, NLP, RL, and game theory, requiring real-world research skills. Check out the Paper and GitHub Page.

AI Researcher

AI Researcher AI Research Software Engineer AI

This AI Paper from Microsoft and Oxford Introduce Olympus: A Universal Task Router for Computer Vision Tasks

Marktechpost

DECEMBER 21, 2024

Computer vision models have made significant strides in solving individual tasks such as object detection, segmentation, and classification. Complex real-world applications such as autonomous vehicles, security and surveillance, and healthcare and medical Imaging require multiple vision tasks.

Computer Vision

Computer Vision Large Language Models Artificial Intelligence Artificial Intelligence

Max Planck Researchers Introduce PoseGPT: An Artificial Intelligence Framework Employing Large Language Models (LLMs) to Understand and Reason about 3D Human Poses from Images or Textual Descriptions

Marktechpost

DECEMBER 5, 2023

A team of researchers from Max Plank Institute for Intelligent Systems, ETH Zurich, Meshcapade, and Tsinghua University built a framework employing a Large Language Model called PoseGPT to understand and reason about 3D human poses from images or textual descriptions. If you like our work, you will love our newsletter.

Large Language Models

Large Language Models Artificial Intelligence Artificial Intelligence LLM

Researchers from China Introduce ControlLLM: An Artificial Intelligence Framework that Enables Large Language Models (LLMs) to Utilize Multi-Modal Tools for Solving Complex Real-World Task

Marktechpost

NOVEMBER 7, 2023

Also, don’t forget to join our 32k+ ML SubReddit , 40k+ Facebook Community, Discord Channel , and Email Newsletter , where we share the latest AI research news, cool AI projects, and more. ControlLLM integrates varied information sources to generate comprehensive and meaningful responses grounded in execution outcomes.

Large Language Models

Large Language Models Artificial Intelligence Artificial Intelligence LLM

Agentic AI: The Foundations Based on Perception Layer, Knowledge Representation and Memory Systems

Marktechpost

JANUARY 30, 2025

Contrastingly, agentic systems incorporate machine learning (ML) and artificial intelligence (AI) methodologies that allow them to adapt, learn from experience, and navigate uncertain environments. Embeddings like word2vec, GloVe , or contextual embeddings from large language models (e.g.,

Robotics

Robotics Convolutional Neural Networks Large Language Models AI

Learn Generative AI With Google

Unite.AI

JULY 11, 2023

What is Generative Artificial Intelligence, how it works, what its applications are, and how it differs from standard machine learning (ML) techniques. You’ll also learn about the Generative AI model types: unimodal or multimodal, in this course. In this course, you will discover diffusion models , their working, and implementation.

Generative AI

Generative AI BERT Natural Language Processing Large Language Models

Top Artificial Intelligence AI Courses from Google

Marktechpost

MAY 30, 2024

Introduction to AI and Machine Learning on Google Cloud This course introduces Google Cloud’s AI and ML offerings for predictive and generative projects, covering technologies, products, and tools across the data-to-AI lifecycle. It includes labs on feature engineering with BigQuery ML, Keras, and TensorFlow.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence BERT Computer Vision

Mini-InternVL: A Series of Multimodal Large Language Models (MLLMs) 1B to 4B, Achieving 90% of the Performance with Only 5% of the Parameters

Using Large Language Models on Amazon Bedrock for multi-step task execution

Webinars

Trending Sources

Building a Multimodal Gradio Chatbot with Llama 3.2 Using the Ollama API

Webinars

Exploring Parameter-Efficient Fine-Tuning Strategies for Large Language Models

10 Best JavaScript Frameworks for Building AI Systems (October 2024)

A Survey Report on New Strategies to Mitigate Hallucination in Multimodal Large Language Models

Shanghai AI Lab Presents HuixiangDou: A Domain-Specific Knowledge Assistant Powered by Large Language Models (LLM)

Can a Language Model Revolutionize Radiology? Meet Radiology-Llama2: A Large Language Model Specialized For Radiology Through a Process Known as Instruction Tuning

Voxel51 Open-Sources VoxelGPT: An AI Assistant That Harnesses GPT-3.5’s Power to Generate Python Code for Computer Vision Dataset Analysis

Researchers from Microsoft and Georgia Tech Introduce VCoder: Versatile Vision Encoders for Multimodal Large Language Models

This AI Paper by NVIDIA Introduces NVLM 1.0: A Family of Multimodal Large Language Models with Improved Text and Image Processing Capabilities

Microsoft AI Releases LLMLingua: A Unique Quick Compression Technique that Compresses Prompts for Accelerated Inference of Large Language Models (LLMs)

MLPerf Inference v3.1 introduces new LLM and recommendation benchmarks

Enhancing Visual Search with Aesthetic Alignment: A Reinforcement Learning Approach Using Large Language Models and Benchmark Evaluations

Microsoft Research Introduces Florence-2: A Novel Vision Foundation Model with a Unified Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks

Beyond High-Level Features: Dense Connector Boosts Multimodal Large Language Models MLLMs with Multi-Layer Visual Integration

Cobra for Multimodal Language Learning: Efficient Multimodal Large Language Models (MLLM) with Linear Computational Complexity

Meet BLIVA: A Multimodal Large Language Model for Better Handling of Text-Rich Visual Questions

Cephalo: A Series of Open-Source Multimodal Vision Large Language Models (V-LLMs) Specifically in the Context of Bio-Inspired Design

How Do Large Language Models Perform in Long-Form Question Answering? A Deep Dive by Salesforce Researchers into LLM Robustness and Capabilities

Create a multimodal chatbot tailored to your unique dataset with Amazon Bedrock FMs

SEED-Bench-2-Plus: An Extensive Benchmark Specifically Designed for Evaluating Multimodal Large Language Models (MLLMs) in Text-Rich Scenarios

This AI Research Introduces TinyGPT-V: A Parameter-Efficient MLLMs (Multimodal Large Language Models) Tailored for a Range of Real-World Vision-Language Applications

FakeShield: An Explainable AI Framework for Universal Image Forgery Detection and Localization Using Multimodal Large Language Models

CausalMM: A Causal Inference Framework that Applies Structural Causal Modeling to Multimodal Large Language Models (MLLMs)

Apple Researchers Propose MAD-Bench Benchmark to Overcome Hallucinations and Deceptive Prompts in Multimodal Large Language Models

Pruner-Zero: A Machine Learning Framework for Symbolic Pruning Metric Discovery for Large Language Models (LLMs)

UCSD Researchers Open-Source Graphologue: A Unique AI Technique That Transforms Large Language Models Such As GPT-4 Responses Into Interactive Diagrams In Real-Time

MathVerse: An All-Around Visual Math Benchmark Designed for an Equitable and In-Depth Evaluation of Multi-modal Large Language Models (MLLMs)

VideoLLaMA 2 Released: A Set of Video Large Language Models Designed to Advance Multimodal Research in the Arena of Video-Language Modeling

Tsinghua University Researchers Released the GLM-Edge Series: A Family of AI Models Ranging from 1.5B to 5B Parameters Designed Specifically for Edge Devices

Meet DualFocus: An Artificial Intelligence Framework for Integrating Macro and Micro Perspectives within Multi-Modal Large Language Models (MLLMs) to Enhance Vision-Language Task Performance

Meet SPHINX-X: An Extensive Multimodality Large Language Model (MLLM) Series Developed Upon SPHINX

Meet Waymo’s MotionLM: The State-of-the-Art Multi-Agent Motion Prediction Approach that can Make it Possible for Large Language Models (LLMs) to Help Drive Cars

XR-Objects: A New Open-Source Augmented Reality Prototype that Transforms Physical Objects into Interactive Digital Portals Using Real-Time Object Segmentation and Multimodal Large Language Models

Five machine learning types to know

Apple AI Research Introduces MM1.5: A New Family of Highly Performant Generalist Multimodal Large Language Models (MLLMs)

Meta AI Introduces MLGym: A New AI Framework and Benchmark for Advancing AI Research Agents

This AI Paper from Microsoft and Oxford Introduce Olympus: A Universal Task Router for Computer Vision Tasks

Max Planck Researchers Introduce PoseGPT: An Artificial Intelligence Framework Employing Large Language Models (LLMs) to Understand and Reason about 3D Human Poses from Images or Textual Descriptions

Researchers from China Introduce ControlLLM: An Artificial Intelligence Framework that Enables Large Language Models (LLMs) to Utilize Multi-Modal Tools for Solving Complex Real-World Task

Agentic AI: The Foundations Based on Perception Layer, Knowledge Representation and Memory Systems

Learn Generative AI With Google

Top Artificial Intelligence AI Courses from Google

Stay Connected