Inference Engine, LLM and Webinar - Artificial Intelligence Zone

NVIDIA Dynamo: Scaling AI inference with open-source efficiency

AI News

MARCH 19, 2025

Dynamo can also offload inference data to more cost-effective memory and storage devices while retrieving it rapidly when required, thereby minimising overall inference costs. Together AI , a prominent player in the AI Acceleration Cloud space, is also looking to integrate its proprietary Together Inference Engine with NVIDIA Dynamo.

Big Data

Big Data AI AI Inference Engine

Revolutionizing Fine-Tuned Small Language Model Deployments: Introducing Predibase’s Next-Gen Inference Engine

Marktechpost

OCTOBER 15, 2024

Predibase announces the Predibase Inference Engine , their new infrastructure offering designed to be the best platform for serving fine-tuned small language models (SLMs). The Predibase Inference Engine addresses these challenges head-on, offering a tailor-made solution for enterprise AI deployments.

Inference Engine

Inference Engine LLM AI AI

IoT-LLM: An AI Framework that Integrates IoT Sensor Data with LLMs to Enhance their Perception and Reasoning Abilities in the Physical World

Marktechpost

OCTOBER 17, 2024

MARS Lab, NTU has devised an innovative IoT-LLM framework that combats the limitations of the LLM in handling real-world tasks. For example, in traditional LLMs like Chat-GPT 4, only 40% accuracy in activity recognition and 50% in machine diagnosis are achieved after processing the raw IoT data.

LLM

LLM Inference Engine Large Language Models Machine Learning

Webinars

How to Achieve High-Accuracy Results When Using LLMs

Relevance, Reach, Revenue: How to Turn Marketing Trends From Hype to High-Impact

MORE WEBINARS

Layer-of-Thoughts Prompting (LoT): A Unique Approach that Uses Large Language Model (LLM) based Retrieval with Constraint Hierarchies

Marktechpost

OCTOBER 23, 2024

Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted) The post Layer-of-Thoughts Prompting (LoT): A Unique Approach that Uses Large Language Model (LLM) based Retrieval with Constraint Hierarchies appeared first on MarkTechPost.

Large Language Models

Large Language Models LLM Inference Engine Algorithm

Google AI Researchers Propose ‘MODEL SWARMS’: A Collaborative Search Algorithm to Flexibly Adapt Diverse LLM Experts to Wide-Ranging Purposes

Marktechpost

OCTOBER 17, 2024

Researchers from Google Cloud AI, Google DeepMind, and the University of Washington have proposed a new approach called MODEL SWARMS , which utilizes swarm intelligence to adapt LLMs through collaborative search in the weight space. The research demonstrated superior results for both single-task adaptation and multi-task domains.

LLM

LLM Algorithm AI Research AI Researcher

Stanford Researchers Propose LoLCATS: A Cutting Edge AI Method for Efficient LLM Linearization

Marktechpost

OCTOBER 14, 2024

Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted) The post Stanford Researchers Propose LoLCATS: A Cutting Edge AI Method for Efficient LLM Linearization appeared first on MarkTechPost. If you like our work, you will love our newsletter.

LLM

LLM Large Language Models Inference Engine AI

Assessing the Vulnerabilities of LLM Agents: The AgentHarm Benchmark for Robustness Against Jailbreak Attacks

Marktechpost

OCTOBER 17, 2024

Research on the robustness of LLMs to jailbreak attacks has mostly focused on chatbot applications, where users manipulate prompts to bypass safety measures. However, LLM agents, which utilize external tools and perform multi-step tasks, pose a greater misuse risk, especially in malicious contexts like ordering illegal materials.

LLM

LLM Inference Engine Chatbots ML

Google AI Research Introduces Process Advantage Verifiers: A Novel Machine Learning Approach to Improving LLM Reasoning Capabilities

Marktechpost

OCTOBER 15, 2024

The key innovation in PAVs is using a “prover policy,” distinct from the base policy that the LLM is following. This enables the LLM to explore a wider range of potential solutions, even when early steps do not immediately lead to a correct solution. Check out the Paper. If you like our work, you will love our newsletter.

Machine Learning

Machine Learning LLM AI Research AI Researcher

Google AI Researchers Introduced a Set of New Methods for Enhancing Long-Context LLM Performance in Retrieval-Augmented Generation

Marktechpost

OCTOBER 16, 2024

Specifically, while LLMs are becoming capable of handling longer input sequences, the increase in retrieved information can overwhelm the system. The challenge lies in making sure that the additional context improves the accuracy of the LLM’s outputs rather than confusing the model with irrelevant information.

LLM

LLM AI Research AI Researcher Inference Engine

SeedLM: A Post-Training Compression Method that Uses Pseudo-Random Generators to Efficiently Encode and Compress LLM Weights

Marktechpost

OCTOBER 15, 2024

The key problem, therefore, is how to effectively compress LLM weights without sacrificing accuracy or requiring calibration data. Researchers from Apple and Meta AI introduce SeedLM, a novel approach that aims to overcome the challenges associated with the deployment of large-scale LLMs by providing a data-free compression method.

LLM

LLM Natural Language Processing Inference Engine Large Language Models

AFlow: A Novel Artificial Intelligence Framework for Automated Workflow Optimization

Marktechpost

OCTOBER 15, 2024

These workflows are modeled as graphs where nodes represent LLM-invoking actions, and edges represent the dependencies between these actions. The key to AFlow’s efficiency lies in its use of nodes and edges to represent workflows, allowing it to model complex relationships between LLM actions.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Automation Inference Engine

Graph-Constrained Reasoning (GCR): A Novel AI Framework that Bridges Structured Knowledge in Knowledge Graphs with Unstructured Reasoning in LLMs

Marktechpost

OCTOBER 18, 2024

Large language models (LLMs) have demonstrated significant reasoning capabilities, yet they face issues like hallucinations and the inability to conduct faithful reasoning. GCR introduces a trie-based index named KG-Trie to integrate KG structures directly into the LLM decoding process. Don’t Forget to join our 50k+ ML SubReddit.

LLM

LLM Inference Engine Large Language Models AI

SPARE: Training-Free Representation Engineering for Managing Knowledge Conflicts in Large Language Models

Marktechpost

OCTOBER 27, 2024

LLMs prefer contextual knowledge over their parametric knowledge, but during conflicts, existing solutions that need additional model interactions result in high latency times, making them impractical for real-world applications. Representation engineering emerged as a higher-level framework for understanding LLM behavior at scale.

Large Language Models

Large Language Models LLM Inference Engine ML

WorFBench: A Benchmark for Evaluating Complex Workflow Generation in Large Language Model Agents

Marktechpost

OCTOBER 26, 2024

Large Language Models (LLMs) have shown remarkable potential in solving complex real-world problems, from function calls to embodied planning and code generation. Researchers from Zhejiang University and Alibaba Group have proposed WORFBENCH, a benchmark for evaluating workflow generation capabilities in LLM agents.

Large Language Models

Large Language Models LLM Inference Engine Algorithm

Katanemo Open Sources Arch-Function: A Set of Large Language Models (LLMs) Promising Ultra-Fast Speeds at Function-Calling Tasks for Agentic Workflows

Marktechpost

OCTOBER 17, 2024

Katanemo’s Arch-Function transforms workflow automation by simplifying LLM deployment and reducing engineering overhead, making it accessible even for smaller enterprises. Arch-Function is optimized for speed and precision, completing tasks in minutes that previously took hours while effectively adapting to dynamic requirements.

Large Language Models

Large Language Models Inference Engine Automation Data Scientist

Salesforce AI Research Propose Programmatic VLM Evaluation (PROVE): A New Benchmarking Paradigm for Evaluating VLM Responses to Open-Ended Queries

Marktechpost

OCTOBER 24, 2024

In PROVE, researchers use a high-fidelity scene graph representation constructed from hyper-detailed image captions and employ a large language model (LLM) to generate diverse question-answer (QA) pairs along with executable programs to verify each QA pair. This approach allows the creation of a benchmark dataset of 10.5k

AI Research

AI Research AI Researcher Inference Engine Large Language Models

Self-Data Distilled Fine-Tuning: A Solution for Pruning and Supervised Fine-tuning Challenges in LLMs

Marktechpost

OCTOBER 19, 2024

Large language models (LLMs) like GPT-4, Gemini, and Llama 3 have revolutionized natural language processing through extensive pre-training and supervised fine-tuning (SFT). However, these models come with high computational costs for training and inference. Check out the Paper. If you like our work, you will love our newsletter.

Large Language Models

Large Language Models Natural Language Processing Inference Engine LLM

Meta AI Releases LayerSkip: A Novel AI Approach to Accelerate Inference in Large Language Models (LLMs)

Marktechpost

OCTOBER 21, 2024

By combining layer dropout, early exit loss, and self-speculative decoding, the researchers have proposed a novel approach that not only speeds up inference but also reduces memory requirements, making it feasible for large models to be deployed on commodity hardware. Check out the Paper , Model Series on Hugging Face , and GitHub.

Large Language Models

Large Language Models Inference Engine AI AI

Can LLMs Follow Instructions Reliably? A Look at Uncertainty Estimation Challenges

Marktechpost

OCTOBER 25, 2024

In light of these drawbacks, a trustworthy technique for determining when and how an LLM may be unsure about its capacity to follow directions is necessary to reduce the dangers involved with using these models. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Gr oup.

LLM

LLM Inference Engine Large Language Models ML

How Large Language Models (LLMs) can Perform Multiple, Computationally Distinct In-Context Learning (ICL) Tasks Simultaneously

Marktechpost

OCTOBER 17, 2024

Task superposition means that when an LLM is provided relevant examples for each task within the same input prompt, it can process and produce responses for several tasks at once. The team has shared their primary contributions as follows. Llama-3, and Qwen. If you like our work, you will love our newsletter.

Large Language Models

Large Language Models Inference Engine LLM ML

Rethinking Direct Alignment: Balancing Likelihood and Diversity for Better Model Performance

Marktechpost

OCTOBER 19, 2024

The study also employed regularization schemes like Negative Log-Likelihood (NLL) to mitigate over-optimization and evaluated generalization performance using LLM-as-a-Judge, a framework for comparing model outputs with those from other leading models. If you like our work, you will love our newsletter.

Inference Engine

Inference Engine Algorithm LLM ML

SGLang: A Structured Generation Language for Efficient Execution of Complex Language Model Programs

Marktechpost

JULY 27, 2024

Recent advancements in LLM capabilities have increased their usability by enabling them to do a broader range of general activities autonomously. There are two main obstacles to effective LM program utilization: The non-deterministic character of LLMs makes programming LM programs tedious and complex.

Inference Engine

Inference Engine LLM Software Development Python

CMU Researchers Release Pangea-7B: A Fully Open Multimodal Large Language Models MLLMs for 39 Languages

Marktechpost

OCTOBER 22, 2024

A team of researchers from Carnegie Mellon University introduced PANGEA, a multilingual multimodal LLM designed to bridge linguistic and cultural gaps in visual understanding tasks. PANGEA is trained on a newly curated dataset, PANGEAINS, which contains 6 million instruction samples across 39 languages.

Large Language Models

Large Language Models Data Scarcity Inference Engine LLM

AutoDAN-Turbo: A Black-Box Jailbreak Method for LLMs with a Lifelong Agent

Marktechpost

OCTOBER 16, 2024

The Attack Generation and Exploration Module uses an attacker LLM to generate jailbreak prompts based on strategies from the Retrieval Module. These prompts target a victim LLM, with responses evaluated by a scorer LLM. This process generates attack logs for the Strategy Library Construction Module.

Large Language Models

Large Language Models Inference Engine LLM Algorithm

Salesforce AI Introduces ReGenesis: A Novel AI Approach to Improving Large Language Model Reasoning Capabilities

Marktechpost

OCTOBER 18, 2024

One of the critical problems faced by AI researchers is that many current methods for enhancing LLM reasoning capabilities rely heavily on human intervention. Finally, the LLM uses these reasoning structures to create detailed reasoning paths. If you like our work, you will love our newsletter.

Large Language Models

Large Language Models Inference Engine AI AI

Agent-as-a-Judge: An Advanced AI Framework for Scalable and Accurate Evaluation of AI Systems Through Continuous Feedback and Human-level Judgments

Marktechpost

OCTOBER 18, 2024

Current evaluation frameworks, such as LLM-as-a-Judge, which uses large language models to judge outputs from other AI systems, must account for the entire task-solving process. The results of the Agent-as-a-Judge framework achieved a 90% alignment with human evaluators, compared to LLM-as-a-Judge’s 70% alignment.

Large Language Models

Large Language Models LLM AI Development AI Developer

Meta AI Releases Meta Spirit LM: An Open Source Multimodal Language Model Mixing Text and Speech

Marktechpost

OCTOBER 18, 2024

Traditionally, large language models (LLMs) used for building TTS pipelines convert speech to text using automatic speech recognition (ASR), process it using an LLM, and then convert the output back to speech via TTS. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Gr oup.

Inference Engine

Inference Engine Large Language Models AI AI

Generative Reward Models (GenRM): A Hybrid Approach to Reinforcement Learning from Human and AI Feedback, Solving Task Generalization and Feedback Collection Challenges

Marktechpost

OCTOBER 22, 2024

GenRM leverages a large pre-trained LLM to generate reasoning chains that help decision-making. The model also outperformed LLM-based judges, which rely solely on AI feedback, showcasing a more balanced approach to feedback optimization. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Gr oup.

Inference Engine

Inference Engine Large Language Models LLM AI

CREAM: A New Self-Rewarding Method that Allows the Model to Learn more Selectively and Emphasize on Reliable Preference Data

Marktechpost

OCTOBER 20, 2024

This is indeed a very serious challenge for the larger and more effective scaling of LLM modalities applied to real-world applications. Current solutions to alignment involve methods such as RLHF and direct preference optimization (DPO). Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Gr oup.

Inference Engine

Inference Engine LLM Large Language Models ML

Baichuan-Omni: An Open-Source 7B Multimodal Large Language Model for Image, Video, Audio, and Text Processing

Marktechpost

OCTOBER 18, 2024

Researchers from Westlake University and Zhejiang University introduced an omni-modal LLM Baichuan-Omni alongside a multimodal training scheme designed to facilitate advanced multimodal processing and better user interactions. It also provides multilingual support for languages such as English and Chinese.

Large Language Models

Large Language Models Natural Language Processing Inference Engine LLM

MiniCTX: Advancing Context-Dependent Theorem Proving in Large Language Models

Marktechpost

OCTOBER 27, 2024

The disconnect between laboratory performance and practical applications raises concerns about the true effectiveness of LLM-based provers. Current methodologies often fail to capture the intricate nature of mathematical reasoning required in authentic theorem-proving scenarios, limiting their practical utility.

Large Language Models

Large Language Models Metadata Inference Engine Automation

LongAlign: A Segment-Level Encoding Method to Enhance Long-Text to Image Generation

Marktechpost

OCTOBER 21, 2024

While alternative methods like large language model (LLM)-based encoders can handle longer sequences, they fail to provide the same level of alignment as contrastive pre-training encoders do. The growing popularity of diffusion models has been driven by advancements in fast sampling techniques and text-conditioned generation.

Inference Engine

Inference Engine Large Language Models LLM ML

MathGAP: An Evaluation Benchmark for LLMs’ Mathematical Reasoning Using Controlled Proof Depth, Width, and Complexity for Out-of-Distribution Tasks

Marktechpost

OCTOBER 26, 2024

Methods for testing reasoning capabilities include datasets like GSM8k, which contains arithmetic word problems that test LLMs on basic to intermediate logic tasks. However, these benchmarks must be revised to push the limits of LLM reasoning, as they often contain repetitive patterns and need more variety in problem structures.

Inference Engine

Inference Engine LLM Large Language Models Machine Learning

MIND (Math Informed syNthetic Dialogue): How Structured Synthetic Data Improves the Mathematical and Logical Capabilities of AI-Powered Language Models

Marktechpost

OCTOBER 21, 2024

The technology behind MIND works by prompting an LLM with a raw text from OpenWebMath and instructing it to break down the problem into a series of conversational turns. Each conversation style contributes to decomposing a mathematical problem into its core components, allowing the model to focus on each part in a detailed and logical manner.

Large Language Models

Large Language Models Inference Engine AI AI

SecCodePLT: A Unified Platform for Evaluating Security Risks in Code GenAI

Marktechpost

OCTOBER 19, 2024

These methods rely on predefined rules or LLM (Large Language Model) judgments to identify potential vulnerabilities in code. The second stage uses LLM-based mutators to generate large-scale data from these seed samples, preserving the original security context. These samples contain insecure and patched code and associated test cases.

Inference Engine

Inference Engine Large Language Models LLM AI Modeling

Controllable Safety Alignment (CoSA): An AI Framework Designed to Adapt Models to Diverse Safety Requirements without Re-Training

Marktechpost

OCTOBER 21, 2024

A team of researchers from Microsoft Responsible AI Research and Johns Hopkins University proposed Controllable Safety Alignment (CoSA) , a framework for efficient inference-time adaptation to diverse safety requirements. The adapted strategy first produces an LLM that is easily controllable for safety.

Large Language Models

Large Language Models Inference Engine LLM AI

M-RewardBench: A Multilingual Approach to Reward Model Evaluation, Analyzing Accuracy Across High and Low-Resource Languages with Practical Results

Marktechpost

OCTOBER 27, 2024

This adaptation is essential, given the global user base that increasingly relies on LLMs across diverse languages for various tasks, including everyday information, safety guidelines, and nuanced conversations. A core issue in LLM development lies in adapting RMs to perform consistently across different languages.

Inference Engine

Inference Engine Large Language Models LLM ML

Neural Magic Unveils Machete: A New Mixed-Input GEMM Kernel for NVIDIA Hopper GPUs

Marktechpost

OCTOBER 15, 2024

These problems are particularly noticeable when deploying LLMs on NVIDIA Hopper GPUs, as balancing memory usage and computational speed becomes more challenging. Neural Magic introduces Machete: a new mixed-input GEMM kernel for NVIDIA Hopper GPUs, representing a major advancement in high-performance LLM inference.

LLM

LLM Inference Engine Large Language Models ML

CodeJudge: An Machine Learning Framework that Leverages LLMs to Evaluate Code Generation Without the Need for Test Cases

Marktechpost

OCTOBER 17, 2024

This research offers a valuable tool for improving the quality and reliability of LLM-generated code and streamlining software development workflows. This approach is quite comprehensive but provides a setback due to its dependence on predefined tests that limit the adaptability in unconventional coding styles. Check out the Paper.

Machine Learning

Machine Learning Software Development Inference Engine Large Language Models

MIRAGE-Bench: An Automatic Multilingual Benchmark for Retrieval-Augmented Generation Systems

Marktechpost

OCTOBER 26, 2024

In RAG systems, an LLM creates a response based on the recovered content after a retrieval step in which pertinent information or passages are gathered. The second kind, known as arena-based benchmarks, uses a high-performance LLM as a teacher to evaluate model outputs through direct model comparisons in a setting akin to a competition.

LLM

LLM Inference Engine Large Language Models Machine Learning

Meet SynPO: A Self-Boosting Paradigm that Uses Synthetic Preference Data for Model Alignment

Marktechpost

OCTOBER 22, 2024

SynPO is a self-boosting method that enhances LLM alignment without heavily depending on human annotations by creating synthetic data. Instead of relying on complicated datasets or outside human inputs, it makes use of the LLM itself to provide a range of cues that elicit various scenarios and replies. and ArenaHard.

LLM

LLM Inference Engine Large Language Models ML

IBM Researchers Introduce ST-WebAgentBench: A New AI Benchmark for Evaluating Safety and Trustworthiness in Web Agents

Marktechpost

OCTOBER 17, 2024

Large Language Model (LLM)–based online agents have significantly advanced in recent times, resulting in unique designs and new benchmarks that show notable improvements in autonomous web navigation and interaction. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Gr oup.

Inference Engine

Inference Engine Large Language Models LLM AI

RealHumanEval: A Web Interface to Measure the Ability of LLMs to Assist Programmers

Marktechpost

OCTOBER 20, 2024

None of these traditional approaches capture key metrics, such as how much time programmers spend coding, how frequently programmers accept LLM suggestions or the degree to which LLMs actually help solve complex problems. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Gr oup.

LLM

LLM Inference Engine Large Language Models ML

Embodied Agent Interface: An AI Framework for Benchmarking Large Language Models (LLMs) for Embodied Decision Making

Marktechpost

OCTOBER 15, 2024

A portion of this disparity might be attributed to the fact that LLMs have been used in various fields with various goals and input-output configurations. It is challenging for researchers to fine-tune the application of LLMs for particular jobs or contexts without this degree of information.

Large Language Models

Large Language Models Inference Engine LLM AI

NVIDIA Dynamo: Scaling AI inference with open-source efficiency

Revolutionizing Fine-Tuned Small Language Model Deployments: Introducing Predibase’s Next-Gen Inference Engine

Webinars

Trending Sources

IoT-LLM: An AI Framework that Integrates IoT Sensor Data with LLMs to Enhance their Perception and Reasoning Abilities in the Physical World

Webinars

Layer-of-Thoughts Prompting (LoT): A Unique Approach that Uses Large Language Model (LLM) based Retrieval with Constraint Hierarchies

Google AI Researchers Propose ‘MODEL SWARMS’: A Collaborative Search Algorithm to Flexibly Adapt Diverse LLM Experts to Wide-Ranging Purposes

Stanford Researchers Propose LoLCATS: A Cutting Edge AI Method for Efficient LLM Linearization

Assessing the Vulnerabilities of LLM Agents: The AgentHarm Benchmark for Robustness Against Jailbreak Attacks

Google AI Research Introduces Process Advantage Verifiers: A Novel Machine Learning Approach to Improving LLM Reasoning Capabilities

Google AI Researchers Introduced a Set of New Methods for Enhancing Long-Context LLM Performance in Retrieval-Augmented Generation

SeedLM: A Post-Training Compression Method that Uses Pseudo-Random Generators to Efficiently Encode and Compress LLM Weights

AFlow: A Novel Artificial Intelligence Framework for Automated Workflow Optimization

Graph-Constrained Reasoning (GCR): A Novel AI Framework that Bridges Structured Knowledge in Knowledge Graphs with Unstructured Reasoning in LLMs

SPARE: Training-Free Representation Engineering for Managing Knowledge Conflicts in Large Language Models

WorFBench: A Benchmark for Evaluating Complex Workflow Generation in Large Language Model Agents

Katanemo Open Sources Arch-Function: A Set of Large Language Models (LLMs) Promising Ultra-Fast Speeds at Function-Calling Tasks for Agentic Workflows

Salesforce AI Research Propose Programmatic VLM Evaluation (PROVE): A New Benchmarking Paradigm for Evaluating VLM Responses to Open-Ended Queries

Self-Data Distilled Fine-Tuning: A Solution for Pruning and Supervised Fine-tuning Challenges in LLMs

Meta AI Releases LayerSkip: A Novel AI Approach to Accelerate Inference in Large Language Models (LLMs)

Can LLMs Follow Instructions Reliably? A Look at Uncertainty Estimation Challenges

How Large Language Models (LLMs) can Perform Multiple, Computationally Distinct In-Context Learning (ICL) Tasks Simultaneously

Rethinking Direct Alignment: Balancing Likelihood and Diversity for Better Model Performance

SGLang: A Structured Generation Language for Efficient Execution of Complex Language Model Programs

CMU Researchers Release Pangea-7B: A Fully Open Multimodal Large Language Models MLLMs for 39 Languages

AutoDAN-Turbo: A Black-Box Jailbreak Method for LLMs with a Lifelong Agent

Salesforce AI Introduces ReGenesis: A Novel AI Approach to Improving Large Language Model Reasoning Capabilities

Agent-as-a-Judge: An Advanced AI Framework for Scalable and Accurate Evaluation of AI Systems Through Continuous Feedback and Human-level Judgments

Meta AI Releases Meta Spirit LM: An Open Source Multimodal Language Model Mixing Text and Speech

Generative Reward Models (GenRM): A Hybrid Approach to Reinforcement Learning from Human and AI Feedback, Solving Task Generalization and Feedback Collection Challenges

CREAM: A New Self-Rewarding Method that Allows the Model to Learn more Selectively and Emphasize on Reliable Preference Data

Baichuan-Omni: An Open-Source 7B Multimodal Large Language Model for Image, Video, Audio, and Text Processing

MiniCTX: Advancing Context-Dependent Theorem Proving in Large Language Models

LongAlign: A Segment-Level Encoding Method to Enhance Long-Text to Image Generation

MathGAP: An Evaluation Benchmark for LLMs’ Mathematical Reasoning Using Controlled Proof Depth, Width, and Complexity for Out-of-Distribution Tasks

MIND (Math Informed syNthetic Dialogue): How Structured Synthetic Data Improves the Mathematical and Logical Capabilities of AI-Powered Language Models

SecCodePLT: A Unified Platform for Evaluating Security Risks in Code GenAI

Controllable Safety Alignment (CoSA): An AI Framework Designed to Adapt Models to Diverse Safety Requirements without Re-Training

M-RewardBench: A Multilingual Approach to Reward Model Evaluation, Analyzing Accuracy Across High and Low-Resource Languages with Practical Results

Neural Magic Unveils Machete: A New Mixed-Input GEMM Kernel for NVIDIA Hopper GPUs

CodeJudge: An Machine Learning Framework that Leverages LLMs to Evaluate Code Generation Without the Need for Test Cases

MIRAGE-Bench: An Automatic Multilingual Benchmark for Retrieval-Augmented Generation Systems

Meet SynPO: A Self-Boosting Paradigm that Uses Synthetic Preference Data for Model Alignment

IBM Researchers Introduce ST-WebAgentBench: A New AI Benchmark for Evaluating Safety and Trustworthiness in Web Agents

RealHumanEval: A Web Interface to Measure the Ability of LLMs to Assist Programmers

Embodied Agent Interface: An AI Framework for Benchmarking Large Language Models (LLMs) for Embodied Decision Making

Stay Connected