Remove BERT Remove Books Remove Computational Linguistics
article thumbnail

All Languages Are NOT Created (Tokenized) Equal

Topbots

70% of research papers published in a computational linguistics conference only evaluated English.[ In Findings of the Association for Computational Linguistics: ACL 2022 , pages 2340–2354, Dublin, Ireland. Association for Computational Linguistics. Are All Languages Created Equal in Multilingual BERT?

article thumbnail

68 Summaries of Machine Learning and NLP Research

Marek Rei

They show state-of-the-art results on long-context language modelling, finding a hidden passcode from a 1M token length context, and summarizing 500K length books. BooookScore: A systematic exploration of book-length summarization in the era of LLMs Yapei Chang, Kyle Lo, Tanya Goyal, Mohit Iyyer. Computational Linguistics 2022.

professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

The State of Multilingual AI

Sebastian Ruder

Research models such as BERT and T5 have become much more accessible while the latest generation of language and multi-modal models are demonstrating increasingly powerful capabilities. Citation For attribution in academic contexts or books, please cite this work as: Sebastian Ruder, "The State of Multilingual AI" [link] 2022.

article thumbnail

ML and NLP Research Highlights of 2021

Sebastian Ruder

6] such as W2v-BERT [7] as well as more powerful multilingual models such as XLS-R [8]. For each input chunk, nearest neighbor chunks are retrieved using approximate nearest neighbor search based on BERT embedding similarity. W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training.

NLP 52
article thumbnail

2022: We reviewed this year’s AI breakthroughs

Applied Data Science

You can also provide negative prompts (which is what you don’t want to see in the image) book a demo with Google’s Imagen access the Discord server of Midjourney to give text prompts to their bot Language generation Machines that can understand and communicate in our own language have always been hard to construct. What happened?

article thumbnail

Multi-domain Multilingual Question Answering

Sebastian Ruder

Reading Comprehension assumes a gold paragraph is provided Standard approaches for reading comprehension build on pre-trained models such as BERT. Using BERT for reading comprehension involves fine-tuning it to predict a) whether a question is answerable and b) whether each token is the start and end of an answer span.

BERT 52