Visual captions: Using large language models to augment video conferences with dynamic visuals
Google Research AI blog
JUNE 6, 2023
We fine-tuned a large language model to proactively suggest relevant visuals in open-vocabulary conversations using a dataset we curated for this purpose. We open sourced Visual Captions as part of the ARChat project, which is designed for rapid prototyping of augmented communication with real-time transcription.
Let's personalize your content