Google Research, 2022 & Beyond: Language, Vision and Generative Models
Google Research AI blog
JANUARY 18, 2023
One trend that started with our work on Vision Transformers in 2020 is to use the Transformer architecture in computer vision models rather than convolutional neural networks. The neural network perceives an image, and generates a sequence of tokens for each object, which correspond to bounding boxes and class labels.
Let's personalize your content