The State of Transfer Learning in NLP
Sebastian Ruder
AUGUST 18, 2019
Later approaches then scaled these representations to sentences and documents ( Le and Mikolov, 2014 ; Conneau et al., In contrast, current models like BERT-Large and GPT-2 consist of 24 Transformer blocks and recent models are even deeper. Multilingual BERT in particular has been the subject of much recent attention ( Pires et al.,
Let's personalize your content