Modular Deep Learning
Sebastian Ruder
FEBRUARY 23, 2023
(b) Polytropon uses low-rank adapters with hard learned routing for few-shot task adaptation. (c) Computation Function We consider a neural network $f_theta$ as a composition of functions $f_{theta_1} odot f_{theta_2} odot ldots odot f_{theta_l}$, each with their own set of parameters $theta_i$. Learned routing. Learned
Let's personalize your content