FlashSigmoid: A Hardware-Aware and Memory-Efficient Implementation of Sigmoid Attention Yielding a 17% Inference Kernel Speed-Up over FlashAttention-2 on H100 GPUs
Marktechpost
SEPTEMBER 13, 2024
In supervised image classification and self-supervised learning, there’s a trend towards using richer pointwise Bernoulli conditionals parameterized by sigmoid functions, moving away from output conditional categorical distributions typically parameterized by softmax.
Let's personalize your content