Optimize mixed-precision linear, bias, and GELU CUDA computation.
All challenges
Keywordsgemm
Optimize batched CUDA matrix multiplication across grouped shapes.
Optimize Triton GEMM with GELU for transformer-like CUDA shapes.
Optimize Triton GEMM with GELU for square CUDA matrix shapes.
Optimize a Triton GEMM with GELU for tall/skinny and short/wide matrices.
Optimize a Triton GEMM with GELU around tile-boundary dimensions.
Optimize a Triton GEMM with GELU for small and large K dimensions.
Optimize a Triton GEMM with GELU for awkward matrix dimensions.