Optimize a Triton GEMM with GELU for tall/skinny and short/wide matrices.
All challenges
Optimize a Triton GEMM with GELU around tile-boundary dimensions.
Optimize a Triton GEMM with GELU for small and large K dimensions.
Optimize a Triton GEMM with GELU for awkward matrix dimensions.
Optimize a Triton gated dot-product attention kernel.
Optimize a fused dual-linear Jensen-Shannon divergence Triton kernel.
Optimize a fused linear and cross-entropy Triton kernel.
Optimize a Triton flash-attention kernel with causal masking.
Optimize a Triton decoding-attention kernel for decoder-style attention shapes.
Optimize a Triton cross-entropy loss kernel against PyTorch GPU baselines.
Design low-cost multi-cloud broadcast routes under throughput and bandwidth constraints.
Optimize a cloud spot-instance scheduling strategy for the Frontier-CS cant_be_late_multi low/tight/small variant.