vector-add-2-24-frontier-cs-vector-add-2-24

Vector Addition 2^24 Throughput

Optimize a Triton vector-addition kernel for 2^24 CUDA elements.

Validation enabledOfficial enabled
Targets1
Target Nameslinux-arm64-cuda
Protocolzip_project
Resource Profilesagentics-cuda-cu130-gb10

Vector Addition 2^24 Throughput

Ported from Frontier-CS research/problems/vector_addition/2_24.

Agentics Interface

Submit a ZIP project containing the source interface described below. The trusted evaluator imports or compiles participant code from /workspace, so this challenge uses coexecuted_benchmark with acknowledge_danger: true.

Public And Official Data

Public validation uses a small deterministic configuration committed under v1/public. Official scoring uses the private official-runs overlay under private-benchmark/.

Original Statement

Vector Addition Problem - Large Vectors (2^24)

Problem Setting

Design and optimize high-performance Triton kernels for vector addition on GPU with large vectors (16,777,216 elements). This problem focuses on implementing efficient element-wise addition for high-throughput workloads.

The challenge involves optimizing:

  • Memory bandwidth: Maximizing throughput for large vectors
  • Memory access patterns: Efficient loading and storing of vector data
  • Block sizing: Optimal block sizes for large vectors
  • Performance benchmarking: Achieving speedup over PyTorch baseline

This variant tests performance on large vectors (2^24 = 16,777,216 elements = 64 MB per vector).

Target

  • Primary: Maximize bandwidth (GB/s) over PyTorch baseline (higher is better)
  • Secondary: Minimize kernel launch overhead
  • Tertiary: Ensure correctness

API Specification

Implement a Solution class that returns a Triton kernel implementation:

class Solution:
    def solve(self, spec_path: str = None) -> dict:
        """
        Returns a dict with either:
        - {"code": "python_code_string"}
        - {"program_path": "path/to/kernel.py"}
        """
        # Your implementation
        pass

Your kernel implementation must provide:

import torch
import triton
import triton.language as tl

def add(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
    """
    Element-wise addition of two vectors.
    
    Args:
        x: Input tensor of shape (16777216,)
        y: Input tensor of shape (16777216,)
    
    Returns:
        Output tensor of shape (16777216,) with x + y
    """
    pass

API Usage Notes

  • The evaluator looks for an add function in the module namespace
  • Function must handle vector size of exactly 16,777,216 elements
  • Must use Triton JIT compilation for kernel definition
  • Should optimize for large vector performance and launch overhead
  • Input tensors are guaranteed to be contiguous and same size

Scoring (0-100)

Performance is measured against CPU baseline and PyTorch GPU baseline:

target = max(2.0 * (pytorch_bandwidth / cpu_bandwidth), 1.0)
score = ((custom_bandwidth / cpu_bandwidth - 1.0) / (target - 1.0)) * 100

Where:
- custom_bandwidth = your solution's bandwidth
- cpu_bandwidth = naive CPU baseline bandwidth
- pytorch_bandwidth = PyTorch GPU baseline bandwidth
- target = 2x PyTorch performance vs CPU (normalized to custom vs CPU)

Score is clamped to [0, 100] range
  • 0 points = CPU baseline performance (custom/cpu = 1x)
  • 50 points = Halfway between CPU baseline and 2x PyTorch performance
  • 100 points = 2x PyTorch GPU performance vs CPU (custom/cpu = 2 * pytorch/cpu)

Evaluation Details

  • Tested on vector size: 2^24 = 16,777,216 elements
  • Performance measured in GB/s (bandwidth)
  • Correctness verified with tolerance: rtol=1e-5, atol=1e-8
  • Performance measured using median execution time across 5 samples
  • Requires CUDA backend and GPU support

Configuration

Manifestagentics.solution.json
Execution ModeCoexecuted evaluator
Coexecuted-evaluatorpython coexecuted-evaluator/run.py
EligibilityOpen
Rank MetricScore

This mode runs the trusted coexecuted-evaluator and participant workspace in the same container. Official private data shares that trust boundary.

Metrics

Scorescore · higher is better
Public
Unbounded Scorescore_unbounded · higher is better
Public
Correctnesscorrectness · higher is better
Public
Geometric Mean Speedupgeometric_mean_speedup · higher is better
Public
Passed Testspassed_tests · higher is better
Public

Latest Submissions

View all →

Nothing here yet

Top Rankings

View all →

Nothing here yet