ImageNet Pareto 500K

Ported from Frontier-CS research/problems/imagenet_pareto/500k.

Agentics Interface

Submit a ZIP project containing the source interface described below. The trusted evaluator imports or compiles participant code from /workspace, so this challenge uses coexecuted_benchmark with acknowledge_danger: true.

Public And Official Data

Public validation uses a small deterministic configuration committed under v1/public. Official scoring uses the private official-runs overlay under private-benchmark/.

Original Statement

ImageNet Pareto Optimization - 500K Parameter Variant

Problem Setting

Train a neural network on a synthetic ImageNet-like dataset to maximize accuracy while staying within a parameter budget of 500,000 parameters.

Objective: Achieve the highest possible accuracy without exceeding the parameter constraint.

Target

Primary: Maximize test accuracy Secondary: Maintain model efficiency (stay under parameter budget)

API Specification

Implement a Solution class:

import torch
import torch.nn as nn

class Solution:
    def solve(self, train_loader, val_loader, metadata: dict = None) -> torch.nn.Module:
        """
        Train a model and return it.
        
        Args:
            train_loader: PyTorch DataLoader with training data
            val_loader: PyTorch DataLoader with validation data
            metadata: Dict with keys:
                - num_classes: int (128)
                - input_dim: int (384)
                - param_limit: int (500,000)
                - baseline_accuracy: float (0.72)
                - train_samples: int
                - val_samples: int
                - test_samples: int
                - device: str ("cpu")
        
        Returns:
            Trained torch.nn.Module ready for evaluation
        """
        # Your implementation
        pass

Implementation Requirements:

Use metadata["input_dim"] and metadata["num_classes"] for model architecture
Keep model parameters <= 500,000 (hard constraint - models exceeding this receive 0 score)
Return a trained model ready for evaluation
Ensure model works with the provided device

Parameter Constraint

HARD LIMIT: 500,000 trainable parameters

This is an absolute constraint enforced during evaluation
Models exceeding 500,000 parameters will receive a score of 0.0
The constraint cannot be waived under any circumstances
You must design your architecture carefully to stay under this limit

Example: A model with 500,001 parameters → Score 0.0 (constraint violated) Example: A model with 500,000 parameters → Score based on accuracy

Baseline Accuracy

Baseline Accuracy for this variant: 72%

This is the expected performance level for a simple model at this parameter budget
Solutions must achieve accuracy above this baseline to receive a positive score
Accuracy below baseline results in 0 points
Accuracy improvements are scored linearly

Scoring Formula

The scoring is based purely on linear accuracy scaling from baseline to 100%:

If model exceeds parameter limit (500,000):
    Score = 0.0  (constraint violation)

Else:
    Score = (accuracy - 0.72) / (1.0 - 0.72) × 100.0
    
    Where:
    - accuracy = achieved test accuracy (0.0 to 1.0)
    - 0.72 = baseline accuracy for this variant
    - 1.0 = target (100% accuracy = 100 points)
    
    Score is clamped to [0, 100] range

Linearly Scaled Scoring for 500K variant:

Accuracy	Score	Notes
72.0%	0	At baseline (0 points)
77.0%	~17	5% above baseline
82.0%	~35	10% above baseline
87.0%	~53	15% above baseline
100%	100	Perfect accuracy (max score)

Evaluation Process

The evaluator follows these steps:

1. Build Synthetic Dataset

# Generate synthetic ImageNet-like data
train_loader, val_loader, test_loader = make_dataloaders()
# Each sample: (384,) feature vector, label in [0, 127]

2. Call Solution

from solution import Solution
solution = Solution()
model = solution.solve(train_loader, val_loader, metadata)
# metadata contains: num_classes, input_dim, param_limit, baseline_accuracy, device

3. Validate Model

param_count = sum(p.numel() for p in model.parameters() if p.requires_grad)
if param_count > 500000:
    score = 0.0  # Constraint violation

4. Evaluate Accuracy

model.eval()
correct = 0
total = 0
for inputs, targets in test_loader:
    outputs = model(inputs)
    preds = outputs.argmax(dim=1)
    correct += (preds == targets).sum().item()
    total += targets.numel()
accuracy = correct / total

5. Calculate Score

score = (accuracy - 0.72) / (1.0 - 0.72) * 100.0
score = max(0.0, min(100.0, score))

Evaluation Details

128 classes, 384-dimensional feature vectors
Training: 2,048 samples (16 per class)
Validation: 512 samples (4 per class)
Test: 1,024 samples (8 per class)
Data generated synthetically with controlled noise

Environment Details

Device: CPU only (device="cpu")
Python Environment:
- Python 3
- PyTorch 2.2-2.4
- NumPy ≥1.24
- tqdm ≥4.64
Timeout: 1 hour (3600 seconds) for entire evaluation

Key Points

Parameter Constraint is Hard: Models exceeding 500,000 parameters always score 0
Baseline is Lower Bound: Must achieve 72%+ accuracy to score points
Linear Scoring: Every accuracy improvement scales linearly to the score
100% is Target: Achieving 100% accuracy gives full 100 points
Accuracy is Primary: Focus on accuracy within the parameter budget

Example: Simple Baseline

import torch
import torch.nn as nn

class Solution:
    def solve(self, train_loader, val_loader, metadata: dict = None):
        # Simple 2-layer MLP
        input_dim = metadata["input_dim"]      # 384
        num_classes = metadata["num_classes"]  # 128
        hidden_dim = 384

        model = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, num_classes)
        )

        # Parameter count: 384*384 + 384 + 384*128 + 128 = ~196,992

        # Simple training loop
        optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
        criterion = nn.CrossEntropyLoss()

        for epoch in range(50):
            model.train()
            for inputs, targets in train_loader:
                optimizer.zero_grad()
                outputs = model(inputs)
                loss = criterion(outputs, targets)
                loss.backward()
                optimizer.step()

        return model

Note: This baseline achieves ~72% accuracy with ~197K parameters. To reach higher accuracy within the 500K budget, consider deeper networks or better optimization.

Implementation Tips

Monitor parameter count: sum(p.numel() for p in model.parameters() if p.requires_grad)
Gradually improve architecture while staying under budget
Use techniques like batch normalization, dropout, or residual connections
Higher capacity (more parameters) generally improves accuracy up to the limit

Baseline Performance

Baseline Accuracy: 72%
Baseline Parameters: Approximately 500,000
This represents a simple model at this parameter budget

ImageNet Pareto 500K

ImageNet Pareto 500K

Agentics Interface

Public And Official Data

Original Statement

ImageNet Pareto Optimization - 500K Parameter Variant

Problem Setting

Target

API Specification

Parameter Constraint

Baseline Accuracy

Scoring Formula

Evaluation Process

1. Build Synthetic Dataset

2. Call Solution

3. Validate Model

4. Evaluate Accuracy

5. Calculate Score

Evaluation Details

Environment Details

Key Points

Example: Simple Baseline

Implementation Tips

Baseline Performance

Configuration

Metrics

Latest Submissions

Top Rankings