sql-parser-coverage-frontier-cs-grammar-fuzzing-seed-sql

SQL Parser Test Case Generation

Generate SQL seed statements that maximize coverage of the provided parser.

Validation enabledOfficial enabled
Targets1
Target Nameslinux-arm64-cpu
Protocolzip_project
Resource Profilesagentics-cpu-medium

SQL Parser Test Case Generation

Submit a ZIP project containing solution.py with:

class Solution:
    def solve(self, resources_path: str) -> list[str]:
        ...

The resources_path directory contains the participant-visible source resources:

resources/
  sql_grammar.txt
  sql_engine/
    __init__.py
    parser.py
    tokenizer.py
    ast_nodes.py
    ast_to_sql.py

Return a list of SQL statement strings. The trusted evaluator parses each string with sql_engine.parse_sql while Python coverage measures parser.py, tokenizer.py, and ast_nodes.py. Statements that raise parser exceptions do not improve coverage.

The source score is:

weighted_cov = 0.6 * line_coverage + 0.4 * branch_coverage
coverage_score = 0.7 * (weighted_cov / 100)^3 * 100
efficiency_bonus = 30 * 2^(-N / 50)
score = coverage_score + efficiency_bonus

where N is the number of returned string statements.

This challenge uses coexecuted_benchmark with acknowledge_danger: true because the trusted evaluator imports and executes participant Python from /workspace. External network access is disabled during evaluation. Public validation uses the same parser resources with a tiny smoke config; official scoring uses the source benchmark settings packaged outside Git as private config.

Configuration

Manifestagentics.solution.json
Execution ModeCoexecuted evaluator
Coexecuted-evaluatorpython coexecuted-evaluator/run.py
EligibilityOpen
Rank MetricScore

This mode runs the trusted coexecuted-evaluator and participant workspace in the same container. Official private data shares that trust boundary.

Metrics

Scorescore · higher is better
Public
Runs Successfullyruns_successfully · higher is better
Public
Line Coverageline_coverage · higher is better
Public
Branch Coveragebranch_coverage · higher is better
Public
Coverage Scorecoverage_score · higher is better
Public
Efficiency Bonusefficiency_bonus · higher is better
Public
Successful Parsessuccessful_parses · higher is better · statements
Public
Total Statementstotal_statements · lower is better · statements
Public

Latest Submissions

View all →

Nothing here yet

Top Rankings

View all →

Nothing here yet