SQL Parser Fuzzer Coverage
Generate SQL parser fuzz cases to maximize trusted coverage.
SQL Parser Fuzzer Coverage
Ported from Frontier-CS research/problems/grammar_fuzzing/fuzzer/sql.
Agentics Interface
Submit a ZIP project containing the source interface described below. The trusted evaluator imports or compiles participant code from /workspace, so this challenge uses coexecuted_benchmark with acknowledge_danger: true.
Public And Official Data
Public validation uses a small deterministic configuration committed under v1/public. Official scoring uses the private official-runs overlay under private-benchmark/.
Original Statement
SQL Parser Fuzzer Optimization Problem
Problem Setting
Design and implement a high-performance fuzzer that maximizes code coverage of a SQL parser within a fixed time budget. This problem focuses on implementing efficient test generation strategies that systematically explore parser behavior.
The challenge involves optimizing:
- Input generation: Efficiently generating or mutating SQL statements to trigger diverse parser paths
- Coverage exploration: Discovering inputs that exercise different parser branches and edge cases
- Time efficiency: Maximizing coverage achieved within the 60-second time budget
- Feedback utilization: Optionally using coverage feedback to guide generation strategies
Target
- Primary: Maximize line coverage percentage of the SQL engine (higher is better)
- Secondary: Maximize branch coverage percentage (used as tiebreaker)
- Tertiary: Achieve high coverage with fewer parser invocations (efficiency bonus)
API Specification
Implement a Solution class that returns fuzzer code:
class Solution:
def solve(self, resources_path: str) -> dict:
"""
Returns a dict with either:
- {"code": "python_code_string"}
- {"program_path": "path/to/fuzzer.py"}
"""
# Your implementation
pass
Your fuzzer implementation must provide:
def fuzz(parse_sql):
"""
Generate SQL statements and execute them through the parser.
This function will be called repeatedly by the evaluator until the time
budget (60 seconds) is exhausted. Each call should generate a batch of
SQL statements and pass them to parse_sql for execution.
Args:
parse_sql: A function that accepts a list[str] of SQL statements.
Call parse_sql(["SELECT * FROM t", "INSERT INTO t VALUES (1)"])
to execute statements through the parser. The parser will
attempt to parse each statement, contributing to coverage.
Returns:
bool: Return True to continue fuzzing, False to stop early.
The evaluator will keep calling fuzz() until either:
- The time budget (60 seconds) is exhausted
- fuzz() returns False
Example:
def fuzz(parse_sql):
# Generate some SQL statements
statements = [
"SELECT * FROM users",
"INSERT INTO orders (id, name) VALUES (1, 'test')",
"UPDATE products SET price = 100 WHERE id = 1",
]
# Execute through parser (this contributes to coverage)
parse_sql(statements)
return True # Continue fuzzing
"""
pass
Resources
The resources_path directory contains:
resources/
├── sql_grammar.txt # BNF-style grammar describing valid SQL syntax
└── sql_engine/ # Target SQL parser package
├── __init__.py
├── parser.py # Recursive descent parser
├── tokenizer.py # SQL tokenizer
└── ast_nodes.py # AST node definitions
You may explore these resources to understand the parser's structure and develop your fuzzing strategy. Various approaches can be effective:
- Grammar-based generation
- Coverage-guided mutation
- Random testing with heuristics
- Hybrid approaches
Fuzzer Interface Details
- parse_sql function: Accepts
list[str]of SQL statements- Each statement is parsed independently
- Exceptions during parsing are caught and do not halt fuzzing
- All executed statements contribute to the cumulative coverage measurement
- fuzz() calls: The evaluator calls
fuzz(parse_sql)repeatedly in a loop- Multiple calls allow for incremental fuzzing strategies
- Coverage accumulates across all calls
- Return
Trueto continue,Falseto stop early
- Stateful fuzzing: Your fuzzer can maintain state between calls (e.g., corpus, coverage map)
Scoring (0-100)
Performance is measured based on code coverage achieved:
# Coverage metrics (0-100 each)
line_coverage = lines_covered / total_lines * 100
branch_coverage = branches_covered / total_branches * 100
# Weighted coverage (0-100)
weighted_cov = 0.6 * line_coverage + 0.4 * branch_coverage
# Non-linear coverage score (0-70 points)
adjusted_cov = (weighted_cov / 100)^3 * 100
coverage_score = 0.7 * adjusted_cov
# Efficiency bonus (0-30 points): fewer parser calls = higher bonus
# N = number of parse_sql calls, N_ref = 500 (reference count)
efficiency_bonus = 30 * 2^(-N / N_ref)
score = coverage_score + efficiency_bonus
- Coverage determines 70% of the score (non-linear: high coverage is rewarded more)
- Efficiency bonus (30%) rewards achieving coverage with fewer parser invocations
- Achieving high coverage efficiently yields higher scores
Evaluation Details
- Time Budget: 60 seconds total for fuzzing execution
- Coverage Tool: Python
coveragemodule with branch coverage enabled - Target Files: parser.py, tokenizer.py, ast_nodes.py from sql_engine
- Timing: Starts when the first
fuzz()call begins
Additional Notes
- The evaluator handles all coverage measurement; your fuzzer only needs to generate inputs
- Parse errors during fuzzing are expected and do not penalize the score
- The
parse_sqlfunction catches exceptions internally; your fuzzer won't crash from bad SQL - Consider generating both valid and edge-case SQL to maximize coverage
- State can be maintained across
fuzz()calls for incremental exploration
Configuration
This mode runs the trusted coexecuted-evaluator and participant workspace in the same container. Official private data shares that trust boundary.
Metrics
Latest Submissions
View all →Nothing here yet
Top Rankings
View all →Nothing here yet