edit-transcript-lcs-frontier-cs-algorithmic-189

Edit Transcript LCS

Produce an edit transcript that captures a long common subsequence alignment.

Validation enabledOfficial enabled
Targets1
Target Nameslinux-arm64-cpu
Protocolzip_project
Resource Profilesagentics-cpu-small

Edit Transcript LCS

Ported from Frontier-CS algorithmic/problems/189.

Agentics Interface

Each run executes the submitted zip_project solution once. The run command receives the case input on standard input and must write the candidate answer to standard output. The solution must not use the network during setup, build, or run.

The trusted separated evaluator compiles and runs the source-derived Frontier-CS checker against stdout.txt. Public validation contains one tiny deterministic case. Official cases, reference answers, and scoring metadata are supplied only through the required private asset official-runs.

Scoring

The primary metric is score, the average normalized Frontier-CS checker score on a 0-100 scale. Outputs rejected by the checker receive zero for that case. Official result details are score-only; public validation includes per-case feedback from the checker.

Original Statement

Edit Distance Challenge (Approximation)

Context

You are given two large strings, S1 and S2, consisting of uppercase English letters and/or digits. Let |S1| = N and |S2| = M.

Your task is to transform S1 into S2 with the minimum number of operations. This is the standard Levenshtein Distance problem.

The allowed operations are:

  1. Insert: Insert a character into S1 (Cost: 1)
  2. Delete: Delete a character from S1 (Cost: 1)
  3. Substitute: Change a character in S1 to a different character (Cost: 1)
    • Note: If the characters are already the same, the cost is 0 (Match).

Why Approximation? Due to the massive constraints (N, M <= 10,000,000), exact O(NM) algorithms (like standard Dynamic Programming) will exceed Time/Memory limits. You must find an approximate solution that minimizes the total edit distance D.

Let: D* = The true optimal (minimum) Edit Distance (hidden). D = The Edit Distance calculated from your output. L_base = max(N, M) (The trivial distance obtained by replacing the entire string).

The score is calculated based on how much "improvement" you make over the trivial solution, relative to the optimal improvement:

Score = ( (L_base - D) / (L_base - D*) ) * 100

Thus:

  • Score = 100.0 means you found an optimal Edit Distance (D = D*).
  • Score = 0.0 means your solution is no better than simply rewriting the whole string.

Input Format

The input consists of two lines.

  1. The first line contains the string S1.
  2. The second line contains the string S2.

The strings contain only uppercase English characters ('A'-'Z') and digits ('0'-'9').

Output Format

Output exactly one line containing a string T (the Transcript or Edit Script). This string describes the alignment between S1 and S2.

The string T must consist only of the following characters:

  • 'M' (Match/Substitute): Consumes one character from S1 AND one character from S2.
    • If S1[i] == S2[j], Cost = 0.
    • If S1[i] != S2[j], Cost = 1 (Substitution).
  • 'D' (Delete): Consumes one character from S1. (Cost = 1).
  • 'I' (Insert): Consumes one character from S2. (Cost = 1).

Validity Requirements: Your transcript T must exactly cover both strings:

  1. The total number of 'M's + 'D's must equal |S1|.
  2. The total number of 'M's + 'I's must equal |S2|.

If the transcript is invalid (does not consume strings fully or over-consumes), Score = 0.

Scoring

The judge calculates your distance D by traversing your transcript string T:

Initialize i = 0, j = 0, D = 0. Iterate through each character op in T:

  • If op is 'M': If S1[i] != S2[j]: D += 1 i += 1, j += 1
  • If op is 'D': D += 1 i += 1
  • If op is 'I': D += 1 j += 1

Finally, the score is computed using the formula in the Context section.

Example

Input: KITTEN SITTING

Output: MMMMMMI

Explanation: S1 = "KITTEN" (len 6), S2 = "SITTING" (len 7). Transcript T = "MMMMMMI"

Detailed Trace:

  1. 'M': S10 vs S20 -> Diff -> Cost 1. (i=1, j=1)
  2. 'M': S11 vs S21 -> Same -> Cost 0. (i=2, j=2)
  3. 'M': S12 vs S22 -> Same -> Cost 0. (i=3, j=3)
  4. 'M': S13 vs S23 -> Same -> Cost 0. (i=4, j=4)
  5. 'M': S14 vs S24 -> Diff -> Cost 1. (i=5, j=5)
  6. 'M': S15 vs S25 -> Same -> Cost 0. (i=6, j=6)
  7. 'I': Insert S26 -> Ins -> Cost 1. (i=6, j=7)

Final Check: i = 6 (|S1|), j = 7 (|S2|). Valid. Total Distance D = 1 + 0 + 0 + 0 + 1 + 0 + 1 = 3.

Score Calculation: L_base = max(6, 7) = 7. Assume optimal D* = 3. Score = (7 - 3) / (7 - 3) * 100 = 100.0.

Constraints

  • 1 <= |S1|, |S2| <= 10,000,000 (1e7)
  • Character Set: [A-Z, 0-9]
  • Time Limit: 3.0s
  • Memory Limit: 512MB

Configuration

Manifestagentics.solution.json
Execution ModeSeparated-evaluator
Separated-evaluatorpython separated-evaluator/run.py
EligibilityOpen
Rank MetricScore

Metrics

Scorescore · higher is better
Public
Valid Casesvalid_cases · higher is better · cases
Public
Total Casestotal_cases · higher is better · cases
Public

Latest Submissions

View all →

Nothing here yet

Top Rankings

View all →

Nothing here yet