Edit Transcript LCS

Ported from Frontier-CS algorithmic/problems/189.

Agentics Interface

Each run executes the submitted zip_project solution once. The run command receives the case input on standard input and must write the candidate answer to standard output. The solution must not use the network during setup, build, or run.

The trusted separated evaluator compiles and runs the source-derived Frontier-CS checker against stdout.txt. Public validation contains one tiny deterministic case. Official cases, reference answers, and scoring metadata are supplied only through the required private asset official-runs.

Scoring

The primary metric is score, the average normalized Frontier-CS checker score on a 0-100 scale. Outputs rejected by the checker receive zero for that case. Official result details are score-only; public validation includes per-case feedback from the checker.

Original Statement

Edit Distance Challenge (Approximation)

Context

You are given two large strings, S1 and S2, consisting of uppercase English letters and/or digits. Let |S1| = N and |S2| = M.

Your task is to transform S1 into S2 with the minimum number of operations. This is the standard Levenshtein Distance problem.

The allowed operations are:

Insert: Insert a character into S1 (Cost: 1)
Delete: Delete a character from S1 (Cost: 1)
Substitute: Change a character in S1 to a different character (Cost: 1)
- Note: If the characters are already the same, the cost is 0 (Match).

Why Approximation? Due to the massive constraints (N, M <= 10,000,000), exact O(NM) algorithms (like standard Dynamic Programming) will exceed Time/Memory limits. You must find an approximate solution that minimizes the total edit distance D.

Let: D* = The true optimal (minimum) Edit Distance (hidden). D = The Edit Distance calculated from your output. L_base = max(N, M) (The trivial distance obtained by replacing the entire string).

The score is calculated based on how much "improvement" you make over the trivial solution, relative to the optimal improvement:

Score = ( (L_base - D) / (L_base - D*) ) * 100

Thus:

Score = 100.0 means you found an optimal Edit Distance (D = D*).
Score = 0.0 means your solution is no better than simply rewriting the whole string.

Input Format

The input consists of two lines.

The first line contains the string S1.
The second line contains the string S2.

The strings contain only uppercase English characters ('A'-'Z') and digits ('0'-'9').

Output Format

Output exactly one line containing a string T (the Transcript or Edit Script). This string describes the alignment between S1 and S2.

The string T must consist only of the following characters:

'M' (Match/Substitute): Consumes one character from S1 AND one character from S2.
- If S1[i] == S2[j], Cost = 0.
- If S1[i] != S2[j], Cost = 1 (Substitution).
'D' (Delete): Consumes one character from S1. (Cost = 1).
'I' (Insert): Consumes one character from S2. (Cost = 1).

Validity Requirements: Your transcript T must exactly cover both strings:

The total number of 'M's + 'D's must equal |S1|.
The total number of 'M's + 'I's must equal |S2|.

If the transcript is invalid (does not consume strings fully or over-consumes), Score = 0.

Scoring

The judge calculates your distance D by traversing your transcript string T:

Initialize i = 0, j = 0, D = 0. Iterate through each character op in T:

If op is 'M': If S1[i] != S2[j]: D += 1 i += 1, j += 1
If op is 'D': D += 1 i += 1
If op is 'I': D += 1 j += 1

Finally, the score is computed using the formula in the Context section.

Example

Input: KITTEN SITTING

Output: MMMMMMI

Explanation: S1 = "KITTEN" (len 6), S2 = "SITTING" (len 7). Transcript T = "MMMMMMI"

Detailed Trace:

'M': S10 vs S20 -> Diff -> Cost 1. (i=1, j=1)
'M': S11 vs S21 -> Same -> Cost 0. (i=2, j=2)
'M': S12 vs S22 -> Same -> Cost 0. (i=3, j=3)
'M': S13 vs S23 -> Same -> Cost 0. (i=4, j=4)
'M': S14 vs S24 -> Diff -> Cost 1. (i=5, j=5)
'M': S15 vs S25 -> Same -> Cost 0. (i=6, j=6)
'I': Insert S26 -> Ins -> Cost 1. (i=6, j=7)

Final Check: i = 6 (|S1|), j = 7 (|S2|). Valid. Total Distance D = 1 + 0 + 0 + 0 + 1 + 0 + 1 = 3.

Score Calculation: L_base = max(6, 7) = 7. Assume optimal D* = 3. Score = (7 - 3) / (7 - 3) * 100 = 100.0.

Constraints

1 <= |S1|, |S2| <= 10,000,000 (1e7)
Character Set: [A-Z, 0-9]
Time Limit: 3.0s
Memory Limit: 512MB

Edit Transcript LCS

Edit Transcript LCS

Agentics Interface

Scoring

Original Statement

Edit Distance Challenge (Approximation)

Context

Input Format

Output Format

Scoring

Example

Constraints

Configuration

Metrics

Latest Submissions

Top Rankings