Edit distance 0-index solution failing

May 25, 2024

I implemented a 0-index based solution for edit distance problem on cses, but it is giving wrong answer verdict on one of the test cases.

#include <bits/stdc++.h>
 
using namespace std;
# define MOD 1000000007
 
string s; string t;
 
int cost(int i,int j){
    return !(s[i] == t[j]);
}
 
void solve() {
    // Write your solution here
    cin>>s; cin>>t;
    vector<vector<int>> dp(s.size(),vector<int>(t.size(),0));
    // dp[i][j] -> the minimum edit distance required to equal A[0-i] and B[0-j]
    dp[0][0] = cost(0,0);
    for (int i=1;i<s.size();i++){
        dp[i][0] = dp[i-1][0] + 1;
    }
    for (int j=1;j<t.size();j++){
        dp[0][j] = dp[0][j-1] + 1; 
    }
    for (int i=1;i<s.size();i++){
        for (int j=1;j<t.size();j++){
            dp[i][j] = min({dp[i-1][j-1] + cost(i,j),dp[i-1][j] + 1, dp[i][j-1] + 1});
            //cout<<dp[i][j]<<" ";
        }
    }
 
    cout<<dp[s.size()-1][t.size() -1];
}   
 
int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
 
    int t=1;
    //cin >> t;
 
    while (t--) {
        solve();
    }
 
    return 0;
}

As the minimum length of the strings is given to be 1. I did not care for including the cases when one string is empty.
I have defined the base case as follows.
dp[0][0] -> edit distance of the strings formed by taking first character from string 1 and first character from string 2. This is clearly 0 when both characters are equal and 1 when they are different.
Now, for the first column

for (i in size(string1))
dp[i][0] -> dp[i-1][0] + 1

as I keep taking more characters from one string and just one character from the other, I would simply need to add one to the previous dp state.
Similarly for the first row.

The test case where this is failing is too large to check manually.

Can someone please explain where this might be going wrong?

>Solution :

The tactical mistake: base of the dynamic programming.
By saying dp[0][0] = cost(0,0), you essentially say "the first operation is changing the front letter of s into the front letter of t".
There are other possible operations as well: removing the front letter of s and adding the front letter of t.

The strategic mistake: problem space.
A good way to think about the problem is to solve subproblems.
A subproblem (i, j) has the form "we ordered all operations from left to right, and we processed i letters of s and j letters of t".
Note that subproblems are valid for 0 <= i <= s.size() and 0 <= j <= t.size().
So, the problem space is actually (s.size()+1) times (t.size()+1).
Whereas your code has both sizes one less.

Edit: some small test cases to demonstrate the error are s="abc", t="bcd", or vice versa. The actual answer is 2 (remove front of s, add back of t), but the current code says 3.