1408: String-Matching-in-an-Array
Easy 2025-01-07

1. Knuth-Morris-Pratt Algorithm:
code
- complexity
- 2. Z Algorithm:
code
- complexity
- Author’s Note:

Apparently, string searching is all the rage nowadays.

There are 2 algorithms (can be used interchangebly) that are best for finding occurences of a pattern string within a text string.

(brute-force is still an option if all-else fails)

1. Knuth-Morris-Pratt Algorithm:

This algorithm aims to generate an array of integers such that for a given index i, its integer value is equal to the length of the longest substring that ends at i, which also so happens to be the prefix substring of the entire string (Longest Prefix Suffix Array or LPS array). This is useful as, normally, when we compare a sub string to a main string, one mismatch would shift the index position by 1 and restart the comparison from the first character of sub. However, this does not take into account that some characters of sub may already match; the KMP algorithm aims to skip these unnecessary comparisons.

For example, with sub = ababaca:

Consequently, if a mismatch was to occur at, for example, index position 5, then we know that the previous 5 letters (index position 0 to 4) matched. When we look at the LPS array generated by the KMP algorithm, we can look at the value stored at index position 5-1 = 4 to see that the length of the longest prefix substring that was also the longest suffix substring was 3. This implies that the first 3 letters (at index position 0 to 2) matched the last 3 letters (at index position 2 to 4) meaning we can start the comparison of the sub string from index 3.

code

class Solution {
private:
    vector<int> LPS (string &s) {
        vector<int> LPS (s.size(), 0);
        int p = 0;
        for (int i = 1; i < s.size(); ++i) {
            while (p != 0 && s[p] != s[i]) {
                p = LPS[p-1];
            }
            if (s[p] == s[i]) {
                ++p;
            } 
            LPS[i] = p;
        }
        return LPS;
    }
    bool KMP (vector<int> &LPS, string &s1, string &s2) {
        int p = 0;
        for (int i = 0; i < s2.size(); ++i) {
            while (p != 0 && s1[p] != s2[i]) {
                p = LPS[p-1];
            }
            if (s1[p] == s2[i]) {
                ++p;
            }
            if (p == s1.size()) {
                return true;
            }
        }
        return false;
    }
public:
    vector<string> stringMatching(vector<string>& words) {
        sort(words.begin(), words.end(), [&](auto s1, auto s2){return s1.size() < s2.size();});
        vector<string> ans;
        for (int i = 0; i < words.size(); ++i) {
            vector<int> LPS_vt = LPS(words[i]);
            for (int j = i+1; j < words.size(); ++j) {
                if (KMP(LPS_vt, words[i], words[j])) {
                    ans.push_back(words[i]);
                    break;
                }
            }
        }
        return ans;
    }
};

complexity

2. Z Algorithm:

This algorithm aims to generate an array of integers, Z, such that for a given index i, its integer value is equal to the length of the longest prefix substring of the entire string, which also so happens to be equal to the to the substring starting at index position i.

For example, with string = aabxaayaab:

at index 0, Z_0 = 10 (aabxaayaab is both prefix and substring starting at index 0)
at index 1, Z_1 = 1 (a is both prefix and substring starting at index 1)
at index 2, Z_2 = 0
at index 3, Z_3 = 0
at index 4, Z_4 = 2 (aa is both prefix and substring starting at index 4)
at index 5, Z_5 = 1 (a is both prefix and substring starting at index 5)
at index 6, Z_6 = 0
at index 7, Z_7 = 3 (aab is both prefix and substring starting at index 7)
at index 8, Z_8 = 1 (a is both prefix and substring starting at index 8)
at index 9, Z_9 = 0

Taking a closer look at Z_7 = 3 is because the prefix substring “aabxaayaab” is also present at index 7 “aabxaayaab”.

An in-depth explanation of this can be found here.

code

class Solution {
public:
    bool Z(string &s1, string &s2) {
        vector<int> z(s1.size()+s2.size()+1);
        string s = s1 + " " + s2;
        int L = 0, R = 0;
        for (int i = 1; i < s.size(); ++i) {
            if (i > R) {
                L = R = i;
                while (R < s.size() && s[R-L] == s[R]) {
                    ++R;
                }
                z[i] = R-L;
            } else {
                int k = i-L;
                if (z[k] + i < R) {
                    z[i] = z[k];
                } else {
                    L = i;
                    while (R < s.size() && s[R-L] == s[R]) {
                        ++R;
                    }
                    z[i] = R-L;
                }
            }
            if (z[i] == s1.size()) return true;
        }
        return false;
    }
    vector<string> stringMatching(vector<string>& words) {
        sort(words.begin(), words.end(), [&](auto s1, auto s2){return s1.size() < s2.size();});
        vector<string> ans;
        for (int i = 0; i < words.size(); ++i) {
            for (int j = i+1; j < words.size(); ++j) {
                if (Z(words[i], words[j])) {
                    ans.push_back(words[i]);
                    break;
                }
            }
        }
        return ans;
    }
};

complexity

Author’s Note:

another link that is very useful for string searching

1408: String-Matching-in-an-Array Easy 2025-01-07

table of contents

1. Knuth-Morris-Pratt Algorithm:

code

complexity

2. Z Algorithm:

code

complexity

Author’s Note:

1408: String-Matching-in-an-Array
Easy 2025-01-07