1408: String-Matching-in-an-Array
Easy
Apparently, string searching is all the rage nowadays.
There are 2 algorithms (can be used interchangebly) that are best for finding occurences of a pattern string within a text string.
(brute-force is still an option if all-else fails)
1. Knuth-Morris-Pratt Algorithm:
This algorithm aims to generate an array of integers
such that for a given index i
, its integer value is
equal to the length of the longest
substring that ends at i
, which also so
happens to be the prefix substring of the
entire string (Longest Prefix Suffix Array or
LPS
array). This is useful as, normally, when we
compare a sub
string to a main
string,
one mismatch would shift the index position by 1 and restart the
comparison from the first character of sub
. However,
this does not take into account that some characters of
sub
may already match; the KMP
algorithm aims to skip these unnecessary
comparisons.
For example, with sub = ababaca
:
Consequently, if a mismatch was to occur at, for example, index
position 5
, then we know that the previous 5 letters
(index position 0
to 4
) matched. When we
look at the LPS
array generated by the KMP
algorithm, we can look at the value stored at index position
5-1 = 4
to see that the length of
the longest prefix substring that was also the
longest suffix substring was 3. This implies that the
first 3 letters (at index position 0
to
2
) matched the last 3 letters (at index position
2
to 4
) meaning we can start the
comparison of the sub
string from index
3
.
Code:
class Solution {
private:
<int> LPS (string &s) {
vector<int> LPS (s.size(), 0);
vectorint p = 0;
for (int i = 1; i < s.size(); ++i) {
while (p != 0 && s[p] != s[i]) {
= LPS[p-1];
p }
if (s[p] == s[i]) {
++p;
}
[i] = p;
LPS}
return LPS;
}
bool KMP (vector<int> &LPS, string &s1, string &s2) {
int p = 0;
for (int i = 0; i < s2.size(); ++i) {
while (p != 0 && s1[p] != s2[i]) {
= LPS[p-1];
p }
if (s1[p] == s2[i]) {
++p;
}
if (p == s1.size()) {
return true;
}
}
return false;
}
public:
<string> stringMatching(vector<string>& words) {
vector(words.begin(), words.end(), [&](auto s1, auto s2){return s1.size() < s2.size();});
sort<string> ans;
vectorfor (int i = 0; i < words.size(); ++i) {
<int> LPS_vt = LPS(words[i]);
vectorfor (int j = i+1; j < words.size(); ++j) {
if (KMP(LPS_vt, words[i], words[j])) {
.push_back(words[i]);
ansbreak;
}
}
}
return ans;
}
};
Complexity:
2. Z Algorithm:
This algorithm aims to generate an array of integers,
Z
, such that for a given index i
, its
integer value is equal to the length of the
longest prefix substring of the entire
string, which also so happens to be equal to the to the
substring starting at index position
i
.
For example, with string = aabxaayaab
:
Taking a closer look at Z_7 = 3
is because the
prefix substring “aabxaayaab” is also present at
index 7
“aabxaayaab”.
An in-depth explanation of this can be found here.
Code:
class Solution {
public:
bool Z(string &s1, string &s2) {
<int> z(s1.size()+s2.size()+1);
vector= s1 + " " + s2;
string s int L = 0, R = 0;
for (int i = 1; i < s.size(); ++i) {
if (i > R) {
= R = i;
L while (R < s.size() && s[R-L] == s[R]) {
++R;
}
[i] = R-L;
z} else {
int k = i-L;
if (z[k] + i < R) {
[i] = z[k];
z} else {
= i;
L while (R < s.size() && s[R-L] == s[R]) {
++R;
}
[i] = R-L;
z}
}
if (z[i] == s1.size()) return true;
}
return false;
}
<string> stringMatching(vector<string>& words) {
vector(words.begin(), words.end(), [&](auto s1, auto s2){return s1.size() < s2.size();});
sort<string> ans;
vectorfor (int i = 0; i < words.size(); ++i) {
for (int j = i+1; j < words.size(); ++j) {
if (Z(words[i], words[j])) {
.push_back(words[i]);
ansbreak;
}
}
}
return ans;
}
};
Complexity:
Author’s Note:
- another link that is very useful for string searching