1408: String-Matching-in-an-Array
Easy
table of contents
Apparently, string searching is all the rage nowadays.
There are 2 algorithms (can be used interchangebly) that are best for finding occurences of a pattern string within a text string.
(brute-force is still an option if all-else fails)
1. Knuth-Morris-Pratt Algorithm:
This algorithm aims to generate an array of integers such
that for a given index i
, its integer value is equal to the
length of the longest substring that
ends at i
, which also so happens to be the prefix
substring of the entire string (Longest Prefix Suffix
Array or LPS
array). This is useful as, normally, when we
compare a sub
string to a main
string, one
mismatch would shift the index position by 1 and restart the comparison
from the first character of sub
. However, this does not
take into account that some characters of sub
may
already match; the KMP algorithm aims to
skip these unnecessary
comparisons.
For example, with sub = ababaca
:
Consequently, if a mismatch was to occur at, for example, index
position 5
, then we know that the previous 5 letters (index
position 0
to 4
) matched. When we look at the
LPS
array generated by the KMP algorithm, we can
look at the value stored at index position 5-1 = 4
to see
that the length of the longest prefix
substring that was also the longest suffix substring was
3. This implies that the first 3 letters (at index position
0
to 2
) matched the last 3 letters (at index
position 2
to 4
) meaning we can start the
comparison of the sub
string from index 3
.
code
class Solution {
private:
<int> LPS (string &s) {
vector<int> LPS (s.size(), 0);
vectorint p = 0;
for (int i = 1; i < s.size(); ++i) {
while (p != 0 && s[p] != s[i]) {
= LPS[p-1];
p }
if (s[p] == s[i]) {
++p;
}
[i] = p;
LPS}
return LPS;
}
bool KMP (vector<int> &LPS, string &s1, string &s2) {
int p = 0;
for (int i = 0; i < s2.size(); ++i) {
while (p != 0 && s1[p] != s2[i]) {
= LPS[p-1];
p }
if (s1[p] == s2[i]) {
++p;
}
if (p == s1.size()) {
return true;
}
}
return false;
}
public:
<string> stringMatching(vector<string>& words) {
vector(words.begin(), words.end(), [&](auto s1, auto s2){return s1.size() < s2.size();});
sort<string> ans;
vectorfor (int i = 0; i < words.size(); ++i) {
<int> LPS_vt = LPS(words[i]);
vectorfor (int j = i+1; j < words.size(); ++j) {
if (KMP(LPS_vt, words[i], words[j])) {
.push_back(words[i]);
ansbreak;
}
}
}
return ans;
}
};
complexity
2. Z Algorithm:
This algorithm aims to generate an array of integers,
Z
, such that for a given index i
, its integer
value is equal to the length of the longest
prefix substring of the entire string, which also so
happens to be equal to the to the substring starting at
index position i
.
For example, with string = aabxaayaab
:
Taking a closer look at Z_7 = 3
is because the prefix
substring “aabxaayaab” is also present at index
7
“aabxaayaab”.
An in-depth explanation of this can be found here.
code
class Solution {
public:
bool Z(string &s1, string &s2) {
<int> z(s1.size()+s2.size()+1);
vector= s1 + " " + s2;
string s int L = 0, R = 0;
for (int i = 1; i < s.size(); ++i) {
if (i > R) {
= R = i;
L while (R < s.size() && s[R-L] == s[R]) {
++R;
}
[i] = R-L;
z} else {
int k = i-L;
if (z[k] + i < R) {
[i] = z[k];
z} else {
= i;
L while (R < s.size() && s[R-L] == s[R]) {
++R;
}
[i] = R-L;
z}
}
if (z[i] == s1.size()) return true;
}
return false;
}
<string> stringMatching(vector<string>& words) {
vector(words.begin(), words.end(), [&](auto s1, auto s2){return s1.size() < s2.size();});
sort<string> ans;
vectorfor (int i = 0; i < words.size(); ++i) {
for (int j = i+1; j < words.size(); ++j) {
if (Z(words[i], words[j])) {
.push_back(words[i]);
ansbreak;
}
}
}
return ans;
}
};
complexity
Author’s Note:
- another link that is very useful for string searching