# Strings Pattern Matching --- ## Problem 1 Longest Prefix String Given an array of strings. Find the longest string which is a prefix of all strings in the array ### Example: A = ["axdc", "axbc", "axdb"] Ans : **"ax"** Here **ax** is the longest string which is a prefix of all the strings in the array. --- ### Question What is the Longest Common Prefix for the given array of Strings? `A = ["aabc", "aaabc", "aabc"]` **Choices** - [ ] aabc - [ ] aaabc - [x] aa - [ ] aaa **Explanation:** The LCS is **aa**. The strings are, **aa**bc **aa**abc **aa**bc Here we can notice that the Common Prefix string is "aa" occurs in all the three strings. --- ### Longest Prefix String Solution Approach using Tries We will take the same example, A = ["axdc", "axbc", "axdb"] Lets construct a Trie, for the above strings ``` root | a | x / \ d b / \ \ c b c ``` We can observe that, the length of the common prefix is equal to the length of the path until which, it has only one path. ### Complexity **Time Complexity**: O(Sum of length of all words) **Space Complexity**: O(Sum of length of all words) --- ### Longest Prefix String Brute Force Approach ### Example A = ["axdc", "axbc", "axdb"] Ans = ax ### Approach A brute force approach to solve this problem would involve iterating through each character of the strings in the array simultaneously and comparing them until a mismatch is found. Absolutely! Sometimes, simplicity is the pinnacle of optimization. Just as you don't need a fancy tool to trim your nails when a simple nail clipper will suffice, there are situations where brute force methods prove to be the most efficient solution. It's a testament to the elegance of problem-solving: embracing the most straightforward approach can often yield the most effective results. So, don't overlook the power of brute force when it's the perfect fit for the task at hand. After all, why complicate things when simplicity gets the job done beautifully? ### Pseudo Code ```java Function longest_common_prefix(strings) min_length = length of shortest string in strings for i from 0 to min_length - 1 for j from 1 to length of strings - 1 if strings[j][i] is not equal to strings[0][i] return substring of strings[0] from index 0 to i - 1 return substring of strings[0] from index 0 to min_length - 1 ``` ### Complexity **Time Complexity**: O(length of strings) **Space Complexity**: O(length of 1 string) --- ## Boring String What is a Boring Substring ? 1. Length = 2 2. If ASCII value of 1st char is x, ASCII value of 2nd char can either be (x + 1) or (x - 1) Example: ab, bc, ba, bc, ef, ji --- ### Question Figure out the Boring String. **Choices** - [ ] aa - [ ] ac - [x] nm - [ ] tr **Explanation:** "nm" is the Boring Substring. --- ### Problem 2 No Boring Substring Given a String S (char array). Rearrange the character in such a way that their is no Boring Substring present. **Example:** S = "abcd" The Strings with no Boring substrings are cadb, bdac. --- ### Question Is it possible to rearrange the characters in the string, in such a way that their is no Boring Substring present. S = "abc" **Choices** - [ ] Yes - [x] No - [ ] Not sure --- **Explanation:** The possible permutations are "abc", "acb", "bac" and "bca", "cab" and "cba". All the possibilities has a Boring string present in it. So it is impossible to rearrange. --- ### Brute Force => Generate all permutations of string and for each permutation, Check whether the Boring String exists or not. Time Complexity = O(N! * N) Space Complexity = O(N!) ### Optimised Approach <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/068/020/original/string.img.jpeg?1710353139" width=600px> Here we can notice that the two corner points, may form a Boring Substring. <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/068/025/original/lines.jpeg?1710353881" width=500px> #### How to Find these two corner elements ? Now the problem is shortned into finding the two corner elements. So, what will be the possible Approaches. Lets think about the Brute Force. #### Brute Force Here, for each element in the even char, we can traverse all the elements in the odd char to find the possible pair. Time Complexity = O(N * N) #### Optimization Can we optimise finding the middle element ? <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/068/031/original/alphaline.jpeg?1710354652" width=500px> Here, we can notice that, by using the two end points, we can be able to reduce the formation of boring substring. Thus the solution will be one of the below, 1. Maximum of Odd, Minimum of Even 2. Minimum of Odd, Maximum of Even Overall Time Complexity = O(N + N) = O(N) --- ## Problem 3 Smallest Period Given a string S of size N. Find the smallest period of the string. **What is a period ?** If the period of a string is K . S[i] = S[i % K] ### Example: s = "abcaabcaabca" | i | 0 1 2 3 | 4 5 6 7 | 8 9 10 11 | | --- | --- | --- | --- | | S | a b c a | a b c a | a b c a | | i % 4 | 0 1 2 3 | 0 1 2 3 | 0 1 2 3 | | S[i%4] | a b c a | a b c a | a b c a | Now lets take another example, S = "abcaabcaab" Just removed the last two characters of the previous example. What is the smallest period now ? The smallest period is 4. | i | 0 1 2 3 | 4 5 6 7 | 8 9 10 11 | | --- | --- | --- | --- | | S | a b c a | a b c a | a b | | i%4 | 0 1 2 3 | 0 1 2 3 | 0 1 | | S[i%4] | a b c a | a b c a | a b | --- ### Question What is the smallest period of the given string? S = "abcabca" **Choices** - [ ] 1 - [ ] 7 - [x] 3 - [ ] 4 **Explanation:** The smallest period of the string "abcabca" is 3 | i | 0 1 2 | 3 4 5 | 6 | | --- | ---- | --- | - | | S | a b c | a b c | a | | i%3 | 0 1 2 | 0 1 2 | 0 | | S[i%3] | a b c | a b c | a | --- ### Brute Force Solution Minimum possible period = 1, for "aaaaaa" Maximum possible period = N, for "abcdef" So the minimum possible period can be between 1 to N. => Iterate & check for for the period condition S[i] = S[i % k] ### Pseudo Code ```cpp for (K = 1; K < N; K++) { boolean isPeriod = True; for (i = 0; i < N; i++){ if(s[i] != s[i % k]){ isPeriod = False; break; } } if (isPeriod == True){ return K; } } ``` Time Complexity = O(N$^2$) ### Optimisated Solution Lets try with Z-algo now, To solve any problem using Z-algo, we need to create a Z array of length N for an Input. Let Z[i] be the Length of the longest substring starting at index i which is also a prefix of the entire string. S = "abcaabcaabca" Lets create the Z-array. <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/679/original/upload_d7ceb734c6ea3caa332585c85cd401e1.png?1711937330" width="500px"> The 0th index is always equal to the length of the string. Now, lets find for each index one by one, <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/680/original/upload_36fc6f38d81b6f701cc02ecf2a840f77.png?1711937499" width="500px"> While finding the value for a particular index, Increment until you find a mismatch. <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/687/original/upload_577ee5f6718c53919b800ead148b0e03.png?1711947839" width="500px"> Finding the values one by one, <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/689/original/upload_a404812f17a09ea743a7815616a45edb.png?1711947917" width="500px"> This is the Final Z array, <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/690/original/upload_d93fce2089a2896dcd60b7a13ecea2a4.png?1711948027" width="500px"> How to find the periods? For now, lets find the what are all the periods, then will move on to finding the minimum one. For all index i where (i + Z[i] == N) are the periods of the string. <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/691/original/upload_2594fc405251b1a0ccb96136f01f1cbe_%281%29.png?1711948145" width="500px"> The indexes which has (i + Z[i] == N) are 0, 4, 8, and 11. We ingore 0, always. **How (i + Z[i] == N) are periods?** <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/692/original/upload_afafa42498c6697bbaa560e13158186c.png?1711948282" width="500px"> If the first part (0 - i)th index is alpha, then from i we have the alpha again and it will repeat. <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/693/original/upload_17f7a256d6366cd76b0b808b91a2de09_%281%29.png?1711948427" width=500px> Lets take the example, Consider "abca" as Alpha, <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/738/original/upload_8d9a457006bba8f016eba99fad035193.png?1711969581" width="500px"> We can see that, the pattern will go like, this <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/739/original/upload_7adb03582363d3c5eed109b9520f5a2b.png?1711969659" width="500px"> This makes the index as the period of the string. ### Psuedocode Psuedo code, when the Z array is given ```cpp for(i = 1; i<N; i++){ if(i + Z[i] == N){ return i; } } return N; ``` Time Complexity to find period = O(N) --- ### Smallest Period Z Array Calculation How to calculate Z Array ?? S = a b c a a b c a a b c a ### Brute force Approach ```java Z[N]; Z[0] = N; for (i = 1; i < n; i++) { len = 0; for(j = 1; j < N; j++){ if((S[j] == S[j-1])){ len++; } else{ break; } } Z[i] = len; } ``` ### Complexity **Time Complexity**: O(N$^2$) ### Observation If we're using O(N$^2$) for creating the Z array and O(N) for finding the smallest period, we can achieve the same results with our initial brute force solution. The key focus now is optimizing the time complexity for calculating the Z array. ### Optimised Approach Lets take a String, S. <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/740/original/upload_7d9e598cfbeeda253e09ee5381c16c23.png?1711969749" width="500px"> Right now, brute force the values. <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/741/original/upload_9c0adfe1b176e73fb2e5a3a9792e408f.png?1711969828" width="500px"> We copy the values until (i + Z[i]) is within the matching region. Because, if it exceeeds the matching region, we dont know what the value is actually there. <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/743/original/Untitled.png?1711970523" width="500px"> ![image](https://hackmd.io/_uploads/S1n7CuDJA.png) ![image](https://hackmd.io/_uploads/S1mo0_PyC.png) on index 13, (i + Z[i]) exceeds the matching region. We dont have the visiblity at index 17. There is a mismatch actually. So when (i + Z[i]) exceeds the matching region. We stop the current and recalculating from the start. ![image](https://hackmd.io/_uploads/SJ53RdvJC.png) ![image](https://hackmd.io/_uploads/SkxT1tvy0.png) ### Pseudo Code ```java Z[N], Z[0]=N; S = C = 0; Count = 0; for (i = 1; i<n; i++){ if (i>e) { e = s = i; Count = 0; while (e < N && S[e] == s[e-s]){ e++; Count++; } e--; z[i] = Count; }else{ if (i + Z[i-s] <= e) { Z[i] = Z[i - s]; } else{ s = i; e = i + Z[i - S]; Count = 0; while (e < N && S[e] == S[e - s]){ e++; Count++; } e--; Z[i] - Count; } } } ``` ### Complexity **Time Complexity**: O(2 * N) = O(N) --- ## Problem 4 String Pattern Matching You are given an String S and a Pattern P, You need to find how many times, the pattern occur in the String. **Example:** S = abcefbabedemababc (N) P = abc (M) ### Brute Force Approach Brute Force => (sliding window of size M) (char by char comparison) **Time Complexity:** <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/698/original/upload_0517eecf2b086ae6195ebf0b15f4c812.png?1711949394" width="500px"> ### Optimisation Using Zalgo How Z-algo helps here ? Recap: Z[i] be the Length of the longest substring starting at index i which is also a **prefix** of the entire string. Here, we need to ideally need to compare every substring with the pattern, P. So what we simply prepend pattern at the prefix of the main string. <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/697/original/upload_d21edc38c481721d633ecd2bc1673cd5.png?1711949204" width="500px"> But we cant, directly, calcuate the Z-array here. <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/696/original/upload_50bcd487bc862c25ea5b9198ff8a15ea.png?1711949122" width="500px"> So, We will introduce a special character (#, or any special character) Now we can find the Z-array for the above string. <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/695/original/upload_b3626b5bf5e2a1e719705e27aa042b34.png?1711948807" width="500px"> As we can see that, the indices which has a Z-value as 3, matches the pattern. <img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/069/694/original/upload_fc7042fd0f06a5e575b7061d5ae04738.png?1711948680" width="500px"> The highlighted indices are the matched patterns.