3. Longest Substring Without Repeating Characters

# 3. Longest Substring Without Repeating Characters ## me, set(), one-by-one ```python= class Solution: def lengthOfLongestSubstring(self, s: str) -> int: l = 0 r = 0 sofarmaxlen = 0 set_ = set() while r < len(s): if s[r] not in set_: set_.add(s[r]) r += 1 curlen = r - l sofarmaxlen = max(sofarmaxlen, curlen) else: set_.remove(s[l]) l += 1 return sofarmaxlen ``` ## use dict() to jump ```python= class Solution: def lengthOfLongestSubstring(self, s: str) -> int: l = 0 r = 0 sofarmaxlen = 0 dict_ = {} while r < len(s): if s[r] in dict_: l = max(l, dict_[s[r]] + 1) # jump to the next of the previously repeating char, # if that char is after the starting char we are using now; # that's why we need max() dict_[s[r]] = r curlen = r - l + 1 # if starting from index 0 and we are now at index 0, # the curlen should be 1 sofarmaxlen = max(sofarmaxlen, curlen) r += 1 return sofarmaxlen ``` ## h2 - problem clarification: - substring - need to be continuous, not subsequence - char type? - `s` consists of English letters, digits, symbols and spaces. - size - `0 <= s.length <= 5 * 10^4` - target at linear time or n log n - **“longest”** - greedy vs DP vs brute-force vs backtracking - Let’s check: how to answer a larger problem from a smaller problem - abc ⇒ abcd - Can get a larger solution - abc ⇒ abcc - any substring containing “cc” is nonpromising - abc ⇒ abca - any substring containing “abca” is nonpromising - simple relationship from subproblem to larger problem, no braching needed - Greedy - **“non-repeating”**; how to detect repeating efficiently? - Use a counter - hashmap vs direct-counter - English letters, digits, symbols and spaces - ASCII? Unicode? - random fact: number of unicode char? - ~3e5 - [https://www.unicode.org/versions/stats/charcountv14_0.html](https://www.unicode.org/versions/stats/charcountv14_0.html) - All ASCII or fixed range ⇒ handmade hashmap by array - Hard to enumerate ⇒ Use hashmap # brute-force - `for i in 0~n-1` - `for in i~n-1` - `get substring as string[i ~ j]` - `if nonrepeating(), update answer as the larger` - Even if nonrepeating() is constant-time, Square time will be 10^8, too slow. - Note that `get substring as string[i ~ j]`can be slow for like Python2/3 - 10^6 ⇒ good - 10^7,8,9 ⇒ it depends - 10^10 ⇒ out - what means slow: [https://discuss.codechef.com/t/how-to-estimate-if-an-algorithm-performs-within-the-time-limits/5743](https://discuss.codechef.com/t/how-to-estimate-if-an-algorithm-performs-within-the-time-limits/5743) Optimization: 1. Should try from longer candidates then stop early. Not helping the big-O. QQ 2. Substrings overlapped largely. I should reuse that info. 1. Sliding windows / two pointer 1. So we don’t have to literally create the substring 2. hashmap updated inplace (reuse) 1. the info of counter is the statistical description of the substring 2. and that’s extendible/transferable # sliding window + counter time: O(n) // n is number of char in string space: O(n) // b/c of counter; if range of char in constant, then space is constant ```python left_ptr = 0 right_ptr = 0 initiate sliding window initiate a counter while still inbound, right_ptr++: grow window LHS; update counter while window/counter is repeating, left ptr++: # no need to iterate whole counter, just check new comer shrink window LHS; update counter // at here, guaranteed the window is nonrepeating update the answer ``` # Follow up - The counter is actually only being used to check seen or not, can use some cheaper data structure. E.g., hashset. If really care about mem, go bitwise - And the left ptr will always move to the next position of the fist of the repeating pair. Memorize the position to jump ![](https://hackmd.io/_uploads/B1m3p84T5.gif)