Leetcode 804 "Unique Morse Code Words": Optimized Solution
I over-engineered my solution to this Leetcode easy problem and it's over twice as fast as the accepted optimal solution. I can't find anywhere else that mentions this approach and I thought my solution was cool, so I made a writeup on how it works.
Problem
International Morse Code defines a standard encoding where each letter is mapped to a series of dots and dashes, as follows:
- 'a' maps to ".-",
- 'b' maps to "-...",
- 'c' maps to "-.-.", and so on.
For convenience, the full table for the 26 letters of the English alphabet is given below:
[".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--.."]
Given an array of strings words where each word can be written as a
concatenation of the Morse code of each letter, how many unique Morse code
encodings are there?
For example, "cab" can be written as "-.-..--...", which is the concatenation of "-.-.", ".-", and "-...". We will call such a concatenation the transformation of a word. Return the number of different transformations among all words we have.
-
Example 1
Input:
words = ["gin","zen","gig","msg"]Output:
2Explanation: The transformation of each word is:
"gin" -> "--...-." "zen" -> "--...-." "gig" -> "--...--." "msg" -> "--...--."There are 2 different transformations:
"--...-."and"--...--.". -
Example 2
Input:
words = ["a"]Output:
1
Constraints:
- 1 <= words.length <= 100
- 1 <= words[i].length <= 12
- words[i] consists of lowercase English letters.
Accepted Optimal Solution
A top ranked solution in terms of speed runs in 0ms on Leetcode's test cases.
Approach:
- Convert each input string into morse code
- Count the number of unique morse codes by adding them all to a set.
morse = [
".-","-...","-.-.","-..",".","..-.","--.",
"....","..",".---","-.-",".-..","--","-.",
"---",".--.","--.-",".-.","...","-","..-",
"...-",".--","-..-","-.--","--.."
]
def toMorse(word: str) -> str:
charsToMorseCode = [morse[ord(n) - ord('a')] for n in a]
return ''.join(charsToMorseCode)
def uniqueMorseRepresentations(words: List[str]) -> int:
morseCodes = set(toMorse(w) for w in words)
return len(morseCodes)
My Solution
In the accepted solution, the input words go through the following
transformations hash(toMorse(word)):
- Each character in the word is mapped to its morse code equivalent
- The morse code equivalents are concatenated to create the full morse-code encoding.
- The full morse code encoding is hashed (when added to the set)
My solution proposes that we can compute the hash code of the morse-code
encoding directly from the original string (ie hashMorseEncoding(word)). We
don't have to first translate the word into morse code.
I'm framing this mathematically because algebra is necessary for my argument.
Definitions
In our case, f(c) is the mapping from letter to morse code. The operator
∥ denotes string concatenation.
Note that hash(s) is the
polynomial rolling hash function.
These definitions are the same as:
f = dict() # char -> str
def F(s: str) -> str:
return "".join(f[c] for c in s)
def hash(s: str) -> int:
output = 0
pow = 1
for i, c in enumerate(s):
v = 2 if c == '-' else 1
output = (output + v * pow) % m
pow *= p
return output
Theorem
def superPolynomialHash(s: str) -> int:
acc = 0
for i in range(len(s)):
pow = sum(len(f(c)) for c in s[:i])
acc += hash(f(s[i])) * p ** pow
return acc
We can compute hash(F(s)) without ever having to call F(s) by using the
expression on the right-hand side.
Proof
Suppose we want hash(F("abc")). Encoded in Morse code,
Each term in hash(s') is one Morse symbol. Group them by which letter they
belong to (the first two belong to "a") and factor:
Those grouped expressions are:
Substituting back:
The exponent on each p^n term is the number of Morse symbols before that
letter in s'.
Unoptimized Implementation
We can use the literal code translation to implement this, but this solution will be very slow (in fact, slower than the original!).
def hash(s: str) -> int:
output = 0
pow = 1
for i, c in enumerate(s):
v = 2 if c == '-' else 1
output = (output + v * pow) % m
pow *= p
return output
def superPolynomialHash(s: str) -> int:
acc = 0
for i in range(len(s)):
pow = sum(len(f(c)) for c in s[:i])
acc += hash(f(s[i])) * p ** pow
return acc
def uniqueMorseRepresentations(words: List[str]) -> int:
hashes = { superPolynomialHash(w) for w in words }
return len(hashes)
Optimized Solution Part 1 — Precomputing
Before running the actual solution, compute certain values which we will hard-code into the actual solution.
- The hash value of each morse code unit
- Must use the polynomial rolling hash function
- The powers of p (same p used in hashing)
- Additionally, we don't need to store the original morse code strings in our program anymore. To save memory, we can just save the lengths of each morse code string instead. The reason why will be apparent later.
morse_codes = [
".-","-...","-.-.","-..",".","..-.","--.",
"....","..",".---","-.-",".-..","--","-.",
"---",".--.","--.-",".-.","...","-","..-",
"...-",".--","-..-","-.--","--.."
]
m = 2 ** 31 - 1
p = 3
lengths = [len(mc) for mc in morse_codes]
print("lengths:", lengths)
max_code_length = max(lengths)
p_powers = [p ** i for i in range(max_code_length + 1)]
print("powers of p:", p_powers)
def polynomialHash(s: str) -> int:
output = 0
pow = 1
for i, c in enumerate(s):
v = 2 if c == '-' else 1
output = (output + v * pow) % m
pow *= p
return output
hashes = [polynomialHash(mc) for mc in morse_codes]
print("hashes:", hashes)
lengths: [2, 4, 4, 3, 1, 4, 3, 4, 2, 4, 3, 4, 2, 2, 3, 4, 4, 3, 3, 1, 3, 4, 3, 4, 4, 4]
powers of p: [1, 3, 9, 27, 81]
hashes: [7, 41, 50, 14, 1, 49, 17, 40, 4, 79, 23, 43, 8, 5, 26, 52, 71, 16, 13, 2, 22, 67, 25, 68, 77, 44]
Optimized Solution Part 2 — Solution
lengths = [2, 4, 4, 3, 1, 4, 3, 4, 2, 4, 3, 4, 2, 2, 3, 4, 4, 3, 3, 1, 3, 4, 3, 4, 4, 4]
hashes = [7, 41, 50, 14, 1, 49, 17, 40, 4, 79, 23, 43, 8, 5, 26, 52, 71, 16, 13, 2, 22, 67, 25, 68, 77, 44]
P = [1, 3, 9, 27, 81]
m = 2 ** 31 - 1
def superPolynomialHash(s: str) -> int:
output = 0
for i, c in reversed(s):
ii = ord(c) - ord('a')
l = lengths[ii]
output = (output * P[l] + hashes[ii]) % m
return output
def uniqueMorseRepresentations(words: List[str]) -> int:
hashes = { superPolynomialHash(w) for w in words }
return len(hashes)
Drawback
There are a lot of hash collisions with the p and m I chose, but it didn't seem to matter with the test cases Leetcode had.
Time and Space Complexity
N = len(words)
M = max(word[i].length)
H = maximum length of a string f(c) can output (4 in this case)
L = # of different letters that can be in a word (26 in this case)
| Solution | Time | Space |
|---|---|---|
| Accepted | O(NMH) | O(HL) |
| My unoptimized | O(NM(M+H+log(H))) = O(NM^2H) | O(L) |
| My optimized | O(NM) | O(L) |
Comparison
The Leetcode test cases are not very large so it's very easy to cap out at 0ms. I ran my own test cases 1,000,000 times.
| Case | Mine | Top Solution |
|---|---|---|
| Default test case | .096s | .236s |
| arr[i].len > 10000 | 40.59s | 96.18s |