I over-engineered my solution to this Leetcode easy problem and it's over twice as fast as the accepted optimal solution. I can't find anywhere else that mentions this approach and I thought my solution was cool, so I made a writeup on how it works.
International Morse Code defines a standard encoding where each letter is mapped to a series of dots and dashes, as follows:
For convenience, the full table for the 26 letters of the English alphabet is given below:
[".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--.."]
Given an array of strings words
where each word can be written as a
concatenation of the Morse code of each letter, how many unique Morse code
encodings are there?
For example, "cab" can be written as "-.-..--...", which is the concatenation of "-.-.", ".-", and "-...". We will call such a concatenation the transformation of a word. Return the number of different transformations among all words we have.
Example 1:Example 2:Input: words = ["gin","zen","gig","msg"]
Output: 2
Explanation: The transformation of each word is:"gin" -> "--...-." "zen" -> "--...-." "gig" -> "--...--." "msg" -> "--...--."
There are 2 different transformations: "--...-." and "--...--.".
Input: words = ["a"]
Output: 1
A top ranked solution in terms of speed runs in 0ms on Leetcode's test cases.
Approach:
morse = [
".-","-...","-.-.","-..",".","..-.","--.",
"....","..",".---","-.-",".-..","--","-.",
"---",".--.","--.-",".-.","...","-","..-",
"...-",".--","-..-","-.--","--.."
]
def toMorse(word: str) -> str:
charsToMorseCode = [morse[ord(n) - ord('a')] for n in a]
return ''.join(charsToMorseCode)
def uniqueMorseRepresentations(words: List[str]) -> int:
morseCodes = set(toMorse(w) for w in words)
return len(morseCodes)
In the accepted solution, the input words go through the following
transformations hash(toMorse(word))
:
My solution proposes that we can compute the hash code of the morse-code
encoding directly from the original string (ie hashMorseEncoding(word)
). We
don't have to first translate the word into morse code.
I'm framing this mathematically because algebra is necessary for my argument.
In our case, f(c)
is the mapping from letter to morse code.
Note that hash(s)
is just the
polynomial rolling hash function.
These definitions are the same as:
f = dict() # char -> str
def F(s: str) -> str:
return "".join(f[c] for c in s)
def hash(s: str) -> int:
output = 0
pow = 1
for i, c in enumerate(s):
v = 2 if c == '-' else 1
output = (output + v * pow) % m
pow *= p
return output
def superPolynomialHash(s: str) -> int:
acc = 0
for i in range(len(s)):
pow = sum(len(f(c)) for c in s[:i])
acc += hash(f(s[i])) * p ** pow
return acc
We can compute hash(F(s))
without ever having to call F(s)
by using the
expression on the right-hand side.
Suppose we wanted to find hash(F("abc"))
. Encoded in morse code, it is
s' = ".-" + "-..." + "-.-." = ".--...-.-."
Each term in hash(s')
represents a single char. Group them by which char they
belong to (ex. first two belong to "a") and factor them.
Note that these groups expressions are equivalent to:
So now we can swap these into the original equation like so:
It should be apparent that the n in the p^n factor for each term is the number of characters before it in the string.
We can use the literal code translation to implement this, but this solution will be very slow (in fact, slower than the original!).
def hash(s: str) -> int:
output = 0
pow = 1
for i, c in enumerate(s):
v = 2 if c == '-' else 1
output = (output + v * pow) % m
pow *= p
return output
def superPolynomialHash(s: str) -> int:
acc = 0
for i in range(len(s)):
pow = sum(len(f(c)) for c in s[:i])
acc += hash(f(s[i])) * p ** pow
return acc
def uniqueMorseRepresentations(words: List[str]) -> int:
hashes = { superPolynomialHash(w) for w in words }
return len(hashes)
Before running the actual solution, compute certain values which we will hard-code into the actual solution.
morse_codes = [
".-","-...","-.-.","-..",".","..-.","--.",
"....","..",".---","-.-",".-..","--","-.",
"---",".--.","--.-",".-.","...","-","..-",
"...-",".--","-..-","-.--","--.."
]
m = 2 ** 31 - 1
p = 3
lengths = [len(mc) for mc in morse_codes]
print("lengths:", lengths)
max_code_length = max(lengths)
p_powers = [p ** i for i in range(max_code_length + 1)]
print("powers of p:", p_powers)
def polynomialHash(s: str) -> int:
output = 0
pow = 1
for i, c in enumerate(s):
v = 2 if c == '-' else 1
output = (output + v * pow) % m
pow *= p
return output
hashes = [polynomialHash(mc) for mc in morse_codes]
print("hashes:", hashes)
lengths: [2, 4, 4, 3, 1, 4, 3, 4, 2, 4, 3, 4, 2, 2, 3, 4, 4, 3, 3, 1, 3, 4, 3, 4, 4, 4]
powers of p: [1, 3, 9, 27, 81]
hashes: [7, 41, 50, 14, 1, 49, 17, 40, 4, 79, 23, 43, 8, 5, 26, 52, 71, 16, 13, 2, 22, 67, 25, 68, 77, 44]
lengths = [2, 4, 4, 3, 1, 4, 3, 4, 2, 4, 3, 4, 2, 2, 3, 4, 4, 3, 3, 1, 3, 4, 3, 4, 4, 4]
hashes = [7, 41, 50, 14, 1, 49, 17, 40, 4, 79, 23, 43, 8, 5, 26, 52, 71, 16, 13, 2, 22, 67, 25, 68, 77, 44]
P = [1, 3, 9, 27, 81]
m = 2 ** 31 - 1
def superPolynomialHash(s: str) -> int:
output = 0
for i, c in reversed(s):
ii = ord(c) - ord('a')
l = lengths[ii]
output = (output * P[l] + hashes[ii]) % m
return output
def uniqueMorseRepresentations(words: List[str]) -> int:
hashes = { superPolynomialHash(w) for w in words }
return len(hashes)
There are a lot of hash collisions with the p and m I chose, but it didn't seem to matter with the test cases Leetcode had.
N = len(words)
M = max(word[i].length)
H = maximum length of a string f(c) can output
(4 in this case)
L = # of different letters that can be in a word
(26 in this case)
Accepted Solution: O(NMH)
time O(HL)
space
My unoptimized solution: O(NM(M+H+log(H))) = O(NM^2H)
time O(L)
space`
My optimized solution: O(NM)
time O(L)
space
The Leetcode test cases are not very large so it's very easy to cap out at 0ms. I ran my own test cases 1,000,000 times.
| Case | Mine | Top Solution |
| ------------------ | ------ | ------------ |
| Default test case | .096s | .236s |
| arr[i].len > 10000 | 40.59s | 96.18s |