I over-engineered my solution to this Leetcode easy problem and it's over twice
as fast as the accepted optimal solution. I can't find anywhere else that
mentions this approach and I thought my solution was cool, so I made a writeup
on how it works.
Given an array of strings words where each word can be written as a
concatenation of the Morse code of each letter, how many unique Morse code
encodings are there?
For example, "cab" can be written as "-.-..--...", which is the concatenation of
"-.-.", ".-", and "-...". We will call such a concatenation the transformation
of a word. Return the number of different transformations among all words we
have.
Example 1:
Input: words = ["gin","zen","gig","msg"] Output: 2 Explanation: The transformation of each word is:
There are 2 different transformations: "--...-." and "--...--.".
Example 2:
Input: words = ["a"] Output: 1
Constraints:
1 <= words.length <= 100
1 <= words[i].length <= 12
words[i] consists of lowercase English letters.
Accepted Optimal Solution
A top ranked solution in terms of speed runs in 0ms on Leetcode's test cases.
Approach:
Convert each input string into morse code
Count the number of unique morse codes by adding them all to a set.
morse = [ ".-","-...","-.-.","-..",".","..-.","--.", "....","..",".---","-.-",".-..","--","-.", "---",".--.","--.-",".-.","...","-","..-", "...-",".--","-..-","-.--","--.."]def toMorse(word: str) -> str: charsToMorseCode = [morse[ord(n) - ord('a')] for n in a] return ''.join(charsToMorseCode)def uniqueMorseRepresentations(words: List[str]) -> int: morseCodes = set(toMorse(w) for w in words) return len(morseCodes)
My Solution
In the accepted solution, the input words go through the following
transformations hash(toMorse(word)):
Each character in the word is mapped to its morse code equivalent
The morse code equivalents are concatenated to create the full morse-code
encoding.
The full morse code encoding is hashed (when added to the set)
My solution proposes that we can compute the hash code of the morse-code
encoding directly from the original string (ie hashMorseEncoding(word)). We
don't have to first translate the word into morse code.
Theory
I'm framing this mathematically because algebra is necessary for my argument.
Definitions
Let s be a string of character, ∣s∣=len(s)f:Characters→Strings,F(s)=i=0∏∣s∣−1f(s[i])hash(s)=s[0]+s[1]∗p+s[2]∗p2+...+s[n−1]∗pn−1modm=i=0∑n−1s[i]∗pimodm
In our case, f(c) is the mapping from letter to morse code.
f = dict() # char -> strdef F(s: str) -> str: return "".join(f[c] for c in s)def hash(s: str) -> int: output = 0 pow = 1 for i, c in enumerate(s): v = 2 if c == '-' else 1 output = (output + v * pow) % m pow *= p return output
It should be apparent that the n in the p^n factor for each term is the number
of characters before it in the string.
Code
Unoptimized Implementation
We can use the literal code translation to implement this, but this solution
will be very slow (in fact, slower than the original!).
def hash(s: str) -> int: output = 0 pow = 1 for i, c in enumerate(s): v = 2 if c == '-' else 1 output = (output + v * pow) % m pow *= p return outputdef superPolynomialHash(s: str) -> int: acc = 0 for i in range(len(s)): pow = sum(len(f(c)) for c in s[:i]) acc += hash(f(s[i])) * p ** pow return accdef uniqueMorseRepresentations(words: List[str]) -> int: hashes = { superPolynomialHash(w) for w in words } return len(hashes)
Optimized Solution Part 1 - Precomputing
Before running the actual solution, compute certain values which we will
hard-code into the actual solution.
The hash value of each morse code unit
Must use the polynomial rolling hash function
The powers of p (same p used in hashing)
Additionally, we don't need to store the original morse code strings in our
program anymore. To save memory, we can just save the lengths of each morse
code string instead. The reason why will be apparent later.
morse_codes = [ ".-","-...","-.-.","-..",".","..-.","--.", "....","..",".---","-.-",".-..","--","-.", "---",".--.","--.-",".-.","...","-","..-", "...-",".--","-..-","-.--","--.."]m = 2 ** 31 - 1p = 3lengths = [len(mc) for mc in morse_codes]print("lengths:", lengths)max_code_length = max(lengths)p_powers = [p ** i for i in range(max_code_length + 1)]print("powers of p:", p_powers)def polynomialHash(s: str) -> int: output = 0 pow = 1 for i, c in enumerate(s): v = 2 if c == '-' else 1 output = (output + v * pow) % m pow *= p return outputhashes = [polynomialHash(mc) for mc in morse_codes]print("hashes:", hashes)