How to prepare for software interview questions - Number of ways to decode a message

- 7 mins cosmin

Continuing the series on How to prepare for software interview questions.

My previous articles:

Contiguous subset of sum K

Simulate a queue with two stacks.

Today, I will break down the following question:

Given the mapping a = 1, b = 2, ... z = 26, and an encoded message, count the number of ways it can be decoded.

For example, the message '111' would give 3, since it could be decoded as 'aaa', 'ka', and 'ak'.

You can assume that the messages are decodable. For example, '001' is not allowed.

Dynamic Programming

Now this is a cool application of dynamic programming. Recently, I was talking to a close friend, Sorin, about how dynamic programming is a fairly abstract concept that’s not easy to grasp. And he said he thinks about these problems as simply recursion with memoization.

Recursion

I hope you know what recursion is, otherwise here is a simple hint:

Memoization

Memoization simply means you want to store the result for a specific input of your recursive function.

The classical example is the Fibonacci sequence.

fibo(n) = fibo(n - 1) + fibo(n - 2)
fibo(1) = 1
fibo(0) = 0

If we were to implement a recursive method:

def fibo(n):
  if n <= 1:
    return n  # fibo(0) = 0; fibo(1) = 1;

  return fibo(n - 1) + fibo(n - 2)

The problem? This has complexity O(2^N). If you don’t believe me, draw the tree of recursion calls.

To overcome this, we can add memoization

memo = {} # Will store the values already computed

def fibo(n):
  # if we already know the answer
  if n in memo:
    return memo[n]
  if n <= 1:
    return n

  # store the result
  memo[n] = fibo(n - 1) + fibo(n - 2)

  # return it
  return memo[n]

With this, we can compute fibo(n) in O(N) time and O(N) memory.

Back to our problem

Let’s get back to our original problem:

Given the mapping a = 1, b = 2, ... z = 26, and an encoded message, count the number of ways it can be decoded.

For example, the message '111' would give 3, since it could be decoded as 'aaa', 'ka', and 'ak'.

You can assume that the messages are decodable. For example, '001' is not allowed.

We can already sense this can be solved using a recursive and memorization strategy (aka recursive dynamic programming), so let’s start with an example.

For the basic strings 1-10 it’s clear there is exactly one way to decode them, easy.

count(message):
  - 1 if message is in 1-10

Next, the 2 digits strings 11-99. Well, for 11-26 we know we have exactly 2 ways to decode them. One is to take the corresponding one mapping and the other one is to break the string in two one-digit messages. For example, for 26 (z) we can decode z (26) and b(2) + f(6).

For all the other strings 27-99 there is only one way, to break the string into two 1 digit strings. For example, 31 can only be decoded as c(3) and a(1).

Here’s a breakdown of all the 2-digit strings.

11: (aa, k)
12: (ab, l)
13: (ac, m)
14: (ad, n)
15: (af, o)
...
23: (bc, w)
24: (bd, x)
25: (be, y)
26: (bf, z)
-----------
27: (bg)
28: (bh)
...
97: (ig)
98: (ih)
99: (ii)

Cool, let’s update our recursive method definition.

count(message):
  - 1 if message is in 1-10
  - 2 if message is in 11-26
  - 1 if message is in 27-99

Now, let’s try to derive some sort of recursive formula. It’s easy to see that, for a string of length N, the number of decodings is at least the decodings consisting of the first N - 1 string. The reason for that is that a one-digit string always has a direct mapping. For example, for the string 12345, the number of ways to decode this is at least the number of ways to decode 1234 because on each such decoding we add the decoding of 5 which has a direct one-on-one (bijection) decoding to the digit e.

Now, the only other case if if the last two digits have a one-character direct mapping. In other words, if the last two digits is between 11-26. For example, if we want the number of ocurrences of 12310, this is at least the number of decodings of 123 on which we add the 10 decoding j(10).

Now there is only one corner case to this, when the last digit of the string is 0 in which case, we know we can only map to something like 20, 30, 40, ... 90 (since we guarantee the string is correct) which all have only one direct mapping.

This covers all the cases. Let’s update the recursion.

count(message):
  - 1 if message is in 1-10
  - 2 if message is in 10-26
  - 1 if message is in 27-99

  - count(message[:-2]) if message[-1] = '0'  # corner case
  - count(message[:-1]) + count(message[:-2]) if message[-2:] is in `10-26`

Implementation in python using memoization:

memo = {}
def count(message):
  if int(message) >= 1 and int(message) <= 10:
    return 1
  if int(message) >= 11 and int(message) <= 26:
    return 2
  if int(message) >= 27 and int(message) <= 99:
    return 1

  if message in memo:
    return memo[message]

  if message[-1] == '0':
    memo[message] = count(message[:-2])
  else:
    memo[message] = count(message[:-1])
    if int(message[-2:]) >= 11 and int(message[-2:]) <= 26:
      memo[message] += count(message[:-2])

  return memo[message]

Unit testing

Now the cherry on the cake: unit testing. This is a great green flag, so go for it everytime.


assert(count("1") == 1)
assert(count("9") == 1)
assert(count("10") == 1)
assert(count("11") == 2)
assert(count("111") == 3)
assert(count("1111") == 5) # (aaaa, aaj, aja, jaa, jj)

# Test a corner case as well
assert(count("111110") == 5) # same as above plus k

## Add more, make sure you find all the bugs!

Code

That would be all for today, next time I’ll present a similar dynamic programming one.

Thanks,

Cosmin

Cosmin-Ionuț Rusu

Cosmin-Ionuț Rusu

A simple action-taker who likes to just do stuff

comments powered by Disqus
rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora