Monthly Archives: August 2014

Using State Machine to Solve Problems

There is a popular interview problem in the web “Given a string, find the longest substring that contains at most 2 distinct characters” with the test string “ababcbcbaaabbdefggggg” and answer “baaabb”. Many sample solutions are available if you just search the test string.

But how about the question is changed to finding at most n distinct characters where n is a input? Let’s solve it with a state machine. The basic approach is very similar to the answers you can find online, we use two pointers, right and left, which point to the start and end of the substring. And we use a map to record how many distinct characters are in the substring. Regarding the state machine, it only has two states, valid (the number of distinct characters in the substring is less or equal than n) and invalid.

diagram

The iterating of the input string starts from the valid states and both the right and the left pointers are set to the beginning of the input string. In the valid state, we just move the right pointer to increase the substring. As a result, if the new substring is valid, the state is still valid. Otherwise, the state is changed to the invalid. In the invalid state, we can only move the left pointer to remove a character from the substring, and the state is changed accordingly. A record action is triggered each time the state goes to the valid, and the record action update the longest substring if the current substring is longer than the previous recorded one.

Because I’m using Lua recently, and Lua has built-in Map support, I provide Lua implementation. The code should be self-explained. The only special part of Lua is the substring uses inclusive indices.You can read more about the string library of Lua.

local args = {...}
local str = args[1] or "ababcbcbaaabbdefggggg"
local allowed = tonumber(args[2]) or 2

local tbl = {} -- The map to store char and its count

-- The following functions will directly use tbl
local function addchar(c)
  tbl[c] = (tbl[c] or 0 ) + 1
end

local function rmchar(c)
  assert(tbl[c], "Try to remove a non-existent char:"..c)
  tbl[c] = tbl[c] - 1
  if tbl[c] == 0 then tbl[c] = nil end --make sure move the key
end

local function isvalid()
  local numkeys = 0
  for k,_ in pairs(tbl) do
    numkeys = numkeys + 1
  end
  return numkeys <= allowed
end

local function main()
  print(string.format("Find the longest substring in \"%s\" with %d distinct chars:",str, allowed))
  local state = true -- The state valid at the beginning
  local left,right = 0,0

  local recleft, recright = 0,0 --used to record the final result
  local function record(l,r)
    if r-l > recright - recleft then
      recleft,recright = l,r
    end
  end

  while right <= #str do
    if state then --valid state
      right = right + 1
      addchar(str:sub(right,right))
      if isvalid() then --added success
        record(left, right)
      else
        state = false
      end
    else --invalid state
      left = left + 1
      rmchar(str:sub(left,left))
      if isvalid() then
        state = true
      end
    end
  end

  local res = str:sub(recleft+1, recright)
  print(string.format("The result substring is \"%s\" with length %d,\n", res, #res))
end

main()

In fact, there is another coding question. Translating all the CPP style comments “//” into C style “/* */” comments of a input source file. If you took the compiler course before, you should know it’s a parser problem, and you may use state machine to solve it. In fact, the state machine is more complex than the previous problem. For example, there are code state, and comment state, and there are sub states that is in string state of a code state. The state changes are triggered by the input character or characters. But the basic idea is similar to the above question.