139. Word Break I

Reference: LeetCode
Difficulty: Medium

My Post: Java Solutions Backtracking (Memoization) + DP with Detailed Exp.

Problem

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

Note:

The same word in the dictionary may be reused multiple times in the segmentation.
You may assume the dictionary does not contain duplicate words.

Example:

Input: s = "leetcode", wordDict = ["leet", "code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".

Input: s = "applepenapple", wordDict = ["apple", "pen"]
Output: true
Explanation: Return true because "applepenapple" can be segmented as "apple pen apple".
             Note that you are allowed to reuse a dictionary word.

Input: s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"]
Output: false

Analysis

Backtracking

First, we put wordSet into a hash set for quick contains examination.

For each character S[depth], we consider substrings S[depth, i] including S[depth], S[depth, depth + 1], …, S[depth, n - 1]. If one of them is in wordDict, we then redo the task on character S[i + 1]; otherwise, return false.

One of the most difficult part is to write the reject & accept code. Think about it carefully, we notice that if the recursive call goes into a situation where n == s.length(), it means the string s can be successfully segmented; otherwise, it won’t go into that situation. We don’t write reject case in this backtracking function.

public boolean wordBreak(String s, List<String> wordDict) {
  // assume s and wordDict are non-empty
  return backtracking(0, s, new HashSet<>(wordDict));
}

private boolean backtracking(int depth, String s, Set<String> wordSet) {
  int n = s.length();
  // accept
  if (depth == n) {
    return true;
  }
  
  for (int i = depth; i < n; ++i) {
    String str = s.substring(depth, i + 1); // substring[depth, i]
    if (wordSet.contains(str)) {
      if (backtracking(i + 1, s, wordSet)) return true;
    }
  }
  
  return false;
}

Time: $O(N^N)$ since each time it has at most $N$ choices and the depth (problem size) is $N$. (this is an upper bound)
Space: $O(N)$ (string length and call stack depth)

Marked For the time complexity, if we count substring operation ($O(N)$), it would be $O(N \times N^N) = O(N^(N + 1))$.

Backtracking (Memoization)

Let’s use an example to see if we can optimize the above method.

// String: "abcde" | wordDict: ["a", ...]

depth = 0 ('a')
we have substrings: "a", "ab", "abc", "abcd", "abcde"

For the first substring "a", it is in wordDict, so depth becomes 1 and we would examine if "bcde" is breakable.

In the process of checking if "bcde" is breakable, we may check if "cde", "de", "e" are breakable. Once we know the answers, we can cache them for future usage no matter they are true or false.

In future when we’ve done processing the first substring "a", we will examine "b", and you will see there could be a lot of repeated computation for "cde", "de", "e".

Difficulty: Using memoization in a backtracking-style recursive function is quite uncommon than other DP memoization. This is the new form I learned. There are three places that we need to set and get memo[].

Note: We use Boolean since initially we want values to be null.

public boolean wordBreak(String s, List<String> wordDict) {
  // assume s and wordDict are non-empty
  int n = s.length();
  Boolean[] memo = new Boolean[n]; // memo[i] --> S[i...] is breakable or not
  return backtracking(0, s, new HashSet<>(wordDict), memo);
}

private boolean backtracking(int depth, String s, Set<String> wordSet, Boolean[] memo) {
  int n = s.length();
  // accept
  if (depth == n) {
    return true;
  }
  // memoization
  if (memo[depth] != null) { // memo
    return memo[depth];
  }
  
  for (int i = depth; i < n; ++i) {
    String str = s.substring(depth, i + 1); // substring[depth, i]
    if (wordSet.contains(str)) {
      if (backtracking(i + 1, s, wordSet, memo)) {
        memo[depth] = true; // memo
        return true;
      }
    }
  }
  
  memo[depth] = false; // memo
  return false;
}

Time: $O(N^2)$
Space: $O(N)$

DP

The idea is that given a problem(s) we can divide it into two subproblems s1 and s2. If both of them are breakable, s is breakable (by saying breakable I mean it satisfies the required conditions).

Note: Substring s(i, j) (character i to j) in Java is denoted by s.substring(i, j + 1).

First, we define our dp[] array, where dp[i] is true if the substring s(0, i - 1) or s.substring(0, i) is breakable; otherwise, it should be false.

Then, we process string length from 1 to n in dp[i]. For each substring s(0, i - 1), we examine each combination of substrings s(0, j - 1) (s.substring(0, j)) and s(j, i - 1) (s.substring(j, i)). The first subproblem can be calculated directly by dp[j] while the second one can be checked by wordDict set. Question: Why don’t we break the second substring and examine it further? (e.g. abc is not in wordDict, but ab and c could be in wordDict)

// String: a b c d e f
In the final round, we are looking into the whole string. We would examine the follow pair of two substrings:
a     bcdef
ab    cdef
abc   def
abcd  ef
abcde f

The question is: what happen if "cdef" is not in wordDict while "cd" and "ef" are both in wordDict?

It is handled previously! If "cd" is in wordDict, in the previously fourth round for substring "abcd", we would examine "ab" and "cd". If dp("ab") is true and "cd" is in wordDict, we would mark dp("abcd") as true!

Then in the final round, dp("abcd") is true and "ef" is in wordDict, so we have the whole string breakable.

Note: In addition, please think about the initialization step and the break statement in the code.

public boolean wordBreak(String s, List<String> wordDict) {
  // assume s and wordDict are non-empty
  int n = s.length();
  Set<String> set = new HashSet<>(wordDict);
  
  boolean[] dp = new boolean[n + 1];
  dp[0] = true; // consider the begining! ("" + "a")
  
  for (int i = 1; i <= n; ++i) { // for each length
    for (int j = 0; j < i; ++j) {
      // s1 = substring(0, j) = dp[j]
      // s2 = substring(j, i) = s[j, i - 1]
      if (dp[j] && set.contains(s.substring(j, i))) {
        dp[i] = true;
        break;
      }
    }
  }
  
  return dp[n];
}

Time: $O(N^3)$ since substring takes $O(N)$.
Space: $O(N)$

BFS

Go To: LeetCode Solution