Reference: LeetCode
Difficulty: Easy
My Post: [Java] Summary of Usages of split() and replace() vs. replaceAll()
Problem
Given a paragraph and a list of banned words, return the most frequent word that is not in the list of banned words. It is guaranteed there is at least one word that isn’t banned, and that the answer is unique.
Words in the list of banned words are given in lowercase, and free of punctuation. Words in the paragraph are not case sensitive. The answer is in lowercase.
Note:
- 1 <=
paragraph.length
<= 1000. - 0 <=
banned.length
<= 100. - 1 <=
banned[i].length
<= 10. - The answer is unique, and written in lowercase (even if its occurrences in
paragraph
may have uppercase symbols, and even if it is a proper noun.) - Paragraph only consists of
letters
,spaces
, or the punctuation symbols!?',;.
. - There are no hyphens or hyphenated words.
- Words only consist of letters, never apostrophes or other punctuation symbols.
Example:
1 | Input: "Bob hit a ball, the hit BALL flew far after it was hit." |
Analysis
Hash Set + Hash Map
Note: str.split("\\s+")
is equivalent to str.split("\\s+", 0)
. It means split the string for as many times as possible, and remove empty result ""
. So trim()
is not necessary here.
Original code:
1 | public String mostCommonWord(String paragraph, String[] banned) { |
Here is the preprocess
function:
1 | // "a, a, a, a, b,b,b,c, c" |
Or:
1 | private String preprocess(String s) { |
replace()
vs. replaceAll()
:
replace(char oldChar, char newChar)
replace(CharSequence target, CharSequence replacement)
replaceAll(String regex, String replacement)
Notice that they all replace all occurrences. All
in the name of replaceAll
doesn’t mean only it can replace all occurrences.
A succinct version:
\\w+
matches allalphanumeric
characters and_
.\\W+
matches all characters exceptalphanumeric
characters and_
.- They are opposite.
1 | private String preprocess(String s) { |
A more succinct version:
1 | String[] words = s.toLowerCase().split("\\W+"); // "\\W+" includes spaces |
Rules about split()
: 271. Encode and Decode Strings
1 | "..".split("\\W+", -1); // ["", ""] |