### Getting a Set of strings from a text file

No. 182

#### Getting a text's set of words.

 Q: Consider a text file foo.txt containing: A simple collection of words. Some words may appear multiple times.We would like to retrieve a comma separated list of all words being contained within excluding duplicates. of, multiple, collection, simple, words, may, Some, times, A, appearThe subsequent rules shall apply: Arbitrary combinations of white space and the characters .,:;?!" shall be treated as word delimiters and are otherwise to be ignored. The order of appearance in the generated result does not matter. Duplicates like “words” in the current example shall show up only once on output. Hints: Your application shall read its input from a given file name provided as a command line argument. Provide appropriate error messages if: The users enters either no arguments at all or more than one command line argument. The file in question cannot be read. You may reconsider the section called “Exercises” regarding file read access. Splitting input text lines at word delimiters .,:;?!" or white space characters may be achieved by means of split(...) and the regular expression String regex = "[ \t\"!?.,'´:;]+";. This “+” sign indicates the appearance of a succession of one ore more character element from the set  \t\"!?.,'´:;. Thus a text That's it. Next try will be split into a string array {"That", "s", "it", "Next", "try"}. Write a Junit test which reads from a given input file and compares its result with a hard coded set of expected strings. A: Maven module source code available at sub directory P/Sd1/Wordlist/Solution below lecture notes' source code root, see hints regarding import. Online browsing of API and implementation. The input file smalltest.txt may be used to define a Junit test: @Test public void testWordSet() throws FileNotFoundException, IOException { final Set expectedStrings = new HashSet (Arrays.asList(new String[]{ "A", "simple", "collection", "of", "words", "Some", "may", "appear", "multiple", "times" })); final TextFileHandler tfh = new TextFileHandler("smalltest.txt"); Assert.assertTrue(tfh.getWords().equals(expectedStrings)); }