So far we've extracted the set of words a given text
consists of. In addition we'd like to see their corresponding
frequencies of appearance as well. This frequency value shall be
used as primary sorting criterion with respect to report output.
Consider the following example text:
One day, Einstein, Newton, and Pascal meet up
and decide to play a game of hide and seek.
Einstein volunteered to be "It". As Einstein
counted, eyes closed, to 100, Pascal ran away
and hid, but Newton stood right in front of
Einstein and drew a one meter by one meter
square on the floor around himself. When
Einstein opened his eyes, he immediately saw
Newton and said "I found you Newton", but Newton
replied, "No, you found one Newton per square meter.
You found Pascal!"
Ignoring special characters the following result shall be
created:
6: Newton
6: and
5: Einstein
3: Pascal
3: found
3: meter
3: one
3: to
2: a
... The first line tells us that the word
“Newton” appears six times in the analyzed
document.
Hints:
-
Define a class WordFrequency
containing a String
attribute among with an integer number representing its
frequency of appearance:
/**
* A helper class to account for frequencies of words found in textual input.
*
*/
public class WordFrequency {
/**
* The frequency of this word will be counted.
*/
public final String word;
private int frequency;
...
}
Two instances of
WordFrequency shall be equal if and
only if their “word” attribute values are
equal regardless of their frequency values. In slightly
other words: With respect to equality instances of
WordFrequency inherit equality
solely from their contained word values irrespective of
any frequency value.
Override equals(...)
and hashValue()
accordingly.
-
Create a List<WordFrequency> (Not a
Set<WordFrequency>!) holding words being found in
your input texts among with their frequencies of
appearance.
Whenever the next input word is being processed
follow the subsequent procedure:
-
Create a corresponding instance of
WordFrequency from it having
initial frequency 1.
-
Test whether an instance being equal has already
been added to your
List<WordFrequency> instance
leaving you with two choices:
- The current word already
exists:
-
Lookup the entry and increment its frequency
by one.
- The current word is new:
-
Add the previously created
WordFrequency instance to
your List<WordFrequency> .
-
After processing the input text file sort your
List<WordFrequency> by a suitable
Comparator<WordFrequency>
instance by means of Collections.sort(...) .
|