Poor man's UNIX grep, Summer 2016

The UNIX operating system provides a command grep which allows for retrieving occurrences of a given string in text files. We consider an example text file input.txt containing four lines:

Roses are nice flowers.
Red wine is tasty
The red cross acts worldwide
Mayflower used to be a ship.

We search this file input.txt for the occurrence of the string flower being contained in lines 1 and 4:

> grep flower input.txt 
Roses are nice flowers.
Mayflower used to be a ship.

Thus the grep command echoes all lines containing the search string in question to standard output. Adding the command line option -i allows for case insensitive searches:

> grep -i red input.txt
Red wine is tasty
The red cross acts worldwide

This time all possible variants like Red, red, RED and so on will match.

grep also allows for searching multiple files. Consider a second file inputSecond.txt:

Errors will show up in red.
Let's start bug fixing

We may search for case insensitive (-i again) appearances of red within both files:

> grep -i red input.txt  inputSecond.txt 
input.txt:Red wine is tasty
input.txt:The red cross acts worldwide
inputSecond.txt:Errors will show up in red.

Finally the -l option will filter individual appearances just showing filenames containing matches:

> grep -l Red input.txt  inputSecond.txt 
input.txt

In contrast a case insensitive search combining both -i and -l options yields:

> grep -i -l Red input.txt  inputSecond.txt 
input.txt
inputSecond.txt

The grep command may read its input from standard input allowing for pipes. This way another command's output feeds into a subsequently executed command. As an example consider a recursive search for HTML files using the find command:

> find . -name \*.html
./Sd1/Wc/wc/Testdata/input.html
./Sda1/rdbmsXml2Html/TestData/climbingprice.html
./Sda1/NoCast/src/main/resources/gallery.html
./Sda1/Jdom/Html2Html/src/main/resources/imageExampleNew.html
./Sda1/Jdom/Html2Html/src/main/resources/imageExample.html
./Sda1/VerifyImgAccess/fileextref.html

We want to restrict the above list to pathnames containing the string Example. This may be achieved by piping the find command's output as input to grep searching for the occurrence of the string Example. Technically both processes get connected by means of the pipe symbol |:

> find . -name \*.html|grep Example
./Sda1/Jdom/Html2Html/src/main/resources/imageExampleNew.html
./Sda1/Jdom/Html2Html/src/main/resources/imageExample.html

Tip

  1. Read about reading from files by using instances of java.io.BufferedReader.

  2. Reading from standard input may be achieved by:

    final BufferedReader source = new BufferedReader(new InputStreamReader(System.in));
    ...
  3. You may create an executable jar archive using Maven. Starting from the mi-mavem-archetype-quickstart your pom.xml already contains a blueprint. Just insert your class containing the entry main(...) method (i.e. de.hdm_stuttgart.mi.sd1.grep.Grep in the current example) accordingly:

    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-shade-plugin</artifactId>
      <version>2.4.1</version>
      <configuration>
        <transformers>
          <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
            <manifestEntries>
              <Main-Class>de.hdm_stuttgart.mi.sd1.grep.Grep</Main-Class>
            </manifestEntries>
          </transformer>
        </transformers>
      </configuration>
      <executions>
        <execution>
          <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
         </execution>
      </executions>
    </plugin>

    Running mvn install will create an executable jar file like e.g. ~/.m2/repository/de/hdm-stuttgart/mi/sd1/grep/0.9/grep-0.9.jar with ~ denoting your home directory:

    > mvn install
    [INFO] Scanning for projects...
    [INFO]                                                                         
    [INFO] ------------------------------------------------------------------------
    [INFO] Building grep 0.9
    [INFO] ------------------------------------------------------------------------
    
    ...
    -------------------------------------------------------
     T E S T S
    -------------------------------------------------------
    Running de.hdm_stuttgart.mi.sd1.grep.CommandLineTest
    Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.757 sec
    
    Results :
    
    Tests run: 5, Failures: 0, Errors: 0, Skipped: 0
    
    ...
    [INFO] Installing /home/goik/workspace/sd-project-summer/grep/target/grep-0.9.jar to 
         /home/goik/.m2/repository/de/hdm-stuttgart/mi/sd1/grep/0.9/grep-0.9.jar
    ...
    

    Due to our <Main-Class>de.hdm_stuttgart.mi.sd1.grep.Grep</Main-Class> declaration in pom.xml this jar file is executable:

    > java -jar ~/.m2/repository/de/hdm-stuttgart/mi/sd1/grep/0.9/grep-0.9.jar
    No search string given
    Usage: grep [-i] [-l] searchString [file 1] [file 2] ...

    There are further simplification steps:

    1. Making the jar file executable using chmod allows for omitting the java command:

      > chmod +x ~/.m2/repository/de/hdm-stuttgart/mi/sd1/grep/0.9/grep-0.9.jar
      > ~/.m2/repository/de/hdm-stuttgart/mi/sd1/grep/0.9/grep-0.9.jar
      No search string given
      Usage: grep [-i] [-l] searchString [file 1] [file 2] ...

      Notice ~ representing a user's home directory.

    2. We may copy the jar archive to a standard location containing executable commands:

      > mkdir ~/bin
      > 
      > cp ~/.m2/repository/de/hdm-stuttgart/mi/sd1/grep/0.9/grep-0.9.jar ~/bin/jgrep
      > 
      > ~/bin/jgrep 
      No search string given
      Usage: grep [-i] [-l] searchString [file 1] [file 2] ...
    3. We may add this directory to the set of directories being searched by the operating system's command line interpreter for executable commands. This is being achieved by either creating or modifying a file ~/.profile in the user's home directory using a text editor. ~/.profile should contain:

      PATH="$HOME/bin:$PATH"

      After logging out and on again your PATH environment variable should contain your ~/bin component:

      > echo $PATH
      /home/goik/bin:/usr/local/sbin:/usr/local/bin:/usr/...

      You should now be able to call jgrep from arbitrary filesystem locations:

      > cd Desktop/
      > cat Testdata/input.txt | ./bin/mygrep red
      The red cross acts worldwide
  4. Testing requires capturing of output being generated by e.g. System.out.println(...) calls. Consider the following code writing the string Hello World! to standard output:

    public class App {
        /**
         * @param args Unused
         */
        public static void main( String[] args ) {
            System.out.print( "Hello World!" );
        }
    }

    We want to set up a Junit test which captures the output to compare it with the expected string value "Hello World!". Following http://stackoverflow.com/questions/1119385/junit-test-for-system-out-println we redefine the standard output stream by a private instance of java.io.ByteArrayOutputStream. Due to Junit's @Before and @After annotations this instance replaces System.out during our tests:

    import java.io.ByteArrayOutputStream;
    import java.io.PrintStream;
    
    import org.junit.After;
    import org.junit.Assert;
    import org.junit.Before;
    import org.junit.Test;
    
    /**
     * Unit test for simple App.
     */
    public class AppTest {
       private final ByteArrayOutputStream outContent = new ByteArrayOutputStream();
    
       @Before
       public void setUpStreams() {
           System.setOut(new PrintStream(outContent));
       }
    
       @After
       public void cleanUpStreams() {
           System.setOut(null);
           outContent.reset();
       }
       
        /**
         * Test method accessing output generated by System.out.println(...) calls.
         */
        @Test
        public void testApp() {
           App.main(new String[]{}); // Calling main() method printing "Hello World!"
           Assert.assertEquals("Hello World!", outContent.toString());
        }
    }