Skip to content


Getting to know Groovy

The Groovy JVM scripting language has been around for many years now, but I never really had much interrest in testing it. I finally read a bit more about it and watched a presentation. I wanted to test it out by myself by parsing a table on a HTML page and printing the output. The amount of code required was very low and the syntax was somewhat familiar from Java. I used Groovy/Grails Tool Suite as my IDE, since it had better code completion than MyEclipse 10.7.1.

Here’s the “final” code for the test

@Grab(group='org.ccil.cowan.tagsoup', module='tagsoup', version='1.2' )
def tagsoupParser = new org.ccil.cowan.tagsoup.Parser()
def slurper = new XmlSlurper(tagsoupParser)
def htmlParser = slurper.parse("data.html")
def myTable = htmlParser.'**'.find{ it.@class == 'my_div_class'}.'**'.find{ it.@class == 'my_table_class' }
myTable.tr.eachWithIndex{ row, index -> 
    println "${row.td[0]} ${row.td[3]} ${row.td[2]}"
}

On line 1-3 we grab the package needed to parse HTML which can have missing end tags etc. and we create a parser. On 4 we load the HTML file and parse it. On line 5 we extract the table element we are looking for by searching for an element that has the class “my_div_class” and inside that, the table with the class “my_table_class”. On line 6 we loop all the rows in the table and for each row we give a closure which on line 7 prints the first, fourth and third cells in that order. And that’s it!

Here’s a sample of the same code in Java

import java.io.File;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
 
public class HtmlParser {
    public static void main(String[] args) {
        File htmlFile = new File("src/main/java/ama/test/mavenstuff/data.html");
        try {
            Document doc = Jsoup.parse(htmlFile, null);
            Element tableElement = doc.getElementsByClass("module").get(0).getElementsByClass("table_stockexchange").get(0);
            Elements tableRows = tableElement.select("tr");
            for (int i = 0; i < tableRows.size(); i++) {
                System.out.println(tableRows.get(i).select("td").get(0).text()
                    + " " + tableRows.get(i).select("td").get(3).text()
                    + " " + tableRows.get(i).select("td").get(2).text()
                );
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Posted in Java.

Tagged with , .


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.