1 minute read

Software Package for Thesis Generic Relation Extraction

This package includes the following software tools:

  • RelationExtractor
  • PatternInspector
  • PatternClassifier
  • OpenCalaisClient


Is a command line tool, used to access the OpenCalais API for extracting entities and relations from articles. The result of this task forms our reference set. Its main class is


and does not have any parameters. Database connection is hardcoded.


Our prototype is a command line tool for extracting entities and relation by using StanfordNLP package. Its main class is


and can take two parameters:

-nXXX...the number of articles that should be processed
-firstXXX...the first article_id, the prototype should start with

If no parameters are provided, this tool processes all articles starting with the lowest article_id. Tested with Java VM argument -Xmx5000m.


This is the only GUI tool in this project and is used to identify good and bad patterns manually. It allows the user to search for patterns or concrete examples represented by a pattern. Furthermore, inapropriate patterns can be disabled. The main class of the inspector is


and does not use any parameters. Tested with Java VM argument -Xmx2000m.


Finally, the pattern classifier command line tool works on manually tagged patterns to learn how good and bad patterns look. Afterwards it tags remaining patterns to improve the precision of our prototype. The main class is


and does not use any parameters.