GenericRelationExtractionSoftwarePackage

1 minute read

Software Package for Thesis Generic Relation Extraction

This package includes the following software tools:

  • RelationExtractor
  • PatternInspector
  • PatternClassifier
  • OpenCalaisClient

OpenCalaisClient

Is a command line tool, used to access the OpenCalais API for extracting entities and relations from articles. The result of this task forms our reference set. Its main class is

weber.test.openCalais.TestOpenCalais

and does not have any parameters. Database connection is hardcoded.

RelationExtractor

Our prototype is a command line tool for extracting entities and relation by using StanfordNLP package. Its main class is

wu.j0125536.extraction.RelationExtraction

and can take two parameters:

-nXXX...the number of articles that should be processed
-firstXXX...the first article_id, the prototype should start with

If no parameters are provided, this tool processes all articles starting with the lowest article_id. Tested with Java VM argument -Xmx5000m.

PatternInspector

This is the only GUI tool in this project and is used to identify good and bad patterns manually. It allows the user to search for patterns or concrete examples represented by a pattern. Furthermore, inapropriate patterns can be disabled. The main class of the inspector is

wu.j0125536.inspector.PatternInspector

and does not use any parameters. Tested with Java VM argument -Xmx2000m.

PatternClassifier

Finally, the pattern classifier command line tool works on manually tagged patterns to learn how good and bad patterns look. Afterwards it tags remaining patterns to improve the precision of our prototype. The main class is

wu.j0125536.classify.PatternClassifier

and does not use any parameters.

Updated: