Information Extraction - Sequential Probability Models

less than 1 minute read

Table of Contents / To Do

  • Introduction
  • Information Extraction (IE)
    • Definition
    • Distinction between Information Extraction, Information Retrieval, Data Mining, Text Mining
  • Methodology
    • Present and compare different IE approaches
  • Sequential Probability Models
    • Theory, Application for IE in combination with trigger phrases
    • Extensions of Trigger Phrases (Meronymic Relations, …)
  • Implementation
    • identify weaknesses in the current trigger phrase implementation
    • apply and implement a sequential probability model
    • evaluation

Practical points

  • identify weaknesses in the current trigger phrase implementation
  • make use of sequential probability models
  • evaluation of the new approach

Literature

  • Text Mining - Predictive Methods for Analyzing Unstructured Information (Sholom Weiss)
  • Hearst, M. A. : “Automatic Acquisition of Hyponyms from Large Text Corpora”
  • Berland, M. and Charniak, E. :”Finding Parts in Very Large Corpora”
  • Grefenstette, G. and Hearst, M. A.: “A Method for Refining Automatically-Discovered Lexical Relations: Combining Weak Techniques for Stronger Results”
  • Litz, B. and Langer, H. and Malaka, R.: “Trigrams’n’Tags for Lexical Knowledge Acquisition”, Proceedings of the First International Conference on Knowledge Mining and Information Retrieval (KDIR 2009), Madeira, Portugal
  • Hamish Cunningham: “Software Architecture for Language Engineering”
  • Literature covering Hearst Patterns

Student profile

  • good programming skills

Source

The sources can be downloaded from the svn repository:

https://svn.semanticlab.net/svn/oss/thesis/information-extraction/infoex-workspace

Updated: