Information Extraction - Sequential Probability Models

less than 1 minute read

Introduction
Information Extraction (IE)
- Definition
- Distinction between Information Extraction, Information Retrieval, Data Mining, Text Mining
Methodology
- Present and compare different IE approaches
Sequential Probability Models
- Theory, Application for IE in combination with trigger phrases
- Extensions of Trigger Phrases (Meronymic Relations, …)
Implementation
- identify weaknesses in the current trigger phrase implementation
- apply and implement a sequential probability model
- evaluation

Text Mining - Predictive Methods for Analyzing Unstructured Information (Sholom Weiss)
Hearst, M. A. : “Automatic Acquisition of Hyponyms from Large Text Corpora”
Berland, M. and Charniak, E. :”Finding Parts in Very Large Corpora”
Grefenstette, G. and Hearst, M. A.: “A Method for Refining Automatically-Discovered Lexical Relations: Combining Weak Techniques for Stronger Results”
Litz, B. and Langer, H. and Malaka, R.: “Trigrams’n’Tags for Lexical Knowledge Acquisition”, Proceedings of the First International Conference on Knowledge Mining and Information Retrieval (KDIR 2009), Madeira, Portugal
Hamish Cunningham: “Software Architecture for Language Engineering”
Literature covering Hearst Patterns

The sources can be downloaded from the svn repository:

Extracting text (and annotations) from HTML with Python