Information Extraction - Sequential Probability Models
Table of Contents / To Do
- Introduction
- Information Extraction (IE)
- Definition
- Distinction between Information Extraction, Information Retrieval, Data Mining, Text Mining
- Methodology
- Present and compare different IE approaches
- Sequential Probability Models
- Theory, Application for IE in combination with trigger phrases
- Extensions of Trigger Phrases (Meronymic Relations, …)
- Implementation
- identify weaknesses in the current trigger phrase implementation
- apply and implement a sequential probability model
- evaluation
Practical points
- identify weaknesses in the current trigger phrase implementation
- make use of sequential probability models
- evaluation of the new approach
Literature
- Text Mining - Predictive Methods for Analyzing Unstructured Information (Sholom Weiss)
- Hearst, M. A. : “Automatic Acquisition of Hyponyms from Large Text Corpora”
- Berland, M. and Charniak, E. :”Finding Parts in Very Large Corpora”
- Grefenstette, G. and Hearst, M. A.: “A Method for Refining Automatically-Discovered Lexical Relations: Combining Weak Techniques for Stronger Results”
- Litz, B. and Langer, H. and Malaka, R.: “Trigrams’n’Tags for Lexical Knowledge Acquisition”, Proceedings of the First International Conference on Knowledge Mining and Information Retrieval (KDIR 2009), Madeira, Portugal
- Hamish Cunningham: “Software Architecture for Language Engineering”
- Literature covering Hearst Patterns
Student profile
- good programming skills
Source
The sources can be downloaded from the svn repository:
https://svn.semanticlab.net/svn/oss/thesis/information-extraction/infoex-workspace