Evaluating popular Information Retrieval Models

1 minute read

Introduction

Retrieval Models return and rank documents based on their relevance in regard to a search query q. Popular retrieval models such as the Vector Space Model (VSM) and Explicit Semantic Analysis (ESA) have numerous applications in information retrieval, text mining and natural language processing.

This thesis focuses on

  • the creation of a Java library which implements popular retrieval models, and
  • the design of a framework for evaluating and comparing these models to each other.

Table of Contents

  1. Theory:
    • IR Models (Definition, Classifications)
    • Popular Models (VSM, ESA, LSI, …)
    • Computational and Memory Complexity
  2. Implementation (VSM, ESA, …)
    • Retrieval Models (VSM, ESA, …)
    • Evaluation Framework (precision, recall, F1, processing time, computational complexity, memory complexity, data storage ..)
  3. Evaluation:
  4. Outlook and Conclusions

Student Profile

  • An interest in information retrieval.
  • Good Java skills. The implementation is an integral part of this thesis.

Literature

  • Stein, Benno and Anderka, Maik (2009). Collection-Relative Representations - A Unifying View to Retrieval Models, Twentieth International Workshop on Database and Expert Systems Application (DEXA 2009); Sixth International Workshop on Text-Based Information Retrieval TIR 2009, pages 383–387
  • Salton, G., Wong, A. and Yang, C. S. (1975). A vector space model for information retrieval, Communications of the ACM, pages 613-620, 18(11)
  • Gabrilovich, Evgeniy and Markovitch, Shaul (2009). Wikipedia-based Semantic Interpretation for Natural Language Processing, Journal of Artificial Intelligence Research, pages 443–498

Updated: