Parallelizing Text Processing
The goal of this work is to parallelize important text processing tasks such as Text Preprocessing and Co-occurrence analysis using the hadoop Map-/Reduce Framework.
Tasks
- familiarize yourself with the hadoop map-/reduce framework
- create a hadoop hello-world application
- transfer the text cleanup & pre-processing components to map-/redcue
- transfer the co-occurrence components to map-/reduce
Table of Contents
- Introduction
- Theoretical Background
- Map-/Reduce
- Natural Language Detection
- Text Preprocessing
- Co-occurrence analysis
- Method
- Implementation
- Evaluation
- Outlook and Conclusions