Optimizing Geographic Tagging

The vision of the Geospatial Web combines geographic data, Internet technology and social change. Geospatial applications like the IDIOM Media Watch on Climate Change facilitate geo-annotation services to refine Web pages and media articles with geographic tags.

Identifying the document’s target geographies is a rather complex task, complicated by geo/geo ambiguities (e.g. Vienna/at versus Vienna/Virginia/us) and geo/non-geo ambiguities like turkey/bird versus Turkey/country. Most approaches toward tagging the target geography therefore facilitate machine learning technologies, gazetteers, or a combination of both to identify geo-tags. The gazetteer’s size and many internal tuning parameter determine the geo-tagger’s performance and its bias towards identifying smaller geographic-entities or higher-level units. Designing a geo-tagger and choosing these parameters often involve trade-offs; improvements in one particular area does not necessarily yield better results in other areas.

The goals of this thesis are

  1. designing a testcase for evaluating geo-taggers
  2. implementing this testcase as a unittest
  3. applying the framework to different approaches towards geo-tagging.


  • Literature recherches
    • geo-tagging algorithms
    • public geo coding API’s (evaluation)
  • Design geo-testcases (different gazetteer sizes, different regions)
  • Implement geo-unittests
  • Modifiy and measure the performance of different geo-algorithms


