Document geolocation using language models built from lexical and geographic similarity
dc.contributor.advisor | Baldridge, Jason | en |
dc.contributor.committeeMember | Erk, Katrin | en |
dc.creator | Skiles, Erik David | en |
dc.date.accessioned | 2012-08-16T14:37:41Z | en |
dc.date.accessioned | 2017-05-11T22:27:02Z | |
dc.date.available | 2012-08-16T14:37:41Z | en |
dc.date.available | 2017-05-11T22:27:02Z | |
dc.date.issued | 2012-05 | en |
dc.date.submitted | May 2012 | en |
dc.date.updated | 2012-08-16T14:38:01Z | en |
dc.description | text | en |
dc.description.abstract | This thesis investigates the automatic identification of the location of doc- uments. This process of geolocation aids in toponym resolution, document summarization, and geographic-based marketing. I focus on minimally su- pervised methods to examine both the lexical similarities and the geographic similarities between documents. This method predicts the location of a doc- ument as a single point on the earth’s surface. Three data sets are used to evaluate this method: a set of geotagged Wikipedia articles and two sets of Twitter feeds. For Wikipedia, the combined method obtains a median error of 12.1 kilometers and an improvement in mean error to 164 kilometers. The large Twitter data shows the greatest improvement from this method with a median error of 333 kilometers, down from the previous best of 463 kilometers. | en |
dc.description.department | Linguistics | en |
dc.format.mimetype | application/pdf | en |
dc.identifier.slug | 2152/ETD-UT-2012-05-5717 | en |
dc.identifier.uri | http://hdl.handle.net/2152/ETD-UT-2012-05-5717 | en |
dc.language.iso | eng | en |
dc.subject | Geolocation | en |
dc.title | Document geolocation using language models built from lexical and geographic similarity | en |
dc.type.genre | thesis | en |