Document geolocation using language models built from lexical and geographic similarity

Skiles, Erik David

Document geolocation using language models built from lexical and geographic similarity

dc.contributor.advisor	Baldridge, Jason	en
dc.contributor.committeeMember	Erk, Katrin	en
dc.creator	Skiles, Erik David	en
dc.date.accessioned	2012-08-16T14:37:41Z	en
dc.date.accessioned	2017-05-11T22:27:02Z
dc.date.available	2012-08-16T14:37:41Z	en
dc.date.available	2017-05-11T22:27:02Z
dc.date.issued	2012-05	en
dc.date.submitted	May 2012	en
dc.date.updated	2012-08-16T14:38:01Z	en
dc.description	text	en
dc.description.abstract	This thesis investigates the automatic identification of the location of doc- uments. This process of geolocation aids in toponym resolution, document summarization, and geographic-based marketing. I focus on minimally su- pervised methods to examine both the lexical similarities and the geographic similarities between documents. This method predicts the location of a doc- ument as a single point on the earth’s surface. Three data sets are used to evaluate this method: a set of geotagged Wikipedia articles and two sets of Twitter feeds. For Wikipedia, the combined method obtains a median error of 12.1 kilometers and an improvement in mean error to 164 kilometers. The large Twitter data shows the greatest improvement from this method with a median error of 333 kilometers, down from the previous best of 463 kilometers.	en
dc.description.department	Linguistics	en
dc.format.mimetype	application/pdf	en
dc.identifier.slug	2152/ETD-UT-2012-05-5717	en
dc.identifier.uri	http://hdl.handle.net/2152/ETD-UT-2012-05-5717	en
dc.language.iso	eng	en
dc.subject	Geolocation	en
dc.title	Document geolocation using language models built from lexical and geographic similarity	en
dc.type.genre	thesis	en

Collections

University of Texas at Austin

Document geolocation using language models built from lexical and geographic similarity

Files

Collections