Document geolocation using language models built from lexical and geographic similarity

dc.contributor.advisorBaldridge, Jasonen
dc.contributor.committeeMemberErk, Katrinen
dc.creatorSkiles, Erik Daviden
dc.date.accessioned2012-08-16T14:37:41Zen
dc.date.accessioned2017-05-11T22:27:02Z
dc.date.available2012-08-16T14:37:41Zen
dc.date.available2017-05-11T22:27:02Z
dc.date.issued2012-05en
dc.date.submittedMay 2012en
dc.date.updated2012-08-16T14:38:01Zen
dc.descriptiontexten
dc.description.abstractThis thesis investigates the automatic identification of the location of doc- uments. This process of geolocation aids in toponym resolution, document summarization, and geographic-based marketing. I focus on minimally su- pervised methods to examine both the lexical similarities and the geographic similarities between documents. This method predicts the location of a doc- ument as a single point on the earth’s surface. Three data sets are used to evaluate this method: a set of geotagged Wikipedia articles and two sets of Twitter feeds. For Wikipedia, the combined method obtains a median error of 12.1 kilometers and an improvement in mean error to 164 kilometers. The large Twitter data shows the greatest improvement from this method with a median error of 333 kilometers, down from the previous best of 463 kilometers.en
dc.description.departmentLinguisticsen
dc.format.mimetypeapplication/pdfen
dc.identifier.slug2152/ETD-UT-2012-05-5717en
dc.identifier.urihttp://hdl.handle.net/2152/ETD-UT-2012-05-5717en
dc.language.isoengen
dc.subjectGeolocationen
dc.titleDocument geolocation using language models built from lexical and geographic similarityen
dc.type.genrethesisen

Files