Browsing by Subject "natural language processing"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery(2013-03-21) Creel, James; Weimer, Katherine; Texas A&M UniversityTheses and dissertations play a significant role in the scholarly literature and often refer to locations of interest or regions under study. Through geoparsing, which is the identification and disambiguation of place names, we have created a tool to generate interactive maps of the geographic locations referenced in theses and dissertations. Our visualization affords increased awareness of the numerous locations being researched and which departments and majors are studying each location. More broadly, the interface supports multidisciplinary research, student recruitment and faculty collaboration. Using geographic and gazetteer metadata and open source mapping applications, this tool provides researchers with serendipitous geographic and interdisciplinary connections. The beta version consists of several DSpace curation tasks to take a given ETD through each step of the metadata creation and mapping processes. Once the tool has suggested geospatial metadata for an ETD, the DSpace administrative interface allows curators to approve the suggested metadata values. Our geoparser integrates various open-source tools as well as specialized heuristics to automate the name extraction and disambiguation tasks. We have employed the OpenNLP and Stanford NLP libraries for the name extraction task, and use the Geonames gazetteer as our source for referenced entities. A preliminary evaluation of the tool indicates an accuracy of 84% with regard to the disambiguation of names to specific Geonames IDs. Work toward improving the accuracy is ongoing. The visualization component of the tool reads geospatial metadata as KML and can render the referenced locations in any of three map visualization options selected by the reader: OpenLayers, OpenStreetMaps and Google Maps. Once a site of interest is located on the map, the reader may select a link to the complete thesis or dissertation stored in the university's instance of DSpace, our institutional repository. The long-term goal of this project is to extend the content to include all TDL ETDs for a widely used search mechanism.Item Search Text Based on Locations(2014-11-21) Zhang, WeiweiTo satisfy the current need for finding queried information quickly, search engines, data mining systems, and many other applications have been in development in recent years. Some of those applications look for documents containing phrases of a particular topic, such as historical events from a certain time period. Among these applications, queries based on geographical data are receiving significant attention from the research community and industry. Therefore, this thesis studies text search based locations, which contributes to the Geographical Information Retrieval (GIR) systems. In addition to the traditional applications of GIR systems, which are used for finding locations in documents, GIR can be applied to other fields as well. Firstly, it can retrieve location information in text and search for answers to questions of a spatial nature (such as \Where is College Station?"). Location information can improve presentation of the search results, for example, by presenting the search results on a map. GIR also adds to the field of spatial diversity search, which allows users to express preferences and constrain the search results to a particular geographical region. In addition, it finds related document based on location information from different sources of information and then represents the similarities graphically. In this way, the readers can visually see the data, helping them understand the document correlations in an intuitive way. However, most of the previous research involves keyword searches in spatial databases instead of raw (unlabeled) text. Although there is some work on raw text processing, that work uses matching techniques, and limits the geographical range to small geographical regions such as a single country. Therefore, this thesis adopts a new clustering method, which utilizes a geographical dictionary to locate any place by its coordinates. This method reduces ambiguity and improves the accuracy over the previous research. This study also implements a new word-clustering method to detect a combination of topics in raw text. This method is more accurate than the latent Dirichlet allocation, a state of the art method based on a probabilistic model. In addition, a novel graphic illustration is utilized to visually represent the relevance ranking between documents.