Folksonomy Based Question Answering System
Abstract
Financial data is on the rise. Most of this data is unstructured in nature. A major contributor to this data is in the form of web articles. With an increase in such data, there is a need for techniques that parse unstructured information. This thesis project presents an overview of a folksonomy-based approach for a Question Answering system. The proposed system is divided into two steps, the first step processes the contextual information using techniques such as document-to-vector, tf-idf and topic modelling which forms the level 1 granularity. The variant of word2vec in the form of paragraph2vec is used for achieving a sentence level granularity (level 2). Various combinations of level 1 and level 2 granularity are explored, and the best combination is sought after. The concepts of folksonomy, which is social and contextual tagging, is associated with reduction of search space. The search space is a combination of all possible answers in which the correct answer resides. The idea is to reduce the search space such that different algorithms have the ability of finding the correct answers. The models are then stress tested by varying different parameters. The parameters are obtained after performing a grid search. While finding the best model, more than 12,000 models were generated and tested. The best model was tested on two test cases where it generated an accuracy of 61% and 64%.