Date of Award
Computer Information Systems (M.S.)
School of Engineering
Although web search remains an active research area, interest in enterprise search has waned. This is despite the fact that the market for enterprise search applications is expected to triple within the next six years, and that knowledge workers spend an average of 1.6 to 2.5 hours each day searching for information. To improve search relevancy, and hence reduce this time, an enterprise- focused application must be able to handle the unique queries and constraints of the enterprise environment. The goal of this thesis research was to develop, implement, and study query expansion techniques that are most effective at improving relevancy in enterprise search.
The case-study instrument used in this investigation was a custom Apache Solr-based search application deployed at a local medium-sized manufacturing company. It was hypothesized that techniques specifically tailored to the enterprise search environment would prove most effective. Query expansion techniques leveraging entity recognition, alphanumeric term identification, intent classification, collection enrichment, and word vectors were implemented and studied using real enterprise data. They were evaluated against a test set of queries developed using relevance survey results from multiple users, using standard relevancy metrics such as normalized discounted cumulative gain (nDCG). Comprehensive analysis revealed that the current implementation of the collection enrichment and word vector query expansion modules did not demonstrate meaningful improvements over the baseline methods. However, the entity recognition, alphanumeric term identification, and query intent classification modules produced meaningful and statistically significant improvements in relevancy, allowing us to accept the hypothesis.
Domke, Eric M., "Query Expansion Techniques for Enterprise Search" (2017). Masters Theses. 873.