Document Type


Lead Author Type

CIS Masters Student


Dr. Jonathan Leidig,

Embargo Period



WikiSearch is an information retrieval system (based on the vector space model) that can be used for searching Wikipedia, one of the largest knowledge bases in the world. Clustering techniques are utilized to group semantically related documents and improve the efficiency of the search system. Clustering allows relevant documents that do not match a query’s explicit form to be retrieved. Cluster labels are automatically generated using document features to provide a faceted browsing service for exploration and discovery. We also propose a storage scheme for creating and managing inverted index and clustering information using a NoSQL database. Finally, performance results are provided for the search system.