Document Type

Project

Lead Author Type

CIS Masters Student

Advisors

Dr. Jonathan Leidig, jonathan.leidig@gvsu.edu

Embargo Period

8-5-2014

Abstract

WikiSearch is an information retrieval system (based on the vector space model) that can be used for searching Wikipedia, one of the largest knowledge bases in the world. Clustering techniques are utilized to group semantically related documents and improve the efficiency of the search system. Clustering allows relevant documents that do not match a query’s explicit form to be retrieved. Cluster labels are automatically generated using document features to provide a faceted browsing service for exploration and discovery. We also propose a storage scheme for creating and managing inverted index and clustering information using a NoSQL database. Finally, performance results are provided for the search system.

Share

COinS