Reverse Tree Clustering

Document Type

Project

Advisors

Dr. Jamal Alsabbagh, alsabbaj@gvsu.edu

Embargo Period

8-17-2010

Abstract

In 1998, Mock proposed a clustering algorithm based on a reverse tree model. Unlike standard top-down tree models where an item is clustered from the root node to a leaf, Mock suggested a bottom-up approach whereby the evaluation starts at the leaves. Clusters would be created at the leaf level first, allowing for overlapping of clusters or fuzziness. An item that fits in one cluster only would be excluded from further consideration at higher levels of the tree. The remaining items would then be evaluated at the next higher level in a manner similar to the one used at the leaf level. The process is repeated by climbing up the tree and terminates when all clusters are disjoint.

In this study we implemented a modification of Mock's algorithm and applied it to the classification of a collection of 21578 Reuter's news articles. A goal of this study was to develop a clustering system that would lend itself to rapid clustering of documents or web pages. Therefore, we elected to base the clustering of documents based upon words that appear in their titles only. We feel that this would simulate the reliance upon metadata of web sites to classify such sites.

Although further study is needed, the performance of our implementation was quite satisfactory and appears to be promising for the clustering of short text phrases.

This document is currently not available here.

Share

COinS