Welcome to CompClust Web

CompClust focuses on gaining a more quantitative and qualitative understanding of clustering results and the relationships between them and other diverse data.

As other large-scale data types mature (global chromatin immunoprecipitation assays, more complete and highly articulated protein: protein interaction maps, GO ontology categories, evolutionarily conserved sequence features, and diverse other covariates) the emphasis is rapidly shifting from analyzing and mining expression data alone to integrating disparate data types. A key feature of any system designed for integration is the ability to provide a many-to-many mapping of labels to data features and data features to other data features in a global way. CompClust provides these capabilities through its use of powerful labelings. Data transformation, merger, aggregation and linking are also needed, and in CompClust these needs are met through the use of its dataset Views. CompClust currently provides these abilities through a Python application programming interface (API) that is immediately and fully usable in a command line interface (CLI) provided through Python's exposed interpreter. Major capabilities illustrated in Hart et al., 2004 are accessible through this web interface and offer much convenience and no need to learn Python commands. However a tutorial for fully utilizing CompClust via the command line is available at http://woldlab.caltech.edu/compClust

This web interface permits users to perform the major classes of analyses shown within Hart et al., 2004 such as:

Basic clustering tools including: DiagEM (EM MoDG), KMeans, XClust.
Cluster comparisons using Confusion Arrays with quantitative scoring via normalized mutualinformation (NMI) and linear assignment (LA).
Receiver Operator Characteristic (ROC) analysis to assay cluster, overlap and quality.
Preliminary PCA projection to better understand the dataspace.

Although the web interface provide many useful functionalities, we encourage users to learn to use the Python command line environment. It can be learned - at the level needed - in a short time (a few weeks of part time effort) by users who have no prior computer programming experience. The reward is access to a remarkable flexibility for interrogating dataset ts that cannot be captured in GUIs or web interfaces. This flexibility matches the many diverse questions and comparisons that a biologist wants to make and visualize in order to to meet specific needs of each study and set of biological data mining aims.

Cell Cycle Example

The Cho Yeast Cell cycling data Cho et. al., 1998. is already loaded and ready to explore using CompClust Web.

Hart et al., 2004 describes in more detail the background and motivations of many of the analysis that are enabled by CompClust.

Continue to the Cho et al., 1998 Data