<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
  <title>CompClust</title>
  <link rel="stylesheet" href="/compclust_css" type="text/css"
  	tal:attributes="href compclust_css" />
</head>
<body>
<h1>Welcome to CompClust Web</h1>
<p id="intro">
CompClust focuses on gaining a more quantitative and qualitative
understanding of clustering results and the relationships between them
and other diverse data.
</p>

<p>
As other large-scale data types mature (global chromatin
immunoprecipitation assays, more complete and highly articulated
protein: protein interaction maps,  GO ontology categories,
evolutionarily conserved sequence features, and diverse other
covariates) the emphasis is rapidly shifting from analyzing and mining
expression data alone to integrating disparate data types.  A key
feature of any system designed for integration is the ability to provide
a many-to-many mapping of labels to data features and data features to
other data features in a global way. CompClust provides these
capabilities through its use of powerful labelings.  Data
transformation, merger, aggregation and linking are also needed, and in
CompClust these needs are met through the use of its dataset Views.
CompClust currently provides these abilities through a <a
  href="http://www.python.org">Python</a> application programming interface
(API) that is immediately and fully usable in a command line interface (CLI)
provided through Python's exposed interpreter.  Major capabilities illustrated
in <a href="http://woldlab.caltech.edu/~hart/frameworkPaper_s2_final.pdf">  Hart
  et al., 2004 </a> are accessible through this web interface and offer much
convenience and no need to learn Python commands.  However a tutorial for fully
utilizing CompClust via the command line is available at
<a href="http://woldlab.caltech.edu/compClust">
  http://woldlab.caltech.edu/compClust </a>
</p>

<p>
This web interface permits users to perform the major classes of analyses shown
within <a href="http://woldlab.caltech.edu/~hart/frameworkPaper_s2_final.pdf">
  Hart et al., 2004 </a> such as:
</p>
<p>
<ul>
  <li>Basic clustering tools including: DiagEM (EM MoDG), KMeans,
       XClust.</li>
   <li>Cluster comparisons using Confusion Arrays with quantitative
       scoring via normalized mutualinformation (NMI) and linear assignment (LA).</li>
   <li>Receiver Operator Characteristic (ROC) analysis to assay cluster,
       overlap and quality.</li>
   <li>Preliminary PCA projection to better understand the dataspace.</li>
</ul>
</p>
<p>
Although the web interface provide many useful functionalities, we encourage
users to learn to use the Python command line environment.  It can be learned -
at the level needed - in a short time (a few weeks of part time effort) by
users who have no prior computer programming experience.  The reward is access
to a remarkable flexibility for interrogating dataset ts that cannot be
captured in GUIs or web interfaces.  This flexibility matches the many diverse
questions and comparisons that a biologist wants to make and visualize in order
to to meet specific needs of each study and set of biological data mining aims.
</p>

<hr/>

<h2> Cell Cycle Example </h2>

<p>
 The  Cho Yeast Cell cycling data Cho et. al., 1998.  is already loaded
 and ready to explore using CompClust Web.
</p>

<p>
 <a href="http://woldlab.caltech.edu/~hart/frameworkPaper_s2_final.pdf">  Hart
   et al., 2004 </a> describes in more detail the background and motivations of
 many of the analysis that are enabled by CompClust. 
</p>

<p>
  <b><a href="compclust/ChoCycling.dat/"> Continue to the Cho et al., 1998 Data</a></b>
</p>

</body>
</html>
