Menu |
About CMap

We are creating a genome-scale library of cellular signatures that catalogs transcriptional responses to chemical, genetic, and disease perturbation. To date, the library contains more than {1.3 Million} profiles resulting from perturbations of multiple cell types.

What is the Connectivity Map?

The Connectivity Map, or CMap, is a resource that uses transcriptional expression data to probe relationships between diseases, cell physiology, and therapeutics. The changes in gene expression, or “signatures,” that arise from a disease, genetic perturbation (knockdown or overexpression of a gene) or treatment with a small molecule are compared for similarity to all perturbational signatures in the database. Perturbations that elicit highly similar, or highly dissimilar, expression signatures are termed “connected”; their related transcriptional effects suggest they confer related physiological effects on the cell. Our goal is to use these connections to uncover novel treatments for a variety of diseases, including cancers, neurological diseases, and infectious diseases.

CMap is a dynamic database and we will release new versions as new data becomes available; version numbers are identified on the home page. We invite you to use CMap and our tools to analyze your gene expression profiles for connectivity to known perturbagens.

What are examples of Connectivity Map applications?

For the biologist: use CMap to reveal connections between steps of biological pathways.

For the chemist: use CMap to uncover structure-function relationships between novel and well-studied compounds

For the pharmacologist: use CMap as a first step in the drug discovery process

What is CLUE?

The dramatic increase in high-dimensional perturbational datasets available to the biomedical community has revealed the need for intuitive and performant user-interfaces to explore and query these data. We have developed a computational environment, called CLUE, to execute on state-of-the-art cloud-based systems. This environment makes data and tools available on the cloud, harmonizes datasets to facilitate interoperability between perturbational data types, and Implements web applications with user friendly graphical user interfaces that access underlying sophisticated algorithms.

Currently CLUE contains over 1.3 million gene expression profiles, related perturbational datasets, analytical tools, and web-based applications, all of which are freely available to academic users. For drug-discovery companies who want to leverage this work for their proprietary research programs while maintaining confidentiality, we offer CLUE as a subscription. See details at subscribe.

How are the gene expression signatures in CLUE generated?

Gene expression is determined using the L1000 assay, which measures the mRNA of ~1000 “landmark” genes from cells treated with chemical or genetic perturbations. (While earlier versions of CMap contained gene expression signatures obtained using Affymetrix microarray chips, the current CLUE data contains only signatures prepared from the L1000 assay.)

L1000 Assay

Cells are lysed in 384-well plates (where each well is an experiment) and the mRNA transcripts are captured on oligo-dT-coated plates. cDNAs are synthesized from the captured transcripts and subjected to ligation-mediated amplification (LMA) using locus-specific oligonucleotides harboring a unique 24-mer barcode sequence and a 5’biotin label. The biotinylated LMA products are detected by hybridization to polystyrene microspheres of distinct fluorescent color, each coupled to an oligonucleotide complementary to a barcode, and staining with streptavidin-phycoerythrin.

Tag Duo

Because only 500 bead colors are commercially available, we developed a strategy that allows two transcripts to be identified by a single bead color. Each bead is analyzed for its bead color (denoting the landmark gene identity) and the fluorescence intensity of the phycoerythrin signal (denoting the landmark transcript abundance). The final L1000 assay measures 978 landmark transcripts and 80 control transcripts chosen for their invariant expression, which serve as quality control indicators.

Data processing

Data is processed through a computational pipeline that generates a gene expression signature for each experiment. The process begins by a deconvolution step that determines the raw fluorescence value for each of the two genes assigned to each bead. These values are then normalized using a set of control genes. In addition to the directly measured 978 landmark genes, we use a regression model to impute the expression of an additional 9196 genes. We refer to this combination of landmark and inferred genes (10174 gene) as a gene expression profile. To generate signatures, expression of each gene in cells that have been treated with perturbagen is compared to expression of that gene in untreated cells, and differential expression is computed using a Z-scoring method. Scores are ranked, revealing the most up-regulated and down-regulated genes in response to a treatment; this gene set, or signature, can then be used to query CMap.

How do the CLUE tools facilitate using gene expression signatures for biological discovery?

CLUE offers a number of web-based apps, described below, for analysis of gene expression signatures. Our API provides metadata about compounds, genes, cell lines, and signatures. We have also developed command line interfaces with tools for computationalists and developers.

Analysis Tools

Touchstone App

“Touchstone” is our term for compound and genetic perturbagens (~5000) that are well-studied and generate robust gene expression signatures in cells. Thus the Touchstone data set serves as a benchmark for assessing connectivity among perturbagens. Use the Touchstone app to learn more about these perturbagens and explore their connectivities.

Query App

Use the query app to find positive and negative connections between your gene expression signature of interest and all the signatures in CMap.


The Integrated Connectivity Viewer presents connectivity data as a matrix-based interactive heatmap that provides a comprehensive view of connections and allows one to easily explore relationships within the data.

Morpheus App

Morpheus is an interactive version of the ICV that lets you manipulate and annotate an existing dataset or one of your choice.

Repurposing App

Explore our repurposing collection of ~5000 tool compounds and drugs for drug discovery opportunities.

Where can I get help?

See our Knowledge Base, email us at, or call in to Office Hours on Thursdays from 1-2PM EST.

How can I collaborate with CMap?

Email us at