Access a suite of analysis apps by clicking on the menu (or type command-K to open)

The first step in using the Query App to compute connections with your gene expression data is to assign a name to your query. Results will be stored in your Analysis History after your query is submitted.

Enter an up-regulated gene of interest, hit enter, and type in subsequent genes in the set you would like to query. You may also have down-regulated genes of interest. They can be entered in the box to the right.

Hit submit and the query algorithm will find connections between your genes of interest and perturbagens in CMap that have signatures most similar to your query. Data are generated in approximately 5 minutes and will be stored in your Analysis History.

The L1000 assay directly measures or infers the expression levels of 12,328 genes. By evaluating the current statistical model against a large compendium of RNA-Seq profiles from over 100 tissues from the GTEx consortium, we have identified a subset of 10,174 genes that are either measured or well inferred. This subset is known as the Best INferred Gene (BING) space. The Query App uses BING space to compute similarities between users' gene sets and the gene expression signatures in the CMap database. Each user entry is therefore mapped into one of the three following categories.
Invalid gene: Not a valid HUGO symbol or Entrez ID, and therefore not used in the query.
Valid gene: A valid HUGO symbol or Entrez ID that is also part of BING space, and therefore is used in the query.
Valid but not used in query: A valid HUGO symbol or Entrez ID that is not part of BING space, and therefore is not used in the query.

Click on a perturbagen in this table to see a CLUE Card that contains all of the information available for this perturbagen. You can also select any compound in the table to query connections with all other compounds in Touchstone. Click on Detailed List to view connections in a table, or click Heatmap to see connections in a matrix powered by the Morpheus App.

Filter the Touchstone data table by selecting perturbagen type or perturbational classes of interest.

Average transcriptional impact

Impact is assessed as a transcriptional activity score, which is calculated as a mean value of median replicate correlation and median signature strength of a perturbagen across multiple cell lines and doses. The score describes a perturbagen’s transcriptional activity, relative to all other perturbagens, as derived from its replicate reproducibility and magnitude of differential gene expression.

PCTCCi =  rank( median( CCi ) )N

PCTSSi =  rank( median( SSi ) )N



TASi is the transcriptional impact score for the i-th perturbagen

PCTCCi is the percentile, relative to all other perturbagens, of the i-th perturbagen’s median replicate correlation coefficient (CC) across all of its signatures

PCTSSi is the percentile, relative to all other perturbagens, of the i-th perturbagen’s signature strength (SS) across all of its signatures

N is the total number of perturbagens

Signature diversity

Thick black bars signify Transcriptional Activity Scores greater than or equal to 0.5; thinner black bars denote scores less than 0.5. Absence of a bar means no data available. Colored lines (chords) signify similar connectivity scores between cell lines; red for positive connectivity scores of 80-100 (pale to intense color according to the score); blue for negative connectivity. Chords are only shown when TAS scores are > 0.5; thus absence of a chord either means that the perturbagen TAS score is very low, or that no data is available. Chords for individual cell lines can be isolated from the rest of the figure by hovering over the cell line name.

Baseline expression of this gene in each cell line is represented as a z-score (top numbers). Scores were calculated using robust z-score formula:

z-scorei = ( xi - median( X ) )/( MAD( X ) * 1.4826 ),


xi is expression value of a given gene in i-th cell line

X = [ x1, x2 ... xn ] is a vector of expression values for a given gene across n cell lines

MAD( X ) is a median absolute deviation of X

1.4826 is a constant to rescale the score as if the standard deviation of X instead of MAD was used

Median and MAD expression values were calculated using RNA-Seq profiles from a total of 1022 cell lines, comprising data from the Cancer Cell Line Encyclopedia (CCLE; Barretina, et al.) and cell lines nominated by the CMap team. Plots show z-score values only for the core LINCS lines used by CMap in L1000 experiments. Light red or light blue regions indicate positive or negative outlier expression, respectively, of the gene relative to the other lines shown; z-score of a positive outlier in the corresponding cell line is in dark red and a negative outlier is in dark blue.

Summary class connectivity shows a boxplot that summarizes the connectivity of a class. Each data point, shown as a light gray dot, represents the median value of connectivity of one member to the other class members. (This corresponds to the median for each row, excluding the main diagonal, in the heatmap shown below.) The box is the distribution of those data points, where the box boundary represents the interquartile range, the vertical line within the box is the median, and the whiskers reflect the minimum and maximum values of the data (exclusive of extreme outliers, which may appear beyond the whiskers).

Connectivity between members of class is a standard heat map of the connectivity scores, summarized across cell lines, between members of the class, where dark red represents the highest positive scores and deep blue the highest negative scores. Individual scores are revealed to the left below the map by hovering over each cell of the map.

Class inter-cell line connectivity is a plot of the median (black line) and Q25-Q75 connectivity scores (blue area around black line) for each cell line as well as the summary scores across cell lines. In some cases perturbations have not been tested in every cell line; the absence of data is indicated by a “0” for that cell line. The example shown reveals that these estrogen agonists show the strongest connectivity to each other in MCF7, a human breast cancer cell line that expresses the estrogen receptor.

Profile status

Colored portion of top bar indicates the Broad assays in which this compound has been profiled.

L1000 cell/dose coverage

For compounds profiled by L1000, cell lines and dose range for which signatures are available are indicated by dark gray bars (lighter gray bar indicates no data is available for that cell line/dose combination). A bar displayed one row above the 10 uM row indicates that doses higher than 10uM were tested. The 6 rows correspond to 6 canonical doses: 20 nM, 100 nM, 500 nM, 1 uM, 2.5 uM, and 10 uM. (In some cases non-canonical doses were tested; these are rounded to the nearest canonical dose for the purpose of this display. For example, if the dose tested was 3.33uM, the 2.5uM bar is shown in dark gray here.)

The Drug
Repurposing Hub
Drug Repurposing Hub
Resource for the advancement
of therapeutic discovery
Resource for the advancement of therapeutic discovery
Drug repurposing experiments are limited by the lack of comprehensive screening libraries. We introduce a best-in-class drug screening collection with more than 3,000 clinical drugs. The Repurposing Hub information resource contains extensive curated annotations for each drug, including a complete blueprint for library assembly. We invite you to explore the Hub information resource as a first step towards discovering new therapeutic applications for existing drugs.
the Library
8,622Total Samples
2,172Protein Targets
5,628Unique Compounds
660Drug Indications

About the Drug Repurposing Hub

Drug repurposing aims to discover new indications for existing drugs with known safety profiles. The advent of high-throughput technologies to systematically evaluate drug effects across different cellular contexts enables such discoveries. However, determining a complete list of clinical drugs and obtaining these compounds for laboratory use is surprisingly challenging. To address this challenge, we integrated multiple public and proprietary data sources with chemical vendor catalogs.

The Drug Repurposing Hub is a close collaboration between the Broad Institute Cancer Program, Center for the Development of Therapeutics, and the Connectivity Map group. The Hub consists of 1) a physical drug screening library available in multiple plate formats at the Broad Compound Management facility, 2) manually curated annotations as part of a comprehensive publicly-accessible information resource with data API, and 3) experimental results with LC-MS tracings and future cellular assays. Active engagement of the user community is planned via drug library additions and deposition of new assay results.

Screening Library

More than 6,000 compound samples were purchased from 50 different suppliers. Each compound has been assayed for purity, registered in the Broad compound management system, and annotated for structure, clinical development status, vendor information, mechanism of action, protein targets, and approved indications. Explore library contents now via the Repurposing Hub web application.

Data Access


We encourage use of the data API or custom web export function (login required) for real-time updates and additional annotation fields (including drug indications and information sources). Basic exports of a subset of the underlying database are provided below. Repurposing Hub data is provided for non-commercial use only. Please contact to discuss any commercial applications.

API Access

The Repurposing Hub database is available via a RESTful web service using the JSON text-based message format. Registration is required to obtain a user-specific API key. The examples below use a limited demonstration key.

API services


The rep_fda_exclusivity service returns information about the exclusivity period of a given drug. This information was obtained from the FDA Orange Book publication.



The rep_drug_moa service returns information about the mechanism of action of a drug.



The rep_fda_orange-book_term service returns information describing abbreviations used in the Orange Book.



The rep_fda_patent service returns information about the patent of a given drug extracted from the Orange Book.



The rep_sample service returns information about the purity, chemical structure, source, and various textual identifiers of the compound.



The rep_drug service returns information about drug synonyms, clinical status, corresponding FDA Orange Book ingredient(s), and external database identifiers.



The rep_drug_indication service returns information about the indications and disease areas for approved drugs.



The rep_fda_product service returns information about a product extracted from the FDA Orange Book publication.



The rep_drug_target service returns information about the gene target of a compound.


See Repurposing CLUE API documentation for more information.

Information sources

Proprietary sources

Several proprietary databases were used in the initial library design and summary analyses. Due to licensing restrictions we cannot provide exports from these resources.


We thank the curators of public drug databases, our chemical vendors, and assay teams. We gratefully acknowledge funding sources including the Broad Next10 initiative, NIH LINCS Program grant 3U54 HG006093, NIH BD2K Program grant 5U01HG008699, NIH training grant T32 CA009172, NIH/Harvard Catalyst training award KL2 TR001100, and Conquer Cancer Foundation of ASCO Young Investigator Award.


Please cite usage as: Corsello SM, Bittker JA, Liu Z, Gould J, McCarren P, Hirschman JE, Johnston SE, Vrcic A, Wong B, Khan M, Asiedu J, Narayan R, Mader CC, Subramanian A, Golub TR. The Drug Repurposing Hub: a next-generation drug library and information resource. Nature Medicine. 23, 405–408 (2017).
Public view only link

Contact Drug Repurposing