To analyze relationships between perturbations, we utilize the framework of connectivity. A connectivity score between two perturbations quantifies the similarity of the cellular responses evoked by these perturbations. A score of 1 means that these two perturbations are more similar to each other than 100% of other perturbation pairs. A score of -1 means that these two perturbations are more dissimilar to each other than 100% of other perturbation pairs.

See a heatmap of connections between individual perturbagens in cell lines and all other perturbagens used for the P100 assay or the GCP assay. The tutorial describes the features of the heatmap.

Bring data, in GCT format, from your own P100 or GCP studies to query against our datasets.

Introspect means querying your dataset against itself. Make sure to "Include Introspect" if you would like to see connections within your dataset (in addition to connections between your dataset and Touchstone-P).

In computing connectivity, biological or technical replicates can be aggregated together. Please select which metadata fields should be used to recognize replicates. For example, if you wish to distinguish between different doses of the same compound, make sure to select "pert_dose" (or something similar) as one of the metadata fields by which to group replicates. The possible metadata fields by which to group replicates only appear after you have upload your GCT and selected "Yes" for "Are there replicates in your data?".


Matched mode: When running GUTC, incorporates cell-line information to match query data against matching cell types in Touchstone. Currently this includes the following 9 cell types : [A375, A549, HEPG2, HCC515, HA1E, HT29, MCF7, PC3, VCAP].
Unmatched mode (recommended): When running GUTC, does not incorporate cell-line information when querying the data against Touchstone signatures.


L-Build ("Light" Build):  All levels of L1000 data up to aggregated signatures.
Full Build:  All levels of L1000 data up to aggregated signatures, as well as all relevant additional analyses of the data (Introspect, t-SNE, PCA, etc.).

When querying Touchstone, Feature Space determines what set of genes to query against. When perturbagens are profiled on the L1000 platform, Landmark is recommended. When the queries you wish to use are not landmarks, use BING instead.

The name of the build used when generating all associated files and folders (e.g. <BUILD_ABBREVIATION>_metadata). For this reason, the abbreviation must be filename compatible.

When merging replicates for L1000, several versions of the merged data are made. This parameter determines which version to use when creating your build. by_rna_well is the default. by_rna_well is recommended.

Access a suite of analysis apps by clicking on the menu (or type command-K to open)

Give each query a descriptive name that will help you identify your results.

Tip: Each list can have a different number of genes; in fact, you can run a query with only one list (up OR down).

Your query will take about 5 minutes to process; check the History section in the Menu for your results!

Valid genes used in the query have HUGO symbols or Entrez IDs and are well-inferred or directly measured by L1000 (member of the BING gene set). Valid genes not used in a query are those that have a valid HUGO or Entrez identifier but are not part of the BING set. Invalid genes do not have HUGO or Entrez IDs.

Filter datasets by category to see only those of interest.

Data Icons identify published and proprietary datasets.

Click on a row to see a summary of that dataset, including cell lines and treatment conditions, assay type, and dates.

Arrange the table to display the information most important for your work, and add key datasets to favorites.

View details about the collection as a whole and about individual compounds.

View subsets of compounds based on mechanism, drug target, or known disease application.

Purity is assessed by ultra-performance liquid chromatography-mass spectrometry (UPLC-MS) of compounds after receipt from the vendor.

Status as of publication of this resource (March 2017). We will be updating this but let us know if you notice a discrepancy.

Click on a compound to see details about its structure, mechanism, targets, approval status, and vendor.

Mouse over this graphic to see the classes of proteins targeted by drugs in the hub.

This is the current count of perturbagens in the reference (touchstone) dataset.

Select data from perturbagens grouped by their MoA or role in the cell.

Choose a perturbagen type, or view them all.

Touchstone is our reference dataset, made from well-annotated perturbagens profiled in a core set of 9 cell lines.

Detailed List is unavailable for Touchstone v1.1.1.1. A new data visualization approach is in development, but to get results in a table format (similar to Detailed View), please click on Heat Map and download the dataset as a GCT file that can be viewed in Excel or similar apps. Please see here for a detailed explanation.

Articles are tagged with topics. Click on a topic tag to see all related articles.

Look it up! A quick reference guide of CMap terms and their meanings.

Email us with your questions.

Click on the heading to read all the articles in this section on a single page, or open each article separately.

Click on a heading to open a menu of articles.

Each article is tagged with key words that describe its content.

Underlined words link to their definition in the CMap glossary.

Your feedback helps us make Connectopedia more useful.

Average transcriptional impact

Impact is assessed as a transcriptional activity score, which is calculated as a mean value of median replicate correlation and median signature strength of a perturbagen across multiple cell lines and doses. The score describes a perturbagen’s transcriptional activity, relative to all other perturbagens, as derived from its replicate reproducibility and magnitude of differential gene expression.

PCTCCi =  rank( median( CCi ) )N


PCTSSi =  rank( median( SSi ) )N


TASi =  PCTCCi + PCTSSi2


where:

TASi is the transcriptional impact score for the i-th perturbagen

PCTCCi is the percentile, relative to all other perturbagens, of the i-th perturbagen’s median replicate correlation coefficient (CC) across all of its signatures

PCTSSi is the percentile, relative to all other perturbagens, of the i-th perturbagen’s signature strength (SS) across all of its signatures

N is the total number of perturbagens

Signature diversity

Thick black bars signify Transcriptional Activity Scores greater than or equal to 0.5; thinner black bars denote scores less than 0.5. Absence of a bar means no data available. Colored lines (chords) signify similar connectivity scores between cell lines; red for positive connectivity scores of 80-100 (pale to intense color according to the score); blue for negative connectivity. Chords are only shown when TAS scores are > 0.5; thus absence of a chord either means that the perturbagen TAS score is very low, or that no data is available. Chords for individual cell lines can be isolated from the rest of the figure by hovering over the cell line name.

Baseline expression of this gene in each cell line is represented as a z-score (top numbers). Scores were calculated using robust z-score formula:

z-scorei = ( xi - median( X ) )/( MAD( X ) * 1.4826 ),

where:

xi is expression value of a given gene in i-th cell line

X = [ x1, x2 ... xn ] is a vector of expression values for a given gene across n cell lines

MAD( X ) is a median absolute deviation of X

1.4826 is a constant to rescale the score as if the standard deviation of X instead of MAD was used

Median and MAD expression values were calculated using RNA-Seq profiles from a total of 1022 cell lines, comprising data from the Cancer Cell Line Encyclopedia (CCLE; Barretina, et al.) and cell lines nominated by the CMap team. Plots show z-score values only for the core LINCS lines used by CMap in L1000 experiments. Light red or light blue regions indicate positive or negative outlier expression, respectively, of the gene relative to the other lines shown; z-score of a positive outlier in the corresponding cell line is in dark red and a negative outlier is in dark blue.

Summary class connectivity shows a boxplot that summarizes the connectivity of a class. Each data point, shown as a light gray dot, represents the median value of connectivity of one member to the other class members. (This corresponds to the median for each row, excluding the main diagonal, in the heatmap shown below.) The box is the distribution of those data points, where the box boundary represents the interquartile range, the vertical line within the box is the median, and the whiskers reflect the minimum and maximum values of the data (exclusive of extreme outliers, which may appear beyond the whiskers).

Connectivity between members of class is a standard heat map of the connectivity scores, summarized across cell lines, between members of the class, where dark red represents the highest positive scores and deep blue the highest negative scores. Individual scores are revealed to the left below the map by hovering over each cell of the map.

Class inter-cell line connectivity is a plot of the median (black line) and Q25-Q75 connectivity scores (blue area around black line) for each cell line as well as the summary scores across cell lines. In some cases perturbations have not been tested in every cell line; the absence of data is indicated by a “0” for that cell line. The example shown reveals that these estrogen agonists show the strongest connectivity to each other in MCF7, a human breast cancer cell line that expresses the estrogen receptor.

Profile status

Colored portion of top bar indicates the Broad assays in which this compound has been profiled.

L1000 cell/dose coverage

For compounds profiled by L1000, cell lines and dose range for which signatures are available are indicated by dark gray bars (lighter gray bar indicates no data is available for that cell line/dose combination). A bar displayed one row above the 10 uM row indicates that doses higher than 10uM were tested. The 6 rows correspond to 6 canonical doses: 20 nM, 100 nM, 500 nM, 1 uM, 2.5 uM, and 10 uM. (In some cases non-canonical doses were tested; these are rounded to the nearest canonical dose for the purpose of this display. For example, if the dose tested was 3.33uM, the 2.5uM bar is shown in dark gray here.)

The Drug
Repurposing Hub
Drug Repurposing Hub
Resource for the advancement
of therapeutic discovery
Resource for the advancement of therapeutic discovery
Drug repurposing experiments are limited by the lack of comprehensive screening libraries. We introduce a best-in-class drug screening collection with more than 3,000 clinical drugs. The Repurposing Hub information resource contains extensive curated annotations for each drug, including details about commercial sources of all compounds. We invite you to explore the Hub information resource as a first step towards discovering new therapeutic applications for existing drugs.
Explore
the Library
8,622Total Samples
2,172Protein Targets
5,628Unique Compounds
660Drug Indications

About the Drug Repurposing Hub

Drug repurposing aims to discover new indications for existing drugs with known safety profiles. The advent of high-throughput technologies to systematically evaluate drug effects across different cellular contexts enables such discoveries. However, determining a complete list of clinical drugs and obtaining these compounds for laboratory use is surprisingly challenging. To address this challenge, we integrated multiple public and proprietary data sources with chemical vendor catalogs.

The Drug Repurposing Hub is a close collaboration between the Broad Institute Cancer Program, Center for the Development of Therapeutics, and the Connectivity Map group.

Screening Library

More than 8,000 compound samples were purchased from 50 different suppliers. Each compound has been assayed for purity, registered in the Broad compound management system, and annotated for structure, clinical development status, vendor information, mechanism of action, protein targets, and approved indications. Explore library contents now via the Repurposing Hub web application.

Frequently Asked Questions

Can I purchase the drug repurposing library from the Broad Institute?

We do not offer copies of the library for purchase. However, we collaborate with many groups, both at the Broad and non-Broad affiliates,. If you are interested in collaborating, please email repurposing@broadinstitute.org.

Do I need to register to access information from the repurposing hub?

No, the annotations provided in the hub are freely available for research use by any organization. The information in the Repurposing Hub may not be repackaged or redistributed for commercial purposes without permission.

Is screening data available through the repurposing hub?

No. The drug library created as part of the repurposing hub is being profiled by other projects such as the Connectivity Map and LINCS. Please review project websites for information on data release.

What information is currently available as part of the drug repurposing hub?

The dataset available in the Repurposing Hub contains comprehensive annotations for a total of 5627 compounds: 2,350 FDA-approved drugs, 1,600 drugs that reached phases 1-3 of clinical development, 95 compounds that were previously approved but withdrawn from use, and 1,582 preclinical or tool compounds. Annotations specifically include compound name, chemical structure, clinical trial status, mechanism of action, protein targets, disease areas, approved indications (where applicable), purity of the purchased sample, and vendor ID.

  • Sample information (updated 3/27/2017): Contains physical sample-level information including Broad Institute ID, compound name, QC confirmation, purity, vendor catalog number, vendor name, expected mass, SMILES, inChiKey, and Pubchem ID.

The information in the Repurposing Hub may not be repackaged or redistributed for commercial purposes without permission.

Is there an API for repurposing annotation?

The Repurposing Hub annotations are available via a RESTful web service as part of the Connectivity Map CLUE compute platform. See here for the CLUE API.

Note that to download repurposing annotation in batch format, the Drug information and Sample information downloadable files (see above) have relevant information and you do not need the API.

Information sources

Proprietary sources

Several proprietary databases were used in the initial library design and summary analyses. Due to licensing restrictions we cannot provide exports from these resources.

Acknowledgements

We thank the curators of public drug databases, our chemical vendors, and assay teams. We gratefully acknowledge funding sources including the Broad Next10 initiative, NIH LINCS Program grant 3U54 HG006093, NIH BD2K Program grant 5U01HG008699, NIH training grant T32 CA009172, NIH/Harvard Catalyst training award KL2 TR001100, and Conquer Cancer Foundation of ASCO Young Investigator Award.

Citing this work

Please cite usage as: Corsello SM, Bittker JA, Liu Z, Gould J, McCarren P, Hirschman JE, Johnston SE, Vrcic A, Wong B, Khan M, Asiedu J, Narayan R, Mader CC, Subramanian A, Golub TR. The Drug Repurposing Hub: a next-generation drug library and information resource. Nature Medicine. 23, 405–408 (2017). Online view only access via Springer Nature SharedIt is here.

Contact Drug Repurposing

Email: repurposing@broadinstitute.org