A dimensionality reduction technique particularly well suited for visualizing data. (For references, see https://lvdmaaten.github.io/tsne)

The parameters that were used for running t-SNE here are: 50 initial dimensions, perplexity of 30, and theta of 0.5. For datasets with <= 5000 samples, the standard t-SNE algorithm is used. For larger datasets, the Barnes-Hut algorithm is employed.

A dimensionality reduction technique in which the two principal components are chosen to have the largest possible variance.

To analyze relationships between perturbations, we utilize the framework of connectivity. A connectivity score between two perturbations quantifies the similarity of the cellular responses evoked by these perturbations. A score of 1 means that these two perturbations are more similar to each other than 100% of other perturbation pairs. A score of -1 means that these two perturbations are more dissimilar to each other than 100% of other perturbation pairs.

See a heatmap of connections between individual perturbagens in cell lines and all other perturbagens used for the P100 assay or the GCP assay. The tutorial describes the features of the heatmap.

Bring data, in GCT format, from your own P100 or GCP studies to query against our datasets.

Introspect means querying your dataset against itself. Make sure to "Include Introspect" if you would like to see connections within your dataset (in addition to connections between your dataset and Touchstone-P).

In computing connectivity, biological or technical replicates can be aggregated together. Please select which metadata fields should be used to recognize replicates. For example, if you wish to distinguish between different doses of the same compound, make sure to select "pert_dose" (or something similar) as one of the metadata fields by which to group replicates. The possible metadata fields by which to group replicates only appear after you have upload your GCT and selected "Yes" for "Are there replicates in your data?".


Matched mode: When running GUTC, incorporates cell-line information to match query data against matching cell types in Touchstone. Currently this includes the following 9 cell types : [A375, A549, HEPG2, HCC515, HA1E, HT29, MCF7, PC3, VCAP].
Unmatched mode (recommended): When running GUTC, does not incorporate cell-line information when querying the data against Touchstone signatures.


L-Build ("Light" Build):  All levels of L1000 data up to aggregated signatures.
Full Build:  All levels of L1000 data up to aggregated signatures, as well as all relevant additional analyses of the data (Introspect, t-SNE, PCA, etc.).

When querying Touchstone, Feature Space determines what set of genes to query against. When perturbagens are profiled on the L1000 platform, Landmark is recommended. When the queries you wish to use are not landmarks, use BING instead.

Root location within a brew folder that contains the instance matrices and the brew_group folder. Default is brew/pc

List of expected treatment doses in micromolar as a listmaker list. If provided, dose discretization is applied to the pert_dose metadata field to generate a canonicalized pert_idose field. Note this assumes that the pert_dose annotations are in micromolar.

Generates TAS plots and connectivity heatmap of preliminary callibration plates to identify the most suitable experimental conditions of specified parameters. Tool should be run on small pilot experiments, with a variety of experimental parameters such as seeding density and time point. Plots can also be decoupled by parameters such as cell id.

Column filter to sig_build_tool as a listmaker collection

The name of the build used when generating all associated files and folders (e.g. <BUILD_CODE>_metadata). For this reason, the code must be filename compatible.

When merging replicates for L1000, several versions of the merged data are made. This parameter determines which version to use when creating your build. by_rna_well is the default. by_rna_well is recommended.

All data is from the Cancer Cell Line Encyclopedia resource. Expression data was released 15-Aug-2017, copy number data is dated 27-May-2014, and mutational data is dated 15-Aug-2017.


Feature Mapping: Ensembl Ids from the source data were mapped to Entrez Gene Ids using gene annotations from NCBI (downloaded on 02-Mar-2016).
Normalization:  RNAseq RPKM values were log2 transformed using log2(max(RPKM, eps)). The data were then normalized such that the expression values were comparable across cell lines, by minimizing technical variation and equalizing their distributions (for details of the normalization, see LISS and QNORM entries in the Connectopedia glossary). Post-normalization, the expression values range between 4 and 15 log2 units, with 4 indicating that a gene is minimally or not expressed and 15 indicating the maximum readout.
Z-scores: The number of standard deviations that a gene is above or below the population mean is called its z-score. The "robust" z-score is resistant to outliers by using median instead of mean and median absolute deviation (MAD) instead of standard deviation. The reference population used to compute the median and MAD for a particular gene is all CCLE lines with data for that gene.
Z-scores Within Primary Site: Similar to z-scores, but the reference population used to compute the median and MAD is all CCLE lines from the same lineage with data for that gene.

All scores indicated are in log 2 ratios to reference, binned using the heuristics described in CNVkit.

Deletion:  score < -1.1
Loss:  -1.1 ≤ score ≤ -0.25
No change:  -0.25 < score < +0.2
Gain: +0.2 ≤ score < +0.7
Amplification: +0.7 ≤ score

Access a suite of analysis apps by clicking on the menu (or type command-K to open)

Switch between running a single query and running a batch query.

Give each query a descriptive name that will help you identify your results.

Tip: Each list can have a different number of genes; in fact, you can run a query with only one list (up OR down).

Your query will take about 5 minutes to process; check the History section in the Menu for your results!

Valid genes used in the query have HUGO symbols or Entrez IDs and are well-inferred or directly measured by L1000 (member of the BING gene set). Valid genes not used in a query are those that have a valid HUGO or Entrez identifier but are not part of the BING set. Invalid genes do not have HUGO or Entrez IDs.

Give each query a descriptive name that will help you identify your results.

Your query will take about 5 minutes to process; check the History section in the Menu for your results!

The sig_fastgutc_tool is a reimplementation of our query algorithm that enables faster query results, especially at larger batch sizes. It is the result of crowd-sourced contest. It is currently in beta mode.

Filter datasets by category to see only those of interest.

Data Icons identify published and proprietary datasets.

Click on a row to see a summary of that dataset, including cell lines and treatment conditions, assay type, and dates.

Arrange the table to display the information most important for your work, and add key datasets to favorites.

View details about the collection as a whole and about individual compounds.

View subsets of compounds based on mechanism, drug target, or known disease application.

Purity is assessed by ultra-performance liquid chromatography-mass spectrometry (UPLC-MS) of compounds after receipt from the vendor.

Status as of publication of this resource (March 2017). We will be updating this but let us know if you notice a discrepancy.

Click on a compound to see details about its structure, mechanism, targets, approval status, and vendor.

Mouse over this graphic to see the classes of proteins targeted by drugs in the hub.

This is the current count of perturbagens in the reference (touchstone) dataset.

Select data from perturbagens grouped by their MoA or role in the cell.

Choose a perturbagen type, or view them all.

Touchstone is our reference dataset, made from well-annotated perturbagens profiled in a core set of 9 cell lines.

Detailed List is unavailable for Touchstone v1.1.1.1. A new data visualization approach is in development, but to get results in a table format (similar to Detailed View), please click on Heat Map and download the dataset as a GCT file that can be viewed in Excel or similar apps. Please see here for a detailed explanation.

Articles are tagged with topics. Click on a topic tag to see all related articles.

Look it up! A quick reference guide of CMap terms and their meanings.

Email us with your questions.

Click on the heading to read all the articles in this section on a single page, or open each article separately.

Click on a heading to open a menu of articles.

Each article is tagged with key words that describe its content.

Underlined words link to their definition in the CMap glossary.

Your feedback helps us make Connectopedia more useful.

Average transcriptional impact

TAS is a metric that incorporates the signature strength (the number of significantly differentially expressed transcripts) and signature concordance (the reproducibility of those changes across biological replicates) to capture activity of a compound. The score is computed as the geometric mean of the signature strength and the 75th quantile of pairwise replicate correlations for a given signature. Prior to computing the geometric mean, the signature strength is multiplied by the square root of the number of replicates. This serves to mitigate score shrinkage with increasing replicate number and allows TAS values derived from signatures of different numbers of replicates to be compared with each other.

Signature diversity

Thick black bars signify Transcriptional Activity Scores greater than or equal to 0.5; thinner black bars denote scores less than 0.5. Absence of a bar means no data available. Colored lines (chords) signify similar connectivity scores between cell lines; red for positive connectivity scores of 80-100 (pale to intense color according to the score); blue for negative connectivity. Chords are only shown when TAS scores are > 0.5; thus absence of a chord either means that the perturbagen TAS score is very low, or that no data is available. Chords for individual cell lines can be isolated from the rest of the figure by hovering over the cell line name.

Baseline expression of this gene in each cell line is represented as a z-score (top numbers). Scores were calculated using robust z-score formula:

z-scorei = ( xi - median( X ) )/( MAD( X ) * 1.4826 ),

where:

xi is expression value of a given gene in i-th cell line

X = [ x1, x2 ... xn ] is a vector of expression values for a given gene across n cell lines

MAD( X ) is a median absolute deviation of X

1.4826 is a constant to rescale the score as if the standard deviation of X instead of MAD was used

Median and MAD expression values were calculated using RNA-Seq profiles from a total of 1022 cell lines, comprising data from the Cancer Cell Line Encyclopedia (CCLE; Barretina, et al.) and cell lines nominated by the CMap team. Plots show z-score values only for the core LINCS lines used by CMap in L1000 experiments. Light red or light blue regions indicate positive or negative outlier expression, respectively, of the gene relative to the other lines shown; z-score of a positive outlier in the corresponding cell line is in dark red and a negative outlier is in dark blue.

Summary class connectivity shows a boxplot that summarizes the connectivity of a class. Each data point, shown as a light gray dot, represents the median value of connectivity of one member to the other class members. (This corresponds to the median for each row, excluding the main diagonal, in the heatmap shown below.) The box is the distribution of those data points, where the box boundary represents the interquartile range, the vertical line within the box is the median, and the whiskers reflect the minimum and maximum values of the data (exclusive of extreme outliers, which may appear beyond the whiskers).

Connectivity between members of class is a standard heat map of the connectivity scores, summarized across cell lines, between members of the class, where dark red represents the highest positive scores and deep blue the highest negative scores. Individual scores are revealed to the left below the map by hovering over each cell of the map.

Class inter-cell line connectivity is a plot of the median (black line) and Q25-Q75 connectivity scores (blue area around black line) for each cell line as well as the summary scores across cell lines. In some cases perturbations have not been tested in every cell line; the absence of data is indicated by a “0” for that cell line. The example shown reveals that these estrogen agonists show the strongest connectivity to each other in MCF7, a human breast cancer cell line that expresses the estrogen receptor.

Profile status

Colored portion of top bar indicates the Broad assays in which this compound has been profiled.

L1000 cell/dose coverage

For compounds profiled by L1000, cell lines and dose range for which signatures are available are indicated by dark gray bars (lighter gray bar indicates no data is available for that cell line/dose combination). A bar displayed one row above the 10 uM row indicates that doses higher than 10uM were tested. The 6 rows correspond to 6 canonical doses: 20 nM, 100 nM, 500 nM, 1 uM, 2.5 uM, and 10 uM. (In some cases non-canonical doses were tested; these are rounded to the nearest canonical dose for the purpose of this display. For example, if the dose tested was 3.33uM, the 2.5uM bar is shown in dark gray here.)

Past Release Notes

CLUE Release Notes

  • Build Request
    • Ability to save a draft of a build before submitting
    • Improvements to UI and ordering of fields
    • Rerun sig_tools on completed builds
  • Updated repurposing with latest qc and structure images
  • Various bug fixes
  • Touchstone app has been updated to v1.1, users can now view results in Detailed View
  • Query validation is now up
  • New interactive Connectopedia article
  • Build Request available for internal users
  • Various bug fixes
  • Ability to upload .grp (.gmx and .gmt) files using Batch Query and Individual Query
  • CCLE collections updated in Cell App
  • Updates on t-SNE in Data Library
  • Various bug fixes
  • Ability to run batch queries using list maker collection
  • Ability to run with sig_fastgutc_tool
  • Ability to download full query results
  • Listmaker modal can now be used in apps that are applicable
  • Updates to the Cell App annotations
  • Command for Proteomics released
  • Various bug fixes
  • Collections in list maker has been updated, including the ability to rename and delete collections
  • Introduced new Cell App V.1.0.0.0 including ~3,000 cell lines and their annotations
  • Various bug fixes
  • Introduced the concept of collections to Listmaker, which allows users to group lists together for multi-list uses
  • Expanded multi-list upload to include .gmx files
  • Improvements to the build request form
  • Sig_calib tool is available in Data Library
  • Expanded Repurposing library, see announcement for details
  • Various bug fixes
  • Command
    • A new command /gex has been released. /gex is used to look up the expression of one or more genes across 1000+ cell lines from CCLE
  • Query
    • Gene lists can be provided using Listmaker, rather than only by copying and pasting
  • Data Library
    • From the landing page, individual datasets can be directly navigated to, rather than having to click on a project and then selecting a dataset of interest
  • Various bug fixes
  • Listmaker
    • GMT files can now be used to upload multiple lists at a time
    • Items in a list can now be viewed in a side card when the list is selected in the table. This side card is also where edits will be made from now on.
    • Lists are now grouped by a "Group name"
    • Download the contents any individual list as a GRP file
  • Data Library
    • Improvements to build request form
    • Lassoing items in the TAS and T-SNE plots now has the option to create/edit lists from selection
  • Admin tool updates
  • Various bug fixes
  • New Data Library including the Mike Table, Detailed View including downloads, data table, and analysis tools
  • Updated proteomics query
  • Various bug fixes
  • We have released Command Launch Bar that will guide users with their analysis by suggesting other, related commands based off of their selection
  • Various bug fixes
  • Improvements to code structure and storage to allow for faster loading times
  • New command /assay has been added to the Command app
  • CMap users may now copy and clone build requests in theData Library
  • ICV tutorial is up to orient users on how to use the heat map
  • Command /conn now uses the latest version of Touchstone
  • CLUE is now faster because of performance enhancements
  • Various bug fixes
  • New and improved content on our Code page, including links to repositories, tutorials, and documentation for Clue's open source software packages
  • Improvements to Clue card loading and error reporting
  • Admin tool updates
  • Various bug fixes
  • Tox landing page has been added
  • Admin tool updates
  • Various bug fixes
  • Data Library
    • Improved Detailed View UI
    • Clue Card can now be accessed in Detailed View, and you can toggle between this and the Build Card
    • Global analyses can now be accessed when available from Detailed View
  • Command
    • Redesigned home tab
    • Improvements to table and heat map UI
    • Additional display and input interpretation options
    • Clue Cards are available for relevant commands when not logged in
  • Connectopedia
    • Improved glossary hover functionality in articles
    • CMap team members can now edit Connectopedia articles
  • Various bug fixes
  • Data Library
    • The build cards originally only viewable on the All Builds tab are now available in Detailed Views of builds
    • Facet and table titles have been replaced with more readable labels
    • The Detailed View is being redesigned! Choose a build to explore and take a look at the new page layout, including redesigned buttons and a link to our glossary underneath the table
  • Connectopedia now has a Proteomics category for articles pertaining to GCP and P100
  • Various bug fixes
  • Detailed list option is not available for queries run on version 1.1.1.1. Please see below for detailed explanation:

We have mothballed the “long ranked list” approach to viewing connectivities from a query. As the CMap dataset grew larger and larger, presenting a linear ranking of results was proving less useful (e.g. the most informative signal is at the positive and negative extremes of the ranked list so it’s not necessary to weed through the entire list). Therefore, we decided to work on new tools to help visualize meaningful connections.

However, we realize many CMap users like the lists, so in the interim we encourage you to use the Morpheus heat map-based user interface to analyze results from your queries (Note: An added advantage is that a heat map browser lets you view results from many queries, as well as individual cell line results, side-by-side in a concise format). You can download the ranked list from this view, as described below.

1. Click the Heatmap button to use Morpheus.

2. To download the list, select “Save Dataset” from the File menu of the heatmap.

3. Save the file in GCT version 1.3 format (see this article in Connectopedia for information about GCT files).

4. The file will download to your computer. Open it in a text editor, such as Microsoft Word, save it as a txt file, and then open it in Excel to see the data in column format; the ranked list of connections shown in the heatmap with connectivity scores for each cell line. Row 3 shows the column labels.

Note that the conventional list view continues to be available for viewing results from CMap touchstone v1.0 (via the Touchstone app in the Tools menu).

  • Data Library
    • You can now favorite builds on Data Library
    • You can also view the email list for those who have access to the datasets
  • There is now an About CMap and an About CLUE page
  • We also incorporated many more coaching tips to make your navigation on CLUE easier
  • Various bug fixes
  • We have launched a new and improved version of the Data Library!
    • Improvements to cards for each dataset so users can gather important information about the projects
    • Added facets for easy sorting and navigation
    • Explore the datasets further in the Detailed View page
  • Query results later than Version 1.1.1.0 cannot currently be viewed as Detailed List
  • Various bug fixes
  • Terms from our glossary can now be seen in Connectopedia articles. Hovering over any blue-underlined term will display a tooltip of that term’s definition.
  • Various bug fixes
  • Added inchiKey to table in Repurposing
  • Fixed logic behind menu to decrease loading time
  • Added section for comments when voting NO for Connectopedia articles
  • Removed ICV from the main menu (though it is still accessible through direct links and through the Data Library app)
  • Various bug fixes
  • Official release of the Proteomics landing page and Proteomics App
    • Explore P100 or GCT data within Touchstone-P
    • Query your own data against Touchstone-P
  • Data is now an analysis tool where users can access CMap and related datasets
  • Various bug fixes
  • Clicking outside toolbar menus will dismiss them
  • The tool that runs queries has been updated
  • PCL version has been updated, to access the old version use the endpoint /v1/pcls
  • Official release of Connectopedia, the new CLUE knowledge base
    • Search Connectopedia via the search bar in the header
    • Complete with glossary of terms
    • View articles one by one, or read an entire category on one page
  • Added contact modal accessible through Connectopedia, header and footer, and various links on pages
  • Various admin tools improvements
  • Other bug fixes
  • Homepage and Header
    • Announcements modal that informs users of CMap and CLUE related news
    • Menu has been split in three for desktop screens
    • Tools: includes analysis webapps, utility tools, developer tools, and internal admin tools
    • Projects: includes information on CMap, NIH LINCS, the Repurposing Hub, and contests
    • Partnering: includes information about ways to collaborate and work with CMap
    • Help dropdown that allows users to access coaching tips, the CLUE knowledgebase (Connectopedia), and the Contact Us modal.
    • A description of the CMap dataset and an updated reference dataset figure has been added
    • User will know if they are in an app if it says CLUE “app name” and a static page if it says only “CLUE”
  • Various admin tools improvements
  • Other bug fixes
  • Added new /conn command to Command
  • Command is now a standalone app, no longer a homepage utility. However, it can still be accessed via the box on the homepage.
  • Fixed issues preventing query results from being displayed in ICV and Connections
  • Fixed bugs that required login on all pages
  • Improved loading times for homepage and static pages
  • Introduced a gene limit to Query. Queries can now use up to 150 perturbagens.
  • Various admin tools improvements
  • Other bug fixes
  • Bugs that prevented tables from being sorted by dates and numbers have been fixed
  • The size of the table int the cart window of Touchstone has been increased for better visibility
  • /clue and /cmap standalone pages have been deprecated, and their contents have been moved to /about
  • The information about the repurposing API has been moved to the API page
  • Various admin tools improvements
  • Other bug fixes
  • The url for clue cards has been updated (they will now be addressed as /cards/[card type]/[id], see sirolimus example
  • The repurposing pages have been updated to reflect information about the recently released repurposing paper (see links here)
  • Various admin tools improvements
  • Other bug fixes
  • Command App
    • Browse results for moa, target, and gene-space
    • Access the app via the search bar on the homepage
    • Type in a perturbagen or a command to get started
    • Type in a / to see all available commands
  • API
    • Updated api and Repurposing API with new metadata and endpoints
    • Added examples and instructions to API pages
    • Redesigned API Explorer
  • Clue Cards
    • Removed bounding box from dose plot
    • Added Broad ID and L1000 Type to gene cards
    • Added new card for errors and timeouts
  • Touchstone and Connections
    • Fixed bug that prevented users from seeing items at the bottom of the tables
  • Users will now be prompted to login when attempting to view a login-only page while logged out
  • Fixed spacing issues in header and some static pages
  • Fixed and restyled tooltips in the main menu
  • Adjustments to table elements
  • Minor bug fixes
  • Tables in Touchstone and Connections no longer overlap with clue cards and instruction text
  • Clue cards are now addressable (see Sirolimus example)
  • Clue API 1.0 has been released
  • Added coaching tips for homepage, Touchstone, and Query (access by clicking the help icon at the top or pressing CONTROL ALT H in any browser on Mac, or ALT H in IE, Chrome, and Safari on Windows, or ALT SHIFT H in Firefox on Windows)
  • Improved and added new Admin Tools
  • Created new page for requesting builds
  • Backend infrastructure work
  • Connections
    • Viewing results by perturbagen type in the dropdown will disable the CMap Class facet
    • Fixed bug that would display count for CMap classes but would not show classes in the table
  • Usernames for logging in, signing up, and recovering passwords have been made case insensitive
  • New data curation tools for admins
  • Proprietary MOA data has been removed
  • Touchstone
    • The number of possible selections in the table has been limited to 100, selecting more than this limit will not allow users to view their results as detailed list or heatmap
    • Tooltips are now available for plots on the CLUE cards
  • Connections
    • Tooltips are now available for plots on the CLUE cards and class cards
    • The connections table can now be exported the same way touchstone tables are
    • Results from queries are no longer labelled as UPTAG
  • Query
    • One sided queries (where all perts are uptag, for example) will now work and can be viewed as detailed list or heatmap
  • History
    • Multiple query results can now be viewed as heatmap or detailed list
  • API
    • Users can now pull a limited amount of results from API calls, with a limit of 1000 records at a time
  • General bug fixes and enhancements to admin pages and tools
  • Touchstone
    • In the column selector, we got rid of Activity, OE KD, and Selected as options and added Target as a option.
    • In the table, the description for compounds is the mechanism of action of that particular compound while the description for genes is the gene family name.
    • Users can search by MoA and BRD ID, in addition to compound name.
    • An interactive introspect plot has been added in the compound card as an visual to hone in connections between cell lines.
  • Connections
    • Under the CMap Classes, the connecting score is shown next to each class, sorted by descending score. The names for each class are also human-readable, meaning that they are correctly capitalized.
    • CMap Classes are added to the Connections table.
    • Connections have rows in the table for CMap Class
    • ’CMap Class’ is added as Perturbagen Type
  • Query
    • After a user hits ‘Submit’ there is a link that takes them to the History page to look at their most recent queries.
  • History
    • ”View as detailed list” for query results has been grayed out.