A dimensionality reduction technique particularly well suited for visualizing data. (For references, see https://lvdmaaten.github.io/tsne)
The parameters that were used for running t-SNE here are: 50 initial dimensions, perplexity of 30, and theta of 0.5. For datasets with <= 5000 samples, the standard t-SNE algorithm is used. For larger datasets, the Barnes-Hut algorithm is employed.
A dimensionality reduction technique in which the two principal components are chosen to have the largest possible variance.
To analyze relationships between perturbations, we utilize the framework of connectivity. A connectivity score between two perturbations quantifies the similarity of the cellular responses evoked by these perturbations. A score of 1 means that these two perturbations are more similar to each other than 100% of other perturbation pairs. A score of -1 means that these two perturbations are more dissimilar to each other than 100% of other perturbation pairs.
See a heatmap of connections between individual perturbagens in cell lines and all other perturbagens used for the P100 assay or the GCP assay. The tutorial describes the features of the heatmap.
Bring data, in GCT format, from your own P100 or GCP studies to query against our datasets.
Introspect means querying your dataset against itself. Make sure to "Include Introspect" if you would like to see connections within your dataset (in addition to connections between your dataset and Touchstone-P).
In computing connectivity, biological or technical replicates can be aggregated together. Please select which metadata fields should be used to recognize replicates. For example, if you wish to distinguish between different doses of the same compound, make sure to select "pert_dose" (or something similar) as one of the metadata fields by which to group replicates. The possible metadata fields by which to group replicates only appear after you have upload your GCT and selected "Yes" for "Are there replicates in your data?".
Matched mode: When running GUTC, incorporates cell-line information to match
query data against matching cell types in Touchstone. Currently
this includes the following 9 cell types : [A375, A549,
HEPG2, HCC515, HA1E, HT29, MCF7, PC3, VCAP].
Unmatched mode (recommended): When running GUTC, does not incorporate cell-line information when querying
the data against Touchstone signatures.
L-Build ("Light" Build): All levels of L1000 data up to aggregated signatures.
Full Build: All levels of L1000 data up to aggregated signatures, as well as all relevant additional analyses
of the data (Introspect, t-SNE, PCA, etc.).
When querying Touchstone, Feature Space determines what set of genes to query against. When perturbagens are profiled on the L1000 platform, Landmark is recommended. When the queries you wish to use are not landmarks, use BING instead.
Root location within a brew folder that contains the instance matrices and the brew_group folder. Default is brew/pc
List of expected treatment doses in micromolar as a listmaker list. If provided, dose discretization is applied to the pert_dose metadata field to generate a canonicalized pert_idose field. Note this assumes that the pert_dose annotations are in micromolar.
Generates TAS plots and connectivity heatmap of preliminary callibration plates to identify the most suitable experimental conditions of specified parameters. Tool should be run on small pilot experiments, with a variety of experimental parameters such as seeding density and time point. Plots can also be decoupled by parameters such as cell id.
Column filter to sig_build_tool as a listmaker collection
The name of the build used when generating all associated files and folders (e.g. <BUILD_CODE>_metadata). For this reason, the code must be filename compatible.
When merging replicates for L1000, several versions of the merged data are made. This parameter determines which version to use when creating your build. by_rna_well is the default. by_rna_well is recommended.
All data is from the Cancer Cell Line Encyclopedia resource. Expression data was released 15-Aug-2017, copy number data is dated 27-May-2014, and mutational data is dated 15-Aug-2017.
Feature Mapping: Ensembl Ids from the source data were mapped to Entrez Gene Ids using gene annotations from NCBI (downloaded on 02-Mar-2016).
Normalization: RNAseq RPKM values were log2 transformed using log2(max(RPKM, eps)). The data were then normalized
such that the expression values were comparable across cell lines, by minimizing technical variation
and equalizing their distributions (for details of the normalization, see LISS and QNORM entries in the Connectopedia glossary). Post-normalization, the expression values range between 4 and 15 log2 units, with 4 indicating that
a gene is minimally or not expressed and 15 indicating the maximum readout.
Z-scores: The number of standard deviations that a gene is above or below the population mean is called its z-score. The "robust" z-score is resistant to outliers by using median instead of mean and median absolute deviation (MAD) instead of standard deviation. The reference population used to compute the median and MAD for a
particular gene is all CCLE lines with data for that gene.
Z-scores Within Primary Site: Similar to z-scores,
but the reference population used to compute the median and MAD is all CCLE lines from the same lineage with data for that gene.
All scores indicated are in log 2 ratios to reference, binned using the heuristics described in CNVkit.
Deletion: score < -1.1
Loss: -1.1 ≤ score ≤ -0.25
No change: -0.25 < score < +0.2
Gain: +0.2 ≤ score < +0.7
Amplification: +0.7 ≤ score
Access a suite of analysis apps by clicking on the menu (or type command-K to open)
Explore the Connectivity Map by typing here and pressing Enter (see instructions below the search box)
Switch between running a single query and running a batch query.
Give each query a descriptive name that will help you identify your results.
Tip: Each list can have a different number of genes; in fact, you can run a query with only one list (up OR down).
Your query will take about 5 minutes to process; check the History section in the Menu for your results!
Valid genes used in the query have HUGO symbols or Entrez IDs and are well-inferred or directly measured by L1000 (member of the BING gene set). Valid genes not used in a query are those that have a valid HUGO or Entrez identifier but are not part of the BING set. Invalid genes do not have HUGO or Entrez IDs.
Give each query a descriptive name that will help you identify your results.
Your query will take about 5 minutes to process; check the History section in the Menu for your results!
The sig_fastgutc_tool is a reimplementation of our query algorithm that enables faster query results, especially at larger batch sizes. It is the result of crowd-sourced contest. It is currently in beta mode.
Filter datasets by category to see only those of interest.
Data Icons identify published and proprietary datasets.
Click on a row to see a summary of that dataset, including cell lines and treatment conditions, assay type, and dates.
Arrange the table to display the information most important for your work, and add key datasets to favorites.
View details about the collection as a whole and about individual compounds.
View subsets of compounds based on mechanism, drug target, or known disease application.
Purity is assessed by ultra-performance liquid chromatography-mass spectrometry (UPLC-MS) of compounds after receipt from the vendor.
Status as of publication of this resource (March 2017). We will be updating this but let us know if you notice a discrepancy.
Click on a compound to see details about its structure, mechanism, targets, approval status, and vendor.
Mouse over this graphic to see the classes of proteins targeted by drugs in the hub.
Search CMap for connections. First, search for a perturbagen and check the box to select it. Then view its connections as a list or heatmap.
This is the current count of perturbagens in the reference (touchstone) dataset.
Select data from perturbagens grouped by their MoA or role in the cell.
Choose a perturbagen type, or view them all.
Touchstone is our reference dataset, made from well-annotated perturbagens profiled in a core set of 9 cell lines.
Detailed List is unavailable for Touchstone v1.1.1.1. A new data visualization approach is in development, but to get results in a table format (similar to Detailed View), please click on Heat Map and download the dataset as a GCT file that can be viewed in Excel or similar apps. Please see here for a detailed explanation.
Use the Command app to retrieve the most up-to-date CMap information. Type a slash (/) to see the list of commands available. Select a command, and type the MoA, gene, perturbagen, or other keyword after it to specify your request.
Articles are tagged with topics. Click on a topic tag to see all related articles.
Look it up! A quick reference guide of CMap terms and their meanings.
Email us with your questions.
Click on the heading to read all the articles in this section on a single page, or open each article separately.
Click on a heading to open a menu of articles.
Articles with the same tag(s) as this article can be opened here. To see all articles associated with any tag, click on the tag from the list below.
Each article is tagged with key words that describe its content.
Underlined words link to their definition in the CMap glossary.
Your feedback helps us make Connectopedia more useful.
Average transcriptional impact
TAS is a metric that incorporates the signature strength (the number of significantly differentially expressed transcripts) and signature concordance (the reproducibility of those changes across biological replicates) to capture activity of a compound. The score is computed as the geometric mean of the signature strength and the 75th quantile of pairwise replicate correlations for a given signature. Prior to computing the geometric mean, the signature strength is multiplied by the square root of the number of replicates. This serves to mitigate score shrinkage with increasing replicate number and allows TAS values derived from signatures of different numbers of replicates to be compared with each other.
Signature diversity
Thick black bars signify Transcriptional Activity Scores greater than or equal to 0.5; thinner black bars denote scores less than 0.5. Absence of a bar means no data available. Colored lines (chords) signify similar connectivity scores between cell lines; red for positive connectivity scores of 80-100 (pale to intense color according to the score); blue for negative connectivity. Chords are only shown when TAS scores are > 0.5; thus absence of a chord either means that the perturbagen TAS score is very low, or that no data is available. Chords for individual cell lines can be isolated from the rest of the figure by hovering over the cell line name.
Baseline expression of this gene in each cell line is represented as a z-score (top numbers). Scores were calculated using robust z-score formula:
z-scorei = ( xi - median( X ) )/( MAD( X ) * 1.4826 ),
where:
xi is expression value of a given gene in i-th cell line
X = [ x1, x2 ... xn ] is a vector of expression values for a given gene across n cell lines
MAD( X ) is a median absolute deviation of X
1.4826 is a constant to rescale the score as if the standard deviation of X instead of MAD was used
Median and MAD expression values were calculated using RNA-Seq profiles from a total of 1022 cell lines, comprising data from the Cancer Cell Line Encyclopedia (CCLE; Barretina, et al.) and cell lines nominated by the CMap team. Plots show z-score values only for the core LINCS lines used by CMap in L1000 experiments. Light red or light blue regions indicate positive or negative outlier expression, respectively, of the gene relative to the other lines shown; z-score of a positive outlier in the corresponding cell line is in dark red and a negative outlier is in dark blue.
Summary class connectivity shows a boxplot that summarizes the connectivity of a class. Each data point, shown as a light gray dot, represents the median value of connectivity of one member to the other class members. (This corresponds to the median for each row, excluding the main diagonal, in the heatmap shown below.) The box is the distribution of those data points, where the box boundary represents the interquartile range, the vertical line within the box is the median, and the whiskers reflect the minimum and maximum values of the data (exclusive of extreme outliers, which may appear beyond the whiskers).
Connectivity between members of class is a standard heat map of the connectivity scores, summarized across cell lines, between members of the class, where dark red represents the highest positive scores and deep blue the highest negative scores. Individual scores are revealed to the left below the map by hovering over each cell of the map.
Class inter-cell line connectivity is a plot of the median (black line) and Q25-Q75 connectivity scores (blue area around black line) for each cell line as well as the summary scores across cell lines. In some cases perturbations have not been tested in every cell line; the absence of data is indicated by a “0” for that cell line. The example shown reveals that these estrogen agonists show the strongest connectivity to each other in MCF7, a human breast cancer cell line that expresses the estrogen receptor.
Profile status
Colored portion of top bar indicates the Broad assays in which this compound has been profiled.
L1000 cell/dose coverage
For compounds profiled by L1000, cell lines and dose range for which signatures are available are indicated by dark gray bars (lighter gray bar indicates no data is available for that cell line/dose combination). A bar displayed one row above the 10 uM row indicates that doses higher than 10uM were tested. The 6 rows correspond to 6 canonical doses: 20 nM, 100 nM, 500 nM, 1 uM, 2.5 uM, and 10 uM. (In some cases non-canonical doses were tested; these are rounded to the nearest canonical dose for the purpose of this display. For example, if the dose tested was 3.33uM, the 2.5uM bar is shown in dark gray here.)
The Connectivity Map is a growing resource of over 3 million perturbational profiles available to the wider scientific community. The clue.io platform offers a range of apps to facilitate the analysis of these data, but many researchers in our community prefer working from the command line to ask deep scientific questions of this large dataset.
Our team has assembled a collection of developer resources to facilitate access and analysis:
To get started, please explore the relevant modules, or click on one of the case studies to learn more.
Compare a given query gene set to all the signatures in the CMap database to find matching drugs, gene, or disease signatures.
Identify a subset of the CMap signature matrix (dataset) to analyze in-depth. Example: study small molecules that are annotated in the literature as kinase inhibitors
Lookup the relationship between compounds and genes
Convert drug names to CMap universal id system
Format my gene set queries properly
Compare a given query gene set to all the signatures in the CMap database to find matching drugs, gene, or disease signatures.
A query consists of a set of genes that are up and down-regulated in your experiment of interest.
Hint: Use entrez gene ids to enter genes. Do not use affymetrix identifiers
The CMap connectivity algorithms uses an enrichment score to compute the similarity between the input signature and each signature in the database. The clue environment offers four modes of running queries, each tailored to a different modality of use:
The API has an endpoint for querying the database using a list of genes. This method is suitable for running a large batch query.
The following is the cURL command for submitting a query against L1000 Touchstone by uploading a file.
Replace the values foruser_key,uptag-cmapfile, anddntag-cmapfilewith your user key from Clue, and the paths to your up and down files. The files for uptag and dntag should be in gmt format.
Coming soon!
Coming soon!
The algorithm returns to the user a rank ordered list of reference perturbagens correlated to the input signature. The identities of the compounds and mechanisms that they represent are hypothesis for targets or pathways that the users’ input signature impinges on. Common techniques to assist interpretation include:
Identify a subset of the CMap signature matrix (dataset) to analyze in-depth. Example: study small molecules that are annotated in the literature as kinase inhibitors
Drugs are acquired from various sources and formatted for profiling in Connectivity Map. As part of this, each small-molecule profile receives a unique BRD-based identifier (which identifies the physical samples used in the experiment) as well as a universal id (signature id) that represents the unique identifier associated with the data produced in the experiment.
See utility use case 1 for more information.
Now I have a list of universal ids. Every signature bears a unique identifier (“sig_id”) and the set of signatures that meet some criteria are identified.
Level | Description |
---|---|
Level 3 (NORM) | Data that has been normalized so that wells from a plate and acoss plates are adjusted for batch effects (Note batch effects are hard to remove completely) |
Level 4 (ZS) | Z-scores for each gene based on Level 3 with respect to the entire plate population. This comparison of profiles to their appropriate population control generates a list of differentially expressed genes. |
Level 5 (MODZ) | Replicate-collapsed z-score vectors based on Level 4. Replicate collapse generates one differential expression vector, which we term a signature. Connectivity analyses are performed on signatures. |
Coming soon!
Coming soon!
Use the UI below to create a SQL statement to run against our data
After you have run the query, you may save the results to a file or view them in our ICV App.
Use the UI below to select a data level and query using the id values you have obtained.
After you have run the query, you may save the results to a file or view them in our ICV App.
Convert drug names to CMap universal id system
Use the box below to convert a pert_name or pert_id into a sig_id. When using this method, you need to specify the data level you will be using. Data levels 3 and 4 use a different id format than level 5.
Perturbagen | Metadata level | Service endpoint | UID |
---|---|---|---|
BRD-K34157611 | Level 3 | /instinfo | CRCGN008_HEPG2_24H_X2_B29:A16 |
BRD-K34157611 | Level 4 | /instinfo | CRCGN008_HEPG2_24H_X2_B29:A16 |
BRD-K34157611 | Level 5 | /siginfo | CRCGN008_HEPG2_24H:A16 |
CLUE is a cloud-based software platform for the analysis of perturbational datasets generated using gene expression (L1000) and proteomic (P100 and GCP) assays. The CLUE platform provides integrated access to datasets, results from the processing and analysis of these data, and software tools that the community can leverage to advance their research.
In addition to apps, such as Touchstone and Query, that can facilitate analysis of these perturbational datasets, our team also offers easy access to open source software packages available in Python, R, Matlab, and Java. We will continue to expand the availability of code to further enable command line access of CLUE tools and data.
Python modules integrated with pandas for analysis, io of a variety of file formats (including GCTx, GCT, GRP, GMT files), and programmatic access to the CLUE API.
R modules integrated with tidyverse for analysis and io of a variety of file formats (including GCTx and GCT files).
Matlab modules for analysis, io of a variety of file formats (including GCTx, GCT, GRP, GMT files), and the L1000 data processing pipeline.
Java classes for analysis and io of a variety of file formats (including GCTX and LXB).
cmapBQ allows for targeted retrieval of relevant gene expression data from the resources provided by The Broad Institute and LINCS Project hosted on Google BigQuery
All software packages above are optimized for IO and analysis of GCTx files; in addition to accessing GCTx files from our datasets associated with CLUE page , you can access GCTx files of annotated perturbational datasets from L1000, proteomic, ARCHS4, and other projects from our Data Library. If you use these packages for analysis of GCTx files, please follow policies and citation instructions below.
All code here is made available under a 3-Clause BSD License. Citation guidelines are as follows:
For additional support, please refer to Connectopedia: The CLUE Knowledge Base for documentation on the file formats frequently used in CLUE (GCTx, GCT, GRP, GMT, GMX) or send us a message. If you would like to discuss a topic further with a member of our team, we also offer weekly office hours for all CLUE users. For issues with code, please file an issue with a test that reproduces the error in the relevant repository on Github.
The CLUE API offers programmatic access to annotations and perturbational signatures in the CMap L1000 dataset via a collection of HTTP-based RESTful web services. These services support complex queries via simple HTTP GET requests that can be executed in a web browser or any programming language. If you are using a web browser to display results, it is best to add your favorite JSON viewer add-on or plugin. The results are returned as standard JSON objects. Click on the links on the side for usage instructions and examples. API requests is based on the loopback framework syntax.
Select an endpoint below to read more about it and see code examples.
PertCellGenesPerturbational ClassesPlateProfileSignatureProbeset ConversionQueryRepurposingThepert service returns meta-information for perturbagens in the CMap L1000 dataset.
Examples
Thecell service returns cell line information.
Examples
Thegene service returns meta-information for measured and inferred genes in the CMap L1000 dataset.
Examples
ThePCL service returns meta-information for perturbational classes in the CLUE dataset.
Examples
Theplate service returns plate information.
Examples
Theprobeset_to_entrez_id service is used to convert affy ids to gene entrez ids.
Examples
This is an HTTP POST request, using the endpoint https://api.clue.io/api/probeset_to_entrez_id/convert. The data this method takes in is a list of affy ids in the form of an array of strings, and it will return a string representing associated gene entrez id.
Input:
Output:
The CURL of this example is as follows:
Theprofile service returns meta-information for profiles in Clue. This includes data for the CMap L1000 datasets as well as the proteomic GCP and P100 datasets.
Examples
Using various API endpoints, users can submit queries and poll for their status. A full article describing how this is done can be found in the Query API tutorial in Connectopedia.
Therep_fda_exclusivity service returns information about the exclusivity period of a given drug. This information was obtained from the FDA Orange Book publication.
Examples
Therep_drug_moa service returns information about the mechanism of action of a drug.
Examples
Therep_fda_orange-book_term service returns information describing abbreviations used in the Orange Book.
Examples
Therep_fda_patent service returns information about the patent of a given drug extracted from the Orange Book.
Examples
Therep_sample service returns information about the purity, chemical structure, source, and various textual identifiers of the compound.
Examples
Therep_drug service returns information about drug synonyms, clinical status, corresponding FDA Orange Book ingredient(s), and external database identifiers.
Examples
Therep_drug_indication service returns information about the indications and disease areas for approved drugs.
Examples
Therep_fda_product service returns information about a product extracted from the FDA Orange Book publication.
Examples
Therep_drug_target service returns information about the gene target of a compound.
Examples
Thesig service returns meta-information for signatures in the CMap L1000 dataset.
Examples
Throughout this guide, variables will be indicated by brackets, i.e. {variable_name}.
Many of the Dockers require the user to mount volumes internal to the Docker, indicated by the use of the -v arg. A local directory will be mounted inside the Docker, allowing the docker access to those local files. There are two use-cases of this flag present in this document:
To download a docker container from the command line: docker pull DOCKERNAME
Select a docker below to read more about it and see code examples.
Use this docker to convert GCTX files to the GCT file format.
filepath path (mounted within docker) to GCTX file to be converted
outdir path (mounted within docker) to subdirectory where output will be (requires trailing / )
outpath output filename (will be located within the outdir)
docker run -it --rm \ --name {name} \ -v {local path to mount}:{desired alias} \ cmap/gctx-to-gct \ --filepath {filepath} \ --outdir {outdir} \ --outpath {outpath}
Input full path: ~/my_directory/example.gctx
Docker command: docker run -it --rm --name gctx_converter -v ~/my_directory/:/mnt/ \ cmap/gctx-to-gct --filepath /mnt/example.gctx --outdir /mnt/converted \ --outpath example.gct
Output full path: ~/my_directory/converted/example.gct
Extract a subset from a larger dataset (can also be used to convert GCT↔GCTX if cid and rid are full grps of all col/row ids).
create_subdir (boolean) whether or not to create a subdirectory for output
cid path (mounted within docker) to .grp file of column ids to extract from input
rid path(mounted within docker) to .grp file of row ids to extract from input
ds path (mounted within docker) to input file to be sliced
out path (mounted within docker) where output will be saved includes file name
use_gctx boolean whether to save output as .gctx filetype, 0 returns .gct filetype
docker run --rm \ --name sig_slice_tool \ -v {local path to mount}:{desired alias} \ -it cmap/sig_slice_tool \ --create_subdir {0 or 1} \ --cid {cid} \ --rid {rid} \ --ds {ds} \ --out {out} \ --use_gctx {use_gctx}
My local directory structure:
Docker command: docker run --rm --name sig_slice_tool \ -v ~/my_directory/:/mnt/ -it cmap/sig_slice_tool --create_subdir 1 \ --cid /mnt/input/cid.grp --rid /mnt/input/rid.grp --ds /mnt/input/input.gct \ --out /mnt/output/ --use_gctx 1
Output full path: ~/my_directory/output/subset.gctx
This tool combines multiple GCT(X) files into a single GCT(X) file.
files path (mounted within docker) to .grp file of names of files be collated
parent_folder path (mounted within docker) to parent directory where files listed in grp are located
out path (mounted within docker) to output directory
docker run --name sig_collate_tool \ -v {local path to mount}:{desired alias} \ -w {working directory is desired alias} \ -t cmap/sig_collate_tool \ --files {files} \ --parent_folder {parent_folder} \ --out {out}
My local directory structure:
Docker command: docker run --name sig_collate_tool -v ~/my_directory/:/mnt/ \ -w /mnt/ -t cmap/sig_collate_tool --files /mnt/input/files.grp \ --parent_folder /mnt/input/uncollated --out /mnt/output
Output full path: ~/my_directory/output/result.gctx
Use this tool to run proteomics Query for connectivity of custom GCT with Touchstone-P.
assay (P100 || GCP)
name (string)
introspect (true or false, whether or not to compute internal connectivity)
input_file (path to input mounted within docker)
fields_to_aggregate [(list of strings referring to set of columns which will be aggregated to identify unique perturbagens)]
out_dir (path to output directory mounted within docker)
psp_on_clue_yml clue/psp_on_clue.yml
config path (mounted within docker) to yml configuration
out path (mounted within docker) to save output includes filename [NB: will override out_dir argument in yml]
docker run --rm \ --name sig_prot_query_tool \ -v {local path to mount}:{desired alias}\ -it cmap/sig_slice_tool \ --config {config} --out {out}
My local directory structure: ~/my_directory/ input/ my_configuration.yml input.gct
My_configuration.yml contents:
assay P100
name my_query
introspect true
input_file /mnt/input/input.gct
fields_to_aggregate ["pert_id", "cell_id", "pert_time"]
out_dir /mnt/this_is_overridden
psp_on_clue_yml clue/psp_on_clue.yml
Docker command: docker run --rm --name sig_prot_query_tool \ -v ~/my_directory/:/mnt/ -it cmap/sig_slice_tool \ --config /mnt/input/my_configuration.yml \ --out /mnt/output
Output full path: ~/my_directory/output/
Expected files:
INTROSPECT_CONN.gct
CONCATED_CONN.gct
The conda environment for CMapPy
None. Running docker on its own will put the user in a shell environment with cmappy_env activated.
docker run -it \ -v {local path to mount}:{desired alias} \ cmap/cmappy \ {any additional command will be run in the shell}
Input full path:-
Docker command: docker run -it cmap/cmappy python
Output: Docker is now running python, with the ability to import cmapPy
Compare replicates signatures to assess similarity.
ds_list path (mounted within docker) to, for single replicate set, grp of file paths, else tsv with column names: group_id and file_path
metric [‘spearman’, ‘wtcs’]
set_size (for wtcs metric only) recommended 50
docker run --rm \ --name sig_recall \ -v {local path to mount}:{desired alias} \ -it cmap/sig_recall_tool \ --ds_list {ds_list} \ --metric {metric}
My local directory structure: ~/my_directory/ input.tsv classA input1.gctx input2.gctx classB input1.gctx input2.gctx input3.gctx
input.tsv file contents:
group_id | file_path |
---|---|
A | /cmap/input/classA/input1.gctx |
A | /cmap/input/classA/input2.gctx |
B | /cmap/input/classB/input1.gctx |
B | /cmap/input/classB/input2.gctx |
B | /cmap/input/classB/input3.gctx |
Docker command: docker run --rm --name sig_recall -v ~/my_directory/:/mnt/ -it cmap/sig_recall_tool --ds_list /mnt/input.tsv --metric ‘spearman’
Given a build directory, this tool generates a report containing functional and technical QC plots.
--inpath path to the build directory
--out the output directory [default:.]
--rpt prefix to append to output directory. only applies if --create_subdir is passed as well [default: my_analysis]
--opts RDS file containing argument values
--title title for the report [default:]
docker run \ -it / -v /path/to/output/ \ cmap/sig_build_synopsis_tool \ --runtests \ --out /output
Run the tool in standard mode:sig_build_synopsis_tool --inpath /path/to/L-build --title BuildName