Batch Query Tutorial
This is a tutorial to perform batch queries from the CMap Query app using some example files (downloadable from the "Example gene lists" section below).
CLUE allows for users to query CMap in two ways: through individual queries, or in batches of up to 25 queries run at once. For an individual query, the input is a list of upregulated genes, and an optional list of downregulated genes. For a batch query, the input is a collection of gene lists, each separate list representing a query (i.e. upregulated genes), and an optional collection of "down" lists. CMap usually stores these collections of gene lists as GMT files, where each row in the file represents a different list of upregulated/downregulated genes.
Example gene lists
For this tutorial, we will be using a set of four signatures. The up and downregulated gene lists are stored in these two GMT files: example_uptag_CRCGN009.gmt (representing upregulated genes) and example_dntag_CRCGN009.gmt (for downregulated genes). If you're curious, these signatures are from the CRCGN dataset.
Download both of these files by clicking on their links. Each row in the file is a distinct query, where the first column is the name for that query, the second is the description, and the remaining columns are Entrez gene IDs. For more information on the necessary format for these lists, see the "Other Tips" section in the bottom of this tutorial.
In the following steps, we will use these two files together to submit a batch of 4 queries with an "up" (uptag) and "down" (dntag) collection.
Optional: Creating gene set collections using Listmaker
The Query app uses GMT/GMX files or the Listmaker app to load gene lists for queries. If you'd prefer to upload a file directly to Query without saving your gene lists in Listmaker, you can skip ahead to the next section.
In order to create a collection, go to Listmaker. From this page, you can upload a GMT file to create your collection of "up" gene lists and collection of "down" gene lists. For this tutorial, we will upload multiple lists at time using one GMT file, but you can also add lists individually. We will upload the GMT file with our up lists (uptag) and create a new collection. To do so, click the "+ Add" button, and drag and drop the GMT file in the box. *Specify the type of your collection as "Gene" (this is so we can use it in the Query app), and create a new collection by typing in "CRCGN up" into the Collection Name field and pressing Enter. *You may leave the tags field empty. Click "Create Lists", and click "Finish" to refresh the page and view your new collection.
Screenshots showing the steps from the section above:
We then repeat this process and create a separate collection "CRCGN_down" with our down sets (dntag). At the end of the process, we should have two new collections with four lists each.
Submitting a batch query in Query
You can launch batch queries from the Query app by selecting the dropdown on the top of the page and switching to "Batch query." Add a name for your query, and load your gene lists. If you are using file upload, you can drag and drop the files into the box. If using Listmaker, you can do so by clicking the Load Collection button under "UP-regulated genes" and "DOWN-regulated genes." If you don't see your collection as available, it is possible you didn't set the type of the lists as "Genes" when creating the collection. You can also check the "Compute with sig_fastquery_tool" option, which will reduce the runtime of your query substantially. After you have loaded both UP and DOWN, click the submit button to submit your queries.
Viewing your results through History
After you've submitted your query, you can check its progress from the History page. Once the status is marked as "complete", you can check the box next to it and click the "Heat Map" button to view its connectivity results. If your query's status is "error" and you are using your own gene sets, check out the section below for debugging tips.
GMT format: For more information on the GMT format, please see the section on the GMT file format on the GSEA wiki.
Entrez Gene IDs: L1000 queries use Entrez Gene IDs as input. Before submitting a batch query, all genes must be converted to Entrez Gene IDs. For individual genes, we recommend using the NCBI gene database and looking at the "Gene ID" field. In order to convert many genes, there are several different online tools and packages including DAVID, MyGene.info, or the clue.io API gene service that you may find useful.
BING genes: L1000 queries are run against BING space signatures (i.e. including roughly 10,000 genes), so when you submit your batch query, only the BING space genes are used. Genes that are not in BING space will not affect the result. In order to determine which genes are in BING space and are used in the query, you may use the /gene-space command in Command or the clue.io API gene service and look for genes marked "landmark" or "best inferred".
Gene list names in collections: In order to match up and down collection contents, the names of your up and down gene lists must match. You can either match with identical names (e.g. "IMATINIB_LOW_DOSE" in the up collection matches to "IMATINIB_LOW_DOSE" in the down collection) or by including _UP and _DN suffixes (e.g. "IMATINIB_LOW_DOSE_UP" in the up collection matches to "IMATINIB_LOW_DOSE_DN" in the down collection).