CONNECTOPEDIA

How are replicates collapsed into signatures?

TAGS:

How are replicates collapsed into signatures?

This step in the CMap data pipeline involves the collapsing of replicate profiles into signatures, the fundamental unit of connectivity. The weighted averages of z-score vectors from Level 4 per perturbation are reduced to a single differential expression vector based on Spearman correlation between the replicates (also referred to as MODZ: moderated z-score). There is one differential expression value per each of the 12,328 genes (978 landmark plus 11,350 inferred) in each replicate. Signatures therefore provide a representation of the biological response of the genome to the perturbation. For the L1000 assay, each signature is designated by its sig_id identification tag. Below is a more detailed summary of collapsing replicates to signatures.

Weighting is determined via Spearman correlation between each pair of replicate profiles from each perturbagen experiment in the level 4 data. Since Spearman correlation operates on ranked lists, the raw z-scores are first converted to ranks from 1 to n within a replicate, where n is the number of genes in the replicates. The weighting of each replicate is then calculated as the normalized sum of associations between each replicate with the others. These normalized values act as multipliers for each respective replicate vector.

What if all replicates look similar?

There are a few different scenarios that can occur with the replicate data. The first is that all replicate profiles look similar. In this case, the weighting will be equal between each and the resulting signature will also look similar. Signatures in this category also have potential to produce a high TAS score which is determined in part by the replicate correlation.

What if all replicates are dissimilar?

In this case, the all replicate correlations will be very low and weighting will be essentially equal. The resulting signature will be likely dissimilar to all the others. It should be noted that these signatures will produce a low TAS score as a consequence of low replicate correlation.

What if two replicates are alike but not the third?

If two replicate profiles look similar but the third is different, the replicate that is most dissimilar will end up being weighted much less than the others resulting in a collapsed signature that will look more similar to the better-correlated replicates.

This example highlights why we take a weighted average rather than a standard average. In the latter case, the signature that is dissimilar will have equal weighting. We want concordance in the data to have a stronger effect on the signature.

Are there cases where there are less than three replicates?

Most of the perturbations will have three replicates, but some will have less as a result of quality control filtering in previous steps. For example, some wells could produce low-quality data that will be filtered out in the pipeline, leaving only one or two replicates present. If there are only two replicates, then the two will go through the same data processing, although this will always result in equal weighting between the two replicates. If there is only one replicate, then the collapsed signature will be equal to that replicate.

Last modified: Fri Jan 11 2019 15:35:13 GMT-0500 (EST)