Scaling a chip means multiplying the signals (intensity measures) for all genes by a common scale factor. The reason to do this is that the total brightness is significantly different between the from the two channels. If the same total weight of RNA is hybridized in both channels, the differences between channels must be due to different uptake of label (dye bias) of RNA hybridized. In fact microarray technology can only measure relative levels of expression: per mg RNA. For a two-color chip, we have two measures for each gene, one from each channel. For each chip we compute scale factors Cred and Cgreen , by:
where Gi and Ri are the measured intensities for the i-th array element (for example, the green and red intensities in a two-color microarray assay) and N is the total number of elements represented in the microarray. To compare ratios both intensities are appropriately scaled, for example:
This is equivalent to subtracting their average from the logarithms of all the expression ratios, which results in a mean log2(ratio) equal to zero, or the (geometric) mean ratio is equal to 1.
In order to make individual channels more comparable across chips, the same constant is used for all chips. In practice there are often anomalies at the top end, for examples a number of probes are saturated. One gets more consistent results by using a robust estimator, such as median or 1/3 – trimmed mean: take mean of middle 2/3 of probes, and scale all probes to make those equal. (John Quackenbush suggested this originally, but TIGR now uses lowess – see below.)
Whereas normalization adjusts the mean of the log2(ratio) measurements, it is common to find also that the variance of the measured log2(ratio) values to differ between arrays. One approach to dealing with this problem is to adjust the log2(ratio) measures so that the variance is the same. This often works, in reducing variance, but sometimes works too well, in that variance of individual measures is actually increased. Probably a partial adjustment is optimal, but it seems unprincipled.
Another two-parameter approach is a linear regression of one channel on the other. This doesn’t seem to do as well.
Dependent Normalization with Lowess
With a little experience it becomes clear to a researcher that these approaches do not compensate for all the systematic differences between chips that obscure and bias analysis of real biological differences. Several statisticians have tried to identify variables, which systematically bias expression ratios. For example one commonly observes that the log2(ratio) values have a systematic dependence on intensity – most commonly a deviation from zero for low-intensity spots. Under-expressed genes appear up-regulated in the red channel. Moderately expressed genes appear up-regulated in the green channel. No known biological process would regulate genes that way – this must be an artefact. It appears that the explanation is chemical: dyes don’t fluoresce equally at different levels, because of different levels of ‘quenching’ – a phenomenon where dye molecules in close proximity, re-absorb light from each other, thus diminishing the signal. Quenching acts at different levels for each dye.
The easiest way to visualize intensity-dependent effects is to plot the measured log2(Ri/Gi) for each element on the array as a function of the log2(Ri*Gi) product intensities. This 'R-I' (for ratio-intensity) plot can reveal intensity-specific artifacts in the log2(ratio) measurements. Note that Terry Speed’s group calls these variables ‘M’ and ‘A’, and the plot is an ‘MA plot’.
Figure 1. Ratio-Intensity plot showing characteristic ‘banana’ shape of cDNA ratios; log scale on both axes. (courtesy Terry Speed)
We would like a normalization method that can remove such intensity-dependent effects in the log2(ratio) values. The functional form of this dependence is unknown, and must depend on many variables we don’t measure. An ad-hoc statistical approach widely used in such situations, is to fit some smooth curve through the points. One example of such a smooth curve is a locally weighted linear regression (lowess) curve. Terry Speed’s group at Berkeley used this approach.
To calculate a lowess curve fit to a group of points (x1,y1),…(xN,yN), we calculate at each point xi, the locally weighted regression of y on x, using a weight function that down-weights data points that are more than 30% of the range away from xi. We can think of the calculated value as a kind of local mean. For each observation i on a two-color chip, set xi = log2(Ri*Gi) and yi = log2(Ri/Gi). The lowess approach first estimates y(xk), the mean value of the log2(ratio) as a function of the log2(intensity). Lowess normalization corrects systematic deviations in the R-I plot by carrying out a local weighted linear regression as a function of the log2(intensity) and subtracting the calculated best-fit average log2(ratio) from the experimentally observed ratio for each data point.
The normalized ratios r* are given by
The result is that ratios at all intensities have a mean of 0, as seen below.
Figure 2. As in Figure 1, but corrected by lowess normalization.
Global versus local normalization.
Most normalization algorithms, including lowess, can be applied either globally (to the entire data set) or locally (to some physical subset of the data). For spotted arrays, local normalization is often applied to each group of array elements deposited by a single spotting pen (sometimes referred to as a 'pen group' or 'subgrid'). Local normalization has the advantage that it can help correct for systematic spatial variation in the array, including inconsistencies among the spotting pens used to make the array, variability in the slide surface, and slight local differences in hybridisation conditions across the array. There is some controversy among biotechnologists about how likely it is that a single print tip will cause a systematic variation.
Another approach is to look for a smooth correction to uneven hybridisation. The thinking behind this approach is that most spatial variation is caused by uneven fluid flow. Flow is continuous, and hence the correction should be continuous as well.
When a particular normalization algorithm is applied locally, all the conditions and assumptions that underlie the validity of the approach must be satisfied. For example, the elements in any pen group should not be preferentially selected to represent differentially expressed genes, and a sufficiently large number of elements should be included in each pen group or spatial area for the approach to be valid.
A good design will place all contrasts of interest directly on chips, but sometimes that is impossible, or just not done. In that case we may want to compare parallel measures: , ie. measures that are not directly contrasted on an array. We observe that variance is very high between parallel measures. We need a kind of normalisation that works across arrays as well as within arrays. It turns out that quantile normalization works quite well at reducing variance between arrays, while not losing any of the properties of lowess normalization.