** IGNORE LINE **
** IGNORE LINE **
** IGNORE LINE **
Preprocessing of expression data

We used our own algorithm to condensate the probe level data provided by Affymetrix CEL-files per chip experiment: Background intensity was computed as the mean of the 2% darkest feature intensities. This background value was subtracted from each feature value. Subsequently, each feature value was divided by the median of all feature values. As a representative expression value (PMQ) for each probe set, the third quartile (75%) of all intensities of all perfect match oligonucleotides was used. Furthermore, to distinguish real expression signals from noise the Wilcoxon signed rank test was applied to each probe set. A probe set was called detectable if the result of the Wilcoxon signed rank test applied to its 11 probe pairs (perfect match versus mismatch oligonucleotide) had a significance level of p < 0.1 and relative expression value (PMQ) of > 4.0. We used these constraints for decision whether a gene is expressed or not due to validation results of several gene expression pattern by quantitative RT-PCR and/or Northern Blot analysis in our lab (data not shown).

For each patient and probeset an expression ratio was calculated according to the following rules: If expression was detectable in both the normal and tumor sample (Wilcoxon test p <= 0.10 and relative expression value PMQ >= 4), the ratio PMQ(T)/PMQ(N) is our expression ratio (hereafter called T/N). If expression was undetectable in either the normal or the tumor sample, the expression ratio was either set to T/N = 2 (normal absent) or to T/N = 0.5 (tumor absent). If expression was undetectable in both the normal and tumor sample, no expression ratio was calculated and we call the probe set not informative. For each probe set the number of cases which showed an up-regulation (T/N >= 2), a down-regulation (T/N <= 0.5) or the number of unchanged transcription levels (0.5 < T/N < 2) were counted. We filtered out those probe sets which are not informative in any patient, reducing the number of probe sets to 19404. To eliminate redundancy of probe sets with respect to genes, we kept only the most informative probe set of a single gene, i.e. the probe set which is informative in the highest number of matched sample pairs. Additionally, only probe sets that could unambiguously be linked to a particular genomic locus were considered (chromosome band and position; see Affymetrix U133A/B annotation files). Finally, the pre-processing resulted in a total number of 10.935 probe sets which were the basis of all further analyses.

