Human Genome U133 data set

This data set consists of 3 technical replicates of 14 separate hybridizations of 42 spiked transcripts in a complex human background at concentrations ranging from 0.125 pM to 512 pM. Thirty of the spikes are isolated from a human cell line, four spikes are bacterial controls, and eight spikes are artificially engineered sequences believed to be unique in the human genome.

The data set is expected to be useful for the development and comparison of expression analysis methods. Distinct from the Human Genome U95 data set below, this data set includes many more spikes, a smaller concentration spike (0.125 pM), a larger background population, 18 micron features scanned using the GeneChip™ Scanner 3000, and some foreign and artificial clones expected to exhibit little, if any, specific cross-hybridization.

This data set requires a special, alternate chip description file (CDF), available below, containing information about the eight artificial clones. The exact spiked sequences are found in the Excel file describing the experimental design.


File name


U133 Description

19 KB

U133 CDF

6.9 MB

U133 Data

141 MB

U133 Probe tabular

4.2 MB

U133 Complete

14 MB

Human Genome U95 data set

The human data set consist of a series of genes spiked-in at known concentrations and arrayed in a Latin Square format. They represent a subset of the data used to develop and validate the expression algorithm.

These data are provided for use in conjunction with data from other groups to establish a set of common or standardized data sets that can be used by the scientific community to develop and validate expression algorithms. 

The Latin Square design for the human data set consists of 14 spiked-in gene groups in 14 experimental groups. The concentration of the 14 gene groups in the first experiment is 0, 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024 pM. Each subsequent experiment rotates the spike-in concentrations by one group; i.e. experiment 2 begins with 0.25 pM and ends at 0 pM, on up to experiment 14, which begins with 1024 pM and ends with 512 pM. Each experiment contains at least 3 replicates. Additional information can be obtained by examining the data files below.

The human data set contains 14 human genes in each of 14 experimental groups. Most groups contain 1 gene. Exceptions are group 1, which contains 2 genes, and group 12, which is empty. Specifically, transcript 407_at listed as present in group 12 is actually included in group 1 (together with 37777_at). Replicates within each group result in a total of 59 CEL files.

Certain probe pairs for transcripts 407_at and 36889_at have been found to perform poorly and should be excluded from the analysis.


File name


U95 Description


38 KB

U95 - Part 1

24 MB

U95 - Part 2

30 MB

U95 - Part 3

32 MB

U95 - Part 4

26 MB

U95 - Part 5

27 MB

U95 - Part 6

25 MB

U95 - Part 7

7 MB

Total size


~171 MB

For Research Use Only. Not for use in diagnostic procedures.