Latin Square Data for Expression Algorithm Assessment

Human Genome U133 data set

This data set consists of 3 technical replicates of 14 separate hybridizations of 42 spiked transcripts in a complex human background at concentrations ranging from 0.125 pM to 512 pM. Thirty of the spikes are isolated from a human cell line, four spikes are bacterial controls, and eight spikes are artificially engineered sequences believed to be unique in the human genome.

The data set is expected to be useful for the development and comparison of expression analysis methods. Distinct from the Human Genome U95 data set below, this data set includes many more spikes, a smaller concentration spike (0.125 pM), a larger background population, 18 micron features scanned using the GeneChip™ Scanner 3000, and some foreign and artificial clones expected to exhibit little, if any, specific cross-hybridization.

This data set requires a special, alternate chip description file (CDF), available below, containing information about the eight artificial clones. The exact spiked sequences are found in the Excel file describing the experimental design.

Description	File name	Size
U133 Description	HG-U133A_tag_description.zip	19 KB
U133 CDF	HG-U133A_tag_CDF.zip	6.9 MB
U133 Data	HG-U133A_tag_Latin_Square.zip	141 MB
U133 Probe tabular	HG-U133A_tag_ProbeSequence.zip	4.2 MB
U133 Complete	HG-U133A_tag_libraryfiles.zip	14 MB

Human Genome U95 data set

The human data set consist of a series of genes spiked-in at known concentrations and arrayed in a Latin Square format. They represent a subset of the data used to develop and validate the expression algorithm.

These data are provided for use in conjunction with data from other groups to establish a set of common or standardized data sets that can be used by the scientific community to develop and validate expression algorithms.

The Latin Square design for the human data set consists of 14 spiked-in gene groups in 14 experimental groups. The concentration of the 14 gene groups in the first experiment is 0, 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024 pM. Each subsequent experiment rotates the spike-in concentrations by one group; i.e. experiment 2 begins with 0.25 pM and ends at 0 pM, on up to experiment 14, which begins with 1024 pM and ends with 512 pM. Each experiment contains at least 3 replicates. Additional information can be obtained by examining the data files below.

The human data set contains 14 human genes in each of 14 experimental groups. Most groups contain 1 gene. Exceptions are group 1, which contains 2 genes, and group 12, which is empty. Specifically, transcript 407_at listed as present in group 12 is actually included in group 1 (together with 37777_at). Replicates within each group result in a total of 59 CEL files.

Certain probe pairs for transcripts 407_at and 36889_at have been found to perform poorly and should be excluded from the analysis.

Description	File name	Size
U95 Description	u95.xls	38 KB
U95 - Part 1	1521lt.zip	24 MB
U95 - Part 2	1521ak.zip	30 MB
U95 - Part 3	1532ak.zip	32 MB
U95 - Part 4	1532lt.zip	26 MB
U95 - Part 5	2353ak.zip	27 MB
U95 - Part 6	2353lt.zip	25 MB
U95 - Part 7	U95a.zip	7 MB
Total size		~171 MB

For Research Use Only. Not for use in diagnostic procedures.