Human Genome U133 data set

This data set consists of 3 technical replicates of 14 separate hybridizations of 42 spiked transcripts in a complex human background at concentrations ranging from 0.125 pM to 512 pM. Thirty of the spikes are isolated from a human cell line, four spikes are bacterial controls, and eight spikes are artificially engineered sequences believed to be unique in the human genome.

The data set is expected to be useful for the development and comparison of expression analysis methods. Distinct from the Human Genome U95 data set below, this data set includes many more spikes, a smaller concentration spike (0.125 pM), a larger background population, 18 micron features scanned using the GeneChip™ Scanner 3000, and some foreign and artificial clones expected to exhibit little, if any, specific cross-hybridization.

This data set requires a special, alternate chip description file (CDF), available below, containing information about the eight artificial clones. The exact spiked sequences are found in the Excel file describing the experimental design.

Microarray Suite (MAS) users can analyze this data by downloading the U133 Complete library file listed below.

The DAT files for this data set are not available.

Description

File name

Size

U133 Description

HG-U133A_tag_description.zip

19 KB

U133 CDF

HG-U133A_tag_CDF.zip

6.9 MB

U133 Data

HG-U133A_tag_Latin_Square.zip

141 MB

U133 Probe tabular

HG-U133A_tag_ProbeSequence.zip

4.2 MB

U133 Complete

HG-U133A_tag_libraryfiles.zip

14 MB

Human Genome U95 data set

The human data set consist of a series of genes spiked-in at known concentrations and arrayed in a Latin Square format. They represent a subset of the data used to develop and validate the Microarray Suite (MAS) 5.0 algorithm.

These data are provided for use in conjunction with data from other groups to establish a set of common or standardized data sets that can be used by the scientific community to develop and validate expression algorithms. For other available data sets, please see: http://www.stat.berkeley.edu/users/terry/zarray/Affy/affy_index.html

The Latin Square design for the human data set consists of 14 spiked-in gene groups in 14 experimental groups. The concentration of the 14 gene groups in the first experiment is 0, 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024 pM. Each subsequent experiment rotates the spike-in concentrations by one group; i.e. experiment 2 begins with 0.25 pM and ends at 0 pM, on up to experiment 14, which begins with 1024 pM and ends with 512 pM. Each experiment contains at least 3 replicates. Additional information can be obtained by examining the data files below.

The human data set contains 14 human genes in each of 14 experimental groups. Most groups contain 1 gene. Exceptions are group 1, which contains 2 genes, and group 12, which is empty. Specifically, transcript 407_at listed as present in group 12 is actually included in group 1 (together with 37777_at). Replicates within each group result in a total of 59 CEL files.

Certain probe pairs for transcripts 407_at and 36889_at have been found to perform poorly and should be excluded from the analysis.

The DAT files for this data set are not available.

Description

File name

Size

U95 Description

u95.xls

38 KB

U95 - Part 1

1521lt.zip

24 MB

U95 - Part 2

1521ak.zip

30 MB

U95 - Part 3

1532ak.zip

32 MB

U95 - Part 4

1532lt.zip

26 MB

U95 - Part 5

2353ak.zip

27 MB

U95 - Part 6

2353lt.zip

25 MB

U95 - Part 7

U95a.zip

7 MB

Total size

 

~171 MB