NetAffx Data Analysis Support—Getting Started
Find valuable information.
Optimize your experiments to get the best results. We’ve compiled a detailed knowledge base of the top tips and tricks to meet your research needs.
View the relevant questions below:
NetAffx™ Analysis Center
The NetAffx™ Analysis Center is the most comprehensive resource of integrated array contents and functional annotations available. The flexible query capabilities provided help you retrieve biological information for specific probe sets. Please read our Analysis Center tutorial for more information.
The NetAffx™ Analysis Center is available to both customers and non-customers at no cost. The only requirement for use is the completion of a short registration form.
The NetAffx™ Analysis Center is available to both customers and non-customers at no cost. The only requirement for use is the completion of a short registration form.
By registering, you receive unprecedented access to array content information, including probe sequences, and gene annotations. This information will enable you to derive the maximum value from your GeneChip™ array data.
The security of your valuable scientific data is very important to us. We neither store nor track any of the data you enter into our web site, including sequence information, probe set IDs, gene names, or identifiers. Also, we do not individually profile the external links you are using. All scientific data transferred to and from the NetAffx™ Analysis Center is secured by Secure Sockets Layer (SSL) level encryption. Information collected on forms in the Analysis Center-including name, address, and billing and shipping information-is also fully secured by SSL. This industry-standard Internet security protocol ensures that your data is fully protected while in transit, similar to that of Internet banking and standard credit card transactions.
NetAffx™ Analysis Center contains both public and in house-generated data. Unless otherwise noted, public data representations are updated once every quarter. Because we have considerable control over our proprietary internal data, we will keep you informed as new data and databanks become available.
Our Database Information Resource describes the functions and contents of the various databases used in the Analysis Center. Additionally, you can use the hyperlinks of the databases listed in the "top page" of the Analysis Center to obtain summaries of the databases.
Yes, you can. To do this, use the quick search and standard search methods within the Analysis Center. When performing this type of query, separate the values using the "|" symbol. Use the Batch Query capability to query a larger number of values. This option allows you to enter multiple accession numbers, gene names, or probe set IDs. For more information, please download the NetAffx™ Analysis Center User's Guide.
Batch Query is the query process that allows you to enter up to 500 query values at once. Batch Query helps you retrieve multiple annotations and query array contents using a large number of probe set IDs, gene names, or accession numbers.
The NetAffx™ Analysis Center supports most of our current catalog GeneChip™ arrays and several discontinued designs.
The NetAffx™ Analysis Center does not support spotted array users. The NetAffx Analysis Center is intended for users of GeneChip™ arrays. Because, our spotted array users are very important to us, we have generated many ideas for ways in which we could expand NetAffx Analysis Center to include spotted array support. If you have any suggestions for ways we could do this, or if you would to request a feature that is not currently on our site, please contact us.
Yes, you can. The Target sequence (the sequence from which probes are selected) information is integrated with the probe set records in the Affymetrix Array Target Sequences databanks. The Consensus/Exemplar sequences (Consensus sequences are derived from sub- clustering UniGene clusters, and Exemplar sequences refer to the longest member of an Affymetrix sub-cluster) can be obtained from the Affymetrix Array Consensus and Exemplar Sequences databanks. The probe sequences for the catalog GeneChip™ probe arrays are now available.
The GO consortium provides controlled vocabularies for the description of the molecular function, biological process, and cellular component of gene products. Currently, GO classifications are available for the Human (U133, U95, HuGeneFL, HC-G110), Murine (U74, Mu11k, Mu19k), Drosophila, Arabidopsis, and Yeast GeneChip™ arrays. The Analysis Center also provides hyperlinks to the AMIGO and QuickGO browsers, enabling you to view the complete hierarchy for all the GO terms associated with the probe sets.
Several BLAST search options are supported in the NetAffx™ Analysis Center. These options include the ability to turn on/off the Filter Sequence and Perform Gapped Alignment options, and the ability to select E-values from a drop-down menu. You can choose to BLAST your sequence against a particular target sequence database, or against all of the databases at once.
The Affymetrix sub-cluster sequences are the building blocks of probe selection. In other words, an Affymetrix sub-cluster is a group of sequences all representing the same transcript, denoted by the transcript ID field in a Target Databank record. The sub-cluster databanks, in the NetAffx™ Analysis Center, provide a list of all the accession numbers of the public sequences used for probe selection. Sub-cluster databanks may also be queried with accession numbers to identify your favorite genes on GeneChip™ arrays. Use the sub-cluster information to precisely identify the correlation of probes with your favorite EST or mRNA sequences in public databases. This analysis is also relevant in the interpretation of GeneChip™ expression results.
Obtain the accession number for your gene of interest from a public database - such as UniGene, GenBank, or dbEST-and query the corresponding sub-cluster databanks in the NetAffx™ Analysis Center. You may also perform a sequence alignment of your favorite gene with Affymetrix probe sequences, using the Probe Match tool.
You can now link directly to NetAffx™ probe set summaries from within your own applications or websites using the following URL:
To link to information for individual probe sets, use the following URL format:
To link to information for a list of probe sets, use the following URL format:
For details on the valid values of the ARRAYNAME and PROBESET parameters, please refer to the Direct Access To Probe Set Information Manual
Examples of Deep Links:
When citing the NetAffx Analysis Center, please refer to:
Liu G, Loraine AE, Shigeta R, Cline M, Cheng J, Valmeekam V, Sun S, Kulp D, Siani-Rose MA. NetAffx: Affymetrix probesets and annotations. Nucleic Acids Res. 2003;31(1):82-6.
The annotation and sequence files contain the complete entries for all probe sets on the array, taken from the NetAffx Analysis Center. They are intended to be used primarily in spreadsheet applications and database programs (such as SQL databases). Interactive and Batch queries can be performed in the NetAffx Analysis Center to find information for individual probe sets of interest.
Annotation files are available for most Affymetrix GeneChip™ Arrays. Please select your array of interest.
Manual, Alignments to Genome in PSL Format
Genome alignments are currently provided for Human/Mouse/Rat/Drosophila/C. elegans arrays. We align the target sequences against the genome sequence downloaded from the UCSC website using BLAT. While some of the target sequences do not align, perhaps due to the draft nature of several genomes, some targets align at multiple locations on the genome. We apply a filter to select the best hit for each target sequence. We use the following procedure:
- calculate a score for each alignment as follows:
score=matches - (mismatches+5*qbaseinsert)
where matches = number of bases that match (including both repeat and non-repeat regions)
mismatches = number of bases that do not match in the alignment
qbaseinsert = number of bases inserted in the query
It is therefore possible that some of the scores are negative.
The pcgood metric is provided on the web site and in the download files.
- Select the alignment with the best score.
- Derive genomic coordinates for the probes (25-mers) from the ""best"" target sequence alignment.
We use the genomic coordinates for each probe (25-mer) from above and search for transcripts (RefSeq and GenBank mRNA alignments to the genome from UCSC genome database) that overlap with the alignment of the probes. In the NetAffx summary report, we provide the transcript whose genomic alignment overlaps with the maximum number of probes from that particular probe set. We also provide the total number of probes from the probe set that overlap with the transcript as measure of the ability of the probe set to detect the corresponding transcript.
When scaling the data, you designate an arbitrary target signal and the Microarray Suite software scales the average intensity of all genes on each array, within a data set, to the target signal you specified. This process enables you to compare multiple arrays within a data set. We advise that you use the same target signal across all arrays being compared. Scaling can be performed independently of the comparison analysis. On the other hand, normalization can only be done when doing the comparison analysis in the Microarray Suite. In this case, the software compares an experimental array with a baseline array, and normalizes the average intensity of the experimental array to the average intensity of the baseline array during normalization. The normalization factor for a particular array changes when you change the comparison baseline array.
Scaling Factor is the multiplication factor applied to each Signal value on an array. A Scaling Factor of 1.0 indicates that the average array intensity is equal to the Target Intensity. Scaling Factors will vary across different samples and there are no set guidelines for any particular sample type. However, if they differ by too much within a set of experiments, approximately 3-fold or more, this indicates wide variation in the .dat files. Therefore, the analyzed data (in the .chp file) should be treated with caution.
Sample variability, which arises mainly from biological heterogeneity, is certainly higher than assay variability, and has been estimated to be at least 10-fold greater. We recommend that researchers run multiple samples per data point to account for sample-to-sample variability. In addition, carefully design the experiment in order to minimize potential variation associated with the samples.
Yes, probe sequences for all GeneChip™ probe arrays are available as an independent databank in the NetAffx™ Analysis Center. Query these databanks by probe set ID to search for probe sequences of interest. You may also "link" to this information from the Target or Sub-cluster databanks in the Analysis Center. The Download Center has compressed (*.zip) files of probe sequences to make it convenient for you to perform bulk downloads, such as all the probe sequences for a probe array of interest (for example, the HG-U133A array).
- Serial Order: The relative order of probe sequences as they align with the consensus/exemplar sequence.
- Probe Interrogation Position: The position of the 13th ("middle") nucleotide of the probe sequence as it aligns on the consensus/exemplar sequence.
- Probe X/Probe Y coordinates: The X and Y coordinates of the probe sequence on the GeneChip™ array.
- Target Strandedness: The sense/antisense orientation of the target sequence that can hybridize with the probe sequence.
To quickly determine matching probes on GeneChip™ arrays, paste or upload a text file with the DNA sequence corresponding to your gene of interest in the Probe Match tool. To learn more about the Probe Match Tool, review the Probe Match Tool User's Guide.
Yes, you can upload or paste a text file into the Probe Match tool with multiple sequences in FASTA format. Please note: The size of the uploaded file must not exceed 20 KB. (To determine the file size on a Windows-based system, right-click the text file icon, and select the "Properties" option).
The Probe Match tool does not accept protein sequences as input. However, you may reverse-translate the protein sequence into DNA, and then use it for analysis with the Probe Match tool.
Alignments by the Probe Match tool produce positive results only if every base in a probe sequence matches perfectly (that is, the aligned bases are identical) with those in the query (input) sequence without any gaps in the query sequence. This algorithm is different from the BLAST algorithm, because the BLAST algorithm allows mismatches and gaps within the query sequence to produce a positive alignment.
The probe interrogation position, provided with the probe sequence information, indicates the base position on the consensus/exemplar sequence where the central base of the probe aligns (the 13th base of a 25mer probe). The position information generated by the Probe Match tool is strictly based on the alignment against the query (input) sequence. Because the consensus/exemplar may not be co-linear with your input sequence, the probe interrogation position in a probe sequence record may not match the output from the Probe Match tool.
Yes, you can. To do this, use all the sequences corresponding to the splice variants of a gene as input for the Probe Match tool. Please note: Use caution when interpreting positive results, and when considering the total number of probes in a probe set that actually match the various isoforms. The total number of probes matching the query sequence is provided in the "# Probes Matching Query" column of the Probe Match output. Discard the results if the number of matching probes constitutes less than 70% of the total number of probes in a probe set.
The probe sequences on the anti-sense arrays are designed to hybridize with the anti-sense strand of the corresponding gene sequence. This means that all the probe sequences are in the sense orientation. Select the "Search With Reverse Complement of Query" option if you do not obtain any matches (or "hits"), with the query sequence or if you are not sure about the orientation of your query (input) sequence. Please note: Use caution when interpreting results from such a query to avoid selecting false positive "hits" to your gene of interest.
The probe sequence download file in the Download Center is a tab-delimited file containing the following columns:
|Probe Set Name: For example, 1007_s_at|
|Probe X: The X coordinate of the probe sequence on the GeneChip® probe array.|
|Probe Y: The Y coordinate of the probe sequence on the GeneChip probe array.|
|Probe Interrogation Position: The position of the 13th ("middle") nucleotide of the probe sequence as it aligns on the consensus/exemplar sequence.|
|Probe Sequence: The 25-base perfect match sequence.|
|Target Strandedness: The sense/antisense orientation of the target sequence that can hybridize with the probe sequence.|
Note: Probe Sequence databanks in the NetAffx™ Analysis Center have an additional column called, "Serial Order". This column provides the relative order of probe sequences as they align with the consensus/exemplar sequence.
The probes on the 3' IVT platform are in the sense orientation.
The probe interrogation position indicates the base position on the consensus/exemplar sequence where the central base of the probe aligns, which is the 13th base of a 25mer probe.
The probe set names never change, but they can give you an idea of what was known about the sequence at the time of design. _at = all the probes hit one known transcript. _a = all probes in the set hit alternate transcripts from the same gene _s = all probes in the set hit transcripts from different genes _x = some probes hit transcripts from different genes.
The .dat files are approximately 750 MB in size, and the binary .cel files are approximately 60 MB in size.
The standard Affymetrix expression control sets (i.e., bacterial spikes) including both the hybridization control spikes and poly-A RNA control spikes are present on the Human Exon 1.0 ST Array and can be used to assess quality in a qualitative manner as used previously. As a new feature unique for the Exon Array, the Expression Console software supports the extraction of residuals from the PLIER model when running a multiple-array analysis. Such residuals can be used to identify outlier arrays and poor performing probes. In addition, for most of the 100 normalization control genes used in the design of the GeneChip™ Human Genome U133 Array, probe sets representing both their intron and exon regions have been tiled on the Exon Array. The exon and intron probe set metrics (i.e., % detection p-value <= 0.05, mean/median signal) can be used as an additional method to identify problematic outlier arrays or analysis problems. For these new experimental analysis metrics, there are no simple standard numeric threshold values that we can recommend at this moment as cutoffs for identifying good-performing or poor-performing arrays. However, it has been found that they are valuable when comparing relative values across a set of experiments for identifying outlier arrays. See the Quality Assessment of Exon Arrays white paper for more information.
No. The new stain-dispensing script was designed to increase productivity and reduce the chance of error during pipetting of the stains. However, some may feel more comfortable dispensing stains by hand. For manual and automated stain-dispensing procedures, please consult the latest version of the GeneChip™ Expression Analysis Technical Manual for HT Array Plates Using the GeneChip™ Array Station (login required).
GeneChip™ Command Console (AGCC) is provided to control the fluidics station, 7G Scanner and generate .dat and .cel files. Probe level analysis of the .cel files is carried out by Expression Console. This software allows for the generation of signal estimates and detection p-values at the probe set level for either exon-level or gene-level analysis. Expression Console Software is freely downloadable from our web site and supported by the Technical Support team. For additional higher levels of analysis, such as alternative splicing detection, a few experimental algorithms are published as methods in respective white papers. Users will need to implement these methods in advanced statistical analysis software packages. These methods have been developed based on experience with limited sample data sets, and further fine-tuning may be required depending on the user's unique biological systems. It is anticipated that new methods will continue to emerge to better support Exon Array analysis in the near future.
Normalization of .cel files takes 1-3 minutes and generation of DABG and PLIER probe set level summaries takes about 40 minutes per .cel file. Further downstream analysis will depend on which data analysis techniques are being applied.
Two primary factors can affect the length of analysis time: 1) the number of arrays analyzed at one time, and 2) the number of probe sets to be included in the analysis. Reducing the size of either one of these two factors will increase the analysis speed.
DABG stands for "detection above background" and is a detection metric generated by comparing Perfect Match probes to a distribution of background probes. This comparison yields a p-value which is then combined into a probe set level p-value using the Fischer equation. PLIER stands for "Probe Logarithmic Intensity Error" and is a model-based signal estimator which benefits from multi-array analysis. For more information on DABG, see the "Exon Array Background Correction" white paper; for more information on PLIER, refer to the PLIER Technical Note.
The current exon array analysis software allows for the aggregation of multiple exon level probe sets into a larger "meta probe set". PLIER signal estimates and DABG detection values are then computed for these meta probe sets. The definition and grouping of the exons into a gene can have a significant impact on the final signal value of a particular gene. Affymetrix recommends using the "core gene" grouping or the "full gene" grouping files to derive the gene-level signal that should most resemble the expression of the constitutive exons. See the "Gene Signal Estimates from Exon Arrays" white paper for more information.
Associations between SNPs and exon probe sets can be obtained by using genome assembly position information which is provided for both the mapping array and the exon array. One useful tool for doing this is the UCSC Table Browser.
The most likely explanation is that a different version of the genome assembly has been used to display the array design information and the array results. At launch, two versions of the library files are provided for array analysis corresponding to the Human Genome Build 34 and 35. Take care to use a consistent version number to match the array design with the actual array data for visualization in IGB.
GCOS users must use DTT v1.1, using the Flat File option, to transfer files to be analyzed by the Expression Console software from the GCOS database to an independent folder.
No, the methods in the white papers are not ANOSVA which was published in the paper referenced. One note about ANOSVA is that it was developed on a combination exon/junction array. Some preliminary work comparing ANOSVA and PAC (one of the two methods described in the white papers) on the exon array tissue panel suggests that the MiDAS and the robust PAC methods presented in the white papers are a better way to go.
One way to do this is to open an IGB window displaying the probe sets of interest, and then right-click on the probe set and select the Get Info menu item which will open up the corresponding NetAffx™ Analysis Center page, revealing the annotations associated with that probe set.
Yes, but not easily. You need to get the IGB jar file which can be downloaded from the IGB page . You will probably also want to grab all the data in the quickload folder from: http://netaffxdas.affymetrix.com/quickload_data/. Put the data in this folder on the local computer. Start up IGB using Java and the IGB jar file. Change the URL for the quickload folder (on QuickLoad tab, select Quickload Options) and point it to the local folder with the quickload data. Please note that we will provide limited technical support for this local IGB workflow. Additional features in IGB, such as the ""Get Info"" annotation retrieval function, require an active network connection to our web site. Users may not be aware of significant feature improvements and enhancements to IGB and updated sequence information and annotations when a local version of IGB is implemented. Local deployment should be used with discretion.
Signal and detection values from the exon arrays cannot be directly compared to that of the HG-U133 arrays. Major differences in array design and assay prevent meaningful comparisons at the signal level. Splice variation and polyadenylation variation can confound comparisons at the biological level (i.e., direction of change) due to differences in probe placement and bias in the target preparation assays.
For Research Use Only. Not for use in diagnostic procedures.