Find valuable information.

Optimize your experiments to get the best results. We’ve compiled a detailed knowledge base of the top tips and tricks to meet your research needs.

View the relevant questions below:

Having problems with your experiment?  Visit our

Troubleshooting page

CytoScan Cytogenetics Suite

CytoScan™ Cytogenetics Suite consists of CytoScan™ arrays and CytoScan™ Reagent Kit, GeneChip™ Command Console™ Software (AGCC), and Chromosome Analysis Suite (ChAS). It enables the performance of high-resolution, genome-wide DNA copy number analysis and also provides genotyping information for the detection of copy neutral loss/absence of heterozygosity (LOH/AOH), which can be used to detect uniparental isodisomy (UPD). The combination of high-resolution DNA copy number data and the ability to detect gains, losses, and UPDs on a single array makes CytoScan™ Cytogenetics Suite ideal for cytogenetics studies.

The Reference Model file in CytoScan™ Cytogenetics Suite includes 380 samples, which were run as part of a larger set of microarrays by nine operators. These operators processed ~48 unique samples in two rounds each, with random placement of sample DNAs across the PCR plates and with random use of instruments and reagents. The source DNA includes the following samples:

284 HapMap samples including at least one replicate of each of 270 HapMap samples: 90 from each of the Yoruban, Asian, and Caucasian ethnic groups, from cell line-derived DNAs from the Coriell Institute of Medical Research 96 DNA samples from blood of phenotypically healthy male and female individuals obtained from BioServe Biotechnologies.

Chromosome Analysis Suite

CytoScan™ array CEL files are processed and analyzed in ChAS 3.1 software. This is a free download from our website. For processing the CEL files, a 64-bit computer with a minimum of 8 GB of RAM is required. You can view your resulting CYCHP files on a 32-bit (Chromosome Analysis Suite 2.1 [ChAS 2.1] or previous) or 64-bit computer. For 32-bit computers, it is important to note that the minimum amount of RAM needed is 3 GB. The recommended system specifications are included in the Chromosome Analysis Suite User Guide.

The CEL file is ~66 MB, and the CYCHP file is ~120 MB.

MAPD is a per-microarray estimate of variability, like standard deviation (SD) or interquartile range (IQR). It measures the variability in log2 ratios by looking at the pair difference of all probes and taking a median value. The effect of an occasional big difference in log2 ratios between probes is removed by taking a median value and not a mean. This variability can come from different sources: intrinsic variability in the starting material, hybridization cocktail preparation, microarray, or scanner. Apparent variability induced by the fact that the reference may have systematic differences from the sample on this microarray Regardless of the source of variability, increased variability decreases the quality of the CN calls. A high MAPD can be attributed to any of the above factors and indicates that CN calls may be inaccurate, leading to a higher false positive/negative rate.

SNPQC is a measure of how well genotype alleles are resolved in the microarray data. In other words, it estimates the distributions of homozygous AA, heterozygous AB, and homozygous BB alleles and calculates the distance between them. The better the separation of these distributions, the better the ability to identify a genotype based on its cluster position. A low SNPQC value indicates that the quality of the SNP allele data is compromised due to higher noise within the array, which compromises the overall quality and clarity of results.

Waviness-SD is a global measure of variation of microarray probes that is insensitive to short-range variation and focuses on long-range variation. Based on an empirical testing dataset, we have determined that array data with Waviness-SD >0.12 has either sample or processing batch effects that will reduce the quality of the copy number calls. Elevated Waviness-SD is not always an indication of too much noise. Elevated waviness with good MAPD and SNPQC metrics can occur in samples with many copy number changes. Therefore, it is advised to check the data when observing elevated waviness with good MAPD and SNPQC. The Waviness-SD metric is applicable to constitutional blood and cell line data. The Waviness-SD metric is not intended for alternative sample types such as cancer samples in which the results may vary as a result of the biological complexity. For these sample types, it is recommended to use the ndwavinessSD.

CytoScan™ HD and 750K arrays use the following QC metrics:
SNPQC ≥15
MAPD ≤0.25
Waviness-SD ≤0.12

CytoScan™ Optima Array uses the following QC metrics:
SNPQC ≥8.5
MAPD ≤0.29
Waviness-SD ≤0.12

For CytoScan™ HD Suite and CytoScan™ 750K Suite, QC metrics have been fine-tuned for blood- derived constitutional samples. The SNPQC and Waviness-SD metrics are based on an assumption of a relatively normal diploid genome for which the majority of the genome is not mosaic. For cancer- related samples, the baseline assumption pertaining to constitutional samples is violated with regard to aberration frequency and high levels of mosaicism, which will likely trigger the SNPQC and Waviness-SD metrics to fail. Only the MAPD metric should be considered for non-constitutional samples. A failure of any one of these metrics for constitutional blood samples is a failure for that array result. There is no direct correlation between the absolute passing numeric value for any one of the metrics and the quality of a sample.

For CytoScan™ Optima Suite, QC metrics have been fine-tuned for amniocytes, chrionic villi, cultured cells, POC, and blood samples. The SNPQC and Waviness-SD are based on an assumption of a relatively normal diploid genome.

The CytoScan™ HD and 750K reference model files are comprised of the same set of samples and include 380 microarrays that were run as part of a larger set of microarrays by nine operators processing ~48 unique samples in two rounds each, with randomization of the placement of DNA samples across the PCR plates and randomization of the reagents and instruments used. The source DNA includes 284 HapMap samples, including at least one replicate of each of 270 HapMap samples: 90 from each of the Yoruban, Asian, and Caucasian ethnic groups from cell line-derived DNA from the Coriell Institute of Medical Research. 96 DNA samples from blood of phenotypically healthy male and female individuals obtained from BioServe Biotechnologies. The samples in this set were chosen to have been run by different operators and with different kits and reagents while still covering all the HapMap cell line ethnic groups, plus the normal blood samples of both sexes.

The normal diploid analysis for a CytoScan™ HD and 750K array is recommended for cancer samples in which >50% of the genome is likely to be rearranged. This analysis automatically determines the normal diploid regions and normalizes the rest of the samples based on those regions, resulting in properly centered data. The normal diploid analysis is NOT recommended for CytoScan™ Optima Array.

For samples run through the normal single sample analysis, the following QC metrics are recommended:
MAPD <0.25
SNPQC OR ndSNPQC ≥15
Waviness-SD OR ndwavinessSD <0.12

ChAS 3.1 includes a table with the following calls made for each mutation: “high confidence,”, “lower confidence,” and “undetected.” The thresholds for the “high-confidence” calls were established based on separations between the normal reference and mutant calls that resulted in 95% sensitivity and 99% specificity (for lower confidence) and 95% sensitivity and 99.9% specificity (for high confidence) in our spike in experiments. The software provides for visualization of mutant probes versus reference probes representing the wild type to enable users to visually assess this separation for their sample. There is no overlay with copy number data (other than through table summaries).

Celpaircheckstatus checks if the results from the AT channel are consistent with the results from the GC channel and represent the results of the respective channel. This metric will be “out of bounds,” for example, when the AT and GC channels were mispaired between samples or due to sample contamination. The user is recommended to troubleshoot as described in the OncoScan™ Console™ User Manual.

The algorithm identifies normal diploid markers in the cancer samples. This is particularly important in highly aberrant samples. The normal diploid markers are used to calibrate the signals so that a log2 ratio of 0 (e.g., copy number 2) is achieved. In about 2% of samples, the algorithm cannot identify a sufficient number of “normal diploid” markers, and no normal diploid calibration occurs. This event triggers “low diploid flag = YES.” In this case, the user needs to carefully examine the log2 ratios and verify that re-centering is necessary.

OncoScanGeneBoundaries.r1 lists the ~900 OncoScan cancer genes and includes +10 kb on each side of the gene. OncoScanCancerGeneOnly.r1 lists the same ~900 OncoScan cancer genes using the start and stop positions of the gene (no additional +10 kb on each side of the gene).

A total of six files are produced that are key to the process:
ARR file—this file includes sample information.
AUDIT file—this file is a log of the sample history.
DAT file—this file is the raw data from the scanner.
CEL file—this file is the gridded and processed data.
xxCHP file—this file is the output of ChAS 3.1 and contains all of the analysis data.
CHPCAR file—this file stores user-annotated calls, interpretations, and modifications made to CHP file segment data (ChAS v2.0 and higher).

Out of the six data files mentioned in the preceding question, it is important to back up and archive the ARR, CEL, CYCHP, and CHPCAR files at a minimum. This will allow you to maintain the ability to either reanalyze from the CEL file or re-visualize the results using the CYCHP/CHPCAR files.

A 64-bit system is required to generate and view CytoScan™ CYCHP data files. The recommended system requirements are Windows™ 7 Professional SP1 and Windows™ 8.1. ChAS 3.1 requires Affymetrix’ GeneChip™ Command Console™ Software (AGCC) 3.2.4 or higher to produce CytoScan™ CEL files. For more details on the system and hardware requirements, please refer to ChAS 3.1 User Guide or the minimum software and hardware requirements documents on our web page.

Due to the amount of memory that ChAS 3.1 needs to operate, Affymetrix™ VERY STRONGLY recommends that you DO NOT install the ChAS 3.1 on production AGCC computers being used for scanning and operating fluidics systems.

NA33 (Hg19) is only compatible in ChAS 3.1. ChAS 3.1 automatically prevents you from selecting an incompatible NetAffx™ analysis file version for analysis or when viewing analysis results.

Yes. With ChAS 3.1, you can now export table data in word (docx), pdf, text, or transfer to clipboard.

You can export graphic view data in pdf, docx, and png formats.

For a whole chromosome or a region, go to the Graphs tab in ChAS 3.1 and use the “copy to clipboard” or “export to txt” functions. For the whole genome, using ChAS 2.1 or later versions, go to the analysis manager/analysis dashboard and select the “QC results” tab. Then open and/or select the file(s) to export, press the “Generate Report” button, and select the “Export probe level data function.” Data will be exported in .txt format.

We recommend changing the weighted log2 ratio graph from the default setting to minimum at -1.5 to maximum at 1.5, using data type as points. We recommend changing the allele peaks graph from the default setting to minimum at -2 to maximum at 2, using data type as points. The other graphs can stay at the default values.

AA markers would be distributed about +1.

AB markers would be distributed about 0.

BB markers would be distributed about -1.

Yes, you can use the CAGdb database, ICCG, DECIPHER, ClinVar, among others for additional public databases that may assist in data analysis.

Mosaicism analysis is currently only available for CytoScan™ array files. However, xxCHP files for the other array types contain the SmoothSignal data type, which displays non-integer copy number changes.

You must be logged into the ChAS database (ChAS DB) to view histogram data. The histograms are only available for NetAffx™ genomic annotation files for genome build Hg19. The browser produces an error message if you try to load Hg19-based histograms while an Hg18-based NetAffxGenome Annotation is currently displayed.

A segment can be queried against the ChAS 3.1 database for other intersecting segments from previously published samples. Using both an overlap threshold and a coverage threshold, one can focus the query results to segments that are of relatively the same size as the segment in the current sample.

The ChAS 3.1 server home page requires an active internet connection, which requires a web browser. (Chrome and Internet Explorer v11 are recommended.) If you are using the local ChAS DB, an active internet connection is not required.

If an xxCHP file has been previously published to the database, you will receive a warning indicating this sample already exists in the database. You can choose to overwrite the existing information or cancel to keep the existing information. It is important to back up and archive the ARR, CEL, CYCHP, and CHPCAR files at a minimum. This will allow you to maintain the ability to either reanalyze from the CEL file or re-visualize the results using the CYCHP/CHPCAR files.

During the installation of ChAS 3.1 on a system with ChAS 3.0, the database will automatically be updated to the current ChAS 3.1 version. Older backup database files can still be updated to ChAS 3.1 during the restore process. Please see the ChAS 3.1 User Guide for instructions on how to restore a backup file.

No, the ChAS 3.0 database will need to be updated to ChAS 3.1.

No, ChAS 3.1 has not been validated on Windows™ 10.

We recommend that you change the weighted log2 ratio graph from the default to a minimum at –1.5 to a maximum at 1.5, using data type as points. We also recommend that you change the allele peaks graph from the default to a minimum at –2 to a maximum at 2, using data type as points. The other graphs can stay with the default values.

The library file location for ChAS 2.1 and above is C:\Affymetrix\ChAS\Library.

The algorithm is designed to detect only mosaicism between ~30–70% for CNs between 1 and 3 for regions on the order of 5,000 markers in size or larger. The endpoint location of mosaic segments is less precise than the CN segmentation, with endpoint variation within 500 markers being typical for segments of 5,000 markers or larger. Some regions of full integer CN 1 or 3 below 5,000 markers in length may be incorrectly called as mosaic segments. Some regions of CN below 1 or above 3, mosaic or otherwise, less than 5,000 markers in length may be incorrectly called as mosaic segments.

A total of six files are produced that are key to the process:

  • ARR file—this file includes sample information.
  • AUDIT file—this file is a log of the sample history.
  • DAT file—this file is the raw data from the scanner.
  • CEL file—this file is the gridded and processed data.
  • CYCHP file—this file is the output of ChAS and contains all of the analysis data.
  • CHPCAR file—this file stores user-annotated calls, interpretations, and modifications made to CHP file segment data (ChAS 2.0 and higher).

Expression Console software

To see the NetAffx™ information associated with a particular probe set, view the probe level summaries under the REPORT menu item and double click on the probe ID of interest.

The SST-RMA analysis algorithm incorporates pre-processing steps to the CEL files before normalization and summarization with RMA. No changes have been made to RMA. RMA was the default analysis flow through Expression Console™ Software, version 1.3 for human and mouse transcriptome arrays. Beginning with Expression Console, version 1.4, the SST-RMA algorithm is the default algorithm for human and mouse transcriptome arrays.

Historically, microarrays were perceived to underestimate fold change values when compared to other methods such as RT-PCR. For customers who are filtering using fold change cutoff, the new algorithm addresses this “fold change compression” by applying a GC correction and also by transforming the microarray data signal to a similar signal space of other methods.

When filtering by fold change cutoffs, expect a larger number of differentially expressed genes compared to the standalone RMA. When comparing the number of differentially expressed genes (defined by fold change filters) to other methods, such as RT-PCR and RNA-Seq, expect this number to align more with these methods when using GCCN-SST. There is no impact on sensitivity or specificity of data. SST-RMA was designed to address comparability to other technologies. The same experimental design recommendations still apply when designing an expression study with microarrays.

With SST-RMA, more significant changes are observed in the number of expressed genes with alternatively spliced events compared to RMA.

The Relative Log Expression (RLE) boxplots are the only controls that might look different in Expression Console™ Software.

No, the SST-RMA analysis algorithm will not be available in Expression Console™ Software for legacy array designs, due to vast amounts of publically available data. Customers who wish to take advantage of this analysis method for other array types should use SST-RMA with Affymetrix™ Power Tools (APT) software package.

Yes, the SST-RMA analysis algorithm will be available in APT.

When should data analyzed with RMA be re-analyzed using SST-RMA?

Positive controls are probe sets designed against putative exons of about 100 housekeeping genes shown to be expressed at detectable levels across a variety of tissues. Since the extent of alternative splicing and transcript expression is not known for all tissues, not all exons are expected to be expressed in all tissues.

Negative controls are putative introns of the same 100 housekeeping genes chosen for positive controls. These probe sets may be expressed in certain tissues through intron retention. They are not true negative controls. Overall, the positive and negative control probe sets provide a medium-size dataset with expected high and low signal values, respectively. This data set is useful to estimate overall data quality though the Pos_vs_neg_auc value.

Genotyping Console software

GTC is available to genotype the following arrays:

  • Genome Wide SNP 5
  • Genome Wide SNP 6
  • Mapping 500K
  • Mapping 100K
  • Mouse Diversity array

Copy number and LOH analysis can be done on the:

  • Genome Wide SNP 6
  • Mapping 500K
  • Mapping 100K

For Genotyping, in general more samples are better for SNP calling; 44 unique samples per batch is recommended.

For Copy Number, there is no minimum sample requirement since the analysis is a single analysis meaning they are analyzed one by one.

There are 4 options available to choose from when analyzing:

  1. Fixed Genotype Boundaries - Version 2
  2. Dynamic Genotype Boundaries - Version 2
  3. Fixed Genotype Boundaries
  4. Dynamic Genotype Boundaries

Option 1 - Fixed Genotype Boundaries - Version 2 should be used. Version 2 of the Fixed Genotype Boundaries method differs from the original version in that a larger set of samples was used to inform the boundaries.

Option 2 - Dynamic Genotype Boundaries - Version 2 processing may be useful if using low-quality input DNA, if deviating from the assay protocol, or if reagents have changed.

The Fixed Genotype Boundaries and Dynamic Genotype Boundaries (options 3 and 4) are version 1 algorithms and are considered legacy. They would likely only be used by someone running a large project that started on these versions of the algorithms so would continue to use the same algorithm for consistency in the dataset.

GeneChip™ Command Console software

The Library File Importer is installed during the Data Exchange Console™ (DEC) installation.

  • Media file—contains information about the media type on which the array is printed, such as a cartridge or 96-well plate.
  • Master file—contains array-specific design information and replaces the cif file. It also identifies the analysis library file packages and scan parameter files.
  • Workflow file—lists the actions to be performed on each probe array type, such as gridding and CEL file generation. The file also includes references to the appropriate algorithm parameter file.
  • Algorithm parameter file—contains the parameters for an algorithm.

It is the process of associating a sample record to a physical array or set of physical arrays. With this feature, users can link a sample to a single array, a sample to an array set (such as the GeneChip™ HG-U133A and B Arrays, or the GeneChip™ Human Mapping 500K Array™ Set), or a sample to an array type in replicate (one sample run on multiple arrays of the same type).

Yes. Please see the demonstration in the training and tutorials section of the website for additional information.

The fluidics scripts are required in order to run the Fluidics Station. Please see the demonstration in the training and tutorials section of the website for additional information.

The use of the scanner requires a number of updated library files. Library files are installed in Command Console™ Software using the Library File Converter Tool. Please see demo in the training and tutorials section of the website.

DAT and CEL data are analyzed in the Command Console™ Viewer. By default, Command Console™ Software automatically grids the DAT file and creates the CEL data. Please see demo in the training and tutorials section of the website.

No. DAT files are available for viewing after the scan has completed.

This functionality is supplied through the Command Console™ Viewer. Please see demo in the training and tutorials section of the website.

Array registration is not required to process the physical array through the Fluidics Station and scanner.

Allows for filtering, sorting, and searching on user-defined attributes.

Allows users to define specific attributes associated with the physical arrays.

Allows managers to require the input of specific attributes to be associated with physical arrays.

Allows the saving of data to user-defined project folders as opposed to the system default folder.

Command Console™ Software itself does not create CHP files. Instead, we offer software tools specific to the various array applications: Please visit the software index page or product listing for information on available CHP writing software.

Command Console™ CEL data will first have to be converted through Data Exchange Console™ into GCOS format. Afterwards, the data will be imported into GCOS and processed through the GCOS publishing feature.

Yes, Expression Console™ Software can produce report files similar to those that were generated in GCOS.

Please see demo in the training and tutorials section of the website.

This is optional. Please see the previous question regarding the benefits of sample/array registration and the training and tutorials section of the website on how to run arrays through fluidics and scanning on different computer workstations.

DAT and CEL files can be opened and viewed in the Command Console™ image viewer. CHP files can be opened and viewed in Expression Console™ Software. Please see the demonstration in the training and tutorials section of the website for instructions on how to use the Data Exchange Console, which moves data between GCOS to Command Console™ Software.

No. Command Console™ Software is file based and data files are accessed directly through the file system. Data is simply transferred or copied as required. If users wish to edit sample and array attributes, view or adjust image grids, generate CEL files, or use the data search capabilities, they will need to install Command Console™ Software.

Click the download link on the Command Console™ Software download page, extract the contents of the downloaded zip file, and follow the installation instructions included.

Please see demo in the training and tutorials section of the website for the matrix on installation needs.

Axiom™ Microbial Detection Analysis Software (MiDAS)

Families are represented within a Superkingdom in alphabetical order, where the last tile represents targets with Unassigned Families (U).

Archaea Bacteria Eukaryota

Fungi Eukaryota

Protazoa Viroids Viruses

A minimum of 16 GB RAM and 30 GB available disk space is required to perform analysis of Axiom™ Microbiome .cel files.

The taxonomic IDs are pulled from the long annotations for the target sequences in HybDB.

Affymetrix™ Transcriptome Analysis Console software (TAC)

Fold change is a number describing how much the signal changes from an initial condition group to a final condition group. These changes are represented in linear space. There are a couple of ways to describe fold change. One way a user might calculate this is to simply divide Sample A by Sample B and asses the result. For example: Array 1: Gene X – 1000 Gene Y – 5000 Gene Z – 500 Array 2: Gene X – 10000 Gene Y – 1000 Gene Z – 500 If comparing Array 1 vs Array 2: Gene X: 1000 / 10000 = 0.1 Gene Y: 5000 / 1000 = 5 Gene Z: 500 / 500 = 1 The way TAC displays fold change is to use the straight fold change value if it is greater than or equal to 1.0 or display (-(1/fold change)) for values between 0 and 1. Let’s look at the example above in TAC format: Again, comparing Array 1 vs Array 2: Gene X: 1000 / 10000 = 0.1, so (-(1/0.1)) = -10 Gene Y: 5000 / 1000 = 5 Gene Z: 500 / 500 = 1 If doing the comparison the other way (Array 2 vs Array 1): Gene X: 10000 / 1000 = 10 Gene Y: 1000 / 5000 = 0.2 (or in TAC, (-(1/0.5)) = -5 Gene Z: 500 / 500 = 1

The Splicing Index algorithm is a way to measure of how much exon specific expression differs between two conditions after excluding gene level influences. The algorithm first normalizes the exon and junction expression values by the level of gene expression and creates a ratio of normalized signal estimates from one condition relative to another.

The algorithm is simply: Splicing Index = (exon level intensity from condition 1/gene level intensity from condition 1) / ((exon level intensity from condition 2/gene level intensity from condition 2).

However, two key criteria must be met to perform a Splicing Index calculation: Criteria 1: A Transcript Cluster gene must be expressed in both conditions. Also, for each condition, you need to determine whether a gene is expressed or not. Criteria 2: A PSR (Probe Selection Region) or Junction can only be analyzed by Splicing Index if it expresses in at least one condition For more information, please see the following white paper.

Axiom Analysis Suite

The file sizes listed below are per array:

  • Axiom 96 DAT: 600MB
  • Axiom 96 CEL: 28MB
  • Axiom 96 ARR: 6kb
  • Axiom 96 JPG: 16MB
  • Axiom 96 AUDIT: 25kb
  • Axiom 384 DAT: 66 MB
  • Axiom 384 CEL: 3 MB
  • Axiom 384 ARR: 6 kb
  • Axiom 384 JPG: 2MB
  • Axiom 384 AUDIT: 26kb

It is not recommended or supported to genotype Axiom 24 format arrays in the same batch as Axiom 96 arrays, even if they are the same array type.