All Affymetrix probes were re-grouped into unique Entrez gene IDs using custom library file downloaded from BRAINARRAY database. The raw data in .CEL files were normalized and summarized by RMA (Robust Multichip Averaging) method to generate an N by M matrix, where N is the number of unique Entrez genes and M is the number of samples. The normalized data were log2-transformed to get final measurements mostly ranging between 1 and 16, so every increase or decrease of the measurements by 1.0 corresponds to a 2-fold difference. All data processing steps were performed within the R environment. The following customized code can be applied to any types of Affymetrix platforms (3'IVT, Exon, Gene, etc.) as long as the raw data were stored in CEL format and BRAINARRAY provides the custom library file in CDF format.
# Install and load the library
library(devtools);
install_github("zhezhangsh/rchive");
library(rchive);
# Download a GEO data series
dir.create(local_folder); # Create a local folder to save downloaded files
ParseGSE(GSE_ID, local_folder); # Download the data series with metadata tables and supplemental files into a local folder
# Load and process
raw <- LoadAffyCel(fn.cel); # Load the CEL files into R
raw@cdfName <- InstallBrainarray(raw@cdfName); # Install the BRAINARRAY custom library and rename the library name
expr <- exprs(rma(raw)); # Load and process the data to generate a N by M matrix
# Optionally, rename gene and sample IDs
expr<-expr[grep('_at$', rownames(expr)), , drop=FALSE];
rownames(expr)<-sub('_at$', '', rownames(expr));
cnm<-sub('.CEL.gz$', '', sampleNames(raw));
cnm<-sapply(strsplit(cnm, '/'), function(x) x[length(x)]);
colnames(expr)<-cnm;