There are many types of cancer that can affect humans; they can arise in many different organs and from many different cell types within these organs. GIST is a relatively rare type of malignancy that arises from cells within the muscle wall of the digestive tract. Despite the wide variety of tumor types that exist in human pathology, what they all have in common is that once a tumor has originated somewhere within the body, it can start growing in a destructive manner. A major discriminating feature of malignant cells compared to benign cells is that they can grow in an unrestricted manner and can form metastases throughout the body. One of the ways to study cancer is by looking at the manner in which malignant cells differ from benign cells in the proteins that they express.
To take a step back, the human body consists of cells where the genetic information resides in the nucleus in the form of DNA. This DNA encodes for about twenty-thousand different genes. Each gene can be “transcribed” into a unique mRNA that in its turn is “translated” into the specific protein for which that gene carries the genetic code. The proteins are the actual building blocks of the cells and determine to a great extent the behavior of the cells, including the malignant behavior seen in cancer cells. Here I would like to describe some of the new technologies that have been developed in the past years that will allow for a much more detailed analysis of the repertoire of proteins that are expressed by GIST cells.
In the past the expression levels of proteins encoded by individual genes had to be examined one gene at a time using laborious techniques. In the April 2007 issue of this newsletter, I described how a new technique (developed in 1995 by Pat Brown at Stanford University) allowed us to use high density “microarrays” to examine the differences in mRNA for all human proteins in a single experiment. There is a good (though not absolute) correlation between mRNA levels and protein levels in tissue and studying the mRNA is technically much simpler and quantitative than looking directly at large numbers of proteins. The gene microarray technique was a tremendous advantage over the previous techniques that were available. In a single overnight experiment one could look at the mRNA levels for essentially all human genes. The analysis of these levels is called “gene expression profiling”.
Recent developments in gene sequencing techniques now allow for a different approach to the quantification of mRNA for all proteins by microarrays. Rather than relying on the specific hybridization of mRNA sequences (that first have been reverse transcribed into cDNA) to probes on microarrays, scientists can now sequence all cDNAs that are derived from all mRNA species in a cell. In this so-called “Ultra High Throughput Sequencing (UHTS)”, a several hundred -fold increase in the number of base pairs that can be sequenced with conventional sequencing techniques is obtained. A base pair consists of two nucleotides on opposite complementary DNA strands that are connected via hydrogen bonds. The approach to determine mRNA levels by cDNA sequencing rather than hybridization is referred to as “RNA-seq”. The number of times a certain cDNA (that is unique for a particular protein) is found to be sequenced during this procedure is a direct indication of the number of mRNA molecules for this protein that were present in the sample analyzed.
There are several approaches to UHTS that can be used for RNA-seq and these techniques can be used in a complementary fashion. The sequencing technique by Illumina, Inc (www.solexa.com), can analyze millions of cDNA fragments per run, but generates relatively short fragments of sequence (about 40 base pairs per sequence). Another technique, the 454 system by Roche (www.454.com) yields fewer sequences per experiment but can produce longer sequences of about 400 base pairs. As a comparison, the current most commonly used Sanger method can sequence lengths of DNA up to 800 base pairs long but can perform far fewer runs per experiment. To be more specific, the technique from Ilumina can do 100,000 times as many runs in a single experiment. Thus, while the fragments of DNA that can be sequenced in the UHTS approach are shorter, this is overcome many times over by the massive increase in the number of DNA fragments that can be sequenced.
An interesting aspect of the Illumina approach is that, unlike the Sanger method, it does not require long stretches of intact mRNA (and the cDNA that we generate from that) as the start material to determine the base pair sequence. This is important because the field of research in GIST (and most other tumors) is still frustrated by a lack of availability of fresh frozen tumor samples, which are needed to generate long stretches of mRNA. While large tumor resections often have a sufficient amount of material to allow freezing this often is not done in routine fashion and small sample needle core biopsies (such as those performed to diagnose a recurrence) are often submitted entirely for paraffin embedding. Essentially all surgery specimens are sent to a pathology department where the tissue is fixed in formalin and embedded in paraffin so that thin tissue sections can be obtained that can be examined under the microscope. Thus we hope to apply the RNAseq technique not only to newly diagnosed GIST tumors but also to tumors that recur during imatinib or other therapies. In addition we hope to be able to analyze samples for which no frozen tissue is available from rare subsets of GIST tumors such as pediatric GIST, wild-type GIST, etc.
In preliminary experiments to assess the ability of RNA-seq to perform reliably on archival formalin fixed paraffin embedded (FFPE) tissue, we performed RNA-seq on 5 matched fresh frozen and FFPE samples. Additionally, we performed gene expression profiling with microarrays on 3 of the same matched fresh frozen and FFPE samples. We then compared the performance of RNA-seq vs. microarray for reliably quantifying gene expression on archival FFPE tissue using the correlation of the fresh frozen and FFPE tissue measurements as a metric of reliability. There is significantly higher correlation of the gene expression measurements from the matched fresh frozen and FFPE samples using RNA-seq compared with themicroarrays. These preliminary data suggest that RNA-seq is a more robust platform for quantifying gene expression from archival FFPE tissue than the gene microarrays. Such strong correlation is of critical importance because it shows that archival specimens retain the characteristics of the original state even though they have been stored for extended periods of time, up to several years. Moreover, this technique allows us to perform exploratory experiments on archival material which gives us access to many more specimens and a greater variety of specimens than fresh frozen tissue banks.