Structure of RNA splice variants
In addition to the cDNA sequence encoding the 1132 amino acid telomerase (referred to as “reference” or “wild-type” telomerase), cDNAs were found that encoding differing lengths of telomerase. In fact, the initial cDNA clones characterized by Nakamura et al. and Meyerson et al. (Nakamura et al. 1997; Meyerson et al. 1997a) had a 182 bp deletion that resulted in a shortened telomerase due to presence of a premature stop codon.
As it turns out, transcription of the single-copy gene produces a number of variant telomerase transcripts. While assaying for expression of telomerase by amplification of cDNA synthesized from normal cells, immortalized cells and tumor cells RNA, Kilian et al. noticed multiple amplification products (Kilian et al. 1997). The amplification products were isolated, and their DNA sequence determined. Some of the products contained insertion of sequence and others deletion of sequence relative to the reference sequence. One of the deletions (β – 182 bp) corresponded to the deletion found by Nakamura et al. and Meyerson et al.
Based on cDNA analysis, at least six different alternative sequences appear to be retained in mRNAs (see Figure 3, which shows 5 of the 6 alternatives). The sixth and 5′-most alternative sequence (not shown in Fig. 3) has an unknown length. The nucleotides that are inserted or deleted relative to the reference telomerase are called “alternative sequences”. Because these sequences are derived from partial or entire exons and introns, “alternative sequences” is chosen as a neutral term.
The human telomerase gene is composed of 16 exons and 15 introns spanning approximately 40 kb of chromosome 5 (Wick et al. 1999; Leem et al. 2002). The size of the exons ranges from 62 to 1354 bp in length. The reference hTERT mRNA (Fig. 4) contains all 16 exons.
For each of the known RNA splice variants, the effect of the presence/absence and location of each alternative sequence is presented on the assumption that it is the only alteration. It will be appreciated that a particular alternative sequence may alter the sequence of the translated product, regardless of whether other alternative sequences are spliced in or out. For example, the presence of alternative sequence 1 results in a frameshift and truncated protein, regardless of whether alternative sequences a, b, 2 or 3 are spliced in or out.
Alternative sequence X is derived from intron 3. Alternative sequence X was found inserted between bp 1769 / 1770, but was of unknown length (Kilian and Bowtell – WO 99/01560). Intron 3 is 2089 bp, thus sequence X could be that long. Because of stop codons present in all three reading frames of alternative sequence X, hTERT that contains X would result in a truncated protein that contains approximately 600 N-terminal amino acids and lacks all of the RTase motifs.
Alternative sequence 1 (Kilian and Bowtell – WO 99/01560), inserted at nucleotide 1950/1951, contains the first 38 bp of intron 4, which is 687 bp long. Its presence in mRNA causes a frame-shift and ultimate translation of a truncated protein due to a stop codon at nt 1973. This truncated protein contains only RTase domains 1 and 2.
Alternative sequence a (Kilian and Bowtell – WO 99/01560), located at bp 2131-2166 in the reference hTERT, is frequently observed spliced out of telomerase mRNA. Because these 36 bp are only part of exon 6, presumably the in-frame deletion of this variant results from an alternative 3’-splice acceptor sequence in exon 6. A protein translated from such an RNA is deleted for 12 amino acids, removing nearly all of RTase motif A. This motif appears to be critical for RT function; a single amino acid mutation within this domain in the yeast EST2 protein results in a protein that functions as a dominant negative and results in cellular senescence and telomere shortening.
Another RNA splice variant is deleted for alternative sequence β (Kilian and Bowtell – WO 99/01560). The deletion encompasses all of exons 7 and 8 — bp 2286-2468 – and encodes a truncated protein, due to a reading frameshift at bp 2287 leading to a termination codon at bp 2605. This variant protein has RTase domains 1, 2, A, B, and part of C, but lacks a motif (AVRIRGKS SEQ. ID NO:90) identified in the b sequence. The motif matches a P-loop motif consensus AXXXXGK(S) found in a large number of protein families including kinases, bacterial dnaA, recA, recF, mutS and ATP-binding helicases (Saraste et al. 1990). The importance of the P-loop in hTERT remains to be investigated.
Alternative sequence 2 – unknown length, derived from intron 11, which is 3801 bp – is inserted between bp 2843 / 2844 (Kilian and Bowtell – WO 99/01560). The sequence contains an in-frame termination codon at its extreme 5-end, resulting in a truncated protein of 948 amino acids, which has the entire RTase domain region, but lacks the C-terminus. Mutations constructed in the C-terminus region (from about aa 926 to 1132) revealed that regions E-I to E-IV (located from 926-1100) are essential for catalytic and biologic function of hTERT, while mutations at the extreme C-terminus (aa 1127) generated a catalytically active but functionally dead protein (Banik et al. 2002). Cells expressing the +1127 mutant failed to immortalize due to shortened telomeres. Whatever the exact mechanism that causes this phenotype, it is clear that the C-terminus plays a critical regulatory role in humans, and that the product of the RNA splice variant containing alternative sequence 2 lacks the C-terminus.
In addition to the variant above, the product of a second splice variant also lacks the reference C-terminal domain (Kilian and Bowtell – WO 99/01560). In this splice variant, alternative sequence 3 (derived from the 3’-most 159 bp of the 781 bp intron 14) is inserted at bp 2157/2158. An in-frame stop codon within alternative sequence 3 results in a protein having an alternative C-terminal domain. Furthermore, the coding region donated by alternative sequence 3 contains a potential SH3 binding site, SGQPEMEPPRRPSGCVG, which matches the consensus c-Abl SH3 binding peptide (PXXXXPXXP) found in proteins such as ataxia telangiectasia mutated (ATM). Curiously, this motif is also found near the N-terminus of hTERT (HAGPPSTSRPPRPWDTP, amino acids 304-320).
A transcript lacking all of exon 11 (189 bp spanning nucleotides 2655 to 2843) was found in hepatocellular carcinoma cell lines (Hisatomi et al. 2003). The protein encoded by this variant doesn’t have a reading frameshift and is 63 amino acids shorter than the reference protein. Exon 11 contains RT motifs D and E, suggesting that the splice variant is missing residues from the catalytic site of the protein.
Finally, an RNA splice variant was found to contain 600 bp of intron 14 fused directly to exon 16 (Wick et al. 1999). An in-frame stop codon close to the 5’-end of intron 14 causes a prematurely terminated hTERT protein.