Summary of the “non-coding sequence” invention

As much as 97% of a human genome does not encode a protein. Researchers initially referred to this non-coding DNA as “junk” because no function could be found for it. 4 More recently, however, this wasteland has proved to be a repository for a variety of functions that are part of normal, and even critical, cellular processes. For example, non-coding DNA contains regulatory elements like enhancers, silencers of expression, and may function to promote exon shuffling in evolution. But what subsequently surprised many was that the non-coding DNA, even beyond these regulatory elements, can have conserved sequence among individuals.

While studying haplotypes of the human MHC (major histocompatibility complex), which encodes products critical for self-recognition and is extremely useful in myriad diagnostic tests, Dr Malcolm Simons discovered that non-coding DNA, both intronic and intergenic DNA, in the region of MHC genes, had sufficient non-random sequence variation to be informative in individuals for surrogate typing of MHC genes. 5 Moreover, the polymorphisms in the non-coding DNA were also informative of MHC haplotypes. 6 From these discoveries, he recognized the universality of the discovery, namely, that the non-random, haplotypic structure of non-coding DNA would be a characteristic of the genomes of all eukaryotic organisms, rather than the alternative interpretation that the informative nature of HLA non-coding sequence was unique to HLA genes. This conclusion was influenced by previous RFLP (restriction fragment length polymorphism) studies of genes from other human loci, other animals, and of plants, in which restriction endonuclease cut sites had been shown to be linked to coding gene mutations/allelic characteristics.

Dr Simons also realized that various consequences flowed from these discoveries. To begin with, when combined with amplification, previously unknown non-coding polymorphisms in non-coding DNA could be captured. Moreover, the restricted heterogeneity in haplotypic structure of DNA in eukaryotic genomes would enable genome-wide mapping of gene regions associated with phenotypic traits, such as disease and drug responsiveness in humans and commercially desired characteristics in animals and plants. A big advantage of this technique is mapping by linkage disequilibrium (gametic association) in single individuals not known to be related by descent, thus avoiding the requirement for pedigrees. To illustrate the discovery, haplotype patterns of polymorphisms can be envisaged as in the exemplary drawing below. In this drawing of a chromosomal region, there are four polymorphic sites, P1 through P4. Each polymorphic site has three alleles, a, b, and c. If all possible haplotype patterns are found in a population, there would be 3 4 or 81 different patterns. In the drawing below, three of the possible patterns are shown.

Thus, to identify the chromosomal region containing the trait, haplotype patterns are determined for selected chromosomal regions. When a population is diverse, the haplotype patterns are expected to be diverse as well, such as above. When a population has a trait, then the region where the trait maps should show less diversity of haplotype patterns, such as shown below. Thus, the trait is mapped by comparing haplotype patterns for chromosomal regions between the diverse population and the population with the trait.

These realizations were captured in two patent application families initially filed in 1989 in the United States. Briefly, the two patent families are directed to (i) single gene allele and linked gene haplotype diagnostics, where the gene/region was known, and (ii) genome-wide gene discovery, where trait-associated genes had not been mapped. These patent families are discussed in more detail below.