Overlap of Claims From Other Genomes

How Does Claim Language Affect the Genome Maps?

During the patent application process the applicant generally attempts to achieve the widest possible claims. Often this is facilitated by the use of similarity or hybridisation language such as:

The mapping analyses performed above only detect sequences that are ~100% identical to the claimed sequences, and miss altogether those that are less-than ~100% identical. This fact, and the existence of such claim language, implies that there are many-other sequence claims that do not appear on our maps. The maps above are therefore under-estimates of the true patenting activity against the Arabidopsis genome.

Remember that the converse is also true: claims based on Arabidopsis genes, which use similarity and hybridisation language,
are likely to be claims to the same or similar genes in other organisms!

Many of the sequences mapped to the chromosomes above will have homologous sequences elsewhere in the genome.  Similarly, sequences from other related-organisms (e.g.Brassica napus, canola) that are <100% identical to the corresponding Arabidopsis gene are probably not present on the map (although some that are almost 100% identical are there already).

The use of hybridisation language also contributes to these cross-genome claims. For example, the following claim would almost certainly claim similar sequences from many other genomes:

Due to the complexity and variation in the claim language it is not possible to accomplish an exhaustive claim-by-claim analysis for all patents.  For this reason we have chosen what we believe to be a reasonable computational approximation of the problem. The choices made with respect to this new method of analysis were:(Patent Application US20040249146: Nucleic acid compositions conferring altered visual phenotypes, DOW AGROSCIENCES)

1. Sequence Database Used as the Query

The query sequences chosen were those DNA sequences that appear in the claims of both US patent applications and granted patents.

2.  Sequence Database Used as the Target

The target sequences for this analysis were the DNA sequences from all 5 Arabidopsis chromsomes.

3. Initial Sequence Comparison

Instead of megaBlast analysis, a method optimised for the fast discovery of almost identical sequences, we chose the Blastn algorithm for this analysis.  Blastn is suited to the recognition of more divergent sequences than megaBlast and we beleive that it is more appropriate for this analysis.

4. Filters and Variables Used For Blastn Analysis

To obtain the largest and most comprehensive dataset for analysis in subsequent steps, we set the Expect Cutoff value to be 1 x 10-20  for Blastn (compared to 1 x 10-200 for the previous analyses).  Furthermore, we retained only those sequence alignments that were >= 150bases in length for analysis (this is the same length used in the previous analyses). This database forms the core for future analyses based on similarity filtering, and can thus be re-anaylsed using various degrees of similarity (see Step 5 below).

5. Similarity Value

At this point we chose what we beleive to be a useful and reasonable definition of patent claims to “similar” sequences.  Thus, 85% identity was chosen as a reasonable value to use in the similarity filter for this analysis. This is based on our experience with patent documents claiming sequences based on sequence identity.  Technically, it is possible to re-filter our dataset using higher or lower valuse of % identity.

6. Discovery of Arabidopsis Sequences with >=85% Identity to the Query Sequence

The filter was applied and alignments with >=85% similarity were identified.  These identified sequences were then aligned over their full length with the corresponding Arabidopsis sequence.  Only those with a full-length alignment score of >= 85% identity were retained.

7. Mapping to the Arabidopsis Genome

Mapping was carried out as in the previous analysis, with matching sequences grouped into 300kb regions and then the totals for these regions mapped as a histogram onto the chromosomes.