Sequence Applications

Mapping Claimed Sequences to the Arabidopsis Genome

As we saw in the previous analysis of claimed sequences appearing in granted US patents (US-B patent documents), it is possible to obtain an overview of patent activity in Arabidopsisby:

  1. Generating a query database consisting of the sequences appearing in the claims section of granted US patents,
  2. Generating a target database consisting of the entire sequences of the chromsomes from Arabidopsis,
  3. Performing a megaBLAST analysis with this data (With Expect <= 1 x 10-200 to filter results),
  4. Filtering the megaBLAST results to obtain matches of >=150bp in length (and taking only the match with the highest bit score), and
  5. Marking the approximate positions of these results  onto a map of the Arabidopsis chromosomes.

From this we were able to detect patenting “hot-spots” and to identify various Arabidopsis genes of interest to those involved in exploiting related genes.

One limitation of this analysis is the fact that much of the Arabidopsis sequences appear in patent applications, not in granted patents.  To overcome this limitation a further query database was constructed, consisting of sequences appearing in the claims section of US patent applications (US-A patent documents).  Included in this dataset were sequences from bulk sequence applications for Arabidopsis.

(Note:  The construction of this database required considerable effort.  The database itself can be found here.)

We repeated the analysis performed above on this new dataset, to determine the extent of patent pending claims on the Arabidopsis genome.

 This analysis does not deal with those sequences that are claimed through similarity or hybridisation language.
Only those sequences that are essentially identical to Arabidopsis sequences were mapped.
Sequences from granted US patents were not included in these maps.