Patent Sequence Databases

Four of the major national patent offices provide public-accessible sequence databases:

  • USPTO provides sequence data to the National Center for Biotechnology Information (NCBI) database
  • European Patent Office provide sequence data to the European Bioinformatics Institute (EBI) database
  • Japan Patent Office provides sequence data to the DNA Database of Japan (DDBJ)
  • World Intellectual Property Organisation (WIPO) provides sequence data (PCTGEN) from PCT application via their web site

There are two commercial and one non-commercial online search services that allow combined searches against some or all the above sequence databases. These are:

Thomson/Derwent “GENESEQ”

Thomson/Derwent provide the largest single database of sequences extracted from patents, GENESEQ, and  an “early availability” database called “GENESEQ FASTAlert” that provides timely access to newly published patent sequences, before those sequences are fully annotated and added to the full GENESEQ database. For data from 1981 onward, GENESEQ includes all nucleic acids (10 or more bases in length), amino acids (4 or more residues in length) and all PCR primers and probes contained within patents derived from the claims, examples and general disclosure.  This covers basic patents from 41 issuing-authorities including US government and PCT patent applications.

GENESEQ is available on the web, via STNweb and in flat file (EMBL) format (enabling integration into in-house bioinformatics systems).  Many IP offices prefer the latter for confidential searching of pre-publication sequences.

Chemical Abstract Service “STN”

STN provides access to the Thomson/Derwent GENESEQ database (called DGENE in STN) as well as access to the PCTGENE database.  Polypeptide and nucleic acid sequences in DGENE and PCTGEN are searchable using three search tools:

  • BLAST® sequence similarity searching from the National Center for Biotechnology Information (NCBI)
  • GETSIM FASTA based sequence similarity searching from FIZ Karlsruhe GmbH
  • GETSEQ sequence code match searching from FIZ Karlsruhe GmbH. Useful for short and/or highly conserved sequence queries.

Input sequences can be manually entered (up to 200 residues), read from a file uploaded to the server, or may be recalled from a previously saved search.

Patent Informatics “PatGen”

PatentInformatics46 provides an online service for searching sequences extracted from patents, but also advertises an in-house database solution that can be installed for a particular user’s database with weekly data updates. The sequence data is extracted from the publicly available databases listed above (NCBI, EBI, DDBJ, PCTGEN).

PatentInformatics maintains the following online services:

  • PatGen DB Lib – This is a basic keyword search tool that enables the searching of bibliographic data. The user can search for the title, abstract, inventor, applicant and date.
  • PatGen DB Tax – This tool has been set up to enable the searching of patent genetic sequences based on the known sequence taxonomy. This is useful to make queries such as, for example, how many patents disclose Fowlpox virus sequences.
  • PatGen DB Blast – Basic Local Alignment Search Tool (BLAST) is used to perform sequence searching. In this case patents can be searched through the matching of related genetic sequences.

PatentInformatics, like Thomson/Derwent GENESEQ, also provides an in-house search that can be installed for confidential searches of pre-publication sequences, for example by an examiner of an Australian patent that did not enter via the PCT.  Unlike GENESEQ, PatLAMP uses a “LAMP” based (Linux, Apache, MySQL, PHP/Perl) server, which includes:

  • Linux – Suse 9.2
  • Webservices: SOAP
  • XML processing: DOM and XSLT
  • Patentinformatics – PatIndex, PatBLAST, PatServ
  • Bioinformatics – BioPerl, BioPerl DB, BLAST, CLUSTALW

While this service is very imperfect (for example, the search output is limited to ten results, no matter how broad the claim around the sequence being searched), PatGen’s presence in the market suggests that provision of such a search tool is something that a national office such as IP Australia could also envision for public good, and the tools are available for an open source solution.

_________________

46http://patentinformatics.fdns.net/