General features of trade mark searching

Although there are other aims of trade mark searching, a common cause for both examiners and other stakeholders (e.g. businesses making applications, anti-infringement and counterfeit enforcement agents) is to uncover aspects that may be deceptively similar to existing marks covering similar products and services (often narrowed by Nice classifications).

Distinguishing characteristics and similarities may be of both figurative and verbal types.  Verbal similarities may include visual, phonological, and synonymic aspects, all of which comprise important language-specific factors.  Thus, one section below relates to pattern recognition, with a focus on searching visual images, and another relates to particularities of foreign characters as marks.  For example, distinguishing characters of similar appearance or that have phonetical equivalence when the alphabet used is non-Roman.

Pattern recognition

Pattern recognition is a field within the area of machine learning with the aim to classify data (patterns) based either on a priori knowledge or on statistical information extracted from the patterns. More specifically, a pattern recognition system consists of:

  • Sensing a distinguishing pattern of high density marks within a low density background or a pattern of vibrations.
  • Feature extraction which is a mechanism that computes information from the observations gathered by the sensor;
  • Classification or description scheme that analyses the extracted information based either on probabilities extrapolated from the statistical regularities in it, or by comparing its structure to a set of patterns that have already been classified or described, known as the training set.46  Mechanisms such as associative neural networks are commonly used for training.

46http://en.wikipedia.org/wiki/Pattern_recognition

Specific applications of pattern recognition

Specific applications of pattern recognition include speech recognition, Optical Character Recognition (OCR) of machine printed or handwritten text, text pattern classification (e.g. E-mail spam filters, or a search engine finding “similar” documents based on frequent co-occurrence of words), and image analysis, which ranges from bar-code reading to identification of human faces, fingerprints or iris patterns and identifying similar patterns in images (e.g. trade marks of various sorts including shape marks).

In Chapter Two of this report, there was discussion of Optical Character Recognition.  This was in the context of full-text and other text-based searching of patent documents, but the comments made in that context about the accuracy of OCR algorithms in general, and OCR of Chinese, Japanese, and Korean (CJK) characters in particular, are certainly relevant to searching trade marks.47
Accuracy is affected by certain stylistic elements in the calligraphy such as brushy or indistinct edges, colours, and incorporated pictorial features, but based on the training set chosen, OCR engines using mechanisms such as associative memory neural networks can be “trained” to handle such elements.

Image analysis is the subtopic of pattern recognition concerned specifically with processing digital images.  In Chapter One of this report, design patent searching was addressed in connection with a survey of patent information search sites, and it was noted that visual image searching would be useful but is not used much in design patent searching because most searching is done using classifications. Image searching may however be similarly or more useful for normal trade marks, and shape marks.  It was, however, noted that although this has been a field of informatics research since the 1950s, the software available still has significant technical limitations, and although there has been some valuable research work related to and tested on trade mark recognition,48 it is still essentially at a prototype level.  The greatest technical progress has tended to focus on security-related images such as faces and fingerprints rather than IP industry uses.49

Similarly, algorithms very much like those used in speech recognition can be used for processing sound marks, but again most of the emphasis has been on development of training sets for the human vocal range, and less into other types of sounds.  In general, except for patterns with simple characteristics such as bar codes, the human eye and ear are still much more sophisticated pattern recognition tools than any software.

However, with the expansion of computing power at reasonable cost, this is an area of endeavour that merits periodic re-examination.

47Outside the scope of this contract report, CAMBIA’s informatics team had recently conducted extensive trials of the latest OCR technology with a focus on the recognition of Chinese and Japanese as well as European languages (evaluations of accuracy were done by native speakers currently employed by CAMBIA as IP research analysts), and was able to identify OCR engines that with our applications achieved accuracy levels of over 99% on patent documents provided in Chinese and Japanese.

48For example, Jain AK, Vailaya A (1998) Shape-based Retrieval: A Case Study with Trade mark Image Databases. Pattern Recognition 31:1369-90.; Eakins, JP, Boardman JM and Graham ME (1998) Similarity Retrieval of Trade mark Images. Multimedia 5:55-63, which also uses shape similarity, and Zhao T, Tang HL, Ip HHS, Qi F (2002) Content-based trade mark recognition and retrieval based on discrete synergetic neural network.Distributed Multimedia Databases: Techniques and Applications, pp. 58 – 72. A useful survey of the trade mark visual matching literature may be found in Leung WH and T. Chen (2002) Trade mark retrieval using contour-skeleton stroke classificationIEEE International Conference on Multimedia and Expo. Part of a thesis submitted to Carnegie Mellon University in 2003, it also describes a method to retrieve trade marks using query by sketches. Trade mark images are filtered to remove noise, then segmented based on pixel connectivity. Either thinning or edge extraction is applied to each region to produce a stroke sketch.

49Fall CJ, Giraud-Carrier C (2005) Searching trade mark databases for verbal similarities.  World Patent Information 27: 135-143

Developments in image recognition technologies

The European eMAGE project is a significant development on the horizon for image recognition technologies specifically for trade mark and design searching, which will use an associative memory neural network for pattern characterisation.50

This project, first reported in 1998, has already undergone delivery delays and has yet to materialise.51 However, there is a commitment by the French INPI as lead agency to use it and market it.  Development in the current phase has been at a cost of about 2 million Euros52, approximately half funded by the European Commission under the eContent programme53, largely in a contract to the French company LTU Technologies54.  This company, already reputed for image processing software for law enforcement and intelligence, was acquired in March of this year by JASTEC International Inc., a US subsidiary of JASTEC Co., Ltd., a Japanese software development and systems integration company55.  The business model of the eMAGE consortium is to out-license this software to other national IP offices, and to package and market to IP divisions in private companies,  and anti-counterfeiting authorities such as those associated with ports and border controls.56

The searches will be against European registered logos and industrial designs databases, and in development has used sample data provided by the French and Portuguese INPI offices and the Œsterreichisches Patentamt. Search capacities are also being set up for French, Portuguese, German and English.  The project is not currently envisioned to encompass other languages such as languages rendered in non-Roman characters.57

Often better results may arise from combining several methods rather than using a single method such as statistical analysis of Fourier descriptors, invariant moments or Zernike moments.58 Ad hoc weighting functions may be used in the combination,59 and image analysis methods may even be used in combination with full-text searching methods, which if used as an initial filter for the images may decrease computational intensivity.60

50http://www.eu-projects.com/emage/Details.htm
51http://www.eu-projects.com/emage/results.htm
52http://www.eu-projects.com/emage/Details.htm
53http://www.cordis.lu/econtent/
54http://www.ltutech.com/en/
55http://www.ltutech.com/en/news.press-release-2005-04-18-00.html
56http://www.eu-projects.com/emage/objectives.htm
57http://www.eu-projects.com/emage/Details.htm

58Fall CJ, Giraud-Carrier C (2005), ibid.
59Chan DYM, King KCI (1999) Genetic algorithm for weights assignment in dissimilarity function for trade mark retrieval. Third International Conference on Visual Information and Information Systems, Lecture Notes in Computer Science 1614: 557—565, uses an algorithm to determine weightings consistent with human judgements in order to classify 1360 monochromatic trade marks.
60Ravela S, Manmatha R (1999) Multi-modal retrieval of trade mark images using global similarity. U. Massachusetts Computer Science Technical Report TR99-32, describes initially narrowing the set of possibly similar trade mark images by using full text search on the text associated with trade marks. The images are then filtered with Gaussian derivatives used to calculate geometric curvature and phase, and the similarity in the distribution of these statistics is then used as a measure of the similarity of the images.

Special features related to foreign word marks, particularly CJK characters

In English and many other languages in which it is possible to find similarities based on letter order, or similar length with a number of letters in common, “fuzzy search” algorithms have been designed to focus on starting and ending letters with letters added, transposed, or deleted, similar to applications in molecular biology for aligning similar DNA sequences61as discussed in Chapter Two.  However, with regard to the visual similarities of words, the common algorithms used to determine the similarity of words written in Roman letters are not applicable to languages written in ideographic scripts, such as Japanese or Chinese.
In many languages a given character can be written in different ways.  For example Chinese characters can be written in a handwritten style with simplified characters now being used as a standard in mainland China, and more traditional complex characters are used in Hong Kong and overseas Chinese communities.  In addition, calligraphy is often a form of artistic expression with diverse styles. Consequently, two characters that look quite different may be recognised as synonymous by a native speaker.  While staff having specialised cultural and linguistic skills is one solution to this challenge for examiners, access to native speakers of many different languages and dialects would be required to address it fully.

To at least partially address this complexity for searching visual similarities using informatics applicants should be required to submit non-Roman letter-based word marks, in addition to the actual planned logo, using a standard character encoding system such as Unicode62.

If it is not preferable to address this with a regulatory change, it could still be facilitated through the use of Optical Character Recognition (OCR) techniques and or skilled staff to extract text from the images submitted according to current requirements. The use of a standard character encoding would allow machine translation and other computational linguistic techniques to be applied to more common languages such as Chinese and Japanese.

Skilled trade mark searchers often use a variety of techniques such as part word searches to identify phonetically similar word marks. These rely on understanding of the peculiarities of spelling and pronunciation to predict the likely pronunciation of “made up” words such as “h@ppy” or “STOCKX” or “OZZEE”.

Numerous algorithms have been developed to aid in the automatic identification of similar words, using phonetic and or letter sequence-based approaches. This approach is obviously also language dependent, but considerable advances have been made particularly in countries with multilingual cultures, such as Switzerland.  Some of the better known approaches are edit distance (also known as the Levenshtein algorithm), N-grams, and Soundex.63 There are open source software packages such as ht://Dig incorporating these algorithms, that provide rules for handling accented characters and capabilities for fuzzy searching once words are reduced to phonetic codes. However, these approaches are still based on a requirement that the pronunciation of words in the mark is known for comparison via these methods to the pronunciations of possibly similar words in a reference database.63

61Fischer I Zell A (2000) String averages and self-organizing maps for strings.  Proceedings of the Second ICSC symposium on Neural Computation, pp. 208-15.

62http://www.unicode.org/

63For more details see Fall CJ, Giraud-Carrier C (2005), ibid.

Current practices for foreign word mark searches

In practice what happens with Trade Mark Offices, for example in Australia,  the representation of a trade mark in an application for registration of the trade mark includes words of a language other than English, the applicant must file in support of the application a translation of the words into English.  Further, if the representation of a trade mark in an application for registration of the trade mark includes characters constituting words in characters that are not Roman letters, the applicant must file both the translation and a transliteration of the characters into Roman letters, using the recognised system of Romanisation of the characters.
Unfortunately, transliteration of non-Roman characters, even if performed to a high standard using recognised systems such as Pinyin, is problematic as a guide to pronunciation, particularly if characters are pronounced differently in different regions and dialects.   Furthermore, speakers from different Chinese-speaking regions may legitimately transliterate or Romanise the same Chinese name differently, particularly because of the variety of ways in which a single combination of characters can be pronounced. This can create difficulties in searching all variations.  Fortunately, as mentioned in Chapter Two, each region uses a relatively small and consistent set of characters when transliterating.

Thus, it may be useful to consider making the requirement for transliteration to include the common transliterative variants, as well as a phonetic rendering of the words with any common variants. This would be important to detect an element that appears prominently in a trade mark but which is pronounced similarly to a generic term or previously trade marked name (for example, 可口可乐, which can legitimately be transliterated “kekoukele”, refers to “Coca Cola”64).

64See http://www.daochinasite.com/eng/study/brand.shtml for some examples of Chinese versions of well known brand names.