A case study: Soya versus Arabidopsis

Predicting the influence of cross genome claims from Arabidopsis gene patents on important dicot crops: Soybean versus Arabidopsis

Soya Bean (Soybean/Soya/Soy/Glycine max):

Soybean, or Glycine max, has a long history of domestication as a food source, and the seed has been harvested for food for many hundreds of years in Eastern Asia. This long history of domestication has resulted in the existence of a very large number of soybean variaties anbd cultivars. More recently, soybean has become an important global food crop, grown in Europe, Asia, Australia, Africa, and Northern and Southern America.  The plant itself is a legume, and will grow in many soil types, requireing a hot summer growing season. Being a legume, the plant establishes a symbiotic relationship with nitrogen-fixing bacteria and is able to grow in relatively nitrogen-poor soils.

Soybean is rich in both protein and oil.  Present day crops are grown for use mainly in processed foods, with only a relatively minor proportion being comsumed directly as food. Soybean and it’s processed components finds its way into products such as breakfast cereals, oils, milk replacements, infant formulas, biodeisel, cosmetics, and animal feed.

Soybean and Biotechnology

Due to its economic importance and growing popularity in processed foods, G. max has been the focus of much research, including breeding studies and the production of GMOs (Genetically Modified Organisms).  As one important example of this:  Monsanto markets a Roundup-ready form of G. max which is resistant to glyphosate-based herbicides (e.g Monsanto’s Roundup herbicides).  This modification offers a simple way for farmers to plant seed directly into unploughed fields.  Spraying with glyphosate-based herbicides eliminates competing weeds without the need for field ploughing prior to planting.

Patenting and genomics
The soybean genome is-as-yet in the early stages prior to genome sequencing.  The Purdue University consortium has started early work to develop the base for sequencing the genome of G. max.  Hence this genome is only in the earliest stages of sequencing.  However, there are presently 463,101 nucleotide and 2,615 protein sequences deposited in GenBank at the NCBI (as of 24th July 2006) for Glycine max.  Although this dataset is far less than that expected for the entire genome, it is possible to use the available sequence data to predict the amount of similarity between Glycine max and Arabidopsis thaliana genes and to determine the degree of cross genome claims in patents.

Glycine max has a genome size of 1100Mb, almost 10-times larger than Arabidopsis thaliana, with a diploid chromosome count of 40 (Genetics. 2005 Jul;170(3):1221-30.). It belongs to the pea family, Fabaceae, along with peas, beans, and other legumes.  We chose soybean as the Arbidopsis comparison for this analysis based on the following:

  • Glycine max is an important crop dicot
  • Large companies such as Monsanto are already interested in developing Glycine max
  • The genome is in an early stage of sequencing, however there is still significant nucleotide sequence deposited with GenBank
  • Glycine max is not as closely-related to Arabidopsis  as are rape, cabbage and lettuce, and is therefore a more realistic comparison for dicots in general.