The CATMA project was initiated by French and Belgian laboratories in December 1999 and joined by additional groups from Germany (July 2000), the Netherlands, Switzerland, the United Kingdom (October 2000), Spain (January 2001) and Sweden (September 2001).
The identification of each gene in the five Arabidopsis chromosomes is at the root of a genome-wide effort to study their expression. The choices made for launching the project reflect the status of our knowledge in February 2001 when the structure of only a minority of Arabidopsis genes (about 2000) had been determined experimentally. Therefore the project also had to rely on gene prediction to identify the boundaries of each transcription unit and of the exon(s) within it.
At the start of the large scale GST synthesis conducted within the CATMA project, the chromosome annotations published by the Arabidopsis Genome Initiative (AGI) sequencing centres were not homogeneous. Different tools had been adopted by different centres and had evolved over time. Also, according to our evaluation of the gene prediction algorithms used for the annotation of the Arabidopsis nuclear genome, the EuGène package developed by Thomas Schiex (INRA, Toulouse) offered the most reliable results.
Therefore, we originally chose to design the CATMA GSTs on a complete updated annotation of the Arabidopsis nuclear genome, provided by EuGène and based on a uniform set of parameters (21,120 in silico GSTs; v1). In a second phase, we updated the collection with 3,463 additional in silico tags (24,583 in total; v2). These new tags were selected according to alternative criteria for primer selection, taking into consideration added 3' UTRs as well as AGI gene models located in regions where no genes were predicted by EuGène. A third set of 5,760 GSTs was prepared in the framework of the CAGE project and took into consideration the latest AGI models provided by TIGR (TIGR5) and the latest Eugène gene models (Eugene040917) (30,343 in total; v3). A final set of 543 GSTs was added in 2006, taking TAIR6 as annotation reference and relaxing primer and probe design constraints (30,886 in total; v4). The used design algorithm then also allowed a probe to tag multiple members of a gene family, leading to an additional 990 GFTs (Gene Family Tags). The description of these GFTs can be found on the the CATMA Gene Family Tag website
On 21st June 2002 the CATMA database was made public, allowing full searching of the first 21,120 validated GSTs (v1). In March 2004, the database was updated with a total of 24,576 GSTs. In March 2007 the 5,760 version 3 and 543 version 4 GSTs were added to the database. The update history of the CATMA database and website is described in the CATMA status page. If you would like to know when further updates take place, please use this link: contact me on update.