CATMA: A Complete Arabidopsis transcriptome Microarray

Simple search | Advanced search | Multiple search | Homology search | Multiple homology search | Help

Help page

If you have any question about using the database which aren't answered below, please e-mail the database administrator for help.

Searching the database

The database can be searched in a number of ways:

Database field descriptions

Below is a verbose description of the more important database fields and of those fields where the explanation is slightly different for the different versions of the GST repository. The verbose description is followed by a complete list of all database tables and fields together with a concise description thereof.

GST
ID The id number of the GST. Format is CATMAxyzzzzz, where x is the chromosome number and y is the version letter.
Repository version 1-2 GST letter is 'a' or 'b'; 'letter b' probes are re-designed and improved 'letter a' probes; control probes have 'ctrl' in their name.
Repository version 3 GST letter is c.
Repository version 4 GST letter is d.
GST Location The position of the GST within the gene model. e.g. exon 2-2/4 means completely within exon 2 out of a total of 4 exons, while exon 4-5/10 means that the GST starts in exon 4, spans an intron, and finishes in exon 5 of 10.
Repository version 1-2The value originally given by SPADS at the time of design, and thus referring to the structure of the used design template (in most cases equal to the exon structure of the transcript sequence given in gene_sequence).
Repository version 3The value presented was calculated based on the -3' artificial UTR extended- transcript sequence of the target gene (TIGR5 or Eugène040917). In case of target genes with multiple splice variants (only for TIGR5), an NA value is given.
Repository version 4The value presented was calculated based on the -not 3' artificial UTR extended- transcript sequence of the target TAIR6 gene. In case of target genes with multiple splice variants, an NA value is given. GSTs of v4 were not restrained to start or stop within an exon, if starting before an exon the '<'; sign is added before the exon number, if stopping after an exon the '>'; sign is added after the exon number.
GST Type E means completely within one exon, I means spanning or overlapping with an intron. '1' means a similarity < 40%, '2' a similarity between 40% and 70% and '3' a similarity > 70%
Repository version 1-2Value originally given by SPADS, with the number corresponding to the initial similarity value. Mind that this similarity value was now re-calculated.
GST Intron % The percentage of the GST corresponding to an intronic region
Repository version 1-2The intron percentage originally given by SPADS at the time of design. This percentage is with respect to the structure of the used design template (in most cases equal to the exon structure of the transcript sequence given in gene_sequence).
Repository version 3The intron percentage was calculated based on the -3' artificial UTR extended- transcript sequence of the target gene (TAIR6 or Eugène040917). In case of target genes with multiple splice variants (only for TAIR6), an average value is given.
Repository version 4The intron percentage was calculated based on the -not 3' artificial UTR extended- transcript sequence of the target TAIR6 gene. In case of target genes with multiple splice variants, an average value is given.
GST Similarity % GST specificity expressed as percentage of sequence identity in the best non-trivial blast hit, i.e. not matching the target gene sequence, using the genome sequence ( ATH1_chrX.1con.01222004, X=1,2,3,4,5) as a blast database.
Repository version 1-2Some GSTs could not be mapped to a current gene model , which is why the sequence identity of the second best blast hit was taken without performing the triviality check.
Repository version 3Values were taken from SPADS output. Triviality was checked taking into account the minimum and maximum coordinate of the target gene transcript. SPADS also takes into account the sequence identity of the second-best, third-best and fourth-best non-trivial blast hit, in case they are within close proximity of the best non-trivial blast hit.
Repository version 4Values were calculated manually, triviality was checked taking into account the minimum and maximum coordinate of the target gene transcript
Baldino Flag Advisory flag warning for potential off-target hybridization. Any GST that has a blast hit with a calculated hybridization temperature equal to or higher than 45 degrees Celcius is flagged. The Tm calculation was performed according to the formula Tm = 16.6(log molar concentration Na+) + 0.41 (%GC) + 81.5 - 675/length - 0.65(percentage formamide) - %mismatch (~ "Baldino, F., Jr., Chesselet, M. F. & Lewis, M. E. (1989) Methods Enzymol. 168, 761-777", %Formamide = 50% and [Na+] = 0.666M). The trivial blast hit was not taken into account and neither those blast hits entirely enclosed by an exon of a target gene of the GST (last criterion only used for GSTs of class GST3,GST4 and GST5).
GST Start and Stop The start and stop coordinates of the probes' blast hit on the ATH1_chrX.1con.01222004 sequence with X the corresponding chromosome number
Repository version 1-2These GSTs were originally designed using an older release of the Arabopsis sequence. The consequence is that for a small fraction of these probes the blast hits on the latest Arabidopsis sequence release do not cover 100% of the probe sequence. The difference between stop and start will in these cases be smaller than the recorded probe sequence length.
GST 96 Well Plate/GST Coordinates The CATMA GSTs of version 1, 2 and 3 are stored on 317 96-well mother plates, numbered from number 96101 until 96480. The first two digits are always '96', the last 2 digits form the number of the corresponding group of 96-well plates and the third digit represents the ordering number (to be) used when rearraying this plate group onto 384-well plates. . GST Coordinates format is xy(y),where x and y(y) are row letter and column numbers respectively.
Repository version 4These probes are stored separately, together with the Gene Family Tags, at UNIL . No GST 96 Well Plate/GST Coordinates information is available.
Gene Sequence The + strand of the transcript sequence of the targeted gene
Repository version 1-2The transcript sequence of the target gene as it was known at design time. For consistency reasons no upgrade was performed of this transcript sequence. Moreover, some GSTs could not be mapped to a current gene model. This mapping information is stored in the gene_mappings table and in the gene_mapping field of this table.
Repository version 3When different splice variants were available, the GST was designed upon an intersected gene model. Nevertheless the -not 3' artifical UTR extended- transcript of only one of the possible splice variants is shown.
Repository version 4When different splice variants were available, the GST was designed upon one of the possible splice variants. The -not 3' artifical UTR extended- transcript of this chosen splice variant is shown here.
Amplification Results Results of the primary PCR amplification of the GST. B means amplification from BACs, G from genomic. 0 means no product was detectable in gel electrophoresis analysis, 1 is a product of the right size, 2 is a smear or multiple bands and 3 means the product appears to be of the wrong size.
Repository version 4Currently no complete amplification results are available.
Sequence Verified A small percentage of the CATMA GSTs have been verified by sequencing. This field indicates whether a particular GST has been sequence validated.
Model Type Indication of the Arabopsis genome annotation version of the original target gene. Possible values are 'TIGR5', 'TAIR6' and 'EUGENE170904 '
Repository version 1-2Version 2 probes were either designed based on TIGR3, TIGR4 or an earlier Eugène annotation release. Information about the exact annotation used was not retrieved and the value for this column is left blank.
Template Sequence The sense strand of that part of the transcript sequence used for GST design.
Repository version 1-2Information about the used template sequence was not retrieved and the value for this column is left blank.
Repository version 3The template sequence was derived as such: 3' artificial UTR extension of 150 bp if no 3' UTR was present in the original transcript; regions overlapping with transcripts of other genes were taken out; in case of splice variants an intersection was taken between the different gene models.
Repository version 4Representative Family Sequence (RFS) was used as template sequence. The RFS design is described in Sclep, G. et al. 2007.
Primers
TM Melting temperature calculation using nearest neighbour (NN) method as performed by Primer3 (Rychlik, Spencer and Roads, Nucleic Acids Research, vol 18, no 21, page 6410, eqn 2 with NN table from Breslauer, Frank, Bloecker and Markey, Proc. Natl. Acad. Sci. USA, vol 83, page 3748, table 2).
Start and Stop Start and stop of primer with respect to the template sequence. The start coordinate of the 3' primer is always higher than the stop coordinate. The start coordinate of the 5' primer is always lower than the stop coordinate.
Repository version 1-2As the template sequence is empty for these probes, the primer start and stop coordinates were consistently left blank.
Gene Mapping
TAIR 7 Comma separated list of TAIR7 AGI code(s) of the nuclear protein-encoding gene model(s) tagged by GSTs (mapping classified as GST3, GST4 or GST5, see Sclep, G. et al. 2007 for more details on the mapping algorithm). Taking into consideration that a small fraction of GSTs does not tag all the alternative splice forms of a certain gene, the name of the gene model is given instead of the gene name. When no TAIR 7 gene models are tagged, the field is left blank.
TAIR 7 Gene Description A list of textual descriptions of the genes listed in the 'TAIR 7' field. One description is given per gene, not distinguishing between different gene models/splice variants. In case of multiple genes (genes, not gene models) listed in the tair_7 column, the different gene descriptions are separated by a ‘@’ character. The descriptions correspond to a concatenation of the ‘COM_NAME’ and ‘PUB_COMMENT’ fields from the TAIR7 annotation files. When no TAIR 7 gene models are tagged, the field is left blank.
Eugene 040917 Comma separated list of Eugène040917 IDs of the nuclear protein-encoding gene model(s) tagged by GSTs (mapping classified as GST3, GST4 or GST5, see Sclep, G. et al. 2007 for more details on the mapping algorithm). When no Eugène040917 gene models are tagged, the field is left blank.
GST Class GST class code when GST was mapped against TAIR7 and Eugène040917 gene models collectively. When taking more genome annotations into account, the likelihood of finding a cognate gene -and thus of classifying a probe as GST5- obviously increases. A GST of class GST5 can be considered as sufficiently covering its target gene, without showing risk of cross-hybridization. See Sclep, G. et al. 2007 for more details on the mapping algorithm.
Repository version 4Due to a different design process for the v4 repository, some probes overlap over less than 100 bp with exonic regions of their target gene. The mapping algorithm classified these probes as GST2. Where for the other repository versions the corresponding genes were not listed in case of a GST2 classification, an exception was made for the v4 GSTs.

Full listing of database tables and fields

Items marked with an asterisk (*) are explained more fully in the description above

complete_sequence (Main table)
id Format - CATMAxyzzzzz *
actual_sequence GST sequence
sequence_length GST length
gst_location E.g. exon n-n/n *
gst_type E.g. E1, E2, I1 *
gst_class Coverage classification of GST *
gst_gc %GC
gst_intron %intron *
gst_similarity % of sequence identity with best non-trivial blast hit *
baldino_flag advisory flag warning for potential off-target hybridization
gst_start Start position in chromosome *
gst_stop Stop position in chromosome *
location Will be removed from database
96_plate_code 96 well plate barcode number *
96_coords 96 well plate row and column *
gene_sequence (Predicted) transcript sequence of target gene model *
chromosome Number, 1-5
amplification_results Results of GST PCR amplification *
sequence_verified Y/N *
model_type Annotation type of target gene model *
gene_mapping List of genes sufficiently covered by GST *
tair_7 Added for database technical reasons, currently empty
eugene_2004_09_17 Added for database technical reasons, currently empty
primer5_id 5' primer - corresponds to primer table id field
primer3_id 3' primer - corresponds to primer table id field
bac_id BAC info - corresponds to bac table id field
template_sequence Part of target gene sequence used as template for probe design *
primer (Specific primer descriptions)
id Format - gst_id followed by 5 or 3
sequence Sequence of primer
length Length of primer
gc %GC
tm Predicted melting temperature *
start Gene base number *
stop Gene base number *
extension_id Corresponds to extension table id field
template_sequence Part of target gene sequence used as template for probe design. Identical to the template_sequence column for the corresponding probe record in the complete_sequence table.
extension (Extension primer sequences)
id Format C1-24/R1-16
sequence See this page for more detail about the purpose and nature of the extension sequences.
bac (BAC information)
id Initial F means IGF series, T means TAMU
chromosome 1-5
start Start position along chromosome
stop Stop position
length In bp
embl_id Accession number of BAC
gene_mapping
catma_id Format - CATMAxyzzzzz
gst_class Coverage classification of GST *
gene_mappings List of both TAIR7 and Eugène040917 genes sufficiently covered by a GST. A combination of tair_7 and eugene_040917 columns, added for technical reasons.
tair_7 List of TAIR7 annotated gene models sufficiently covered by GST *
tair_7_describe List of descriptions of the covered TAIR7 genes. One description per gene (as opposed to per gene model), in same order as in tair_7 column. *
eugene_040917 List of Eugène040917 annotated genes sufficiently covered by GST *
gene_ontology (Gene Ontology)
agi_match AGI accession number (~ TAIR6)
annotation Annotation (from ftp://ftp.arabidopsis.org/home/tair/Ontologies/Gene_Ontology/ATH_GO_GOSLIM.20070630.txt)
ontology Standardized gene ontology (from ftp://ftp.arabidopsis.org/home/tair/Ontologies/Gene_Ontology/ATH_GO_GOSLIM.20070630.txt)
member (Contributing parties)
id Format - surname, first initial
name Name of contact person
organization Name of contributing institute/organization
country Country of the institute/organization
added Added for database-technical purposes only.
REMOVED Added for database-technical purposes only.