Content, aim and frequently asked questions

Contents and aims of the Progeentix data collection

overall Progenetix database growth

The Progenetix database collects information about genomic abnormalities in human neoplasias. the main aim of the project is to facilitate the identification of genes relevant in tumorigenesis as well as the development of genomic profiling tools for the differential diagnosis and prognostic evaluation.

Developed as a repository for published Comparative Genomic Hybridization (CGH) data, currently a growing number of array CGH profiles is being added. In contrast to array data repositories (e.g. GEO, ArrayExpress), Progenetix provides pre-computed genomic profiles with whole genome copy number gain/loss information. Generally, the inclusion of all interpreted (a)CGH data into publications or supplementary material should be part of good scientific practice.

change of genomic screening techniques by publication date

Progenetix data can be freely accessed for academic research projects; please contact Michael Baudis to get access to the XML formatted source file and data mining features.

During data collection, the input data is automatically converted from the various styles found in publications to a Golden Path mapped copy number status information. Though data validity checks are performed, aberrant annotations contained in the source may not be found in all instances. Please be aware to check the original publications if performing data analysis procedures.

FAQs

What are composite karyotypes, and how are they treated? If annotations from different techniques are available (CGH, aCGH, banding), the combination of all found abnormalities is used.
Which criteria are used for the inclusion of publications? the main criteria are the inclusion of band specific aberration data on a per case basis, and online availability of the article containing extractable text data. Only in some instances, the aberrations were (manually) transposed from ideogram format.
How is the data converted? Usually, CGH data tables copied from the articles and reformatted to an ISCN "ish cgh" format, using a version of the ISCN2matrix converter. During data entry, a band validation algorithm checks for the non-existing bands or aberrant formats in the original annotations.
aCGH data is treated depending on the original format: a) segments files are used for assigning the unaltered segments to individual cases, with subsequent ISCN transcription; b) log2 data tables are converted to segments lists, mostly using DNAcopy (from Bioconductor)
What is the mandatory format for CGH annotations? CGH aberrations should be annotated according to the ISCN (1995) recommendations (F. Mitelman, ed.). However, the ISCN parser will accept some modifications of the format (e.g. any aberrations enclosed in parentheses with the correct prefix ("enh" or "+" for gains, "dim" or "-" for losses, "amp" or "++" for high level gains).
Is the scientific quality of the data evaluated prior to inclusion? No. Discussions about the validity of certain CGH results are left to the reviewers of the publications, and to the user's judgement.
Is the data included "as is", or are corrections made? Obvious errors (mostly typing errors) are corrected if unambiguos. Sometimes "amp" annotations are changed to "enh", if whole chromosomes or chromosomal arms are denoted as "amplified". However, "amp" annotation should be considered inherently inconsistent.
Progenetix collects information about the genomic copy number profiles of individual cancer and leukemia cases. It consists mainly of a compilation of published data from chromosomal and array/matrix Comparative Genomic Hybridization (CGH) experiments. Progenetix is with 17255 CGH and 2443 array CGH experiments the largest public CGH database. The software and database content are maintained by Michael Baudis.