ChemGenome - A Prokaryotic Gene Prediction Software

[General Info] [Data Set] [Validated Result Set] [Help] [Home]

About ChemGenome Downloadable Version

ChemGenome is an ab-initio gene evaluation and prediction software that uses physico-chemical properties to construct a 3D vector to predict genes in Prokaryotic Genomes

Ab-initio genome analysis entails the classification of a genome sequence into coding and non-coding regions without any extrinsic comparison with known datasets. Advanced gene finders for both prokaryotic and eukaryotic genomes typically use complex probabilistic models, such as Hidden Markov Models, in order to combine information from a variety of different signal and content measurements.

Current Downloadable version will run on Linux Platform. The Windows and Solaris version will be made available soon

Click Here to Download the ChemGenome Software and ReadMe file containing instruction to use the program

ReadMe File

Follow the steps to run the ChemGenome 2.0 for Linux (The ChemGenome2.0 is compiled with Linux version)

Installing and Running ChemGenome
ChemGenome has been written and compiled in Linux environment. Following instructions will be run on Linux system.

1. Installation
To install ChemGenome download the files from website Chemgenome2.0.tar. Size of the compressed file is 30KB. Copy the tar files in your current directory and uncompress it by using this command

$ tar -xvf Chemgenome2.0.tar

The ChemGenome2.0 contains three files- Chemgenome2.0, data directory, and readme.txt
To run ChemGenome2.0 properly , user should copy data directory in to their current directory before running Chemgenome2.0. After execution of ChemGenome2.0 all the result file will be copied in to the current directory.

2. Running
Run the ChemGenome2.0 with genome files as a paramter

$ ./Chemgenome2.0 <genome_file_name> <orf_length> <method> <Start Codon(ATG OR|AND CTG OR|AND GTG OR|AND TTG)>

3. Arguements
Threshold Value: You can specify lower threshold value to find smaller genes or higher threshold value to weed out false positives
Start Codon: You can specify what should be the start codon with which you want to find genes.
Method :
DNA Space: The method takes complete or part of genome sequence of prokaryotic species in FASTA format as input file. It searches for genes based on physico-chemical properties of double-helical deoxyribonucleic acid (DNA).
Protein Space: The method takes the result generated from DNA space as input file and works as a filter based on stereochemical properties of protein sequences to reduce false positives.
Swissprot Space :The method takes the result generated from protein space as input file and calculates the standard deviation of a query nucleotide sequence (predicted gene sequence) with the swissprot proteins based on the frequency of occurrence of aminoacids. A threshold standard deviation is chosen to keep the false positives at minimum and precision at maximum.

4. Output of the Program
On Version available online, there is graphical output. In downloadable version following files are created.
The output files are
1. 1main_orfs - Genes predicted in 1st main reading frame
2. 2main_orfs - Genes predicted in 2nd main reading frame
3. 3main_orfs - Genes predicted in 3rd main reading frame
4. 1complementary_orfs - Genes predicted in 1st complementary reading frame
5. 2complementary_orfs - Genes predicted in 2nd complementary reading frame
6. 3complementary_orfs - Genes predicted in 3rd complementary reading frame
7. gene_sequences - Gene Sequences of the predicted genes

5. Speed
Time taken by the program will depend on genome size and the speed of the system on which its run. It takes usually 1-2 minutes for 1MB genome on a Pentium 4, CPU 2.40 GHz, 248 MB RAM with swissprot space method.

References

[1] "Prokaryotic Gene Finding based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations", Poonam Singhal,B Jayaram,Surjit B. Dixit and David L. Beveridge, Biophys J., 2008, 94, 11, 4173-4183.
[ Read Paper ]

[2] "A Physico-Chemical model for analyzing DNA sequences", Dutta S, Singhal P, Agrawal P, Tomer R, Kritee, Khurana E and Jayaram B, J.Chem. Inf. Mod., 2006, 46(1), 78-85.
[ Abstract ]

[3] "Beyond the Wobble : The rule of conjugates", Jayaram B, Journal of Mol. Evol., 1997, 45, 704-705.
[ Read Paper ]

In case of any Suggestions/Exceptions, Please contact us at scfbio@scfbio-iitd.res.in

© Copyright 2004-2007, Prof B. Jayaram & Co-workers\| Disclaimer