About ChemGenome2.1 Downloadable Version
ChemGenome is a physico-chemical method which accepts DNA sequence in FASTA format
and predicts genes, based on hydrogen bonding energy, stacking energy and protein-nucleic acid interaction parameter for each trinucleotide (codon).
ChemGenome is ab-initio in nature and has been tested on 372 prokaryotic genomes with sensitivity, specificity and correlation coefficients averaged over 356208 genes and an equal number of frame-shifted genes (non-genes) as 97.5%, 97.20% & 94.25% respectively. The software can be downloaded from the following link. http://www.scfbio-iitd.res.in/chemgenome/chemgenomenew.jsp
Click Here to Download the ChemGenome Software and ReadMe file containing instruction to use the program
Follow the steps to run the ChemGenome 2.1 for Linux (The ChemGenome2.1 is compiled with Linux version)
Installing and Running ChemGenome2.1
ChemGenome has been written and compiled in Linux environment. Following instructions will be run on Linux system.
To install ChemGenome download the files from website ChemGenome2.1.tar. Size of the compressed file is 2.9 MB.
Copy the tar files in your current directory and uncompress it by using this command
$ tar -xvf ChemGenome2.1.tar
The ChemGenome2.1 contains five files- ChemGenome2.1.sh, Protein_Score.exe, Chemgenome2.0, data directory and readme.txt
To run ChemGenome2.1 properly , user should copy data directory in to their current directory before running Chemgenome2.1. After execution of ChemGenome2.1 all the result file will be copied into the current directory.
ChemGenome2.1 can simply be called by providing first argument as chromosome file in FASTA format and second input is 1 or 2 on the basis of organism selected (1 for sequence from prokaryotic organism and 2 for sequence from Eukaryotic organism or from unknown sequence)
$ sh ChemGenome2.1.sh <genome_file_name> <1 or 2>
For Advanced feature user can modify ChemGenome2.1.sh file!
In ChemGenome2.1.sh, the first executable program is Chemgenome2.0 with given parameters,
$ ./Chemgenome2.0 <genome_file_name> <orf_length> <method> <Start Codon (ATG OR|AND CTG OR|AND GTG OR|AND TTG) >
ORF Length: If you have small genome you can specify lower threshold value to find smaller genes. If you have large genomes you can specify higher threshold value to weed out false positives.
Start Codon: You can specify what should be the start codon with which you
want to find genes.
DNA Space: The method takes complete or part of genome sequence of prokaryotic
species in FASTA format as input file. It searches for genes based on
physico-chemical properties of double-helical deoxyribonucleic acid (DNA).
Protein Space: The method takes the result generated from DNA space as input
file and works as a filter based on stereochemical properties of protein
sequences to reduce false positives.
Swissprot Space :The method takes the result generated from protein space as
input file and calculates the standard deviation of a query nucleotide
sequence (predicted gene sequence) with the swissprot proteins based on the
frequency of occurrence of aminoacids. A threshold standard deviation is
chosen to keep the false positives at minimum and precision at maximum.
4. Output of the Program
The output of Chemgenome2.0 is further passed through protein based filters to produce final output,
On Version available online, there is graphical output.
In downloadable version following files are created.
The output files are
1. 1main_orfs.txt - Genes predicted in 1st main reading frame
2. 2main_orfs.txt - Genes predicted in 2nd main reading frame
3. 3main_orfs.txt - Genes predicted in 3rd main reading frame
4. 1complementary_orfs.txt - Genes predicted in 1st complementary reading frame
5. 2complementary_orfs.txt - Genes predicted in 2nd complementary reading frame
6. 3complementary_orfs.txt - Genes predicted in 3rd complementary reading frame
7. Gene_sequences.txt - Gene Sequences of the predicted genes along with position.
8. Protein_sequences.txt - Protein sequences of the predicted genes along with position.
Time taken by the program will depend on genome size and the speed of the system on which its run. It takes usually 1-2 minutes for 1MB genome on a Pentium 4, CPU 2.40 GHz, 248 MB RAM with swissprot space method.
 "Prokaryotic Gene Finding based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations", Poonam Singhal,B Jayaram,Surjit B. Dixit and David L. Beveridge, Biophys J., 2008, 94, 11, 4173-4183.
[ Read Paper ]
| "A Physico-Chemical model for analyzing DNA sequences", Dutta S, Singhal P, Agrawal P, Tomer R, Kritee, Khurana E and Jayaram B, J.Chem. Inf. Mod., 2006, 46(1), 78-85.
[ Abstract ]
| "Beyond the Wobble : The rule of conjugates", Jayaram B, Journal of Mol. Evol., 1997, 45, 704-705.
[ Read Paper ]
of any Suggestions/Exceptions, Please contact us at