PDB

In PDB file some regions of macromolecules don't have known coordinates i.e. coordinates presented in ATOM records in a PDB file may not exactly match the sequence in the SEQRES-primary sequence of the polymeric molecules records (Reff: PDB-An Educational Resource for Exploring a Structural View of Biology). The probable reasons are: Technical failures, disordered regions or highly flexible region. In PDB file these missing residues are listed in REMARK 465 (wwPDB Processing Procedures and Policies Document, December 2012 Version 2.6). In mmCIF format _pdbx_poly_seq_scheme contains the information about the missing residues. The missing residues in this file denoted as a '?'. Missing residues could be useful in modeling gaps or  missing stretches of amino acids before using the protein for molecular dynamic simulation. We have developed the utilities for finding & extracting these residues.

1) In the first method we made the FASTA sequence of amino acids from ATOM section of PDB files and do the pairwise alignment (Needle program) with the amino acid sequence of corresponding UniProt amino acid sequence of that PDB files/ specific chain of molecule. The alignment gives the missing residues in the coordinate section.

2) In second method we use the information in REMARK 465 of PDB files and  _pdbx_poly_seq_scheme  of mmCIF files to extract the information about the missing residues in coordinate section.


Utilities to find missing residues:
 

Alignment based method 
General method

 


© Bioinformatics centre, Savitribai Phule Pune University, India