Methods

Methodology of the DR_bind1 DNA and RNA binding residue prediction server

DNA and RNA binding residues generally possess electropositive atoms that interact with the DNA/RNA electronegative atoms or water oxygen atoms. In the absence of DNA/RNA or water, these DNA/RNA binding residues would be in an unfavorable electrostatic environment due to the electrostatic repulsion among the electropositive atoms and would therefore be energetically unstable. On the other hand, DNA/RNA binding residues within the same family are known to be highly conserved. They would be expected to preserve not only their physico-chemical features (i.e., aa type and solvent accessibility), but also their energetic features due to their critical functional roles. Hence, solvent-accessible residues that share the highest evolutionary conservation of aa type, as well as structural and energetic features within the same family are predicted to bind DNA/RNA.

Searching for homologous proteins

The SAS database was used to search all sequences in the PDB that are homologous to the target protein. Homologous proteins sharing ≥90% sequence identity were deemed to be similar and grouped together using CD-HIT, and the longest protein was selected as representative of that group. If a homologous protein representative shared <30% pairwise sequence identity with the target protein sequence, it was excluded as proteins belonging to the same family generally exhibit pairwise residue identities ≥30%

Definition of solvent-accessible residues

An aa X is considered to be solvent accessible if the percent ratio of its relative solvent-accessible surface area is ≥15% computed by NACCESS.

Electrostatic ranking of each residue

Given al-residue DNA-binding protein structure, all Asp/Glu residues were deprotonated, while Arg/Lys residues were protonated; His residues were protonated or deprotonated depending on the availability of hydrogen-bond acceptors in the structure. Next, l mutant structures were generated by replacing Ala, Asn, Asp, Cys, Gly, Ser, Thr, or Val in the wild-type structure to Asp⁻ and the other residues to Glu⁻. The side chain replacements were carried out using SCWRL, followed by energy minimization with heavy constraints on all heavy atoms using AMBER to relieve any bad contacts. Based on the wild-type/mutant structures, the gas-phase (e = 1) electrostatic energy of the wild-type (E^elec_wt) or mutant (E^elec_mut) protein in the folded state relative to that in an extended reference state (E′ ^elec_wt or E′ ^elec_mut) was computed using AMBER with the all-hydrogen-atom AMBER force field. In this extended reference state, the residues do not interact with one another; hence, the electrostatic energy difference between the wild-type (E′ ^elec_wt) or mutant (E′ ^elec_mut) unfolded protein is equal to the difference between the electrostatic energies of the native residue at position i (E′ ^elec_i) and the corresponding mutant Asp⁻/Glu⁻ (E′ ^elec_D/E). The change in the gas-phase electrostatic energy ΔΔ^elec upon mutation of residue i to Asp⁻/Glu⁻ is given by:

ΔΔ^elec_i = (E^elec_mut,i− E^elec) − (E′ ^elec_D/E− E′ ^elec_i)

(1)

A negative ΔΔE^elec_i means that residue i is electrostatically stabilized upon mutation to an Asp^–/Glu^– and would likely bind to the electronegative RNA atoms. Hence, residues with the top 10% most negative <ΔΔE^elec>_i values were assigned Rank^ele = 10, residues with the next 10% most negative <ΔΔE^elec>_i values were assigned Rank^ele = 9, while the least likely RNA-binding residues were assigned Rank^ele= 1.

Evolutionary ranking of each residue

For a given protein, the conservation score of residue i, C_i, was obtained from the ConSurf-DB database. The C_i score is an integer number ranging from 9 for a slowly evolving, conserved residue to 1 for a rapidly evolving, highly variable residue.

Cleft assignment of each residue

Given the 3D protein structure, the 10 largest clefts were found using SURFNET, where cleft 1 is the biggest and cleft 10 is the smallest. If any atom of a residue was assigned as a constituent of the cleft by the SURFNET program, then this residue was regarded as a component of the cleft. When atoms of a residue were assigned to two different clefts, the residue was assigned to the larger of the two clefts. Residues not in any of these ten clefts were assigned to cleft 11.

Detecting DNA/RNA-binding residues

Given the structures of protein X and its homologs, DNA/RNA-binding residues were detected as follows: For each residue in protein X, the sum of Rank^ele and C was computed. Let Max denote the largest value of Rank^ele + C in protein X. Based on the structure of protein X, n residues that are solvent accessible with Rank^ele + C = Max were identified. If n is >3, we included m solvent-accessible residues in van der Waals contacts to these n residues with Rank^ele + C = Max–1. If n + m is still >3, then Rank^ele + C was successively decreased by one until n + m is ≥ 3. Max was then redefined as the value of Rank^ele + C for which n + m is ≥3. Let N denote n or n + m.

Next, the structure of protein X was aligned with that of each homologous protein representative using the MASPCI program to determine the correspondence between the N residues of protein X and the respective residues in the homologous proteins. N′ residues of the N residues of protein X were selected if their corresponding residues in any of the homologous proteins were also solvent accessible with Rank^ele + C ≥ Max. If N′ = 0, then the original N residues of protein X were chosen. The N′ or N residues were grouped according to their cleft number, and the cleft containing the most residues was predicted to be the DNA/RNA-binding site. If two or more clefts contained the same number of residues, then the residues comprising these clefts were predicted to bind DNA/RNA

DR_bind is hosted at The Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan.