A new off-lattice HP model with side-chains for protein folding problem

A HP like nonlinear programming model and a “brutforce” algorithm is proposed. The model takes into account highest degree of complexity – different size of radiuses of side-chains, mentioned below as radicals, and two stages of the algorithm, including random structure initially and subsequent purposeful folding process according to the physical forces and energies. The very preliminary computational runs favourably demonstrate the adequateness of the model and the efficiency of the algorithm.


Introduction
3D structure of proteins is the major factor that determines their biological activity.The synthesis of new proteins and the crystallographic analysis of their 3D structure is very slow and very expensive process.If we can predict the 3D structure of many proteins, than only proteins with expected properties have to be synthesized.That will increase the number of known structures in the databases for proteins, and they can be used for drug design.The prediction of the 3D structure of proteins, if we know only the primary structure -the amino-acid sequence, is a protein folding problem.The reason for this process of folding in water environment is the interaction between water molecules and between amino-acids and water molecules.As water molecule has higher polarity than amino-acids, there is a minimum of energy when the protein is folded, not to spoil water to water interconnections.The way of folding is determined by the polarity or the hydrophobicity of different aminoacids, so the 3D structure with minimum energy is the real case.[1,2,3] There is less energy when more hydrophobic (H) amino-acids (the hydrophobic type of amino acid depends on the middle nucleotide of the codon [4]) are in contact in the core of the folded 3D structure and more polar (P) amino-acids are in contact with water.As we know the amino-acid sequence and the hydrophobicity of every amino-acid, we can predict the 3D structure -this method is called HP folding [5].The closest to our model type of HP model is the offlattice one, following the approach know as HP folding.

Structure and contact defining constraints
Let x i , y i , z i ∈ R be the unknown coordinates of the alpha carbon atoms, xr i , yr i , zr i ∈ R be the coordinates of the centers of the radicals and r i be the normalized radiuses of the radicals, for the i th ( j th ) amino acid in the peptide chain, where n is the number of amino acids: The constraints below are on corresponding euclidean distances between i th and j th alpha carbon atoms and centers of radicals: The goal of the willpower folding is to maximize the contacts between the radicals of hydrophobic amino ITM Web of Conferences 16, 02007 (2018) https://doi.org/10.1051/itmconf/20181602007AMCSE 2017 acids and the contacts between the alpha carbon atoms of all amino acids and the radicals with electrical charge not in contact -to find the maximum of the following objective function: , where h i is the hydrophobicity value of the amino acid, wtw is a parameter for the influence of hydrogen bonds.

Model description
The main purpose of the developed algorithm is to create a structure with low potential energy, starting from randomly folded form, instead of looking for the best structure among wholly randomly generated forms.Each amino acid is represented as the position of the alpha carbon atom and the position of the center of the radical with three-dimensional coordinates.The first stage is to generate a three-dimensional shape by randomly turning the peptide chain in 90 degrees and a distance 1 between the alpha carbon atoms in the peptide chain and between each alpha carbon and the radical center of the same amino-acid.For prevention of making rather spread or extreamly tight, there is used variable constrain of spreading [6].The second stage, which is called "Willpower folding" [7], is to purposefully modify the 3D in order to minimize the energy while using now non-integer coordinates.

Willpower folding
The steps of the algorithm for maximizing (3) subject to (1) and ( 2) are given below: 1. Locate the conditional center as the arithmetic mean of the three directions: Displace the coordinates of the radicals in proportion to their hydrophobicity value h i , if the h i > 0 direction is to the center of the molecule, and if h i < 0 is opposite: Displace the coordinates of the alpha carbon atoms to the center of the molecule:  = xr i -0.1(xr j -x i )/((xr j -r i ) 2 + (yr j -y i ) 2 + (zr j -z i ) 2 ); yr i_new = yr i -0.1(yr j -y i )/((xr j -x i ) 2 + (yr j -y i ) 2 + (zr j -z i ) 2 ); zr i_new = zr i -0.1(zr j -z i )/((xr j -r i ) 2 + (yr j -r i ) 2 + (zr j -z i ) 2 );

Thermo effect
What we call "Thermo effect" is to make a small random move in every step of Willpower folding, which corresponds to the real environment.This prevents building equal structures if the initial random fold is the same and gives more chance to find better structure.

Computational Results
First, using our definition of contact (2), we find the contacts in the real structure of protein 1UUB, using the coordinates of alpha carbons in PDB file of the protein data bank (Figure 1).

Fig. 1. Real structure of protein 1UUB
There are 125 contacts according to the constraints (2) above.
The test run of the program, realizing the algorithm on 1UUB obtains the following results: among only 100000 randomly generated structures, the value of the evaluation function is 1279.2 and 206 contacts -59 matches with the real one (Figure 2)..The time needed was 70 min on PC, Intel i5, 2.26GHz.Remark: for the source code in C++, the list of values coordinates and the list of contacts' couples, mail to itodorin@gmail.com.The other experiment we done, is to find out how much better structure could be built, classifying fold not by evaluation function, but by ratio contacts matching ratio.In this way we obtain the following structure with 62 contacts matches out of 120 (Figure 3).

Conclusions
This approach for protein folding prediction, using "Willpower folding" with "Thermo effect", appear to be faster than models for finding best structure among randomly built structures.Therefore it is possible rather more structures of proteins to be processed for the same time.Even more, the model is new (age under year) and might be further developed in direction of granularityplacing every atom, and the evaluation function could take into account the real impact of the distribution of electronic density, such degree of granularity is hard to be execute considering computational time for models for finding best structure among randomly built structures.