Analysis of docking algorithms by HPC methods generated in bioinformatics studies

High-performance computing (HPC) is an important domain of the computer science field. For more than 30 years, it has allowed finding solutions to problems and enhanced progress in many scientific areas such as bioinformatics and drug design. The binding of small molecule ligands to large protein targets is central to numerous biological processes. The accurate prediction of the binding modes between the ligand and protein (the docking problem) is of fundamental importance in modern structure-based drug design. The interactions between the receptor and ligand are quantum mechanical in nature, but due to the complexity of biological systems, quantum theory cannot be applied directly. Consequently, most methods used in docking and computational drug discovery are more empirical in nature and usually lack generality.


Introduction
Quantum mechanical phenomena, such as the formation of a covalent bond between the protein and the ligand upon binding during the transition state of the reaction, cannot be predicted and/or evaluated using these empirical methods.In the field of molecular modeling, docking is a method which predicts the preferred orientation of one molecule to a second when bound to each other to form a stable complex.Knowledge of the preferred orientation in turn may be used to predict the strength of association or binding affinity between two molecules using, for example, scoring functions.Docking is frequently used to predict the binding orientation of small molecule drug candidates to their protein targets in order to in turn predict the affinity and activity of the small molecule.Hence docking plays an important role in the rational design of drugs.Given the biological and pharmaceutical significance of molecular docking, considerable efforts have been directed towards improving the methods used to predict docking.Each docking program makes use of one or more specific search algorithms, which are the methods used to predict the possible conformations of a binary complex.The present benchmark is made from an existing test set (CCDC/Astex Validation Set) on typical HPC system.Selected examples were docked with GOLD software.

Molecular docking
Molecular docking is a computer simulation procedure to predict the conformation of a receptor-ligand complex, where the receptor is usually a protein or a nucleic acid molecule (DNA or RNA) and the ligand is either a small molecule or another protein.It can also be defined as a simulation process where a ligand position is estimated in a predicted or pre-defined binding site.Molecular docking research focusses on computationally simulating the molecular recognition process.It aims to achieve an optimized conformation for both the protein and ligand and relative orientation between protein and ligand such that the free energy of the overall system is minimized.Computational docking of a small molecule to a biological target involves efficient sampling of possible poses of the former in the specified binding pocket of the latter in order to identify the optimal binding geometry, as measured by a user-defined fitness or score function.X-ray crystallography and NMR spectroscopy continue to be the primary source of 3dimensional structural data for protein and nucleic acid targets.In favourable cases where proteins of unknown structure have high sequence homology to known structures, homology modelling can provide a viable alternative by generating a suitable starting point for "in silico" discovery of high affinity ligands.Potential energy of molecular field model is a function of a atomic position (x,y,z) normally in Cartesian space.The equation of the potential energy of the system of atoms in the molecular force field, commonly used in molecular modelling is presented below: The complexity of computational docking increases in the following order: (a) rigid body docking, where both the receptor and small molecule are treated as rigid.(b) flexible ligand docking, where the receptor is held rigid, but the ligand is treated as flexible; (c) flexible docking, where both receptor and ligand flexibility is considered.Rigid-body docking simulation has been employed for virtual-screening initiatives, this method has been used as the fastest way to perform an initial screening of a small molecule database.It has a relatively high accuracy, when compared against crystallographic structures.This accuracy is even higher if we introduced an analysis of the best results using an empirical scoring function for the best results obtained using rigid-body docking simulations.Usually, flexible docking or/and scoring functions have been used for applying a more specific refinement and lead optimization after initial rigid body docking procedure, since these methods demand for computational power and CPU time.Flexible docking methods can consider several possible conformations of ligand or receptor, as well as for both molecules at the same time, at a higher computational time cost.
The root-mean-square deviation (RMSD) is calculated between two sets of atomic coordinates, in this case, one for the crystallographic structure (xc, yc, zc) and another for the atomic coordinates obtained from the docking simulations (xd, yd, zd), the summation is taken over all N atoms being compared, the equation is as follows: Before ligands can be docked against a receptor, generally the binding site has to be identified first.This is done to limit the search space on the receptor surface and thus minimize the degrees of freedom that have to be searched.The active site is often known from crystal structures of ligand-bound receptors, but it can also be predicted.The largest cavity n a protein surface is frequently the active site, but this is not always the case and different active site prediction and analysis methods have been developed.The genetic algorithm (GA) adopted by GOLD algorithm requires as input the approximate size and location of the receptor active site and also the coordinates of protein and a ligand conformation.The active site may be defined by several techniques.GA is also implemented in the program DOCK, which is able to dock either whole ligand inside active site or a rigid fragment of the ligand."Lamarckian"GA (LGA) is also implemented in docking algorithms.The LGA switches between "genotypic space" and "phenotypic space."Mutation and crossover occur in genotypic space, while phenotypic space is determined by the energy function to be optimized.Energy minimization (local sampling) is performed after genotypic changes have been made to the population (global sampling) in phenotypic space, which is conceptually similar to MC minimization.After successful docking procedure most important parameters are: • Binding Energy -(BE), kcal/mol (3) BE = ImolE + IE + TE + UE

Methods
GOLD software uses a Genetic Algorithm (GA) for protein ligand docking which works as follows: 1.
Selecting a Protein 2.
Defining the Protein Binding Site 5.
Selecting a Fitness Function 7.
Starting the Docking Run 8.

Analysis of Output
The population of chromosomes is iteratively optimised.At each step, a point mutation may occur in a chromosome, or two chromosomes may mate to give a child.The selection of parent chromosomes is biased towards fitter members of the population, i.e. chromosomes corresponding to ligand dockings with good fitness scores.The GOLD validation test set is one of the most comprehensive of all of the docking methods reviewed, and achieved a 71% success rate based primarily on a visual inspection of the docked structures.66 of the complexes had an RMSD of 2.0 Å or less, while 71 had an RMSD of 3.0 Å or less.The omission of hydrophobic interactions and a solvent model may explain some of the docking failures which included highly flexible, hydrophobic ligands, and those complexes containing poorly resolved active sites.However, recent extensions to GOLD include the addition of hydrophobic fitting points that are used in the least squares fitting algorithm to generate the ligand orientation.
In this paper the benchmark test set is based on CCDC/Astex Validation Set developed by Cambridge Crystallographic Data Centre (CCDC) for docking software GOLD.There are 60 entries and protonation states have been set in all cases.

Results
After the execution of 60 docking procedure were obtained 10 conformation for each case.After that from those 10 confirmation is chosen most energy-favorable i.e. that have the smallest binding energy (BE) so this is the main criteria for present results in Table 1.All confirmation are saved and observed.The summary of these most energy-favorable conformation for each test case are presented in Table 1.
Table 1.The results from docking procedure for each test case.The ligand will have been docked a number of times so a set of files will have been written to the output directory, each containing the results of a separate docking attempt.The result of each docking attempt is written out as gold_soln_ligand_m1_n.mol2,where n is the number of the docking solution 1,2,3 ... and m1 is an index to the ligand (in this example, only one ligand was docked).Note that the file gold_soln_ligand_m1_1.mol2 is not the best GOLD prediction, it is just the solution found in the first docking attempt.However, as GOLD proceeds, symbolic links are created: ranked_ligand_m1_1.mol2will point to the current topranked solution, ranked_ligand_m1_2.mol2will point to the second-best solution, and so on.With the Hermes 3D view is possible to inspect the solutions predicted by GOLD.The docking solutions are given in their docked order with their corresponding fitness score.If required the solutions can be ordered and Fitness to determine which is the highest scoring.A simple test of the effectiveness of a docking program is to take a proteinligand complex from the PDB and extract the ligand.The docking program can then be used to predict the binding mode of the ligand and a comparison made with the crystallographically observed position.The crystallographically observed conformation of the docked ligand is stored in the ligand we extracted from the protein, that was subsequently re-loaded.Compare this with the solution predicted by GOLD.
After successful docking procedure for each testcase is observed by Fitness (scoring function), Best ranking time and Total run time .The results for 22 faster cases for docking from initial 60 experiments was present on Table 2.In the fields of molecular modelling, scoring functions are fast approximate mathematical methods used to predict the strength of the non-covalent interaction (also referred to as binding affinity) between two molecules after they have been docked.Most commonly one of the molecules is a small organic compound such as a drug and the second is the drug's biological target such as a protein receptor.

Captions/numbering 5 Discussion
Comparisons suggest that the best algorithm for docking is probably a hybrid of various types of algorithm encompassing novel search and scoring strategies.The most useful docking method will not only perform well, but will be easy to use and parametrise, and sufficiently adaptable such that different functionality may be selected, depending on the number of structures to be docked, the available computational resources, and the complexity of the problem.If the parameters cannot be generated quickly then although the algorithm may be computationally efficient, from a practical point of view it is limited.Conversely, a rapid scoring function may not necessarily be able to model some specific interactions.Moreover, although current docking methods show great promise, fast and accurate discrimination between different ligands based on binding affinity, once the binding mode is generated, is still a significant problem.