Protein pK(a) calculation methods are developed partly to provide fast non-experimental estimates of the ionization constants of protein side chains. However, the most significant reason for developing such methods is that a good pK(a) calculation method is presumed to provide an accurate physical model of protein electrostatics, which can be applied in methods for drug design, protein design, and other structure-based energy calculation methods. We explore the validity of this presumption by simulating the development of a pK(a) calculation method using artificial experimental data derived from a human-defined physical reality. We examine the ability of an RMSD-guided development protocol to retrieve the correct (artificial) physical reality and find that a rugged optimization landscape and a huge parameter space prevent the identification of the correct physical reality. We examine the importance of the training set in developing pK(a) calculation methods and investigate the effect of experimental noise on our ability to identify the correct physical reality, and find that both effects have a significant and detrimental impact on the physical reality of the optimal model identified. Our findings are of relevance to all structure-based methods for protein energy calculations and simulation, and have large implications for all types of current pK(a) calculation methods. Our analysis furthermore suggests that careful and extensive validation on many types of experimental data can go some way in making current models more realistic.
University College Dublin ->