Die gesuchte Webseite ist umgezogen, bitte klicken Sie hier, um zur aktuellen URL der gewünschten Webseite zu gelangen.

 

 

Hollas, Boris: On the Redundancy of Topological Indices / Boris Hollas. With a Preface by Prof. Dr. Ivan Gutman. – Taunusstein : Driesen, 2005 (Driesen Edition Wissenschaft). – 102 S. ; 19 cm. Zugl.: Ulm, Universität, Dissertation, 2005. ISBN 3-936328-38-2 Softcover, 18,00 Euro.

Bestellen in unserer Depotbuchhandlung

Bestellen bei Amazon-Marketplace

Erwerbungsvorschlag an eine Bibliothek in Ihrer Nähe

Topological indices are molecular descriptors based on the graph of a molecule. Numerous topological indices, including the popular connectivity index or the Zagreb indices, have been proposed, many of these were found to be strongly correlated. This is a serious problem for QSAR/QSPR studies as data processing may give meaningless results or fail completely.

Using random graphs as a mathematical model for molecular graphs, Hollas shows that certain topological indices are necessarily correlated. This correlation can be arbitrarily close to a perfect linear relationship, in which the information provided by either index is completely redundant. The underlying reasons are identified and simple transformations of the considered topological indices are proposed, reducing or eliminating unwanted correlations. Hollas also shows that the variance of certain topological indices depends on the number of atoms and how a uniform variance throughtout the data set can be obtained. Experimental results with chemical graphs support these findings, which are of practical importance for QSAR/ QSPR studies and other applications of topological indices.

The author, Boris Hollas, studied mathematics and computer science at Gießen University. He works as a research assistant at the departement of theoretical computer science at Ulm University.

Preface

The considerations outlined in this book are connected with the following serious problem in contemporary (and future) chemistry. Until now a total of 2 · 107 chemical compounds have been either isolated from naturally occurring materials or synthesized in laboratory. This is negligibly small compared to the number of possible chemical compounds that are (in principle, at least) capable to exist. The author speaks in his book of 10100 possible chemical compounds. In reality this number is much much greater, and – from today’s point of view – may be considered as in.nite. For instance, there are 410000 possible DNA-chains consisting of (only!!!) 10 000 base-pairs. There are > 20100 peptides consisting of (only!!!) 100 amino acids. According to a recently published theoretical enumeration, there are 7 ·1021 possible benzenoid hydrocarbons with up to 35 six-membered rings. Of these only about 1000 have been actually prepared.

This enormous disbalance between the number of existing and possible chemical species has a consequence that the true problem of chemistry is no more to synthesize a given compound, but to decide which compound to synthesize. Today the most evident deadlock of this kind is seen in the pharmaceutical industry. In order to choose the potentially interesting compounds from combinatorial libraries consisting of millions or billions of imagined (but not yet existent) compounds, one must employ some very fast and computationally inexpensive method. One such approach is based on the so-called topological indices.

Nowadays several hundreds of topological indices are being used in the socalled QSPR (quantitative structure–property relations) and QSAR (quantitative structure–activity relations) studies. The idea is the following: one tries to construct a simple (usually, a linear) mathematical model that reproduces the experimentally available data, expecting that the same model will correctly (in a statistical sense) predict the same data for not-yet-synthesized compounds. Some of the parameters employed in QSPR/QSAR models are topological indices.

A long known vexation of the QSPR/QSAR approach is the fact that many of the currently used topological indices are mutually correlated, and thus useless from a practical point of view.

In a series of published papers, the results of which constitute the present book, Boris Hollas has established general results showing the following:

·  Certain pairs of topological indices, when applied to reasonably constructed sets of random (molecular) graphs, are linearly correlated, with correlation coe.cients approaching unity.

·  It is clari.ed why (because of which property) these topological indices are correlated.

·  Based on the above, simple transformations of these topological indices can be envisaged, after which their mutual correlation vanishes.

In our opinion, the above results are of fundamental value for the theory of topological indices and for their applications in chemistry. One could even say, that before Hollas arrived at his results, the chemists who used topological indices in QSPR/QSAR studies acted in a blind manner. After the chemical community will realize the value of the Hollas’ theorems (which, for sure, will require a few more years), the outlook of QSPR/QSAR e.orts will become quite di.erent. The present book will certainly help achieving this goal.

Until now topological indices were designed based on “chemical intuition” which often was not far from arbitrariness. From now on, one would need to check is the newly proposed indices meet the ”Hollas criteria”.

Those readers of this book who want to skip its more di.cult mathematics– based parts are recommended to immediately go to Chapter 7. There the results of a few numerical experiments are reported. These convincingly show how the theory developed by Hollas looks in practical, real–world, settings. The results of Chapter 7 might be of particular value for chemists, that is for scholars not mastering the sophisticated details of probability theory.

 

Kragujevac, March 2005

Prof. Dr. Ivan Gutman

Acknowledgment

This Ph. D. thesis was defended on February 23, 2005 at Ulm University, Faculty of Computer Science.

I thank Prof. Uwe Sch¨oning (Ulm), Prof. Rainer Schuler (Ulm) and Prof. Ivan Gutman (Kragujevac) for their reports on this work. Last but not least, I thank Ivan Gutman, who showed so much interest in this work, for writing the preface and for his support.

 

Ulm, March 2005

Boris Hollas

Contents:

Preface

Overview

List of Figures

1 Basics from Probability Theory

1.1 Convergence Theorems and Inequalities for Expectations

1.2 Measures of Dependence

1.3 Conditional Expectation

1.4 Convergence in Distribution

1.5 Random Graphs

1.6 Generating Non-Uniform Random Numbers

2 Methods of Data Analysis

2.1 Linear Regression

2.2 Principal Component Analysis

2.3 Classification and Regression Trees

2.4 Self-OrganizingMaps

2.5 Learning Vector Quantization

3 Important Concepts & Issues in Chemoinformatics

3.1 Introduction

3.2 Molecular Descriptors & Topological Indices

3.3 Chemical Similarity

3.4 QSAR and QSPR

3.5 Correlated Descriptors and Descriptors with Non-Uniform Variance

4 Independent Vertex Properties

4.1 Introduction

4.2 TheModel

4.3 Correlations for the GeneralModel

4.4 Correlations for Model Gn,pn

4.5 The Variance

4.6 Experiment with Chemical Structures

4.7 Summary

5 Topological Indices that Depend on the Degree of a Vertex

5.1 Introduction

5.2 TheModel

5.3 Expectations for n Fixed

5.4 Convergence of δ(i,j) f,n

5.5 Expectations for Poisson-Distributed N

5.6 Covariance with I1

5.7 Zero-Order Indices

5.8 The Variance

5.9 Summary

6 Random Trees and Asymptotic Independence

6.1 Characteristic Functions

6.2 The Random TreeModel

6.3 Asymptotic Independence

6.4 Summary

7 Experimental Results

7.1 Introduction

7.2 Correlation

7.3 Variance

7.4 Summary

Discussion

Zusammenfassung

Bibliography

Index

Notations

Bibliograpy

[1]   B. Hollas. Correlation properties of the autocorrelation descriptor for molecules. MATCH Commun. Math. Comput. Chem., 45:27–33, 2002.

[2]   B. Hollas. An analysis of the autocorrelation descriptor for molecules. J. Math. Chem., 33(2):91–101, 2003.

[3]   B. Hollas. Correlations in distance-based descriptors. MATCH Commun. Math. Comput. Chem., 47:79–86, 2003.

[4]   B. Hollas. An asymptotically independent topological index on random trees. J. Math. Chem., submitted.

[5]   B. Hollas. The covariance of topological indices that depend on the degree of a vertex. MATCH Commun. Math. Comput. Chem., 54(1):177–187, 2005.

[6]   B. Hollas. An analysis of the redundancy of graph invariants used in chemoinformatics. Discr. Appl. Math., submitted.

[7]   B. Hollas. On the variance of topological indices that depend on the degree of a vertex. MATCH Commun. Math. Comput. Chem., 54(2):0–0, 2005.

[8]   B. Hollas, I. Gutman, and N. Trinajsti´c. On reducing correlations between topological indices. Chem. Phys. Lett., submitted.

[9]   H. Bauer. Probability Theory. Gruyter, 1996.

[10] W. Feller. An Introduction to Probability Theory and its Applications. Wiley, 1970.

[11] P. Billingsley. Probability and Measure. Wiley, 1995.

[12] G. Grimmet and D. Stirzaker. Probability and Random Processes. Oxford, 2001.

[13] R. Roman. Coding and Information Theory. Springer, 1992.

[14] B. Bollob´as. Random Graphs. Academic Press, 1984.

[15] B. Bollob´as. Modern Graph Theory. Springer, 1998.

[16] E. Palmer. Graphical evolution. Wiley, 1985.

[17] I. Gutman, T. Soldatovi´c, and D. Vidovi´c. The energy of a molecular graph and its size dependence. A Monte Carlo approach. Chem. Phys. Letters, 297:428–432, 1998.

[18] I. Gutman and D. Vidovi´c. Quest for molecular graphs with maximal energy: A computer experiment. J. Chem. Inf. Comput. Sci., 41:1002–1005, 2001.

[19] D. Knuth. The Art of Computer Programming. Addison Wesley, 1998.

[20] D. Livingstone. Data Analysis for Chemists. Oxford University Press, 1995.

[21] J. Gasteiger and J. Zupan. Neural Networks for Chemists. Wiley-VCH, 1999.

[22] S. F. Arnold. Mathematical Statistics. Prentice Hall, 1990.

[23] M. Kendall, J. Ord, and A. Stuart. The Advanced Theory of Statistics. Charles Gri.n & Company, 1979.

[24] W. Mendenhall and T. Sincich. Statistics for the Engineering and Computer Sciences. Collier Macmillan Publishers, 1989.

[25] A. A.. and S. Azen. Statistical Analysis - A Computer Oriented Approach. Academic Press, 1979.

[26] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classi.cation and Regression Trees. Wadsworth, 1984.

[27] T. Kohonen. The self-organizing map. Proc. IEEE, 78:1464–1480, 1990.

[28] L. Fausett. Fundamentals of Neural Networks. Prentice Hall, 1994.

[29] F. Farata and R. Walker. A study of the application of Kohonen-type neural networks to the travelling salesman problem. Biol. Cyb., 64:463–468, 1991.

[30] M. Bena¨ým, J. C. Fort, and G. Pages. Convergence of the one-dimensional Kohonen algorithm. Adv. in Appl. Probab., 30:850–869, 1998.

[31] E. Erwin, K. Obermayer, and K. Shulten. Self-organizing maps : ordering, convergence properties and energy functions. Biol. Cyb., 67:47–55, 1992.

[32] R. Horowitz and L. Alvarez. Convergence properties of self-organizing neural networks. In Proceedings of the 1995 American Control Conference (IEEE Cat. No. 95CH35736), volume 2, pages 1339–44, Evanston, IL, USA, 1995. American Autom. Control Council.

[33] M. Cottrell, J. C. Fort, and G. Pag`es. Two or three things that we know about the Kohonen algorithm. In M. Verleysen, editor, Proc. ESANN’94, European Symp. on Arti.cial Neural Networks, pages 235–244, Brussels, Belgium, 1994. D facto conference services.

[34] T. Kohonen. Improved versions of learning vector quantization. In Proc. Int. Joint Conf. Neural Networks, pages 545–550, 1990.

[35] E. Kosmatopoulos and M. Christodoulou. Convergence properties of a class of learning vector quantization algorithms. IEEE Transactions on Image Processing, 5:361–368, 1996.

[36] R. Hertzberg and A. Pope. High-throughput-screening: New technology for the 21st century. Curr. Opinion Chem. Biol., 4:445–451, 2000.

[37] W. Walters, M. Stahl, and M. Murcko. High-throughput virtual chemistry. In P. von Ragu´e Schleyer, editor, Encyclopedia of Computational Chemistry, Vol. 2, pages 1225–1237. Wiley, 1998.

[38] D. Boyd. Drug design. In P. von Ragu´e Schleyer, editor, Encyclopedia of Computational Chemistry, Vol. 1, pages 795–804. Wiley, 1998.

[39] P. Jurs. Quantitative structure-property relationships (QSPR). In P. von Ragu´e Schleyer, editor, Encyclopedia of Computational Chemistry, Vol. 4, pages 2320–2330. Wiley, 1998.

[40] R. Todeschini and V. Consonni. Handbook of Molecular Descriptors. Wiley, 2000.

[41] R. Brown and Y. Martin. Use of structure-activity data to compare structurebased clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci., 36:572–584, 1996.

[42] R. Brown and Y. Martin. The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding. J. Chem. Inf. Comput. Sci., 37:1–9, 1997.

[43] N. Trinajsti´c. Chemical Graph Theory. CRC Press, Boca Raton, 1992.

[44] D. Bonchev and D. Rouvray, editors. Chemical Graph Theory. Abacus Press/Gordon & Breach, 1991.

[45] J. Ko¡ca, M. Kratochv´ýl, V. Kvasni¡ca, L. Matyska, and J. Posp´ýchal. Synthon Model of Organic Chemistry and Synthesis Design. Springer, 1989.

[46] M. Randi´c. Topological indices. In P. von Ragu´e Schleyer, editor, Encyclopedia of Computational Chemistry, Vol. 5, pages 3018–3022. Wiley, 1998.

[47] S. Basak, B. Gute, and G. Grunwald. Use of topostructural, topochemical, and geometric parameters in the prediction of vapor pressure. J. Chem. Inf. Comput. Sci., 39:255–260, 1999.

[48] I. Gutman. A formula for the Wiener number of trees and its extension to graphs containing cycles. Graph Theory Notes, 27:9–15, 1994.

[49] A. Dobrynin and I. Gutman. Solving a problem connected with distances in graphs. Graph Theory Notes, 28:21–23, 1995.

[50] I. Gutman, S. Klavzar, and A. Rajapakse. The Szeged and the Wiener index of graphs. Appl. Math. Lett., 9:45–49, 1996.

[51] I. Gutman and N. Trinajsti´c. Graph theory and molecular orbitals. total π- electron energy of alternant hydrocarbons. Chem. Phys. Lett., 17:535–538, 1972.

[52] I. Gutman, B. Ru¡s¡ci´c, N. Trinajsti´c, and C. Wilcox. Graph theory and molecular orbitals. XII. acyclic polyenes. J. Chem. Phys., 62:3399–3405, 1975.

[53] J. Devillers and A. Balaban, editors. Topological Indices and Related Descriptors in QSAR and QSPR. Gordon & Breach, Amsterdam, 1999.

[54] M. Karelson. Molecular Descriptors in QSAR/QSPR. Wiley, New York, 2000.

[55] A. Balaban, editor. From Chemical Topology to Three-Dimensional Geometry. Plenum, New York, 1997.

[56] S. Nikoli´c, N. Trinajsti´c, I. Toli´c, G. R¨ucker, and C. R¨ucker. On molecular complexity indices. In D. Bonchev and D. Rouvray, editors, Complexity - Introduction and Fundamentals, pages 29–89. CRC Press, 2003.

[57] M. Randi´c. On characterization of molecular branching. J. Am. Chem. Soc., 97:6609–6615, 1975.

[58] M. Randi´c, P. Hansen, and P. Jurs. Search for useful graph theoretical invariants of molecular structure. J. Chem. Inf. Comput. Sci., 28:60–68, 1988.

[59] M. Diudea, editor. QSAR/QSPR Studies by Molecular Descriptors. Nova, New York, 2001.

[60] A. Katritzky and E. Gordeeva. Traditional topological indices vs. electronic, geometrical, and combined molecular descriptors in QSAR/QSPR research. J. Chem. Inf. Comput. Sci., 33:835–857, 1993.

[61] D. Morales and O. Araujo. On the search for the best correlation between graph theoretical invariants and physicochemical properties. J. Math. Chem., 13:95–106, 1993.

[62] L. Clark and J. Moon. On the general Randi´c index for certain families of trees. Ars Comb., 54, 1999.

[63] D. Vuki¡cevi´c and N. Trinajsti´c. Modi.ed Zagreb M2 index - comparison with the Randi´c connectivity index for benzenoid systems. Croatica Chemica Acta, 76(2):183–187, 2003.

[64] S. Nikoli´c, G. Kova¡cevi´c, A. Mili¡cevi´c, and N. Trinajsti´c. The Zagreb indices 30 years after. Croatica Chemica Acta, 76(2):113–124, 2003.

[65] I. Gutman and M. Lepovi´c. Choosing the exponent in the de.nition of the connectivity index. J. Serb. Chem. Soc., 66(9):605–611, 2001.

[66] L. Kier and L. Hall. Molecular Connectivity in Chemistry and Drug Research. Academic Press, 1976.

[67] L. Kier and L. Hall. Molecular Connectivity in Structure Activity Analysis. Wiley, 1986.

[68] G. Moreau and P. Broto. Autocorrelation of a topological structure: A new molecular descriptor. Nouv. J. Chim., 4:359–360, 1980.

[69] J. Devillers, D. Domine, and W. Karcher. Estimating n-octanol/water partition coe.cients from the autocorrelation method. SAR QSAR Environ. Res., 33:301–306, 1995.

[70] J. Devillers, D. Domine, C. Guillon, S. Bintein, and W. Karcher. Prediction of partition coe.cients using autocorrelation descriptors. SAR QSAR Environ. Res., 7:151–172, 1997.

[71] J. Devillers and D. Domine. Comparison of reliability of log p values calculated from a group contribution approach and from the autocorrelation method. SAR QSAR Environ. Res., 7:195–232, 1997.

[72] M. Wagener, J. Sadowski, and J. Gasteiger. Autocorrelation of molecular properties for modelling corticosteroid binding globulin and cytosolic ah receptor activity by neural networks. J. Am. Chem. Soc., 117:7769–7775, 1995.

[73] H. Bauknecht, A. Zell, H. Bayer, P. Levi, M. Wagener, J. Sadowsky, and J. Gasteiger. Locating biologically active compounds in medium-sized heterogeneous datasets by topological autocorrelation vectors: Dopamine and benzodiazepine agonists. J. Chem. Inf. Comput. Sci., 36:1205–1213, 1996.

[74] J. Devillers. Autocorrelation descriptors for modeling (eco)toxicological endpoints. In A. Balaban and J. Devillers, editors, Topological Indices and Related Descriptors in QSAR and QSPR, pages 595–612. Gordon & Breach, Amsterdam, 1999.

[75] J. Gasteiger and T. Engel, editors. Chemoinformatics. Wiley-VCH, 2003.

[76] M. Johnson and G. Maggiora, editors. Concepts and Applications of Molecular Similarity. Wiley, 1987.

[77] A. Leach and V. Gillet. An Introduction to Chemoinformatics. Kluwer Academic Publishers, 2003.

[78] K. Kova¡cevi´c, D. Plav¡si´c, N. Trinajsti´c, and D. Horvat. On the intercorrelation of topological indices. Stud. Phys. and Theoret. Chem., 63:213–224, 1989.

[79] N. Trinajsti´c, S. Nikoli´c, S. Basak, and I. Lukovits. Distance indices and their hyper-counterparts: Intercorrelation and use in the structure-property modeling. SAR QSAR Environ Res., 12(1-2):31–54, 2001.

[80] S. Basak, B. Gute, and A. Balaban. Interrelationship of major topological indices evidenced by clustering. Croatica Chemica Acta, 77:331–344, 2004.

[81] S. Basak, V.Magnuson, G. Niemi, R. Regal, and G. Veith. Topological indices: Their nature, mutual relatedness, and applications. Math. Model., 8:300–305, 1987.

[82] I. Motoc, A. Balaban, O. Mekenyan, and D. Bonchev. Topological indices: Inter-relations and composition. MATCH Commun. Math. Comput. Chem., 13:369–404, 1982.

[83] A. Madansky. Prescriptions for Working Statisticians. Springer, 1988.

[84] M. Randi´c. Resolution of ambiguities in structure-property studies by use of orthogonal descriptors. J. Chem. Inf. Comput. Sci., 31:311–320, 1991.

[85] National Cancer Institute. Connection tables for 127k structures. ftp://helix.nih.gov/ncidata/2D/nciopen.mol.Z.

[86] M. Randi´c. Generalized molecular descriptors. J. Math. Chem., 7:155–168, 1991.    

 

Assoziierte Webseite: www.driesen-antiquariat.de