![]() |
|
Die gesuchte Webseite ist umgezogen, bitte klicken Sie hier,
um zur aktuellen URL der gewünschten Webseite zu gelangen. |
Hollas,
Boris: On the Redundancy of Topological Indices / Boris
Hollas. With a
Preface by Prof. Dr. Ivan Gutman. – Taunusstein : Driesen, 2005 (Driesen
Edition Wissenschaft). – 102 S. ; 19 cm. Zugl.: Ulm, Universität, Dissertation,
2005. ISBN 3-936328-38-2 Softcover, 18,00 Euro.
Bestellen in unserer
Depotbuchhandlung
Bestellen bei
Amazon-Marketplace
Topological
indices are molecular descriptors based on the graph of a molecule. Numerous
topological indices, including the popular connectivity index or the Zagreb
indices, have been proposed, many of these were found to be strongly
correlated. This is a serious problem for QSAR/QSPR studies as data processing
may give meaningless results or fail completely.
Using random graphs as a mathematical model for
molecular graphs, Hollas shows that certain topological indices are necessarily
correlated. This correlation can be arbitrarily close to a perfect linear
relationship, in which the information provided by either index is completely
redundant. The underlying reasons are identified and simple transformations of
the considered topological indices are proposed, reducing or eliminating
unwanted correlations. Hollas also shows that the variance of certain
topological indices depends on the number of atoms and how a uniform variance
throughtout the data set can be obtained. Experimental results with chemical
graphs support these findings, which are of practical importance for QSAR/ QSPR
studies and other applications of topological indices.
The author, Boris Hollas, studied
mathematics and computer science at Gießen University. He works as a research
assistant at the departement of theoretical computer science at Ulm University.
Preface
The
considerations outlined in this book are connected with the following serious
problem in contemporary (and future) chemistry. Until now a total of 2 · 107 chemical compounds have been either isolated
from naturally occurring materials or synthesized in laboratory. This is
negligibly small compared to the number of possible chemical compounds that are
(in principle, at least) capable to exist. The author speaks in his book of 10100 possible chemical compounds. In reality
this number is much much greater, and – from today’s point of view – may be
considered as in.nite. For instance, there are 410000 possible DNA-chains consisting of
(only!!!) 10 000 base-pairs. There are > 20100 peptides consisting of (only!!!) 100
amino acids. According to a recently published theoretical enumeration, there
are 7 ·1021 possible benzenoid hydrocarbons with up
to 35 six-membered rings. Of these only about 1000 have been actually prepared.
This enormous disbalance between the
number of existing and possible chemical species has a consequence that the
true problem of chemistry is no more to synthesize a given compound, but to
decide which compound to synthesize. Today the most evident deadlock of this
kind is seen in the pharmaceutical industry. In order to choose the potentially
interesting compounds from combinatorial libraries consisting of millions or
billions of imagined (but not yet existent) compounds, one must employ some
very fast and computationally inexpensive method. One such approach is based on
the so-called topological indices.
Nowadays several hundreds of topological
indices are being used in the socalled QSPR (quantitative structure–property relations)
and QSAR (quantitative structure–activity relations) studies. The idea is the
following: one tries to construct a simple (usually, a linear) mathematical
model that reproduces the experimentally available data, expecting that the
same model will correctly (in a statistical sense) predict the same data for
not-yet-synthesized compounds. Some of the parameters employed in QSPR/QSAR
models are topological indices.
A long known vexation of the QSPR/QSAR
approach is the fact that many of the currently used topological indices are
mutually correlated, and thus useless from a practical point of view.
In a series of published papers, the
results of which constitute the present book, Boris Hollas has established
general results showing the following:
·
Certain
pairs of topological indices, when applied to reasonably constructed sets of
random (molecular) graphs, are linearly correlated, with correlation coe.cients
approaching unity.
·
It is
clari.ed why (because of which property) these topological indices are correlated.
·
Based
on the above, simple transformations of these topological indices can be
envisaged, after which their mutual correlation vanishes.
In our opinion, the above results are of fundamental value for the
theory of topological indices and for their applications in chemistry. One
could even say, that before Hollas arrived at his results, the chemists who
used topological indices in QSPR/QSAR studies acted in a blind manner. After
the chemical community will realize the value of the Hollas’ theorems (which,
for sure, will require a few more years), the outlook of QSPR/QSAR e.orts will
become quite di.erent. The present book will certainly help achieving this
goal.
Until now topological indices were
designed based on “chemical intuition” which often was not far from
arbitrariness. From now on, one would need to check is the newly proposed
indices meet the ”Hollas criteria”.
Those readers of this book who want to
skip its more di.cult mathematics– based parts are recommended to immediately
go to Chapter 7. There the results of a few numerical experiments are reported.
These convincingly show how the theory developed by Hollas looks in practical,
real–world, settings. The results of Chapter 7 might be of particular value for
chemists, that is for scholars not mastering the sophisticated details of
probability theory.
Kragujevac, March 2005
Prof. Dr. Ivan Gutman
Acknowledgment
This Ph. D.
thesis was defended on February 23, 2005 at Ulm University, Faculty of Computer
Science.
I thank Prof. Uwe
Sch¨oning (Ulm), Prof. Rainer Schuler (Ulm) and Prof. Ivan Gutman (Kragujevac)
for their reports on this work. Last but not least, I thank Ivan Gutman, who
showed so much interest in this work, for writing the preface and for his
support.
Ulm, March 2005
Boris Hollas
Contents:
Overview
List of
Figures
1
Basics from Probability Theory
1.1 Convergence Theorems and Inequalities for
Expectations
1.2 Measures of Dependence
1.3 Conditional Expectation
1.4 Convergence in Distribution
1.5 Random Graphs
1.6 Generating Non-Uniform Random Numbers
2
Methods of Data Analysis
2.1 Linear Regression
2.2 Principal Component Analysis
2.3 Classification and Regression Trees
2.4 Self-OrganizingMaps
2.5 Learning Vector Quantization
3
Important Concepts & Issues in Chemoinformatics
3.1 Introduction
3.2 Molecular Descriptors & Topological Indices
3.3 Chemical Similarity
3.4 QSAR and QSPR
3.5 Correlated Descriptors and Descriptors with
Non-Uniform Variance
4
Independent Vertex Properties
4.1 Introduction
4.2 TheModel
4.3 Correlations for the GeneralModel
4.4 Correlations for Model Gn,pn
4.5 The Variance
4.6 Experiment with Chemical Structures
4.7 Summary
5
Topological Indices that Depend on the Degree of a Vertex
5.1 Introduction
5.2 TheModel
5.3 Expectations for n Fixed
5.4 Convergence of δ(i,j)
f,n
5.5 Expectations for Poisson-Distributed N
5.6
Covariance with I1
5.7 Zero-Order Indices
5.8 The Variance
5.9 Summary
6
Random Trees and Asymptotic Independence
6.1 Characteristic Functions
6.2 The Random TreeModel
6.3 Asymptotic Independence
6.4 Summary
7
Experimental Results
7.1 Introduction
7.2 Correlation
7.3 Variance
7.4 Summary
Discussion
Zusammenfassung
Bibliography
Index
Notations
Bibliograpy
[1]
B. Hollas. Correlation properties of the autocorrelation descriptor for
molecules. MATCH Commun. Math. Comput. Chem., 45:27–33, 2002.
[2]
B. Hollas. An analysis of the autocorrelation descriptor for molecules. J.
Math. Chem., 33(2):91–101, 2003.
[3]
B. Hollas. Correlations in distance-based descriptors. MATCH Commun. Math.
Comput. Chem., 47:79–86, 2003.
[4]
B. Hollas. An asymptotically independent topological index on random trees. J.
Math. Chem., submitted.
[5]
B. Hollas. The covariance of topological indices that depend on the degree of a
vertex. MATCH Commun. Math. Comput. Chem., 54(1):177–187, 2005.
[6]
B. Hollas. An analysis of the redundancy of graph invariants used in
chemoinformatics. Discr. Appl. Math., submitted.
[7]
B. Hollas. On the variance of topological indices that depend on the degree of
a vertex. MATCH Commun. Math. Comput. Chem., 54(2):0–0, 2005.
[8]
B. Hollas, I. Gutman, and N. Trinajsti´c. On reducing correlations between
topological indices. Chem. Phys. Lett., submitted.
[9]
H. Bauer. Probability Theory. Gruyter, 1996.
[10]
W. Feller. An Introduction to Probability Theory and its Applications. Wiley,
1970.
[11]
P. Billingsley. Probability and Measure. Wiley, 1995.
[12]
G. Grimmet and D. Stirzaker. Probability and Random Processes. Oxford, 2001.
[13] R. Roman. Coding and Information Theory. Springer,
1992.
[14] B.
Bollob´as. Random Graphs. Academic Press, 1984.
[15]
B. Bollob´as. Modern Graph Theory. Springer, 1998.
[16] E. Palmer. Graphical evolution. Wiley, 1985.
[17]
I. Gutman, T. Soldatovi´c, and D. Vidovi´c. The energy of a molecular graph and
its size dependence. A Monte Carlo
approach. Chem. Phys. Letters, 297:428–432, 1998.
[18]
I. Gutman and D. Vidovi´c. Quest for molecular graphs with maximal energy: A
computer experiment. J. Chem. Inf. Comput. Sci., 41:1002–1005, 2001.
[19]
D. Knuth. The Art of Computer Programming. Addison Wesley, 1998.
[20]
D. Livingstone. Data Analysis for Chemists. Oxford University Press, 1995.
[21] J. Gasteiger and J. Zupan. Neural Networks for Chemists. Wiley-VCH,
1999.
[22]
S. F. Arnold. Mathematical Statistics. Prentice Hall, 1990.
[23]
M. Kendall, J. Ord, and A. Stuart. The Advanced Theory of Statistics. Charles
Gri.n & Company, 1979.
[24]
W. Mendenhall and T. Sincich. Statistics for the Engineering and Computer
Sciences. Collier Macmillan Publishers, 1989.
[25]
A. A.. and S. Azen. Statistical Analysis - A Computer Oriented Approach.
Academic Press, 1979.
[26]
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classi.cation and
Regression Trees. Wadsworth, 1984.
[27] T. Kohonen. The self-organizing map. Proc. IEEE, 78:1464–1480,
1990.
[28]
L. Fausett. Fundamentals of Neural Networks. Prentice Hall, 1994.
[29]
F. Farata and R. Walker. A study of the application of Kohonen-type neural
networks to the travelling salesman problem. Biol. Cyb., 64:463–468, 1991.
[30]
M. Bena¨ým, J. C. Fort, and G. Pages. Convergence of the one-dimensional
Kohonen algorithm. Adv. in Appl. Probab., 30:850–869, 1998.
[31]
E. Erwin, K. Obermayer, and K. Shulten. Self-organizing maps : ordering,
convergence properties and energy functions. Biol. Cyb., 67:47–55, 1992.
[32] R. Horowitz
and L. Alvarez. Convergence properties of self-organizing neural networks. In
Proceedings of the 1995 American Control Conference (IEEE Cat. No. 95CH35736),
volume 2, pages 1339–44, Evanston, IL, USA, 1995. American Autom. Control
Council.
[33]
M. Cottrell, J. C. Fort, and G. Pag`es. Two or three things that we know about
the Kohonen algorithm. In M. Verleysen, editor, Proc. ESANN’94, European Symp.
on Arti.cial Neural Networks, pages 235–244, Brussels, Belgium, 1994. D facto
conference services.
[34]
T. Kohonen. Improved versions of learning vector quantization. In Proc. Int.
Joint Conf. Neural Networks, pages 545–550, 1990.
[35]
E. Kosmatopoulos and M. Christodoulou. Convergence properties of a class of
learning vector quantization algorithms. IEEE Transactions on Image Processing,
5:361–368, 1996.
[36]
R. Hertzberg and A. Pope. High-throughput-screening: New technology for the
21st century. Curr. Opinion Chem. Biol., 4:445–451, 2000.
[37]
W. Walters, M. Stahl, and M. Murcko. High-throughput virtual chemistry. In P.
von Ragu´e Schleyer, editor, Encyclopedia of Computational Chemistry, Vol. 2,
pages 1225–1237. Wiley, 1998.
[38]
D. Boyd. Drug design. In P. von Ragu´e Schleyer, editor, Encyclopedia of
Computational Chemistry, Vol. 1, pages 795–804. Wiley, 1998.
[39]
P. Jurs. Quantitative structure-property relationships (QSPR). In P. von Ragu´e
Schleyer, editor, Encyclopedia of Computational Chemistry, Vol. 4, pages
2320–2330. Wiley, 1998.
[40]
R. Todeschini and V. Consonni. Handbook of Molecular Descriptors. Wiley, 2000.
[41]
R. Brown and Y. Martin. Use of structure-activity data to compare
structurebased clustering methods and descriptors for use in compound
selection. J. Chem. Inf. Comput.
Sci., 36:572–584, 1996.
[42]
R. Brown and Y. Martin. The information content of 2D and 3D structural
descriptors relevant to ligand-receptor binding. J. Chem. Inf. Comput. Sci., 37:1–9, 1997.
[43]
N. Trinajsti´c. Chemical Graph Theory. CRC Press, Boca Raton, 1992.
[44] D. Bonchev
and D. Rouvray, editors. Chemical Graph Theory. Abacus Press/Gordon &
Breach, 1991.
[45]
J. Ko¡ca, M. Kratochv´ýl, V. Kvasni¡ca, L. Matyska, and J. Posp´ýchal. Synthon
Model of Organic Chemistry and Synthesis Design. Springer, 1989.
[46]
M. Randi´c. Topological indices. In P. von Ragu´e Schleyer, editor,
Encyclopedia of Computational Chemistry, Vol. 5, pages 3018–3022. Wiley,
1998.
[47] S. Basak, B. Gute, and G. Grunwald. Use of topostructural, topochemical, and
geometric parameters in the prediction of vapor pressure. J.
Chem. Inf. Comput. Sci., 39:255–260,
1999.
[48]
I. Gutman. A formula for the Wiener number of trees and its extension to graphs
containing cycles. Graph Theory Notes, 27:9–15, 1994.
[49]
A. Dobrynin and I. Gutman. Solving a problem connected with distances in
graphs. Graph Theory Notes, 28:21–23, 1995.
[50] I. Gutman, S. Klavzar, and A. Rajapakse. The Szeged and the Wiener index of
graphs. Appl. Math. Lett., 9:45–49, 1996.
[51]
I. Gutman and N. Trinajsti´c. Graph theory and molecular orbitals. total π- electron energy of alternant
hydrocarbons. Chem. Phys. Lett., 17:535–538, 1972.
[52]
I. Gutman, B. Ru¡s¡ci´c, N. Trinajsti´c, and C. Wilcox. Graph theory and
molecular orbitals. XII. acyclic polyenes. J. Chem. Phys., 62:3399–3405, 1975.
[53]
J. Devillers and A. Balaban, editors. Topological Indices and Related
Descriptors in QSAR and QSPR. Gordon & Breach, Amsterdam, 1999.
[54]
M. Karelson. Molecular Descriptors in QSAR/QSPR. Wiley, New York, 2000.
[55]
A. Balaban, editor. From Chemical Topology to Three-Dimensional Geometry.
Plenum, New York, 1997.
[56] S. Nikoli´c,
N. Trinajsti´c, I. Toli´c, G. R¨ucker, and C. R¨ucker. On molecular complexity indices. In D. Bonchev and D.
Rouvray, editors, Complexity - Introduction and Fundamentals, pages 29–89. CRC
Press, 2003.
[57]
M. Randi´c. On characterization of molecular branching. J.
Am. Chem. Soc., 97:6609–6615, 1975.
[58] M. Randi´c,
P. Hansen, and P. Jurs. Search for useful graph theoretical invariants of
molecular structure. J. Chem. Inf. Comput. Sci., 28:60–68, 1988.
[59] M. Diudea,
editor. QSAR/QSPR Studies by
Molecular Descriptors. Nova, New York, 2001.
[60]
A. Katritzky and E. Gordeeva. Traditional topological indices vs. electronic,
geometrical, and combined molecular descriptors in QSAR/QSPR research. J.
Chem. Inf. Comput. Sci., 33:835–857,
1993.
[61]
D. Morales and O. Araujo. On the search for the best correlation between graph
theoretical invariants and physicochemical properties. J. Math. Chem.,
13:95–106, 1993.
[62]
L. Clark and J. Moon. On the general Randi´c index for certain families of trees.
Ars Comb., 54, 1999.
[63]
D. Vuki¡cevi´c and N. Trinajsti´c. Modi.ed Zagreb M2 index -
comparison with the Randi´c connectivity index for benzenoid systems. Croatica Chemica Acta, 76(2):183–187, 2003.
[64]
S. Nikoli´c, G. Kova¡cevi´c, A. Mili¡cevi´c, and N. Trinajsti´c. The Zagreb
indices 30 years after. Croatica Chemica
Acta, 76(2):113–124, 2003.
[65]
I. Gutman and M. Lepovi´c. Choosing the exponent in the de.nition of the
connectivity index. J. Serb. Chem. Soc., 66(9):605–611, 2001.
[66]
L. Kier and L. Hall. Molecular Connectivity in Chemistry and Drug Research.
Academic Press, 1976.
[67]
L. Kier and L. Hall. Molecular Connectivity in Structure Activity Analysis.
Wiley, 1986.
[68]
G. Moreau and P. Broto. Autocorrelation of a topological structure: A new
molecular descriptor. Nouv. J. Chim.,
4:359–360, 1980.
[69]
J. Devillers, D. Domine, and W. Karcher. Estimating n-octanol/water partition
coe.cients from the autocorrelation method. SAR QSAR Environ. Res., 33:301–306, 1995.
[70] J. Devillers, D.
Domine, C. Guillon, S. Bintein, and W. Karcher. Prediction of partition coe.cients using
autocorrelation descriptors. SAR QSAR Environ. Res., 7:151–172, 1997.
[71] J. Devillers
and D. Domine. Comparison of reliability of log p values calculated from a group
contribution approach and from the autocorrelation method. SAR QSAR Environ. Res., 7:195–232, 1997.
[72]
M. Wagener, J. Sadowski, and J. Gasteiger. Autocorrelation of molecular
properties for modelling corticosteroid binding globulin and cytosolic ah
receptor activity by neural networks. J. Am. Chem. Soc.,
117:7769–7775, 1995.
[73] H. Bauknecht, A. Zell, H. Bayer, P. Levi, M.
Wagener, J. Sadowsky, and J. Gasteiger. Locating biologically active compounds in medium-sized
heterogeneous datasets by topological autocorrelation vectors: Dopamine and
benzodiazepine agonists. J. Chem. Inf. Comput. Sci., 36:1205–1213, 1996.
[74] J. Devillers. Autocorrelation descriptors for modeling
(eco)toxicological endpoints. In A. Balaban and J. Devillers, editors, Topological
Indices and Related Descriptors in QSAR and QSPR, pages 595–612. Gordon &
Breach, Amsterdam, 1999.
[75]
J. Gasteiger and T. Engel, editors. Chemoinformatics. Wiley-VCH, 2003.
[76]
M. Johnson and G. Maggiora, editors. Concepts and Applications of Molecular
Similarity. Wiley, 1987.
[77]
A. Leach and V. Gillet. An Introduction to Chemoinformatics. Kluwer Academic
Publishers, 2003.
[78] K.
Kova¡cevi´c, D. Plav¡si´c, N. Trinajsti´c, and D. Horvat. On the intercorrelation of topological
indices. Stud. Phys. and Theoret. Chem., 63:213–224, 1989.
[79]
N. Trinajsti´c, S. Nikoli´c, S. Basak, and I. Lukovits. Distance indices and
their hyper-counterparts: Intercorrelation and use in the structure-property
modeling. SAR QSAR Environ Res.,
12(1-2):31–54, 2001.
[80]
S. Basak, B. Gute, and A. Balaban. Interrelationship of major topological
indices evidenced by clustering. Croatica
Chemica Acta, 77:331–344, 2004.
[81]
S. Basak, V.Magnuson, G. Niemi, R. Regal, and G. Veith. Topological indices:
Their nature, mutual relatedness, and applications. Math. Model., 8:300–305,
1987.
[82]
I. Motoc, A. Balaban, O. Mekenyan, and D. Bonchev. Topological indices:
Inter-relations and composition. MATCH Commun. Math. Comput. Chem., 13:369–404,
1982.
[83] A. Madansky.
Prescriptions for Working Statisticians. Springer, 1988.
[84]
M. Randi´c. Resolution of ambiguities in structure-property studies by use of
orthogonal descriptors. J. Chem. Inf. Comput. Sci., 31:311–320, 1991.
[85]
National Cancer Institute. Connection tables for 127k structures.
ftp://helix.nih.gov/ncidata/2D/nciopen.mol.Z.
[86]
M. Randi´c. Generalized molecular descriptors. J. Math. Chem., 7:155–168, 1991.
Assoziierte Webseite: www.driesen-antiquariat.de