Chris Hillman et al on Frieden Chris Hillman wrote: >On Wed, 10 Mar 1999, Erik Van Nimwegen wrote: > >> > B. Roy Frieden's work came up recently in sci.physics.research, and I was >> > able to download several of his papers on this subject from the Physical >> > Review on-line archive >> > >> > http://prola.aps.org/search.html >> > >> > Unfortunately, I haven't had a chance to read them, but they do look >> > interesting! >> >> I read through two of them and unless I'm missing something subtle it >> seems like a lot of noise about very little. > >Oh dear. Well, he certainly makes some very strong claims. > >> As far as I could tell the whole thing boils down to the observation >> that Fisher information is equal to the square of a derivative, >> integrated over a manifold and that kinetic energy terms appearing in >> actions in physics are also integrals over the square of a derivative. > >I agree, that is hardly news! And so well known it is unlikely to help >physicists come up with new laws of physics. > >> One of the things that put me off completely is that at a certain point >> he derives the free particle action by *assuming* translation >> invariance.. He goes then on to say how momentum conservation magically >> comes out of this action. This to me shows that Frieden has never heard >> of Noether's theorems. Anybody that is making the kind of strong claims >> that Frieden makes without knowing such essential things is "out" in my >> opinion. > >Hmmm.... that doesn't sound good. > >> Also, I didn't see any of the more interesting actions derived in his >> papers (such as the ones that describe electromagnetism or general >> relativity). > >I -still- haven't gotten around to reading his papaers, arghgh, but I did >glance at a table in one and noticed that omission too, although he does >include the Lagrangian formulation for geodesics. > >> There's a lot of gobbledygook about nature playing some sort of >> information game with observers where each of them are winning or losing >> information and nature tries to minimize the information that it loses. >> I couldn't make any sense of it I'm afraid. > >[snip] > >> If reading the papers leads you to any of such insights, let me know. I >> think that it WOULD be interesting if general physical theories could be >> cast in an information theoretic frame work (especially quantum >> mechanics), but I found Frieden's papers very disappointing.. bordering >> on crackpotish I would say. > >Wow, I guess I really better try to actually read some of his papers so I >can report a first hand impression. Maybe this weekend. > >> > For instance, Fisher > ^^^^^^ > >Arghgh, I meant Roy Frieden, not Ronald Fisher! > >> > claims that his method shows why the laws of physics take the form of >> > -differential equations- of a particular form (with a squared gradient >> > term). Heady stuff! :-) >> >> I don't think he shows any such thing. Obviously, for an action over a >> continuous manifold the Euler-Lagrange equations are differential >> equations. >> >> > Readers of this n.g. should be interested to know that the Fisher >> > information and related ideas can be expressed in terms of a connection >> > and curvature (as in a generalization of Riemannian geometry); this is >> > sometimes called "information geometry", and there are at least two books >> > on this subject. In general, my impression is that all these ideas suffer >> > from the drawback that they assume some sort of parametrized family of >> > possible probability distributions (e.g. Gaussians) from which you want to >> > pick the parameter values "most consistent with the data". >> >> This is interesting stuff. I believe that there are versions where the >> space of distributions is something like a Hilbert space, including >> "all" possible distributions. But I know much less about this field than >> I would like to. > >It's been a -loooong- time since I looked at any of this stuff, and >unfortunately I can't come up with the authors or titles of the books just >now. All I remember is that this field is also called "statistical >manifolds" and that one of the books is in English but by a Japanese >author. I just skimmed the book so you may well be right about the scope >of his inquiry being broader than I appreciated. > >> > In >> > addition, as Frieden knows, the Fisher information has a well known >> > connection with a generalization of the Heisenberg uncertainty principle, >> > via the Cramer-Rao inequality. > >Hmmm... now I can't find a reference for this "well-known connection" :-( >but read on... > >> This seems to be one of the things Frieden got hooked on. However, to me >> it seems that they are hardly related at all. The Heisenberg uncertainty >> principle states a feature of nature: One cannot measure this and that >> quantity, or such and such quantities to arbitrary precision. One could >> in principle imagine that position and momentum did NOT obey an >> uncertainty relation. > >Actually, no--- that's just my point! This relationship actually is a >pretty trivial fact about operators on a Hilbert space. Any operators >which fail to commute must obey some sort of uncertainty relationship! My >formal physics background is nil, so don't assume that I really know what >I'm talking about :-( But for what it's worth (in the following, my math >is probably OK; it's the physical interpretation I might have messed up): > >Begin with a (simplified!) dictionary: > > Quantum Physics Your Favorite Hilbert Space, X > (abstractly speaking, all > separable Hilbert spaces are > "the same") > > quantum state (perhaps "mixed") x in X, taken up to complex scalar > multiple, so set ||x|| = 1 > > physical "quantity" s Hermitian operator S on X > > allowed values for s eigenvalues of S > (real numbers) > > "pure" state for s (measuring s eigenfunction x_j for S, up to > while in this state gives complex scalar multiple, so set > corresponding eigenvalue s_j) ||x_j|| = 1 > > decompositions into sum of x = sum_j (x ,x_j) x_j > pure states for s Sx = sum_j s_j (x, x_j) x_j > (case of a compact operator S) > > probability measuring s while squared cosine of angle btwn > in state x gives value s_j x and x_j = (x, x_j)^2 > > mean value of s when measured sum_j s_j (x, x_j)^2 > in state x = (x, Sx) > = (cos angle) ||Sx|| > (a real number) > > variation from mean value of s || Px ||^2, where > when measured in state x P = S - (x, Sx) I > > standard deviation of measurements || Px || > of t while in state x > >Before everyone complains that I've dropped all kinds of factors, note >that I have -normalized- the state and eigenstates, ||x|| = ||x_j|| = 1. >This can get a bit tricky, since then Sx may not be normalized. > >Also, in this dictionary I am pretending all our Hermitian operators are >compact; for the "real" uncertainty principle this isn't true, which >complicates things, but this can be fixed up. I'm just trying to get >across the idea that the HUP is an immediate consequence of a simple fact >about Hermitian operators on Hilbert spaces. > >Folklore Theorem: Let X be a complex Hilbert space. Let T,S be Hermitian >operators. Given x in X, define new (Hermitian) operators P,Q by > > P = S - (x, Sx) I > > Q = T - (x, Tx) I > >Then > > || Px || || Qx || >= 1/2 | (x, [S,T] x) > >Proof: Observe that [P,Q] = [S,T]. Then > > | (x, [S,T]x ) | = | (x, [P,Q]x) | [S,T] = [P,Q] > = | (x, PQx) - (x, QPx) | > = | (Px, Qx) - (Qx, Px) | (P, Q are Hermitian) > = | 2 im (Px, Qx) | > <= 2 | (Px, Qx) | > = 2 || Px || || Qx || (Cauchy-Schwarz) >Q.E.D. > >Corollary: whenever [S,T] = k I, we must have an inequality of the form > > (standard deviation of s) (standard deviation of t) > ( measured in state x ) ( measured in state x ) >= k > >On the other hand, whenever S,T commute, there is no such inequality. > >Note: no analysis in sight! I just used essentially algebraic facts about >Hilbert spaces--- the analysis comes in when you try to realize X as the >space of square integrable functions on some other space. > >Actually, I've seen many generalizations of the uncertainty principle. For >one which is quite a bit less abstract, and more analytical, than the >"algebraic" version offered above, see problem 32 on p. 189 of Folland, >Real Analysis: Modern Techniques and Applications, Wiley, 1984. > >> The Cramer-Rao inequality on the other hand, says >> something about how accurate one can estimate some parameter in a family >> of probability distributions, given "random samples" from the >> distribution. What do these two things have to do with eachother other >> than that their mathematical formulations look similar? >> I'd be interested to know though, if there is some deeper connection. > >One version of the Cramer-Rao inequality says that the variance of an >unbiased estimator is greater than or equal to the reciprocal of the >Fisher information (variance of a certain logarithmic derivative). Since >I can't find a reference to the "well-known connection" right now, I guess >I'll have to try to work it out myself... > >More later, I hope! :-) > >Chris Hillman > ** rossini@biostat.washington.edu (A.J. Rossini) wrote: >>>>>> "CH" == Chris Hillman writes: > > CH> On Wed, 10 Mar 1999, Erik Van Nimwegen wrote: > > >> > Readers of this n.g. should be interested to know that the > >> > Fisher information and related ideas can be expressed in > >> > terms of a connection and curvature (as in a generalization > >> > of Riemannian geometry); this is sometimes called > >> > "information geometry", and there are at least two books on > >> > this subject. In general, my impression is that all these > >> > ideas suffer from the drawback that they assume some sort of > >> > parametrized family of possible probability distributions > >> > (e.g. Gaussians) from which you want to pick the parameter > >> > values "most consistent with the data". > >> > >> This is interesting stuff. I believe that there are versions > >> where the space of distributions is something like a Hilbert > >> space, including "all" possible distributions. But I know much > >> less about this field than I would like to. > > CH> It's been a -loooong- time since I looked at any of this > CH> stuff, and unfortunately I can't come up with the authors or > CH> titles of the books just now. All I remember is that this > CH> field is also called "statistical manifolds" and that one of > CH> the books is in English but by a Japanese author. I just > CH> skimmed the book so you may well be right about the scope of > CH> his inquiry being broader than I appreciated. > >You are probably thinking of Amari. There are other books on the link >between differential geometry and statistics. Note that the fisher >information is the Reimannian metric on a manifold describing a >probability model. Amari showed that a non-torsion free class of >metrics provides a number of different bounds for other (non-maximum >likelihood) optimization criteria for fitting statistical models. > > >> > In addition, as Frieden knows, the Fisher information has a > >> > well known connection with a generalization of the Heisenberg > >> > uncertainty principle, via the Cramer-Rao inequality. > > CH> Hmmm... now I can't find a reference for this "well-known > CH> connection" :-( but read on... > >any references for this connection? > >best, >-tony > >-- >A.J. Rossini >UW Biostatistics & Center for AIDS Research >206-543-1044 / 206-720-4282 >rossini@biostat.washington.edu >http://www.biostat.washington.edu/~rossini/ ** Erik Van Nimwegen wrote: >Chris Hillman wrote: >> > >> It's been a -loooong- time since I looked at any of this stuff, and >> unfortunately I can't come up with the authors or titles of the books just >> now. All I remember is that this field is also called "statistical >> manifolds" and that one of the books is in English but by a Japanese >> author. I just skimmed the book so you may well be right about the scope >> of his inquiry being broader than I appreciated. > >His name is Amari. I can't say I did much more than skim it either. > >> >> > This seems to be one of the things Frieden got hooked on. However, to me >> > it seems that they are hardly related at all. The Heisenberg uncertainty >> > principle states a feature of nature: One cannot measure this and that >> > quantity, or such and such quantities to arbitrary precision. One could >> > in principle imagine that position and momentum did NOT obey an >> > uncertainty relation. >> >> Actually, no--- that's just my point! This relationship actually is a >> pretty trivial fact about operators on a Hilbert space. Any operators >> which fail to commute must obey some sort of uncertainty relationship! My >> formal physics background is nil, so don't assume that I really know what >> I'm talking about :-( But for what it's worth (in the following, my math >> is probably OK; it's the physical interpretation I might have messed up): > >Of course you're right. But the essential point is that in quantum >mechanics the non-commuting of position and momentum is a POSTULATE. > > > >> Q.E.D. >> >> Corollary: whenever [S,T] = k I, we must have an inequality of the form >> >> (standard deviation of s) (standard deviation of t) >> ( measured in state x ) ( measured in state x ) >= k >> >> On the other hand, whenever S,T commute, there is no such inequality. > >yes yes and yes. However, who is to decide what operators (representing >physical quantities) commute and which ones don't? Again, quantum >mechanics is essentially build on assuming that operators for position >and momentum don't commute but that their commutator gives -i hbar I >(hope I have the sign right). > >I have been aware, for some time, of the following (admittedly rather >vague) fact: >Assume that the position and momentum of a particle are both given by a >probability distribution. Assume further that the momentum probability >distribution is the Fourier transform of the posisition probability >distribution and vice versa. Then one automatically has "uncertainty >relations". For instance, a delta-peak in momentum gives a flat >distribution in position and vice versa. Gaussian distributions minimize >the product of uncertainty in position and momentum. >Somebody once tried to convince me that THIS is the essential reason for >the uncertainty relations and that one doesn't need to postulate them. >However, I wasn't convinced. What is the compelling argument for >assuming that position and momentum distributions are Fourier-transforms >of eachother? And where the hell does Plancks constant come in? > >Anyway.. I still believe that there might be something deeper to it. >Maybe the above may help in thinking about it (I realize that it's just >a more specific example of the general argument: noncommuting operators >on a Hilbert space -> uncertainty relations) > >Regards, > >Erik ** Chris Hillman wrote: > >On 11 Mar 1999, A.J. Rossini wrote: > >> You are probably thinking of Amari. > >Yes, thanks, now I can find the books I had in mind: > > Differential geometry in statistical inference > S.-I. Amari, ed. > Hayward, Calif. : Institute of Mathematical Statistics, 1987. > > Amari, Shunichi. > Differential-geometrical methods in statistics > Springer-Verlag, 1985. > >This should also be relevant: > > Michael K. Murray and John W. Rice. > Differential geometry and statistics > Chapman & Hall, 1993 > >> Note that the fisher information is the Reimannian metric on a >> manifold describing a probability model. Amari showed that a >> non-torsion free class of metrics provides a number of different >> bounds for other (non-maximum likelihood) optimization criteria for >> fitting statistical models. > >As I recall, points in this manifold represent a specific type of >probability distribution with particular values of the parameters. >Continuously changing the parameters corresponds to moving on a curve >through the manifold. Then various quantities which come up in problem of >finding the specific parameter values which provide the best fit to a >given data set can be identified with purely geometrical concepts like the >curvature. > >The point I was trying to emphasize is that IIRC, you do assume some type >of general statistical model (with a finite number of parameters) at the >outset. If one were limited to the usual finite parameter families >treated in probability theory, this would stand in contrast to >parameter-free methods such as maximal entropy techniques. However, >infinite dimensional Riemannian manifolds also make sense and apparently I >may overlooked the fact that Amari and his followers have also >investigated such manifolds. This might allow them to consider such >general families of probability distributions that their methods, while >not exactly parameter free, would be somewhat immunized against the >objection I had in mind. > >> CH> Hmmm... now I can't find a reference for this "well-known >> CH> connection" :-( but read on... >> >> any references for this connection? > >That's the problem, I can't find any right now. Nor can I find any >references for the folklore theorem I mentioned either (I think I read >about this in some long-forgotten book--- maybe the monumental functional >analysis book by Yosida?) and actually spent quite a bit of time coming up >with the citation to the problem in Folland's textbook :-/ Nonetheless, I >think this is all "well-known" to the appropriate mathematicians, I just >can't think who those people might be right now :-( > >Chris Hillman > ** rossini@biostat.washington.edu (A.J. Rossini) wrote: >>>>>> "CH" == Chris Hillman writes: > > CH> On 11 Mar 1999, A.J. Rossini wrote: > > >> You are probably thinking of Amari. > > CH> Yes, thanks, now I can find the books I had in mind: > > CH> Differential geometry in statistical inference S.-I. Amari, > CH> ed. Hayward, Calif. : Institute of Mathematical > CH> Statistics, 1987. > > CH> Amari, Shunichi. Differential-geometrical methods in > CH> statistics Springer-Verlag, 1985. > >These are what I was thinking of. > > CH> This should also be relevant: > > CH> Michael K. Murray and John W. Rice. Differential geometry > CH> and statistics Chapman & Hall, 1993 > >Depends on who you are :-). It's tech transfer from statistical >theory to mathematical theory, unlike Amari's work (and there are >related works by Efron, Lauritzen, etc -- add to your first list, a >book on statistics and differential geometry published as the results >of a conference sponsored by the Navy in the late 80s/early 90s). > > >> Note that the fisher information is the Reimannian metric on a > >> manifold describing a probability model. Amari showed that a > >> non-torsion free class of metrics provides a number of > >> different bounds for other (non-maximum likelihood) > >> optimization criteria for fitting statistical models. > > CH> As I recall, points in this manifold represent a specific type > CH> of probability distribution with particular values of the > CH> parameters. Continuously changing the parameters corresponds > CH> to moving on a curve through the manifold. Then various > CH> quantities which come up in problem of finding the specific > CH> parameter values which provide the best fit to a given data > CH> set can be identified with purely geometrical concepts like > CH> the curvature. > >Correct. > > CH> The point I was trying to emphasize is that IIRC, you do > CH> assume some type of general statistical model (with a finite > CH> number of parameters) at the outset. If one were limited to > CH> the usual finite parameter families treated in probability > CH> theory, this would stand in contrast to parameter-free methods > CH> such as maximal entropy techniques. However, infinite > CH> dimensional Riemannian manifolds also make sense and > CH> apparently I may overlooked the fact that Amari and his > CH> followers have also investigated such manifolds. This might > CH> allow them to consider such general families of probability > CH> distributions that their methods, while not exactly parameter > CH> free, would be somewhat immunized against the objection I had > CH> in mind. > >Amari had (looked at the infinite-dim'l parameter case), but it's >wrong (this was first mentioned to me by Paul Vos, a (bio)statistician >who knows a good bit more about this type of stuff). It also was on a >specialized model, applicable more to those who study asymptotic >theory for statistical estimators. The key problem was the assumption >that parallel-transport held in the particular infinite-dimensional >parameterization. Not clear at all... > >However, if you look at the work done by people like Wellner (at UW >:-), etc on semiparametric models, there are striking similarities. >In particular, it appears that one could construct an information >theory for semiparametric models (say, a finite parameterizaton >describes the first or second moment (mean/variance), and you don't >care about the error/variability structure, or you parameterize the >difference in odds, but not the underlying odds structure. > >While there is a reasonable geometric theory (metrics, almost, if I >recall correctly) for differential manifold-like structures in Hilbert >spaces (which ought to carry into the semiparametric problem), the >topology needed for convergence theory (and nice probabilistic >behaviour) requires one to admit a Hadamard-space (I can't recall the >exact definition, but its a bit weaker, admits a psuedo-metric, and is >a bit more difficult to work with. In particular, constructing a >theory similar to that of the differential manifolds is apparently >more difficult, but might've been done? > >_That_ quickly got beyond my abilities (or at least abilities+free >time :-) a few years ago, and I've not looked at it recently. > >best, >-tony > >-- >A.J. Rossini >UW Biostatistics & Center for AIDS Research >206-543-1044 / 206-720-4282 >rossini@biostat.washington.edu >http://www.biostat.washington.edu/~rossini/ ** Chris Hillman wrote: > > >On 12 Mar 1999, A.J. Rossini wrote: > >> However, if you look at the work done by people like Wellner (at UW >> :-), etc on semiparametric models, there are striking similarities. >> In particular, it appears that one could construct an information >> theory for semiparametric models (say, a finite parameterizaton >> describes the first or second moment (mean/variance), and you don't >> care about the error/variability structure, or you parameterize the >> difference in odds, but not the underlying odds structure. >> >> While there is a reasonable geometric theory (metrics, almost, if I >> recall correctly) for differential manifold-like structures in Hilbert >> spaces (which ought to carry into the semiparametric problem), the >> topology needed for convergence theory (and nice probabilistic >> behaviour) requires one to admit a Hadamard-space (I can't recall the >> exact definition, but its a bit weaker, admits a psuedo-metric, and is >> a bit more difficult to work with. In particular, constructing a >> theory similar to that of the differential manifolds is apparently >> more difficult, but might've been done? > >That sounds quite interesting! I don't know very much about what tools >are available here, but the first book I'd check is > >Author: Warner, Frank W. (Frank Wilson), 1938-. >Title: Foundations of differentiable manifolds and Lie groups / >Pub. Info.: New York : Springer, c1983. >LC Subject: Differentiable-manifolds. > Lie-groups. >Status: Mathematics Research General Stacks > QA614.3 .W37 DUE 03-19-99 > Mathematics Research Reserve > QA614.3 .W37 CHECK THE SHELVES 4 HOUR RESERVE > >which I think treats infinite dimensional manifolds, presumably including >connections, from the outset. (As you probably know, the Math Library is >in Padelford Hall, just North of the Hall Health, which is right across >from the HUB. Readers at other universities should be able to obtain this >book also, since it is pretty standard.) > >Unfortunately, time is always a constraint :-( > >Chris Hillman ** daizadeh@FAS.HARVARD.EDU (Iraj Daizadeh) wrote: > >Hello. > >Now, we have to be careful...Just because one is able to derive a first >order in time (second in space) d.e. from a primitive looking function >does not mean that this function is a superset of Q.M..Further, just >because we derive an equation that is qualitatively similar to the >Bohm formulism does not mean we have reinterpreted Q.M. in a new >informational form; This is ridiculous. > >Indeed, one needs to have a physical problem >that can only be addressed rigorously via an information function to >really say anything. >If we cannot do this, then we are in the realm of philosophy, >which is NOT what we want. I do not want to address problems in >``consciousness'' or whatever. I just noticed something slightly >interesting whose formulism I thought I could steal for my own >calculations. The problems I am trying to solve are not theoretical they >are phenomenological as is most, if not all, of the formulisms utilized in >Biology; simply due to the number of degrees of freedom that are required >for a rigorous treatment of biological processes. > >Best, Iraj. > > >Iraj Daizadeh, Ph. D. >Harvard University >Department of Cellular and Molecular Biology >The Biological Laboratories, Box #140 >16 Divinity Avenue >Cambridge, MA 02138 >Phone: (617) 495-0783 > (617) 495-0560 >Fax: (617) 496-4313 >Email: daizadeh@fas.harvard.edu > In article <7h73pp$l77$2@pravda.ucr.edu>, baez@galaxy.ucr.edu (john baez) wrote: > In article <01be9aec$b623a020$2897cfa0@sj816bt720500>, > Philip Wort wrote: > > >I find this book is very difficult to follow. It is full of forward > >references and has a confusing section structure ... anyway any help > >would be appreciated, since it seems that there is something interesting > >here beneath the confusion (perhaps just mine). > > I wish I could help you, but I can't. When people started talking about > Friedan's work I went down to the library to look it up, but I couldn't > make much sense out of his papers. I'd hoped his book would be easier > to read, but from what you say it sounds like maybe not.... > > Does *anyone* understand this stuff? If so, could they explain it? > > I don't even understand what "Fisher information" really is or why > people (not just Friedan) are interested in it. I'll take a shot at this, having studied parameter estimation a little. Consider the problem of estimating a parameter vector, A, from a vector of random variables, R, whose probability density depends on the parameter vector. (A is a parameter because it's not a random variable with a known probabililty density.) That is, we know the conditional density, p(R|A), and want to estimate A given R. One common way to do this is maximum likelihood estimation (Choose the A that maximizes p(R|A)), but other methods are possible. The question arises, how good is my estimation scheme? If this were a Bayesian problem, I could tell you, because I can get p(A|R) from p(R|A) and p(A). However, in this kind of problem, there is no prior information p(A). So that doesn't work. Instead, there is a bound on how well we can do. Suppose that we come up with an unbiased estimator, a(R). This just means that E(a(R)-A)=0, for all A. Then the covariance of the estimation error satisfies covariance(a(R)-A) >= J where J is the Fisher information matrix. The bound above is known as the Cramer-Rao inequality. When the equality holds, the estimate is called "efficient". An efficient estimate does not always exist, but when it does, the estimator that does this is the maximum likelihood estimator. The Fisher information matrix is given by J_{ij} = E[(d log[p(R|A)]/d A_{i})(d log[p(R|A)]/d A_{j})] = -E[(d^{2} log[p(R|A)]/d A_{i} d A_{j})] One of my pet peeves is that researchers are keep coming up with new, ad hoc estimation schemes, but few bother to compare their results to the Cramer-Rao inequality to see if the idea is a good one. Of course, any unbiased estimator will be at best as good as the bound, but some are much worse! In any event, I have no idea how this relates to physics, so maybe I haven't been much help.