How to derive physically real error bars, for fractions near 0 and 1?

by JeanTate

Those are four plots, by jules, from the Mass Dependent Merger Fraction (Control vs Post-quenched Sample) thread (near the top of page 6).

I could not find, with a quick search, jules' write-up of how she derived those error bars, but it's not directly relevant to my question anyway.

In these plots, the vertical (y) axis is "merger fraction"; by definition, "fractions" here must be reals¹, and have values bounded by 0 (or 0%) and 1 (or 100%). And for all estimated fractions plotted - the blue diamonds and pink squares - they do.

The same limits apply to error bars: all such bars must lie in [0, 1]. Yet some clearly do not. How to derive 'error bars' so that they remain physically real?

Error bars are often "1 σ", one standard deviation. If the 'errors' are distributed normally, these bars indicate that there's a ~67% chance the 'true' value lies between 'the mean ±1 σ ' ². But close to 0 or 1, the errors are most certainly not distributed normally, nor anywhere close to it. As with the values themselves - the means - there are hard limits (0 and 1).

What approximations to these 'hard limit' error distributions are commonly used, in astronomy?

¹ They're actually rational numbers; again, by definition. However, if they're weighted fractions, I guess sometimes they could be irrational.

² Yes, I know these statements are very sloppy, and mlpeck, for one, would surely hope that my understanding of this stuff is better than that!

Posted January 31, 2014 1:50 AM
by mlpeck

Made it. Which is not a given when traveling includes a transfer through O'hare this winter.

The same limits apply to error bars: all such bars must lie in [0, 1].
Yet some clearly do not. How to derive 'error bars' so that they
remain physically real?

The simple way to approach this is to become a Bayesian. The reason it's simple is that the Beta probability distribution, which has support on [0,1], is a conjugate prior for a binomial probability. What that means specifically is that if you have a prior Beta(α,β) for the probability, flip a coin N times and observe H heads and T=N-H tails the posterior for the probability of heads is

Beta(α+H, β+N-H)

Now getting credible intervals for the proportion that are correctly limited to [0,1] just requires a table lookup.

A conservative vague prior is Beta(1,1), which is just a uniform distribution. Others that have been proposed are Beta(1/2,1/2) and Beta(0,0).

As a numerical example suppose you have a prior of Beta(1,1), you flip a coin 10 times and observe no heads. Your posterior is Beta(1,11) which looks like the graph below. A 95% upper credible limit is about 0.23, so a sensible error bar might stretch from 0 to 0.23.

This example shows that a Bayesian approach gives sensible results even when the observed proportion is near 0 or 1 or with
small sample sizes.

Sorry, I'm being concise to a fault, but I won't have a lot of free time for a few weeks.

Posted January 31, 2014 3:40 PM
by JeanTate in response to mlpeck's comment.

Thank you, mlpeck! 😃 😃

It'll take me a while to understand this - and considerably longer to be able to apply it, I'm sure - but it looks very cool indeed.

Would I be correct in guessing that number crunching using this approach is fully supported in R? If so, yet more motivation for me to stop procrastinating ... 😛

Oh, and glad to hear (read) that you made it OK; good thing it wasn't Atlanta 😉

Posted January 31, 2014 4:07 PM
by mlpeck in response to JeanTate's comment.

Would I be correct in guessing that number crunching using this
approach is fully supported in R? If so, yet more motivation for me to
stop procrastinating ... 😛

Yes, the graph and supporting calculations were done in R. But I'm sure those could be done in Excel as well, although I don't know if an add-on would be required.

Actually everything graphical or quantitative I've posted here has been done in R. Most if not all of it could be done in Python with appropriate extensions as well. Spreadsheets, maybe not in some cases.

Posted February 1, 2014 1:56 AM
by jtmendel scientist, moderator

Another way to compute proper confidence intervals for a binomial distribution, which describes the uncertainties for fractions, is to use something called the Wilson score interval. There are some resources online which describe its computation (which I have written down somewhere but can't locate just now) and if I recall it's a relatively straightforward calculation.

Incidentally, this is the same approach that Reddit uses to compute the significance of its vote statistics.

Posted February 1, 2014 1:21 PM
by mlpeck

Wilson score interval. There are some resources online which describe
its computation

http://en.wikipedia.org/wiki/Binomial_confidence_interval.

The Jeffreys interval discussed in the article is a Bayesian credible interval with Beta(1/2,1/2) prior.

Posted February 1, 2014 2:12 PM
by JeanTate

Thank you jtmendel (and it's really good to see you are still around!) and mlpeck. 😃

If we stick to being frequentist, it seems that the Wilson score interval is likely to be sufficient for our purposes, right? If we decide to go Bayesian ...

But I'm sure those could be done in Excel as well, although I don't know if an add-on would be required. [...] Most if not all of [what I've posted here] could be done in Python with appropriate extensions as well. Spreadsheets, maybe not in some cases.

I once had a copy of Excel with add-ons/plug-ins/whatever; it was very rich in what it seemed to be able to do (not that I understood even 1% of all it had). Today, however, I use vanilla OpenOffice*; some of these can certainly be done with that tool (and also, likely, whatever Google offers for free, with Docs), some may be possible with some effort, some not practical even if possible.

Separate question: if you could somehow do some bootstrapping, on this sort of Quench data, would your estimated variances ('errors') be good enough and/or close to what you'd get using the other estimators mentioned so far?

If there's an easy-to-use bootstrap tool (Windows, free, etc), that might be an even simpler way to go. I've no doubt R has such a capability; right?

*I have Python on my Linux box, but for Quench I'm trying to stick with just what any ordinary zooite can easily get/use; specifically, what can be done if you have some flavor of Windows as your OS, and have not paid for MS Office

Posted February 1, 2014 4:24 PM
by jtmendel scientist, moderator

Separate question: if you could somehow do some bootstrapping, on this sort of Quench data, would your estimated variances ('errors') be good enough and/or close to what you'd get using the other estimators mentioned so far?

Sorry for being so slow to get back on this, JeanTate. My experience is that Jackknife uncertainty estimates are essentially identical to some sort of Binomial confidence interval as long as you allow for asymmetric uncertainties (i.e. computing percentiles of the Jackknife sample distribution rather than a simple r.m.s.).

As for how to do this, I think there's an add-on for Excel that can do resampling statistics, and I wouldn't be surprised if there is something similar for OpenOffice (though I don't know). It's a very good question, actually.

Posted February 24, 2014 8:50 AM
by JeanTate in response to jtmendel's comment.

And me too, for being so tardy in thanking you.

As for how to do this, I think there's an add-on for Excel that can do resampling statistics, and I wouldn't be surprised if there is something similar for OpenOffice (though I don't know). It's a very good question, actually.

As I no longer have a copy of Excel, I can't check out the first option; maybe another zooite reading this can?

How do to this in OpenOffice? Well, it's now on my list of things to look into! 😄

Thanks again.

Posted March 6, 2014 2:48 PM
by klmasters scientist

Astronomers rarely use Microsoft products for analysis. I would use IDL (but that is a paid software). There's probably python packages in atrophy which could do it - Brooke (vrooje) would be the best person to ask about that.

There a paper by Cameron (? don't remember year, maybe 2010) which discussed methods for calculating errors on fractions near 0 and 1. Did you find that? I think we cite it in the Galaxy Zoo: Bars in Disc galaxies paper.

Posted March 14, 2014 4:35 PM
by JeanTate in response to klmasters's comment.

Thanks Karen.

For the Quench project, I intend to stick to using tools any ordinary zooite has access to (and likely familiarity with), hence things like MS Excel and Open Office; for my own research outside Quench, I am always on the lookout for (free!) tools and approaches I can fairly easily master, such as those based on Python. IDL is, obviously, not on my radar. 😦

There a paper by Cameron (? don't remember year, maybe 2010) which discussed methods for calculating errors on fractions near 0 and 1. Did you find that? I think we cite it in the Galaxy Zoo: Bars in Disc galaxies paper.

No, I did not find that; thanks for the pointer! 😄

Posted March 14, 2014 6:17 PM
by JeanTate in response to klmasters's comment.

There a paper by Cameron (? don't remember year, maybe 2010) which discussed methods for calculating errors on fractions near 0 and 1. Did you find that? I think we cite it in the Galaxy Zoo: Bars in Disc galaxies paper.

The "Galaxy Zoo: Bars in Disc galaxies paper" is Melvin+ 2014, which cites Cameron+ 2010.

I did not find anything in Cameron+ 2010 on "methods for calculating errors on fractions near 0 and 1", perhaps it was in another paper Melvin+ 2014 cited?

In any case, in Melvin+ 2014, there's the following re error bars (formatting not copied faithfully! but I hope the equation is correct despite that):

Each point is labelled with the number of barred disc galaxies (N_bar) over the total number of disc galaxies (N_disc) observed within the given bin. We show 1σ errors for each point (grey), with the errors calculated as follows;

σ_f = SQRT((f_bar * (1-f_bar))/N_disc)

Posted April 25, 2014 3:46 PM
by JeanTate in response to JeanTate's comment.

Digging a bit deeper, Cameron+ 2010 say this about the error bars (in their Figure 3, which is similar to Melvin+ 2014's Figure 7*):

We also note that the error bars shown in Fig. 3 reflect the Poisson uncertainties only;

And Melvin+ 2014 also to Poisson (re their Figure 9, which is also similar):

green tramlines which represent Poissonian errors

Hmm, where did I read about a problem with Poisson error bars? Here: Those Deceiving Error Bars, from a blog by Tommaso Dorigo (one of my fave physics blogs). The context is HEP (high energy physics), but it seems quite pertinent here. Consider this (my emphasis):

So what's the problem with Poisson error bars ? The problem is that those error bars are not representing exactly what we would want them to. A "plus-or-minus-one-sigma" error bar is one which should "cover", on average, 68% of the time the true value of the unknown quantity we have measured: 68% is the fraction of area of a Gaussian distribution contained within -1 and +1 sigma. For Gaussian-distributed random variables a 1-sigma error bar is always a sound idea, but for Poisson-distributed data it is not typically so. What's worse, we do not know what the true variance is, because we only have an estimate of it (N), while the variance is equal to the true value (m). In some cases this makes a huge difference. [...] Worse still is the fact that the Poisson distribution, for small (m<50 or so) expected counts, is not really symmetric. This causes the +-sqrt(N) bar to misrepresent the situation very badly for small N. Let us see this with an example.

He goes on:

The solution, of course, is to try and draw error bars that correspond more precisely to the 68% coverage they should be taken to mean. But how to do that ? We simply cannot: as I explained above, we observe N, but we do not know m, so we do not know the variance. Rather, we should realize that the problem is ill-posed. If we observe N, that measurement has NO uncertainty: that is what we saw, with 100% probability. Instead, we should apply a paradigm shift, and insist that the uncertainty should be drawn around the model curve we want to compare our data points to, and not around the data points!

While that may not be quite appropriate to the part of observational astronomy of interest in the Quench project, no doubt something similar is called for (Dorigo goes on to present a couple of solutions). The blog post ends with this:

Despite the soundness of the approach advocated in the paper [containing a proposed solution], though, I am willing to bet that it will be very hard to impossible to convince the HEP community to stop plotting Poisson error bars as sqrt(N) and start using central intervals around model expectations.

A bunch of ordinary zooites doing an experimental project well off the usual path surely has zero chance of making a difference. Let's stick with the formula Melvin+ 2014 used, and even call it σ (no matter how misleading that is).

*Both note a very important caveat ("cosmic variance potentially contributes an additional ~13% uncertainty for populations of this number density in the volume sampled", and "Our shaded errors do not account for any systematic errors that may be present, especially in the higher redshift bins", respectively); however, here I'm focusing on how the error bars are calculated, from the data

Posted April 26, 2014 1:54 PM
by JeanTate in response to JeanTate's comment.

This "σ_f" has a very desirable property: it's symmetric around f=0.5. 😃 As f gets close to zero, σ_f does likewise; as f gets close to 1, σ_f also approaches zero. Further, if N_bar (or N_merger or N_asymmetrical in our project) is 1 - rather common in Quench! - then the "-sqrt(N)" arm of σ_f will reach almost to zero*, also rather desirable.

*assuming N_objects - the total number of objects/galaxies in the bin - is three or more.

Posted April 26, 2014 2:09 PM
by mlpeck in response to JeanTate's comment.

I'll reiterate my earlier suggestion that binomial and multinomial proportions are as good a place as any to use Bayesian reasoning, because the mathematics are incredibly simple and you will automatically end up with confidence* intervals confined to the range [0,1]. Also, unless you start with a dogmatic prior confidence* intervals will never collapse to a point even when the votes are unanimous.

Alternately you could use the Wilson score interval or Jeffreys interval as described in this Wikipedia entry: http://en.wikipedia.org/wiki/Binomial_confidence_interval. The Jeffreys interval is based on Bayesian reasoning dressed up in frequentist clothing.

*Bayesians call these credible intervals.

Posted April 26, 2014 4:20 PM
by JeanTate in response to mlpeck's comment.

I'm conflicted! 😦

On the one hand, I would much prefer to use either of these approaches, not least because I think they are based on more solid statistical foundations. However, I'm not sure I could calculate confidence intervals using either of these approaches, using only Open Office's Calc spreadsheet tool. And for this project, I would like to continue to use only tools which any ordinary zooite both has and is likely to be familiar with.

On the other hand, the method Melvin+ 2014 uses is both very easy to apply to our data, and (apparently) widely used (not least by at least one member of the Quench project Science Team). However, it clearly has undesirably features, not the least being that it would train ordinary zooites (like me, and perhaps ChrisMolloy) to use a technique that is statistically unsound.

So .... I'll see how far I get trying to use either of the approaches you recommend, using just OpenOffice (in this regard, MS' Excel is the same, I think, not counting any extra packages or extensions you can buy).

Posted April 28, 2014 4:24 PM
by JeanTate in response to JeanTate's comment.

However, I'm not sure I could calculate confidence intervals using either of these approaches, using only Open Office's Calc spreadsheet tool.

After some googling, and reading, I came across this webpage. While its field of study is linguistics (the analysis of corpora, specifically), I found it a very easy-to-read introduction (including some nice stuff on the history!). And so was able to understand the Wikipedia webpage mlpeck provided a link to.

More important, however, is the fact that I think I can calculate Wilson score intervals, for the 'fraction' classification data we're working with! 😄 So, once again, a big thank you to mlpeck, and to jtmendel! 😃

Posted April 28, 2014 10:03 PM
by JeanTate in response to mlpeck's comment.

I'll reiterate my earlier suggestion that binomial and multinomial proportions are as good a place as any to use Bayesian reasoning, because the mathematics are incredibly simple and you will automatically end up with confidence* intervals confined to the range [0,1]. Also, unless you start with a dogmatic prior confidence* intervals will never collapse to a point even when the votes are unanimous.

Turns out OpenOffice Calc does have both a built-in/standard BETA and (more importantly) BETAINV function (no add-ons, etc, required). 😮 Yay! 😄

After some ~~playing~~ trial and error, I think I may have the hang of this. If so, then yes, " the mathematics are incredibly simple and you will automatically end up with confidence intervals confined to the range [0,1]" (I don't - yet - understand 'credible' intervals, and 'dogmatic priors', much less 'conjugate priors', but all I need to understand - at least for now - is how to do the calculations, given the data).

Which leads me to this question: why do astronomers (and high energy particle physicists, apparently) stick with the awkward/misleading/(and worse) Poissonian approach?

Posted April 29, 2014 4:46 PM
by zoey.young

Yes... I believe so

Posted May 8, 2014 2:36 AM
by mlpeck

Cameron, E. 2011: On the Estimation of Confidence Intervals for Binomial Population Proportions in Astronomy: The Simplicity and Superiority of the Bayesian Approach.

I had not read this when I suggested using a Bayesian approach to estimate binomial confidence limits many months ago. The recent paper by Cheung et al. on the AGN/Bar (lack of?) connection cites this paper. They botched the explanation of what they were doing in Table 2, where one of their subsamples has 0 barred AGN, although I guess their confidence interval is OK ~~(I haven't checked yet)~~. It is: they effectively use a Beta(1,1) prior, there were 0 bars in 9 candidates so the posterior is Beta(1,10).

Cameron has a rather amusing blog too that I just discovered. He seems to have a pretty low opinion of the quality of statistical reasoning in astronomy.

Posted October 6, 2014 8:36 PM
by mlpeck

Incidentally here are two more recent submissions to arxiv on which Cameron is a co-author. I did some logistic regressions for this project and thought I had posted on one of the data analysis results threads, but I can't find anything.

de Souza, R.S. et al. 2014: The Overlooked Potential of Generalized Linear Models in Astronomy - I: Binomial Regression and Numerical Simulations.

Elliott, J. et al. 2014: The Overlooked Potential of Generalized Linear Models in Astronomy-II: Gamma regression and photometric redshifts.

The first one in particular is highly relevant to studying classification data.

Posted October 6, 2014 10:33 PM