Galaxy Zoo Starburst Talk

# When binning, what's the best measure to use (mean, median, ...)?

• In our analyses, we will be using bins; we will be binning by redshift, log_mass, etc.

To represent each bin's 'average' or 'central' (redshift, log_mass, ...) value, what is the standard practice in astronomy?

In particular, is it better - in some way or another - to use the arithmetic mean? or the median? Or something else (what)? If so, why?

In Quench, does it matter, as long as all analyses use the same measure?

(and is this called a 'measure'? or is it a 'statistic'?)

Posted

• Good question. I'm going to rephrase, just so I'm sure I'm answering what you want answered:

Where do you place your data point (along the x-axis) in each bin for
a plot showing binned data?

In Wong et al. 2012, her figure 5 shows "Percentage of PSB galaxies versus Stellar Mass". She's binned her data in mass bins (binned in increments of 10^-0.5 Msun) and she's placed her data points at the mean value in each bin. This helps to keep the plot looking clean and reinforces the idea that this is binned data in equal binning increments.

However, if one of the science goals for a plot is to show how the values are skewed within each bin (for example, if you wanted to show for some reason that there's a funny trend in the redshift values for our sources and the median value for each redshift bin is significantly skewed from the mean [this isn't the case, just picked a random example]), then you'd want to use the median value.

The question you've posed comes out of the thread: http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS000022o

Is there a science motivation for this asymmetry study to use the median rather than the mean? If not, use the mean, which tends to be the convention for binned data unless, as stated above, there's good motivation to not.

BTW, good to see this asymmetry study and the double-checking you and Chris are doing of the results.

Posted

• Is there a science motivation for this asymmetry study to use the median rather than the mean? If not, use the mean, which tends to be the convention for binned data unless, as stated above, there's good motivation to not.

Thanks for clarifying this. No science motivation for using median. I was about to update asymmetry using median but will amend to mean.

Posted