Galaxy Zoo Starburst Talk

Redshift and size cuts: Proposal and discussion

  • JeanTate by JeanTate

    In this post, I said I'd look into redshift and size cuts (the post following this has links and some background):

    follow up on mlpeck's suggestion (somewhere!) that there should be a redshift cut: objects with redshift < some value should be excluded (many good reasons)

    similarly, perhaps do a size cut: exclude all objects with petro_r50 > some_value (one reason: the part of the galaxy the spectrum is of is too small, as a proportion of the whole galaxy, to be representative)

    The redshift cuts I propose are:

    1. exclude QS and QC objects with redshifts < 0.0265
    2. ditto, for redshifts > 0.313

    The first will reduce the number of QS objects by 77, and QC by 58; the second excludes just four QS galaxies, and six QC ones.

    With the low-redshift cut, the biggest1 QS object ( AGS00000lc) has a Petro_R50 of just 7.52", so the covering fractions are all > 4%; and the biggest1 QC one ( AGS000031u) 8.54" (3%). This cut also removes many of the obvious outliers.

    The difference in redshift between a QS object and the one with the next greatest redshift (in the v4 QS catalog) is very small, < 0.01 (and in most cases much less) ... until AGS00001jf. This galaxy has a redshift of 0.3243, and the redshift of the one with the closest (smaller) redshift is 0.3115. And the highest redshift QS object ( AGS000017k) has a redshift that is also > 0.01 above that of its neighbor. {I'll add a plot later, showing this}.

    In later posts I'll discuss what obvious outliers remain after these two redshift cuts, how many 'too big' galaxies are removed by the first cut, and the feasibility of applying a 'size cut' as well.

    1 Not counting objects with obviously wrong Petro_R50 values (e.g. AGS00000s1), these should be excluded as well. See the Galaxies which are too big (Petro_R50 >> fiber aperture) thread for more details. Also the Objects - Galaxies? - indistinguishable from point sources one.

    Posted

  • mlpeck by mlpeck

    Why precisely those redshift cuts? You explained the consequences of making them, but that's not the question I'm asking. If you were going to propose redshift cuts (or any others) without peeking at the data what cuts would you propose and why?

    How do you justify making them after the fact?

    Give a concise and objective definition of an outlier. Again, how do you justify exclusion of data after the fact?

    It's been a long time since I've taken a formal statistics course but I'm pretty sure it's still verboten to cull "outliers" by hand. I think in general appropriate ways to deal with outliers include:

    a) Use quantile based descriptive statistics.

    b) Use robust methods for inferential statistics. Extra credit for using robust hierarchical Bayesian methods.

    c) In graphs, adjust axes to exclude extreme outliers. In plots of large samples use shading or contour lines to indicate the most densely populated regions.

    Posted

  • JeanTate by JeanTate in response to mlpeck's comment.

    Excellent questions! ๐Ÿ˜ƒ Simply by reading them, I have learned something new.

    It's been a long time since I've taken a formal statistics course but I'm pretty sure it's still verboten to cull "outliers" by hand.

    Fortunately - or not - I have never taken any such course (I'm entirely self-taught, in statistics). ๐Ÿ˜ฎ

    My proposal is still a WIP (Work In Progress), but the motivation for the low-z cut was to ensure that the covering fraction (area of fiber aperture over 'Petro_R50' area) is 'reasonable'. Without any knowledge of the relevant literature - or generally accepted practices - I guessed that 10% would be OK; unfortunately, such a cut would - if converted to a redshift - remove a very large fraction of the QS and QC galaxies.

    More broadly, as the aim of this project (well, at least one aim, as I understand it) is to investigate post-quenched galaxies in the local universe in a consistent manner, then picking a low-z cut that ensures that very few objects will have covering fractions less than ~10% (and, very likely, also ensures that the fibers will be approximately centered on the galaxies' nuclei) goes a long way towards locking in such consistency. From that perspective, what does it matter that the threshold is chosen 'after the fact'?

    Most of all, however, I'm happy that someone took the trouble to read, and respond, to my proposal.

    Are any SCIENTISTs still reading posts in Quench Talk?

    Posted

  • jtmendel by jtmendel scientist, moderator

    I think if there's good reason to believe the data are unreliable, then removal after the fact is okay. In a perfect world you would be able to decide how to construct the sample right at the outset and never think about it again. My experience that is this is rarely the case, and refining the sample is almost always an iterative process. The most important part of any cuts you make are to ensure that they aren't driven by a genuine physical property of the objects you're interested in!

    With regard to aperture effects, I think the current wisdom is that requiring a luminosity (rather than area) covering fraction >20% goes a long way to removing the worst aperture problems (the reference I'm thinking of for this is Kewley et al. 2005), at least in terms of emission lines. I would expect something like this to hold for the continuum spectra as well, although I would be hard pressed to give a reference for it.

    The translation between the fractional area covered by the fiber and the fractional luminosity depends on how light is distributed within the galaxy; because of this it is often easier to remove individual objects with covering fractions less than 20% rather than adjust the redshift limits to remove them (and sometimes throw out lots of science objects, as you've found!).

    Posted

  • JeanTate by JeanTate in response to jtmendel's comment.

    Thanks very much, jtmendel! ๐Ÿ˜ƒ

    Kewley et al. (2005) is a pretty daunting paper, at least for this zooite. However, the main conclusions - of direct relevance to the Quench project - are pretty clear, and a "covering fraction cut of 20%" would be prudent (yes, exactly how that's calculated is important).

    The translation between the fractional area covered by the fiber and the fractional luminosity depends on how light is distributed within the galaxy

    This is especially true for the QS objects: there are ~200 objects with 'merging' or 'both' classifications; how light is distributed within such galaxies is unlikely to be well modeled with either of the default SDSS models (ellipsoidal, with deVaucouleurs or exponential radial profile), as they surely have "high spatial frequency substructure" (what an amazing term! ๐Ÿ˜ฎ ), not to mention more than one nucleus (in at least some cases).

    it is often easier to remove individual objects with covering fractions less than 20% rather than adjust the redshift limits to remove them (and sometimes throw out lots of science objects, as you've found!).

    Indeed. Assuming that the QS and QC galaxies can be considered to have the same distribution of the relevant properties as the galaxies in Kewley et al. (2005) - which we already know that at least the QS ones don't - a redshift cut of ~0.04 (which is what Kewley et al. (2005) conclude is reasonable, for SDSS galaxies) would remove ~250 objects ... which is likely considerably more than the number objects whose covering fractions are < 20%.

    Now to crunch some numbers ...

    Posted

  • JeanTate by JeanTate

    What can be used to estimate the percentage of flux (luminosity) that the fiber covers, compared with the total galaxy flux? SDSS DR7 has this covered! ๐Ÿ˜›

    The flux entering the fiber is estimated by the fiber magnitudes (one per band), and the total galaxy flux by the model magnitudes. These last include a very nice fudge (or kludge) to deal with profiles that are not well-fitted by a deVaucouleurs or exponential profile, the cmodelmags. Details here. Now to make use of the CasJob database mlpeck created ...

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    Preliminary results: of the 2999 objects in mlpeck's "quenchdb" (which is some version of the QS catalog), 2793 sets of data with fiber and cmodel mags. Among these a whopping 904 have covering fractions1 in at least one band that are < 20%, and 359 < 20% in all five bands.

    1 luminosity/flux basis, taking the fiber luminosity/flux as 'fiberMag' and the total luminosity/flux as 'cModelMag'; one for each band (u, g, r, i, z)

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    A worked example, to see if anyone can spot any mistakes.

    AGS00001n8 (DR7 587738946657518031) has fiberMag_{x} and cModelMag_{x} - where {x} is (u, g, r, i, z) - of (21.50171, 18.7285), (19.72613, 16.32575), (18.62684, 15.57101), (17.96169, 15.15122), and (17.43097, 14.60935), respectively.

    The ratio of the 'fiber' luminosity to the 'whole galaxy' luminosity (as measured by cModelMag) is 10^(0.4*(fiberMag - cModelMag)). Expressed as percentages, these are: 7.8%, 4.4%, 6.0%, 7.5%, and 7.4%, all of which are waaay below 20%. This example also illustrates why a redshift cut as a proxy for fiber covering would be a BAD IDEA (assuming I've made no mistakes in my calculation) ... AGS00001n8 has a redshift of 0.042.

    enter image description here enter image description here

    Posted

  • zutopian by zutopian

    I had posted in the topic "Sample Selection" following statement on 12 Oct .:

    In the Mendel et al. paper there is given following:

    "consider only those galaxies with z=<0.2, ensuring that Halpha stays blue-ward of significant night-sky emission at  lambda> 8000ร…" and a "lower redshift limit of z = 0.01".

    http://arxiv.org/abs/1211.6115

    As far as I know, the GZQ sample contains however also galaxies, which have redshifts greater than 0.2. Nonetheless, the GZQ sample contains just 3000 galaxies, but the Mendel et al sample contains 12.105 galaxies.

    http://quenchtalk.galaxyzoo.org/#/boards/BGS0000001/discussions/DGS00001xy?page=2&comment_id=5260327972c1092dda0000ca

    Posted

  • zutopian by zutopian

    New topic by Jean:

    What do BPT diagrams look like, if you select for fiber covering fraction? http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS000021y?page=1&comment_id=528405fe72c1094f2200034c

    Posted

  • zutopian by zutopian

    From Goto's paper:

    The figure shows that at z ... 0.04, there are significant number of E+A galaxies a few times larger than the fiber size. Therefore, we recommend to use a low redshift cut (e.g., z > 0.05; see also Gยดomez et al. 2003; Goto et al. 2003) when one performs statistical analysis on the sample.

    http://arxiv.org/abs/0801.1106

    Posted