Galaxy Zoo Starburst Talk

Asymmetrical Classifications

  • ChrisMolloy by ChrisMolloy

    This is my first look at the data, in particular looking at Log Mass and Asymmetrical classifications. Low mass is 0-10.6, med is 10.6 -10.9, and high is 10.9 and above.

    1. Overview Asymmetrical QS 975 Items of 2890, Percentage 33.73%; QC 642 Items of 2998, Percentage 21.41 %
      http://tools.zooniverse.org/#/dashboards/galaxy_zoo_starburst/525753d772c1093227000003

    2. Log Mass Asymmetrical Low; 0-10.6; QS 307 Items, Percentage 31.48%; QC 307 Items, Percentage 47.81%
      http://tools.zooniverse.org/#/dashboards/galaxy_zoo_starburst/52575890749832701100016d

    3. Log Mass Asymmetrical Med; 10.6-10.9; QS 313 Items, Percentage 32.10%; QC 171 Items, Percentage 26.63%
      http://tools.zooniverse.org/#/dashboards/galaxy_zoo_starburst/52575df772c10941fb0002a8

    4. Log Mass Asymmetrical High; 10.9+; QS 355 Items, Percentage 36.41%; QC 164 Items, Percentage 25.54%
      http://tools.zooniverse.org/#/dashboards/galaxy_zoo_starburst/5257619774983207dd000158

    Observations: More QS items asymmetrical. Higher number of QC objects in lower log mass, QS more evenly spread in each log mass. QS increases slightly with higher log mass, QC decreases.

    One further thing, those total numbers seem high. Am I using the right version of tools?

    Am aware this is also not a clean dataset. Just wanted to see what these initial numbers looked like.

    Posted

  • JeanTate by JeanTate in response to ChrisMolloy's comment.

    Nice! 😃

    Perhaps next might be a test of statistical significance of these results (as an exercise; until the catalogs are clean, the results of such tests would have unknown - and unknowable - validity). I'd be more than happy to run one or two, and walk you - and any other zooite still reading these posts - through how I do such tests (that might also prompt any SCIENTIST reading it too, to step in and provide context and background, fix any mistakes, and perhaps suggest better tests). Or wait for you. Or ...

    What do you think?

    Posted

  • ChrisMolloy by ChrisMolloy

    Statistical significance tests sounds good to me. Glad to be walked through how you do them.

    Posted

  • ChrisMolloy by ChrisMolloy in response to ChrisMolloy's comment.

    Just an update on the above post on Asymmetrical classifications. I thought these numbers were to high.

    When displaying Log Mass classifications in Histogram you need to input in prompt the BPT code first before filtering for log mass and classifications. Otherwise, (I was just by chance doing a BPT scatterplot on the overview above) the figures won't match. I think, Blog Post 4 on Tools needs to be amended to reflect this. Will update the above.

    Posted

  • JeanTate by JeanTate in response to ChrisMolloy's comment.

    GIANT CAVEAT, which applies to everything I'll write about statistics: I am entirely self-taught, so my explanations etc are what makes sense to me; it is possible - likely? almost certain? - that you'd get different explanations from those who teach this stuff. Also, there may be errors and mistakes; I hope any SCIENTIST reading my posts, and how notices any, will correct those, with alacrity.

    Overview Asymmetrical QS 975 Items of 2890, Percentage 33.73%; QC 642 Items of 2998, Percentage 21.41 %

    This looks like a good case where a simple contingency test would be applicable. There are many, many websites which explain what this test is; this one is good because it describes the test (albeit using a quite different field) and points to a different test which can be used in a 2 x 2 case. Oh, and it has a link to an online calculator, so numbers can be crunched. I'll stick with the general case.

    It's difficult to format tables nicely in Talk, but here's what we have:

    class  QC   QS  Total
    Asym   642  975 1617
    !Asym 2356 1915 4271
    Total 2998 2890 5888
    

    Plugging those numbers in, I get:

    chi-square = 112.
    degrees of freedom = 1
    probability = 0.000
    

    Which basically says that there's zero chance ("probability = 0.000") that the much greater number of asymmetric QS objects (than QC ones) could arise by chance (e.g. by randomly picking from each catalog). Or - assuming clean samples and unbiased selection of QC galaxies (both of which we know to be untrue) - post-quenched galaxies are truly more likely to have asymmetrical morphology, per the GZ Quench classification exercise, than the general SDSS population of galaxies with "the same" ('matched) redshifts and stellar masses.

    Questions?

    Would you like to calculate the chi-square and probability1 of another 2 x 2 table, ChrisMolloy? Anyone else (ordinary zooite) care to try too?

    1 The 'degrees of freedom' ('dof') will be the same; do you know why?

    Posted

  • JeanTate by JeanTate

    I just realized that there are analyses you can do, with contingency tables like this, to see the extent to which QS and/or QC not being 'clean' will matter.

    For example, suppose we decide there are 42 true outliers in the QS catalog1; what's the biggest change to the distribution of the 'asymmetry' feature this could make? Well, the true outliers could all be Asym, and the matches in QC which we'd have to remove - this project is based on each QS galaxy having exactly one matched QC one - be all !Asym (i.e. be classified as 'not asymmetric'):

    class   QC      QS   Total                 QC   QS Total
    Asym   642     975-42 1617-42  which is:  642  933 1575
    !Asym 2356-42 1915    4271-42  which is: 2314 1915 4229
    Total 2998-42 2890-42 5888-84  which is: 2956 2848 5804
    

    This reduces the "chi-square" to 93.2, but the probability remains 0.000

    Similarly, we could insist that the total Asym + !Asym be the same for both QS and QC, and remove 108 QC galaxies. To do so properly, we'd need to find the QC object which matches the 'missing' QS one for each of the 'missing' 108, but what's the worst (best?) that could happen; the distribution of Asym/!Asym among the 108 QC objects that would make the overall distribution more probable? No prize for guessing: it's if all 108 'to be removed' QC objects were !Asym ones:

    class   QC       QS Total                 QC   QS Total
    Asym   642      975 1617      which is:  642  975 1617
    !Asym 2356-108 1915 4271-108  which is: 2248 1915 4163
    Total 2998-108 2890 5888-108  which is: 2890 2890 5780
    

    Same conclusion: chi-square becomes 95.2, but probability remains 0.000

    Of course, galaxies are not electrons - which can be in either a 'spin up' or a 'spin down' state, and ONLY in either state - and the distributions of zooite classifications on the 'asymmetry' feature are not binary. So to do this analyses properly, we should be working with the 'consensus asymmetry values' (one for each QS and each QC object)2. Is a contingency table test appropriate if we had such values? What do you think?

    1 Leaving aside, for now, the question of how we actually decide what a 'true outlier' is!

    2 If Asym is 1 and !Asym 0, then the consensus values will all be in the range [0, 1]

    Posted

  • ChrisMolloy by ChrisMolloy in response to JeanTate's comment.

    This is my first foray into statistics. This is fascinating. Especially the second table. Can you send me the link to your table or another which would be acceptable. I've looked but don't know where to begin.

    Here's the numbers I want to crunch:
    QS Asym 890, !Asym 1806 of 2696,Items, Asym Percentage 33.01/%;
    QC Asym 576, !Asym 2011 of 2587 items, Asym Percentage 22.26%

    So Degree of freedom is the number variation allowed which won't change the results? Chi Square is what?
    I need to do a bit more reading on this before going further.

    Posted

  • JeanTate by JeanTate in response to ChrisMolloy's comment.

    Oops! 😦 I forgot to include a link in my earlier post (I've now edited it to do so). Here it is again, as a URL:

    http://www.physics.csbsju.edu/stats/contingency.html

    Hope that helps. Anyway, as I said, there are a bazillion websites on contingency tables, and I don't have any particular recommendations. That one is good because you can enter your data and it'll calculate the 'chi-square' statistic and its probability for you. If you are even somewhat familiar with a spreadsheet, it's pretty straight forward to calculate chi-square (though it gets cumbersome for larger tables); the probability isn't so easy to write a formula for, in a spreadsheet.

    Posted

  • ChrisMolloy by ChrisMolloy in response to JeanTate's comment.

    Thank you Jean. This did help. Looking at expected and observed contingency tables was interesting. I got Chi Square 76.0, dof=1, and Probability 0.000 for the above numbers. That is a really good website. Will have a go at setting up the s/sheet. Shouldn't be to hard. Thanks again.

    Posted

  • jules by jules moderator in response to ChrisMolloy's comment.

    "Just an update on the above post on Asymmetrical classifications. I
    thought these numbers were to high.

    When displaying Log Mass classifications in Histogram you need to
    input in prompt the BPT code first before filtering for log mass and
    classifications. Otherwise, (I was just by chance doing a BPT
    scatterplot on the overview above) the figures won't match. I think,
    Blog Post 4 on Tools needs to be amended to reflect this. Will update
    the above."

    Sorry if I'm missing the obvious Chris - I'm slightly addled after spending some time following up on Blog post 4 - but what's the "BPT code" you refer to?

    Thanks!

    Posted

  • ChrisMolloy by ChrisMolloy in response to jules's comment.

    The same code you used for your mergers BPT diagram. The same in Quench Boost blog post 3.

    Posted

  • ChrisMolloy by ChrisMolloy in response to ChrisMolloy's comment.

    Also, if you go straight into blog post 4 it just goes straight into the filtering of mergers and log mass. So your not getting the emission line ratios (used to identify agn), which reduces the number of galaxies. Does this help?

    Posted

  • jules by jules moderator in response to ChrisMolloy's comment.

    Yes, I think. You need the emission line ratios if you are looking at AGN activity . But the how-to guide part 4 just looks at mergers and log mass and not AGNs. This is what I am doing and when I come to produce BPT diagrams for each merger bin then I'll need the ratios.

    Nice work by the way - your dashboards are showing up perfectly.

    Posted

  • ChrisMolloy by ChrisMolloy in response to jules's comment.

    Thanks for the response. That's good to know.

    Posted

  • ChrisMolloy by ChrisMolloy

    Here’s a brief look at asymmetrical galaxies with merging or not features. This is based on my tools version. The table listed below are those with BPT ratios as the numbers correspond more easily.

    enter image description here

    Above is a chart of the BPT FoD/Smooth overview for both categories with the total amount of galaxies for both datasets. Galaxies classified as Smooth predominate for both QS and QC respectively. The percentages are QS Smooth 74%, FoD 26% of 2689, and QC Smooth 70.1% and FoD 29.9 % of 2581.

    enter image description here

    Here’s another table of the same features with asymmetrical signatures. Approximately 26.7 % of QS Smooth galaxies are asymmetrical, 51.3 % for FoD. Whereas the QC set is 16.3 % Smooth, and 36.5 % FoD. As a total percentage QS galaxies are 33.1% asymmetrical, as opposed to 22.3% for the QC galaxies.

    enter image description here

    Asymmetrical merging signatures as classified are listed above. Included is the neither category. For the QS sample 12.8% of the Smooth and 32% of the FoD galaxies have asymmetrical merger signatures. The corresponding numbers for QC are 4.9% Smooth and 15.7% FoD.

    The Neither category, those purely asymmetrical is the following; QS Smooth 13.9% and 19.3% FoD. For QC 11.4% Smooth and 20.9% FoD respectively.

    enter image description here

    Here is the actual numbers for the different categories as listed above.

    Observations
    In looking at the above charts there are many observations that could be drawn. Three things strike me the most. Firstly, what if we added the asymmetrical merging and neither categories together. Is this evidence of merging being a greater factor than initially observed? I think this might have been mentioned previously but I can't track down the post. Secondly, FoD’s are over 50% asymmetrical and 32% more likely to be involved in mergers for the QS sample. For QC 36.5% are more likely to be asymmetrical and 15.7% are likely to be involved in mergers. Thirdly, the smoothness of the Smooth galaxies is striking also. The percentages are low. Darg notes that early type galaxy mergers are harder to detect due to the longer time scale involved in spiral mergers. There is less disturbance. Are we witnessing this?

    enter image description here

    Here’s a line graph of mass dependant merger fractions for galaxies with asymmetrical merger signatures. The QS fraction rises gradually from a fraction low of 6 at log mass 9.30 and steadily rises to a fraction of 10 at 10.47. The fractions rise dramatically to a fraction of 25 at 10.78, 29 at 11.0 and peaking at a fraction of 40 at 11.29.

    QC galaxies, start with a fraction of 10 at Log Mass 9.48 and dip to a fraction of 5 at 10.20 and 10.49 and then rise slowly to a fraction of 7 at Log Mass 10.71 levelling off at a fraction of 11 at log mass 11.06 and 11.26.

    For the QS galaxies the fraction number rises dramatically at log mass 10.47 onwards. However, with regards the QC something causes the galaxy fraction to dip between log mass 9.4 and 11.2, with the fraction numbers starting off at 10 and finishing at 11. The increase, albeit small in mergers occurs at log mass 10.49. However there is no significant rise as seen with the QS sample.

    I think what can be drawn from this chart is there is a difference between low and high mass asymmetrical mergers for the QS. Something changes around log mass 10.5. Kaviraj discussed the change in quenching mechanisms at the higher and lower log mass and the different mechanisms involved, supernovae lower log mass, agn higher log mass. For the QC galaxies I’m unsure as to how to read this as there is no significant change in the merger fraction and there just seems to be a small wave line in the graph.

    enter image description here

    Above is a plot of the mass dependant asymmetrical fractions for Asymmetrical galaxies with no merging signatures. The QS fraction starts at 14 log mass 9.48 and dips to a fraction of 9 at 10.17. It then increases to 12 at log mass 10.48, 17 at 10.77 and levelling off at 22 at log mass 11.01 and 11.27 respectively.

    The QC fraction starts at 18, log mass 9.58 and then dips to a fraction of 10 at log mass 10.16 rising slowly to 11 at 10.44 and 14 at log mass 10.75. It then decreases to a fraction of 12 at log mass 11.01 and 10 at log mass 11.3.

    Around log mass 10.1 there is a change for both QS and QC. But the QC galaxies rise but then dip at 10.7 as opposed to the QS galaxies which continue a gradual rise before levelling off. I think Kaviraj et al.(2007) change in quenching mechanisms at the higher and lower log mass, and the different mechanisms involved is relevant here again for the QS galaxies, and possibly for the QC galaxies. There is a gradual decline in QC galaxies from the low mass to the high mass. And vice versa for the QS.

    But I’m postulating also whether these galaxies are further advanced in the evolutionary time scale of quenching. They have had a merger incident with the features of this mostly passed but the merger (asymmetrical) signature still present. Wong makes an interesting comment on these types of galaxies that they appear to occupy an "intermediate morphology", as yet to be determined. Some do appear to have clear morphology while others appear very disturbed and asymmetrical. While these galaxies look intermediate they do appear to look in some cases as precursors to early type galaxies.

    enter image description here

    enter image description here

    Finally here are BPT diagrams of both the asymmetrical quenched and control galaxies. QS galaxies appear mostly embedded in the composite agn side of the chart with little in the star forming region. For QC the results are similar but a little more scattered towards the star forming region.

    Conclusion
    I suppose a final question observation I have here is are we observing more possible mergers than initially observed? And if yes, could this then explain the predominance of the QS asymmetrical galaxies in the composite, agn side of the BPT diagram?

    One further point. Someone more adept at statistical significance tests than I might want to look at the tables. I’d be intrigued at the results. I did some tests but was unsure of the significance of the results.

    Dashboards attached of Log Mass bins.

    http://tools.zooniverse.org/#/dashboards/galaxy_zoo_starburst/52966ef0134bb3054b0000af

    http://tools.zooniverse.org/#/dashboards/galaxy_zoo_starburst/529468f872c1093e9e000459

    http://tools.zooniverse.org/#/dashboards/galaxy_zoo_starburst/528dbdeafe16d9408e0000b5

    Feedback appreciated. Please excuse any glaring errors.

    Posted

  • jules by jules moderator

    Fascinating read Chris! I'll read that again when I have more time but will offer a few initial comments:

    1. Certainly looks like merging / asymmetry is definitely playing a part in our Quench sample - good line of enquiry to pursue.

    2. I was glad to see you found the same sudden gradient increase at log mass 10.5 ish in your Asymmetrical Mergers merger fraction plot that I found here. (Page 4 if the link doesn't take you there.) Something does happen at log mass 10.5. I noticed that your filtered sample showing asymmetric galaxies shows the effect to be less noticeable in QC compared with my (unfiltered) plot . Can't fathom out why that should be yet.

    3. How are you defining disturbed in your table above? I just went back to the decision tree (it's so long since I looked at it!) and couldn't quite work it out.

    4. Finally - I now see the benefit of error bars having added them to my plots as Laura recommended. I think it would be useful if you could add them to your line graphs too.

    Sterling work Chris. I really need to read that again.

    Posted

  • ChrisMolloy by ChrisMolloy in response to jules's comment.

    Hi Jules,

    Thanks for the response. I found your posts on mergers really helpful.

    The disturbed category is discussed here. It was a new category added. It's not in the decision tree.

    Will add error bars. What percentage did you use for this?

    Log Mass 10.5 is interesting. Need to think about it a bit more.

    Posted

  • jules by jules moderator

    Thanks for the link - I knew it was somewhere!

    I used Laura's error calculation:

                           square_root  of  (# of sources in this mass_bin showing merger signatures)               
                           --------------------------------------------------------------------------------------------  
                                                 the total number of sources in this mass_bin

    Posted

  • JeanTate by JeanTate in response to ChrisMolloy's comment.

    This is pretty cool, well done!

    I haven't yet had a chance to go through it all in detail, but - as jules has already noted - at ~10.5 (in log_mass), something does seem to change. And, given what I found re Eos (edge-on spirals), I wonder how the trends you report here relate to Eos fraction?

    Posted

  • ChrisMolloy by ChrisMolloy in response to JeanTate's comment.

    Thanks for that. I'm going to look at redshift next but have followed that Eos thread with interest. Really interesing. Will back track after that and have a look. I am intrigued by that FOD fraction (and EoS within it) after having read your posts.

    Posted

  • JeanTate by JeanTate in response to jules's comment.

    I've now had a chance to read your post in more detail Chris ... some intriguing results; well done! 😃

    To tidy up a few things (much like my questions on jules' work, here):

    • you're working with the v4 QS and QC catalogs, right?
    • how did you treat the 'Star or artifact' objects? Sure, there are very few of them, in either catalog, but still ...

    Finally here are BPT diagrams of both the asymmetrical quenched and control galaxies.

    Did you plot every object with all four emission line fluxes > 0? Or did you filter out those with S/N < 3?

    In light of the various redshift and mass dependencies that have been found (one of Laura's posts has a nice compilation of links; can't find it just now 😦), it will be very interesting to see how the trends you've found change when the analyses are re-done using just 'the 778'.

    Posted

  • ChrisMolloy by ChrisMolloy

    Hi Jean,

    Thanks for the response.

    Star artifacts I haven't treated yet. Aware of them especially in the Log Mass Plots. How do we deal with this?

    Pretty sure working on v 4. Replaced and reloaded all files on hard drive of tools from this post here. Had the disturbed category so presumed this was current. Is this so?

    All four emission line fluxes are > 0. Haven't changed a thing since Laura's Quench Boost Post here.
    Haven't filtered S/N LT 3. Here's a dashboard as an example.

    http://tools.zooniverse.org/#/dashboards/galaxy_zoo_starburst/5274cec272c109125a00003a

    However, after having read this post here I wondered whether there had been a change in the BPT code. I tried replicating this in tools but got stuck on the latter lines. I put this aside to revisit later. However, what I was interested in was whether it was possible in tools to isolate starforming, composite and agn. I know this could be onerous but was intrigued by it. And reading your latest threads, especially EoS you've been referencing a more detailed analyses of the galaxies.

    Is this possible in tools and is there a change in the BPT code now?

    And yes, I am itching to look at redshift but will await your response.

    Posted

  • JeanTate by JeanTate in response to ChrisMolloy's comment.

    Star artifacts I haven't treated yet. Aware of them especially in the Log Mass Plots. How do we deal with this?

    It's one of those annoying details ... in 'the 778' there is only one in QS ( AGS00000l1) and two in QC ( AGS00000z2, and AGS00001c3), so whether you include them in any analysis, or not, very likely makes no difference to the result. However, all analyses should be done consistently, which means they all should include them, or all omit them (and we should say so). And if we omit them, do we also omit their partners (for example, omit AGS00002wf, the QC counterpart of AGS00000l1)?

    It's also annoying because for some fields - e.g. 'merging' - the catalogs have values for 'Star or artifact' objects (they're set as 'Neither'), but for others they're left blank ... e.g. 'symmetrical' is blank. So to be consistent we need to first check, then decide how to treat the fields (consistently), then proceed with the analyses.

    Pretty sure working on v 4. Replaced and reloaded all files on hard drive of tools from this post here. Had the disturbed category so presumed this was current. Is this so?

    I think you're right.

    Is this possible in tools and is there a change in the BPT code now?

    It should be possible in Tools, and I vaguely remember successfully producing a BPT diagram, complete with cuts on S/N ... but it was quite a while ago, and I also remember it being exceedingly difficult and error prone (perhaps that's just my lack of familiarity?).

    I don't think there's a change in 'BPT code'; however, we do need to agree on what lines to use to separate AGN from Composite, and SFR from Composite. mlpeck's plot included LINERs, a class which is also possible to identify using Tools, I think (but I don't know the equations to use). And his 'unclassifiable' class is simply a cut on S/N (though I'm not sure which).

    Sorry for the delay in responding; I hope I haven't held you back from starting your analyses ...

    Posted

  • ChrisMolloy by ChrisMolloy in response to JeanTate's comment.

    Thanks for the response. Really good points here. For now I will include the star/artifacts and reference this as such and I agree wih your comment that:

    "all analyses should be done consistently, which means they all should include them, or all omit them (and we should say so)".

    You haven't held up my work. Going to leave tools for a day or so till whatever is happening with it is fixed.

    Posted

  • jules by jules moderator

    "For now I will include the star/artifacts and reference this as such"

    Same here - I'm sure eventually we'll have ready filtered reliable dataset to work with.

    Posted