Galaxy Zoo Starburst Talk

What differences in 'consensus classification' are there if 'duplicates' are included?

  • JeanTate by JeanTate

    This post, the last on page 7 of the "Quench project: a proposal aimed at reviving and completing it." thread, has the background. In a nutshell, many Quench project objects were classified more than once by some zooites; to ensure consistency, our analyses should be based on the rule 'no zooite classifies an object more than once', or 'consensus classifications should include no more than one 'vote' per zooite'. A link to the 'nodups' catalogs is in that post.

    As we have both the 'nodups' and the 'withdups' classifications, it is possible work out what differences there are, in the 'consensus' view. I have just completed an analysis of one set of such differences, and will present the results in this thread.

    Posted

  • JeanTate by JeanTate

    Objects with no duplicates

    There are 310 (out of 3002) QS objects whose 'total_votes' values are the same in the gzquench_sample_consensus_nodups and the most recent gzquench_sample_consensus catalogs.

    And there are 365 (out of 3002) QC ones (in gzquench_control_consensus_nodups and gzquench_control_consensus).

    I have checked that the values in all the "tnn_a0m_count" fields are indeed the same in the 'nodup' and 'withdup' catalogs, for both QS and QC. And that the "tnn_a0m_fraction" values are correct, given the "tnn_a0m_count" values.

    In the remaining posts in this thread I will ignore these objects.

    Posted

  • JeanTate by JeanTate

    Analysis of differences, with thresholds (introduction)

    As the paper we will eventually publish will present results on differences in feature frequency, between QS and QC objects^, and as 'feature frequency' will be defined using a classification threshold (a 'fraction'), my analysis looks at how including 'duplicate classifications' changes the numbers of objects in each of the 31 "tnn_a0m" fields, given a consistent "tnn_a0m_fraction" threshold.

    Perhaps some examples will help:

    758882024636285296 ( AGS00001cn), a QS object, has 17 nodup 'votes' and 20 withdup ones. The number of votes for the three answers to the first question ("Is the galaxy simply smooth and rounded with no sign of a disk?") is (10,7,0) and (10,9,1), respectively. For a threshold of ">0.5", on the first answer ("Smooth"), this object changes its classification; with only 'nodup' votes, it is over the threshold (10/17, or 0.588), but under it if duplicates are included (10/20, or 0.5). There is no change for a threshold of ">0.7".

    588848900991942796 ( AGS000001q), also a QS object, has 19 nodup 'votes' and 21 withdup ones. The numbers of votes for the two answers to the third question ("Does the galaxy have a bulge at its centre?") are (0,0) and (0,1), respectively. For a threshold of ">0.5", on the second answer ("No"), this object changes its classification; with only 'nodup' votes, it is under the threshold (0, because there are no votes for "Eos" in Q2), but over it if duplicates are included (1/1, or 1.0, because there is one vote for "Eos"). There is also a change for a threshold of ">0.7".

    (In the last example there's a subtly I hadn't appreciated before: "zero" - both count and fraction - has two meanings, depending on whether that part of the tree was reached by at least one zooite-classification or not; if it was, then the values are true non-negative integers and reals (actually just the subset, rationals), otherwise they are more accurately described as 'N/A' (or something similar). I did not treat these two differently, nor do I intend to.)

    For the rest of the results I'll be reporting, I will focus on 'nodup to withdup' classification changes of the kind in the first example; namely, a 'nodup' classification that is over a threshold but the 'withdup' one is under it. I will look at two thresholds, ">0.5" and ">0.7".

    Note: none of the differences in the two examples above are relevant for the features we have chosen to examine ("merging" and "asymmetry"). I will write about those in a later post.

    ^there may be other results presented too, of course, but these results are likely to be the major ones

    Posted

  • JeanTate by JeanTate

    Numbers of classification changes, by the 31 answers ('atoms')##

    "the 31 answers" are the "tnn_a0m" fields; they are the 'atoms' of classification.

    Note that only two questions - t00 ("Is the galaxy simply smooth and rounded with no sign of a disk?"), three possible answers, and t11 ("Would you like to discuss this object?"), two answers - are asked for every object. Two more - t09 ("Is the galaxy currently merging or is there any sign of tidal debris?"), four possible answers, and t10 ("Does the galaxy appear symmetrical?"), two - for all except those classified as soa ("Star or artifact").

    For both QS and QC catalogs, the minimum number of changes is zero. It's the same atom, t00a02 (Star or artifact) in both catalogs. And it's the same for both thresholds (>0.5 and >0.7). It's also zero for the atom t11a00 (Yes in answer to the question "Would you like to discuss this object?") for the threshold >0.7, in both catalogs.

    The maximum number of classification changes is for the t00a00 atom (Smooth), in both QS and QC catalogs, and for both thresholds: 152 and 113 (QS and QC, respectively) for >0.5; 337 and 281 (ditto) for >0.7.

    The average number of classification changes: 27.0 and 20.3 (QS and QC, respectively) for >0.5; 37.8 and 32.1 (ditto) for >0.7.

    Those are the numerators; what about the ratios (i.e. % of answers whose classification changes)*?

    The minima do not change, obviously.

    For both catalogs, for the threshold >0.7, the t00a00 atom (Smooth) is again the maximum; 29.9% (QS; 337/1126) and 24.6% (QC; 281/1144) (actually, it's the t09a01 atom, at 25.0% 2/8).

    However, for the threshold >0.5 it's different: the t11a00 atom (Yes, would like to discuss), with 37.5% (3/8) for QS, and 100% (2/2) for QC. Ignoring this atom, the maxima are both 20%, but for different atoms: t08a00 (QS, 21/105, 1 off-centre bright clumps embedded within the galaxy) - and t09a01 (QC, 11/55, tidal debris). For the t00a00 atom (Smooth), the ratios are 7.8% (QS, 152/1953) and 6.1% (QC, 113/1860).

    The averages: QS 6.9% and 9.0% (for >0.5 and for >0.7, respectively), and QC 8.2% and 8.2% (ditto).

    At first glance, this seems pretty disturbing: in our analyses of "merging" and "asymmetry" differences of ~±10% would dramatically affect our results, irrespective of what threshold we choose! When these differences are examined in detail, especially above the 'atom' level, the picture is not so bad ... as I'll show in later posts.

    a word about the denominators: these are, for each of the 31 'atoms', the number of 'nodup' objects with "tnn*_a0m_fraction" above the threshold

    Posted

  • JeanTate by JeanTate

    'Asymmetric' objects

    In this post I will examine how the number of objects classified as 'asymmetric' changes if 'withdup' classifications are included.

    Question 10 is "Does the galaxy appear symmetrical?" Except for partial classifications (where a zooite does not complete all questions presented, for an object, which are excluded from all catalogs), this question is asked (and answered) for all objects other than those classified as soa ("Star or artifact") in Question 1 ("Is the galaxy simply smooth and rounded with no sign of a disk?").

    A change in the number of objects classified as soa - e.g. 9/17 (0.529) 'nodup' cf 9/20 (0.45) 'withdup', for a threshold of >0.5; 12/16 (0.75) 'nodup' cf 13/20 (0.65) 'withdup', for a threshold of >0.7 - can produce a change in the number (and fraction) of 'asymmetrical galaxies' even if there are no changes to the t1000 or t1001 atoms ... depending on how we define 'asymmetrical galaxy'.

    Fortunately, we do not have to consider this indirect effect*; there is just one QS object (587725505557823574, AGS000004v) that is a soa, for both thresholds (>0.5 and >0.7), and three QC ones (587734894362493076, AGS00004ny; 587735236882727375, AGS00003hs; and 587739115244748811, AGS00003ag), all soa for >0.5 but not for >0.7. These four are all soa in both 'nodup' and 'withdup' catalogs: QS object: (0.882, 15/17; 0.85, 17/20); QC objects: (0.538, 7/13; 0.6, 12/20), (0.563, 9/16; 0.55, 11/20), and (0.529, 9/17; 0.6, 12/20), respectively.

    Perhaps the simplest definition of an asymmetric galaxy is "a QS or QC object for which the majority of t10 ("Does the galaxy appear symmetrical?") votes is for a01 (No)"; i.e. for a threshold of >0.5 for t10a01. Using this definition, how does the number of asymmetric galaxies differ, if the 'withdup' catalog is used, rather than the 'nodup' one?

    There are 865 QS objects (out of 2692, 32.1%) which satisfy this criterion in the 'nodup' catalog (ignoring the 310 QS objects for which N_vote is the same in the 'nodup' and 'withdup' catalogs; see the second post in this thread), and 516 QC objects (out of 2637, 19.6%; ditto (365)). And there are 833 (30.9%) QS objects in the 'withdup' catalog, and 497 (18.8%) QC ones (same caveats). The differences, -32 (QS, -1.2%) and -19 (QC, -0.7%), are the net(s) of 55 QS 'nodup' asymmetric galaxies being 'not asymmetric' in the 'withdup' catalog and 27 'withdup' asymmetric galaxies being 'not asymmetric' in the 'nodup' catalog (and 51/32 QC galaxies).

    What if the threshold is >0.7 (rather than >0.5), with everything else the same (or ceteris paribus)?

    By this definition, the number of asymmetric galaxies in the 'nodup' QS catalog is 459 (17.1%), and 201 (7.6%) in the QC one. In the 'withdup' catalogs, the numbers are 428 (15.9%) and 182 (6.9%), respectively. The net differences are again ~-1% (fewer asymmetric galaxies in the 'withdup' catalogs than in the 'nodup' ones).

    Such a small difference (~-1%) is too small to worry about, right? Unfortunately, probably not ... but that's the topic of a later post ...

    *Of course, we still have to decide whether to include these four (or one) objects in any analyses of 'asymmetric galaxies' (and 'merging galaxies' too): do we consider them 'galaxies' at all? if yes, how do we include the Q10 (Q9) votes by zooites (the minority who did not vote 'soa')? However, these considerations apply whether we analyze data in the 'nodup' or the 'withdup' catalogs.

    Posted

  • JeanTate by JeanTate

    More on asymmetric galaxies

    enter image description here

    Those are KS plots, of the "No" answer to Q10 (Does the galaxy appear symmetrical?). In the top plot, for all 3002 QS objects and all 3002 QC ones (i.e. including those ~300+ for which there are no duplicates). In the bottom plot it's QS, and only those with three or more duplicates (per object), 1157 objects.

    The darker blue lines (both plots, "Snd1") are the QS 'nodup' fractions, and the green lines ("Sd1") the QS 'withdup' ones.

    In the top plot, the orange line ("Cnd1") is the QC 'nodup' fractions, and the dark red one ("Cd1") the QC 'withdup' one. The lighter blue points (not a line, "Snd0") in the top plot is the QS 'nodup' 'Yes' answer fractions.

    The top plot shows:

    • for almost every fraction greater than ~0.15 (and less than ~0.95), there are fewer asymmetric QC objects than QS ones
    • there is, at first glance, no difference between the 'nodup' and 'withdup' distributions, for either QS or QC objects
    • the distribution of QS 'nodup' Yes fractions is a simple transform of the QS 'nodup' No ones (as you would expect)

    Except near the two ends, the 'nodup' distributions seem to show more asymmetry than the corresponding 'withdup' ones: the green line is nearly everywhere above the darker blue one, and the dark red the orange. While the results in earlier posts in this thread (above) confirm this - at least for thresholds of > 0.5 and 0.7 - I doubt that a KS statistic would show this difference to be statistically significant.

    The bottom plot - of ~1/3 of the QS objects, those with three or more duplicates - shows this difference more clearly, with the green line above the darker blue one for all fractions between ~0.25 and ~0.95 (or at least not below it).

    In some sense, the 'nodup' data is a subset of the 'withdup' data. It may be that that means a KS test is invalid, and to do a valid KS test I would need to compare the 'nodup' distribution with the ('withdup' minus 'nodup', in some sense) one. Does anyone reading this (mlpeck 😉 ) know?

    enter image description here

    This is also a (pair of stacked bar) plot(s) of Q10 No fractions, for QS objects. In it I am exploring an 'odd-even' effect.

    The data excludes objects for which there are no duplicates (310), and fractions ≤ 0.4 and ≥ 0.6; 571 objects in the top ('nodup') plot, 533 in the bottom ('withdup').

    The bars are bins of equal 'fraction' width, infinitesimally less than 0.01. The first bar ("1") is the bin (0.40, 0.41], "2" is (0.41, 0.42), ... "10" is (0.49, 0.50], ... "19" is (0.58, 0.59]. I think quasi-code for a transform which works is ROUND(100fraction)-40.

    The y-axes are percentages; summing the bars in each plot gives 100.

    Orange bars are fractions for objects whose total Q10 counts (i.e. t10_a00_count + t10_a01_count) are odd integers; blue bars, even.

    For the vote totals in the Quench project, a fraction of 0.5 (bin 10) can be produced by only an even numbered Q10 total count. It follows that a difference in the number of objects, for two distributions that are otherwise the same except for the ratio of even-numbered to odd-numbered total votes, which survive a 'fraction threshold cut' will depend on the threshold. In short, there will be an odd/even ratio systematic effect.

    This effect can be seen in the two plots above, albeit not very clearly (can you think of a way to show it more clearly?): the "7" and "13" bins are smaller in the lower plot than the upper, and the "10" bin bigger.

    *I vaguely remember this notation: it's a range of reals, with the two numbers being the lower and upper bounds. Ordinary brackets denote the bound is excluded from the range, square brackets included.

    Posted

  • JeanTate by JeanTate

    'Merging' objects

    This post is like the 'Asymmetric objects' one above, except that the topic is 'merging objects'. I can repeat the second para, and skip the third and fourth (and footnote); the details/explanations are the same.

    Question 9 is "Is the galaxy currently merging or is there any sign of tidal debris?" Except for partial classifications (where a zooite does not complete all questions presented, for an object, which are excluded from all catalogs), this question is asked (and answered) for all objects other than those classified as soa ("Star or artifact") in Question 1 ("Is the galaxy simply smooth and rounded with no sign of a disk?").

    Unlike Q10, where the two answers are Yes and No, Q9 has four choices: Merging, Tidal debris, Both, Neither.

    Perhaps the simplest definition of a merging galaxy is "a QS or QC object for which the minority of t09 ("Is the galaxy currently merging or is there any sign of tidal debris?") votes is for a03 (Neither)"; i.e. for a threshold of <0.5 for t09a03*. Using this definition, how does the number of merging galaxies differ, if the 'withdup' catalog is used, rather than the 'nodup' one?

    There are 593 QS objects (out of 2692, 22.0%) which satisfy this criterion in the 'nodup' catalog (ignoring the 310 QS objects for which N_vote is the same in the 'nodup' and 'withdup' catalogs; see the second post in this thread), and 324 QC objects (out of 2637, 12.3%; ditto (365)). And there are 591 (22.0%) QS objects in the 'withdup' catalog, and 321 (12.2%) QC ones (same caveats). The differences, -2 (QS, -0.1%) and -3 (QC, -0.1%), are the net(s) of 22 QS 'nodup' merging galaxies being 'not merging' in the 'withdup' catalog and 20 'withdup' merging galaxies being 'not merging' in the 'nodup' catalog (and 19/16 QC galaxies).

    What if the threshold is <0.3 (rather than <0.5), with everything else the same (or cet. par.)?

    By this definition, the number of merging galaxies in the 'nodup' QS catalog is 324 (12.0%), and 155 (5.9%) in the QC one. In the 'withdup' catalogs, the numbers are 313 (11.6%) and 150 (5.7%), respectively. The net differences are again ~-0.2% (fewer merging galaxies in the 'withdup' catalogs than in the 'nodup' ones).

    Even more than with the asymmetric galaxies, such a small difference (~-0.2%) is too small to worry about, right? Unfortunately, probably not ... but that's the topic of a later post ...

    *this may seem a bit backwards; however it's perfectly logical, and consistent, given that the four t10 fractions sum to 1.000, for all QS and QC objects (I checked). After all, if astronomers are perfectly happy to work with a system in which the brighter an object is the more negative its magnitude is ... 😉

    Posted

  • JeanTate by JeanTate

    More on merging galaxies

    enter image description here

    Those are KS plots, of the "Neither" answer to Q9 (Is the galaxy currently merging or is there any sign of tidal debris?). In the top plot, for all 3002 QS objects and all 3002 QC ones (i.e. including those ~300+ for which there are no duplicates). In the bottom plot it's QS, and only those with three or more duplicates (per object), 1157 objects.

    The darker blue lines (both plots, "Snd3") are the QS 'nodup' fractions, and the green lines ("Sd3") the QS 'withdup' ones.

    In the top plot, the orange line ("Cnd3") is the QC 'nodup' fractions, and the dark red one ("Cd3") the QC 'withdup' one.

    The top plot shows:

    • for almost every fraction, except perhaps those greater than ~0.95, there are fewer merging QC objects than QS ones
    • there is, at first glance, no difference between the 'nodup' and 'withdup' distributions, for either QS or QC objects

    Unlike for asymmetric galaxies, the QS 'nodup' merging galaxies distribution does not seem different from the 'withdup' one. That's a contrast with the QC distributions: while the differences seem small, the dark red line is almost everywhere above the orange one. This is in line with the much smaller differences between the numbers of 'nodup' and 'withdup' merging galaxies for thresholds of < 0.3 and 0.5 (per the post above).

    The bottom plot - of ~1/3 of the QS objects, those with three or more duplicates - shows this lack of difference more clearly, with the green line above the darker blue one only for (most) fractions greater than ~0.60 (or at least not below it).

    Posted

  • JeanTate by JeanTate

    Sidebar: A somewhat different KS test

    In some sense, the 'nodup' data is a subset of the 'withdup' data. It may be that that means a KS test is invalid, and to do a valid KS test I would need to compare the 'nodup' distribution with the ('withdup' minus 'nodup', in some sense) one. Does anyone reading this (mlpeck 😉 ) know?

    enter image description here

    Those are KS plots of "'withdup' minus 'nodup', in some sense"; the upper one is of 'asymmetrical galaxies' (per the definition in this post, above), the lower 'merging galaxies' (this post). The darker blue lines are the same as in those two posts above (QS 'nodup' fractions, excluding Ndup = 0, 1, or 2); the cyan lines are the QS fractions for the 'duplicate difference' votes.

    There are only 1132 objects.

    In the bottom plot it's QS, and only those with three or more duplicates (per object), 1157 objects.

    Huh? There are 25 QS objects with three or more duplicates (per object), BUT the Q9 (and Q10; they're the same) total number of 'withdup' votes (counts) is equal to the number of 'nodup' votes (counts). For all these the 'extra' 'withdup' votes are all soa ("Star or artifact")* 😮

    Clearly the small number of 'duplicate difference' votes results in a very lumpy distribution of fractions. Does that make a KS test invalid? inconclusive? I don't know; do you?

    *one object (587730773872869598, AGS00000jt) has five 'extra' votes ... all of which are soa (and it's not the only object like this) 😮 😮

    Posted

  • JeanTate by JeanTate

    "Too small to worry about, right?" Wrong! 😦##

    [re asymmetric galaxies] Such a small difference (~-1%) is too small to worry about, right? Unfortunately, probably not ... but that's the topic of a later post ...

    .

    Even more than with the asymmetric galaxies, such a small difference (~-0.2%) is too small to worry about, right? Unfortunately, probably not ... but that's the topic of a later post ...

    Take asymmetric galaxies (per the definition given), and the > 0.7 threshold:

    [...] the number of asymmetric galaxies in the 'nodup' QS catalog is 459 (17.1%), and 201 (7.6%) in the QC one. In the 'withdup' catalogs, the numbers are 428 (15.9%) and 182 (6.9%), respectively.

    The differences, ignoring sign, are 31 (QS) and 14 (QC). These are, indeed, differences of ~1% ... if the denominator is the total number of galaxies. However, the analyses we are interested in (well, at least some such analyses) concern the distribution of asymmetric galaxies by attributes such as redshift and log_mass; for such analyses the denominator(s) are 459 and 201 (or similar) ... and the differences become much larger (~7%).

    It may be that the ('extra') duplicate classifications are not significantly different than the 'nodup' ones, being distributed across any and all attributes we are interested in in the same way (with a general, slight, tendency towards boring; somewhat fewer asymmetric galaxies, fewer merging galaxies, for example). If so, we could use the 'withdup' catalogs (they have more classifications).

    Sadly, there's strong evidence that the duplicate classifications are actually very different.

    Consider Q1 (t00, "Is the galaxy simply smooth and rounded with no sign of a disk?"), and the third answer in particular (t00_a02, "Star or artifact" (soa)):

    There are 53,690 nodup QS classifications in all*, of which 1,960 were soa; that's 3.7% of the 53,690. In the 'withdup' QS catalog, there are 60,310 classifications, 6,620 more than in the 'nodup' QS catalog. How many of those 'extra' 6,620 classifications were soa? If these 'extra' classifications were distributed across the three answers to Q1 similarly to the nodup ones, there should be ~245 (which is 3.7% of 6,620).

    There are, however, 2,009 ... over 30% of the 'extra' classifications were soa^! 😮

    *said another way: there are 53,690 answers to Q1, across all 3002 QS objects; each zooite answering Q1 did so no more than once for each object

    ^the split of the remaining 4,611 'extra' classifications between a00 ("Smooth") and a01 ("Features or disk") is also very different than that for the nodup ones: 2719/33274 ('extra'/nodup; a00) and 1892/18456 (ditto; a01)

    Posted

  • mlpeck by mlpeck

    In some sense, the 'nodup' data is a subset of the 'withdup' data. It
    may be that that means a KS test is invalid, and to do a valid KS test
    I would need to compare the 'nodup' distribution with the ('withdup'
    minus 'nodup', in some sense) one.

    The two sample KS test assumes the samples are independent. It also assumes the draws for each sample are independent. So the first assumption is definitely violated and the second probably is for the data set with duplicates. Better not use a KS test.

    I'd suggest instead a parametric model, namely that the vote totals follow binomial distributions (for the binary choice questions), with the null hypothesis being that those who were fed repeats made votes that were independent of their first exposure. If the null hypothesis is true the vote totals for any given question on any given galaxy will still be binomial with the same probability p and a larger (in general) number of trials.

    How to devise a test for that null hypothesis is a problem I do not intend to think about.

    Posted

  • JeanTate by JeanTate in response to mlpeck's comment.

    Thanks! 😃

    Once I found that the fraction of 'soa' classifications among the 'extras' is so dramatically different from that among the 'nodup' ones, I stopped being interested in finding good statistical tests (for now, for this purpose). Including duplicate classifications very obviously introduces an important, far-from-trivial bias, even if for some specific questions that bias is apparently small. IMHO the classification databases should have had duplicates removed BEFORE being uploaded to Tools, let alone 'published' as FITS and CSV files ...

    Posted