Galaxy Zoo Starburst Talk

Quench project: a proposal aimed at reviving and completing it.

  • JeanTate by JeanTate

    Summary:

    Write to the eight SCIENTISTs who have posted here since 1 August, 2013, asking them to make public the Quench project clicks database. When that is done, let us ordinary zooites complete Stage 2, then Stage 3, as originally proposed. To help ensure timely completion, in writing to the eight SCIENTISTs, ask each for an explicit commitment to work actively with us, right through to the end of Stage 3.

    Background

    The Galaxy Zoo Quench project is described in this "Project Overview" document.

    Stage 1 - "Classification" - took somewhat longer than expected, partly because the Control sample contained duplicates which had to be removed and replaced (and classified, in the Quench Boost phase), but was completed before the end of August, 2013.

    Stage 2 - "Data Analysis & Discussion" - was pretty rocky, with a lot of good discussion on many topics, but also at least one major revision to the classifications parts of the catalogs used in Tools (which are, to date, the only complete databases available to zooite participants); see this post by SCIENTIST mzevin1, dated September 26 2013. By mid December, 2013, there were five zooites still active - ChrisMolloy, JeanTate (me), jules, mlpeck, and zutopian - one SCIENTIST (trouille), and one Development Team member (edpaget).

    Between 17 and 20 December, 2013 the Quench project Tools databases ("dataset" or "Quench tables") were changed, at least twice; these contain thousands of differences, when compared with the databases made available in late September. Many of those differences are in the summary zooite classifications, which are at the heart of this project (details are in this thread). The Quench project came to a halt.

    The Quench project as science

    In launching Quench, ltrouille wrote "this new Galaxy Zoo Quench project provides the opportunity to take part in the ENTIRE scientific process – everything from classifying galaxies to analyzing results to collaborating with astronomers to writing a scientific article!"

    Within the Zooniverse, there are only two other comparable projects, SpaceWarps and Planet Hunters. Not being a professional scientist, I cannot say - from first-hand experience - to what extent our experience in Quench is typical of collaborative astronomical research; however, by comparing what's in the public record - mostly the Talks of those two other Zooniverse projects - there seem to be two huge differences with what we have experienced:

    • participants in those projects have direct access to the data they need in order to do the data analysis which leads to the writing of professional journal articles
    • professional astronomers are very actively engaged.

    Further, anyone wishing to do research based on zooites' Galaxy Zoo 2 clicks - to take just one example - can easily and freely access a much richer dataset of classifications than those provided in Quench (see here for details).

    So, if we have access to Quench classifications data to the same level of detail as is available in the Galaxy Zoo 2 catalog, we should be able to complete Stage 2.

    And if we have a comparable level of involvement by professional astronomers as in SpaceWarps and Planet Hunters - and as was clearly intended ("Throughout, [volunteers will] discuss with the science team their interpretation of the results. At the end of the process, volunteers and the science team will collaboratively write a 4-page Astrophysical Journal article") - we should also be able to complete Stage 3 successfully.

    Notes

    The eight SCIENTISTs, and the date (in 2013) they last posted here (if there are any errors, please let me know!), are:

    • vrooje: August 9
    • klmasters: August 9
    • KWillett: August 29
    • astropixie: September 6
    • jtmendel: October 10
    • mzevin1: October 11
    • ivywong: October 28
    • trouille: December 20

    I propose that we write to them by sending them an email, and that we find their email addresses from their institutional webpages. I also propose that we use this thread to collaboratively agree on what the email we send to them should say.

    What do you think?

    Posted

  • mlpeck by mlpeck

    I think you could send exactly this document, minus the first paragraph to the scientists you listed1 plus Ed Padgett (sp?) and any other relevant GZ technical people we can identify.

    The one thing that might be worth adding are links to discussion threads that seem scientifically promising.

    1AFAICT M. Zevin is most likely a beginning grad. student or senior undergrad -- a check late last year showed no papers on ADS or astro-ph on which he was a co-author.

    2Data reduction software for GZ data is available on GitHub. According to the description "most" of it is written in Python and IDL (ugh). Given access to the raw data it should probably be feasible to reduce it ourselves if Zooniverse staffers no longer have the time or interest, depending on how compatible the IDL code is with the open source GDL.

    Posted

  • ChrisMolloy by ChrisMolloy in response to JeanTate's comment.

    I agree with what you've written. I can't really add too much to this. It covers all areas succinctly and clearly. I concur with mlpeck suggestion to look at discussion threads that look scientifically promising, and at other areas of interest that could lead to promising avenues of pursuit. And I’m sure that there are a few areas of interest not listed in the discussion boards that could be pursued.

    I do think maybe Quench could have been micromanaged a bit more. In the sense that maybe those of us less adept at some areas could have gathered and collected and analysed the data say on all areas of morphology. Maybe the science team could have broken the participants down into groups to look at different areas and guided us through. You did this for me looking at asymmetrical galaxies. If this had been done from the beginning of the data analysis phase then all of the morphology could have been collected and analysed quite quickly. However as most of the science team departed one could be left wondering what if. And that’s not taking into account the integrity of the data in Tools. I’m not discussing this.

    Maybe the time frame was ambitious of a 4 week turn around, with Tools being in Beta. Maybe three months was more realistic. Also I thought after the last Quench Boost post in September there could have been more posts on say redshift, absmag etc. It might have kept more people around. At one stage I was going to suggest to you to write a Quench Boost Post. Another what if.

    I really think that Tools needs to be made usable and workable and that we need to push for this. I know I have said this before as have others but it just has so much potential. If we had have had a dedicated techie working on Tools throughout Quench’s inception would we be where we are at now? And I do think that Tools or something similar is the future if you want to do a lot of mass-produced research of a high quality, if it is managed well. And talking of managing things well I do think that yourself, mlpeck, zutopian and Jules have managed things well. And I hope zutopian does check in and does not leave permanently.

    So, I hope this helps. Not really adding anything new to what you’ve written. Maybe a sense of frustration, a bit garbled, and it’s late here and we had a big earthquake today so my nerves are a bit frayed. I would really like something to come from this. I have learnt so much from this project. I think your letter is good so go for it and add anything of value if you see fit from this post which is mostly a retrospective analysis about what could have been and should have been.

    Posted

  • zutopian by zutopian in response to ChrisMolloy's comment.

    And talking of managing things well I do think that yourself, mlpeck, zutopian and Jules have managed things well. And I hope zutopian does check in and does not leave permanently.

    Thanks for your kind comment. Besides I am pleased, that you are back and want to continue.
    I had started to participate on 18th Sept. So I had missed a lot and it took a while to get an overview. I was surprised about the high-level
    discussions.: You, Jean, Jules and mlpeck did great work.

    Maybe the science team could have broken the participants down into groups to look at different areas and guided us through.

    I think, that the volunteers should have been classified in 3 groups at the begin of the project.: Advanced - Intermediate- Newbies
    Besides there should have been at least one scientist per group to guide the volunteers.
    You, Jean, Jules and mlpeck would have been in the advanced group, of course! I would have been in the Newbies group, even if I had started to participate at the beginning of the project.

    Maybe a sense of frustration, a bit garbled, and it’s late here and we had a big earthquake today so my nerves are a bit frayed.

    I guess, that you live in New Zealand. I am glad, that you are fine apart from the fact, that your nerves are a bit frayed.

    It would be fine, if you, Jean or mlpeck inform me and also Jules by PM (Talk or GZ forum), when there is progress in this project.

    Posted

  • JeanTate by JeanTate

    Thanks mlpeck, ChrisMolloy, and zutopian. 😃

    I'll have a go at drafting an email, and will share it here before I send it. OK if it reads - perhaps explicitly - as coming from the four of us (and not just me)? I'll PM jules to ask her if she's OK with that too.

    [mlpeck] The one thing that might be worth adding are links to discussion threads that seem scientifically promising.

    [ChrisMolloy] I concur with mlpeck suggestion to look at discussion threads that look scientifically promising, and at other areas of interest that could lead to promising avenues of pursuit.

    I agree that an example or two would be a good idea; any faves? On the other hand, I think the focus should be on enabling the remaining stages, per the original vision, rather than selling possible scientifically valuable results (I think we've shown that - collectively - we're more than capable of doing this work, and that the possible results are interesting and valuable).

    [ChrisMolloy] we had a big earthquake today so my nerves are a bit frayed

    Hope your nerves have steadied by now; big earthquakes are surely scary!

    [zutopian] It would be fine, if you, Jean or mlpeck inform me and also Jules by PM (Talk or GZ forum), when there is progress in this project.

    Will do.

    Posted

  • ChrisMolloy by ChrisMolloy in response to zutopian's comment.

    Thanks for the kind words. Yes Wellington. Nerves have settled. And I am definitely a newbie!

    Posted

  • ChrisMolloy by ChrisMolloy in response to JeanTate's comment.

    I agree that an example or two would be a good idea; any faves? On the other hand, I think the focus should be on enabling the remaining stages, per the original vision, rather than selling possible scientifically valuable results (I think we've shown that - collectively - we're more than capable of doing this work, and that the possible results are interesting and valuable).

    I still want to finish looking at the asymmetrical. Tools still hanging. However, your second sentence is pertinent and we should run with this. Look forward to reading the draft e-mail. No problems from me with it coming from all of us.

    Posted

  • zutopian by zutopian

    Jean wrote:

    The eight SCIENTISTs, and the date (in 2013) they last posted here (if there are any errors, please let me know!), are:
    ◦vrooje: August 9
    ◦klmasters: August 9
    ◦KWillett: August 29
    ◦astropixie: September 6
    ◦jtmendel: October 10
    ◦mzevin1: October 11
    ◦ivywong: October 28
    ◦trouille: December 20

    That's probably right. I had started to participate on 18th Sept. I had discussed with mzevin1, trouille and ivywong*.
    jtmendel did a post (on Oct 10) during my participation and he had done posts before.
    vrooje, klmasters, astropixie and KWillett hadn't done posts during my participation! I read posts, which they had done before.
    As far as I know, there are no posts by other scientists. Well, there are posts by Ed Paget**, as mentioned by mlpeck.

    *BTW, ivywong had done her 1st post on 16 Oct.: http://quenchtalk.galaxyzoo.org/#/users/ivywong
    **Profile of Ed Paget: http://quenchtalk.galaxyzoo.org/#/users/edpaget

    Posted

  • JeanTate by JeanTate

    I'll have a go at drafting an email, and will share it here before I send it.

    Here is the first draft. I've got placeholders for the two (more?) scientifically promising areas; I've yet to share the email addresses (or should I keep them to myself?). Also, the salutation ("Dear X,"?) is missing. Oh, and the links will have to be formatted differently (my email addy is a Gmail account). Feedback please!


    We earnestly request that you make public the Quench project clicks database, in a form similar to the Galaxy Zoo 2 data release.

    We would also urge you to publicly commit to work actively with us on the Quench project, following the original project design, right through to the end of Stage 3.

    As of December, 2013, there were several different investigations under way, most of which seem scientifically promising; for example {one} and {two}. Once the Quench project is revived, we intend to continue these.

    The remainder of this email gives a brief description of the background to, and current status of, the Quench project, and its scientific nature.

    ("Signed")
    {Jean Tate, Chris Molloy, Jules, mlpeck} We are ordinary zooites who were active in the Galaxy Zoo Quench project at the time the serious problem with the integrity of the classifications in the Quench Control and Quench Sample databases (used in Tools) was reported.


    The Galaxy Zoo Quench project is described in this "Project Overview" document.

    Stage 1 - "Classification" - took somewhat longer than expected, partly because the Control sample contained duplicates which had to be removed and replaced (and classified, in the Quench Boost phase), but was completed before the end of August, 2013.

    Stage 2 - "Data Analysis & Discussion" - was pretty rocky, with a lot of good discussion on many topics, but also at least one major revision to the classifications parts of the catalogs used in Tools (which are, to date, the only complete databases available to zooite participants); see this post by SCIENTIST mzevin1, dated September 26 2013. By mid December, 2013, there were five zooites still active - ChrisMolloy, JeanTate (me), jules, mlpeck, and zutopian - one SCIENTIST (trouille), and one Development Team member (edpaget).

    Between 17 and 20 December, 2013 the Quench project Tools databases ("dataset" or "Quench tables") were changed, at least twice; these contain thousands of differences, when compared with the databases made available in late September. Many of those differences are in the summary zooite classifications, which are at the heart of this project (details are in this thread). The Quench project came to a halt.

    The Quench project as science

    In launching Quench, ltrouille wrote "this new Galaxy Zoo Quench project provides the opportunity to take part in the ENTIRE scientific process – everything from classifying galaxies to analyzing results to collaborating with astronomers to writing a scientific article!"

    Within the Zooniverse, there are only two other comparable projects, SpaceWarps and Planet Hunters. Not being a professional scientist, I cannot say - from first-hand experience - to what extent our experience in Quench is typical of collaborative astronomical research; however, by comparing what's in the public record - mostly the Talks of those two other Zooniverse projects - there seem to be two huge differences with what we have experienced:

    • participants in those projects have direct access to the data they need in order to do the data analysis which leads to the writing of professional journal articles
    • professional astronomers are very actively engaged.

    Further, anyone wishing to do research based on zooites' Galaxy Zoo 2 clicks - to take just one example - can easily and freely access a much richer dataset of classifications than those provided in Quench (see here for details).

    So, if we have access to Quench classifications data to the same level of detail as is available in the Galaxy Zoo 2 catalog, we should be able to complete Stage 2.

    And if we have a comparable level of involvement by professional astronomers as in SpaceWarps and Planet Hunters - and as was clearly intended ("Throughout, [volunteers will] discuss with the science team their interpretation of the results. At the end of the process, volunteers and the science team will collaboratively write a 4-page Astrophysical Journal article") - we should also be able to complete Stage 3 successfully.

    Posted

  • JeanTate by JeanTate

    I agree that an example or two would be a good idea; any faves?

    I've got one nomination so far, from ChrisMolloy: Asymmetrical Classifications. My personal fave is What bias does the varying fraction of Eos - in the QS catalog - introduce?, but I'm biased 😉

    Posted

  • mlpeck by mlpeck

    I think the piece that Jules was working on should be mentioned, especially since it's another area where the GZ classifiers possibly added some value: there is a larger fraction of galaxies in the quench sample that display merger signatures of some sort, and there is an intriguing correlation with stellar mass that seems to be lacking in the control sample.

    Posted

  • JeanTate by JeanTate

    Thanks.

    the piece that Jules was working on should be mentioned

    Is this it? Mass Dependent Merger Fraction (Control vs Post-quenched Sample) If so, I agree.

    Anyone else have a fave (or two)? Other comments on the draft email?

    Re email addys: I've got them all except for Michael Zevin (mzevin1; his name is mentioned in this GZ blog post), and Ed Paget (but we can send him the proposal in 'Talk format', via PM, which I'm sure he checks often).

    Posted

  • zutopian by zutopian in response to JeanTate's comment.

    Is this it? Mass Dependent Merger Fraction (Control vs Post-quenched Sample) If so, I agree.

    I think, that it might be actually unnecessary to mention it in the e-mail, because it was mentioned in the fastcolabs article. (just my humble opinion)

    Re email addys: I've got them all except for Michael Zevin (mzevin1; his name is mentioned in this GZ blog post), and Ed Paget

    You could ask in your e-mail, that they forward the e-mail to mzevin and Ed Paget.

    Posted

  • zutopian by zutopian in response to JeanTate's comment.

    I've got one nomination so far, from ChrisMolloy: Asymmetrical Classifications. My personal fave is What bias does the varying fraction of Eos - in the QS catalog - introduce?, but I'm biased

    I suggest, that you mention both in your e-mail.
    I would like, that you mention also following one.: Potentially problematic sources in "the 778"

    Posted

  • zutopian by zutopian

    I suggest, that the final version of the letter should be posted as "Open Letter to the Scientists" in a new topic in "Just Chat" or "Questions for the Scientists!". In the e-mail you could ask them to read it and say, that their replies are awaited in Talk.

    Posted

  • ChrisMolloy by ChrisMolloy in response to JeanTate's comment.

    I think it's good. Faves. List all four cited above. I can't really think of anything else. Interestingly DR10 is listing GZ classifications and the clicks. http://skyserver.sdss3.org/public/en/tools/explore/galaxyzoo.aspx?id=1237665179524923492&spec=2415191435407026176&apid= You probably already know ths though. Also, I think you should keep the email addresses private.

    Posted

  • zutopian by zutopian in response to JeanTate's comment.

    ("Signed") {Jean Tate, Chris Molloy, Jules, mlpeck} We are ordinary zooites who were active in the Galaxy Zoo Quench project at the time the serious problem with the integrity of the classifications in the Quench Control and Quench Sample databases (used in Tools) was reported.

    After doing the above posts today, I noticed, that my name isn't listed as signed person. Well, it doesn't matter, because I don't continue and didn't do scientific analysis. I just had guessed, that my name would be also listed and I wonder, if you just forgot to list my name?

    You had done following comment before.:

    I'll have a go at drafting an email, and will share it here before I send it. OK if it reads - perhaps explicitly - as coming from the four of us (and not just me)? I'll PM jules to ask her if she's OK with that too.

    So I guess, that "the 4 of us" means actually those persons, who are listed as signed persons? If so, nonetheless I don't delete the above posts, which I did today, because they might be somehow useful.

    Posted

  • zutopian by zutopian in response to zutopian's comment.

    I would like, that you mention also following one.: Potentially problematic sources in "the 778"

    I noticed, that my above suggestion actually doesn't match the below statement in the draft.:

    As of December, 2013, there were several different investigations under way, most of which seem scientifically promising; for example {one} and {two}.

    Well, I think, that it is an important topic, but it isn't about the analysis of the nature of the QS galaxies.

    Posted

  • JeanTate by JeanTate

    Thanks for your comments and suggestions, zutopian and ChrisMolloy! 😃

    [zutopian] You could ask in your e-mail, that they forward the e-mail to mzevin and Ed Paget.

    Very good suggestion!

    [zutopian] I suggest, that the final version of the letter should be posted as "Open Letter to the Scientists" in a new topic in "Just Chat" or "Questions for the Scientists!". In the e-mail you could ask them to read it and say, that their replies are awaited in Talk.

    And another one! I'll combine them: the email will contain the full text (plus links) as well as a link to a new thread (I prefer Questions for the Scientists!). And I'll invite responses both in the email and by them joining the new Talk thread. I'll also add that - contrary to my usual position - everything they say in an email reply I'll consider open for me to repeat/copy in the new Talk thread (unless they explicitly request that I don't).

    [ChrisMolloy] I think you should keep the email addresses private.

    The email addresses I'll be using are all in the public domain (except my own, of course), but yes, posting them here serves no useful purpose.

    [zutopian] my name isn't listed as signed person

    I'm more than happy to have all five names on the email (and would also seriously consider adding any other ordinary zooite who had been active here earlier), and would also be delighted if you changed your mind about not continuing with the project, once (not "if"!) it's live again ... you've made a great contribution so far, and I feel strongly that you have more to contribute yet.


    I hope to have the next draft up later today (maybe tomorrow, depends on where you are in the world). With luck that'll be the final one, and the email can go out on Monday, with the new Talk thread going up at about the same time.

    Posted

  • JeanTate by JeanTate

    Here is the second draft.The salutation ("Dear X,"?) is (still) missing, and the links will have to be formatted differently in the email.

    How would it appear, as a stand-alone new thread, in the Questions for the Scientists! section?

    The only changes (except perhaps for some minor formatting, and the way the URLs are displayed):

    • "The remainder of this email This post gives ...";

    • the addition of "We have posted the content of this email in the
      Quench project Talk thread,
      {Quench project: a proposal aimed at reviving and
      completing it. URL} We urge you to log on and discuss this proposal
      online. If you would prefer to respond with a private email, please
      do so; however, please state explicitly if you do not wish me (Jean
      Tate) to copy the content of any email replies to the Talk
      thread.
      "

    • also "if you have an email address for either edpaget (Ed Paget) or mzevin1 (Michael Zevin), or both, would you mind forwarding this email to them please?"

    The subject of the email, and the title of new Talk thread: Quench project: a proposal aimed at reviving and completing it.

    Feedback please!


    We earnestly request that you make public the Quench project clicks database, in a form similar to the Galaxy Zoo 2 data release.

    We would also urge you to publicly commit to work actively with us on the Quench project, following the original project design, right through to the end of Stage 3.

    As of December, 2013, there were several different investigations under way, most of which seem scientifically promising; for example Mass Dependent Merger Fraction (Control vs Post-quenched Sample), Potentially problematic sources in "the 778", and Asymmetrical Classifications. Once the Quench project is revived, we intend to continue these.

    The remainder of this email gives a brief description of the background to, and current status of, the Quench project, and its scientific nature.

    ("Signed")
    {Chris Molloy, Jean Tate, Jules, mlpeck, zutopian} We are the ordinary zooites who were active in the Galaxy Zoo Quench project at the time the serious problem with the integrity of the classifications in the Quench Control and Quench Sample databases (used in Tools) was reported.


    The Galaxy Zoo Quench project is described in this "Project Overview" document.

    Stage 1 - "Classification" - took somewhat longer than expected, partly because the Control sample contained duplicates which had to be removed and replaced (and classified, in the Quench Boost phase), but was completed before the end of August, 2013.

    Stage 2 - "Data Analysis & Discussion" - was pretty rocky, with a lot of good discussion on many topics, but also at least one major revision to the classifications parts of the catalogs used in Tools (which are, to date, the only complete databases available to zooite participants); see this post by SCIENTIST mzevin1, dated September 26 2013. By mid December, 2013, there were five zooites still active - ChrisMolloy, JeanTate (me), jules, mlpeck, and zutopian - one SCIENTIST (trouille), and one Development Team member (edpaget).

    Between 17 and 20 December, 2013 the Quench project Tools databases ("dataset" or "Quench tables") were changed, at least twice; these contain thousands of differences, when compared with the databases made available in late September. Many of those differences are in the summary zooite classifications, which are at the heart of this project (details are in this thread). The Quench project came to a halt.

    The Quench project as science

    In launching Quench, ltrouille wrote "this new Galaxy Zoo Quench project provides the opportunity to take part in the ENTIRE scientific process – everything from classifying galaxies to analyzing results to collaborating with astronomers to writing a scientific article!"

    Within the Zooniverse, there are only two other comparable projects, SpaceWarps and Planet Hunters. Not being a professional scientist, I cannot say - from first-hand experience - to what extent our experience in Quench is typical of collaborative astronomical research; however, by comparing what's in the public record - mostly the Talks of those two other Zooniverse projects - there seem to be two huge differences with what we have experienced:

    • participants in those projects have direct access to the data they need in order to do the data analysis which leads to the writing of professional journal articles
    • professional astronomers are very actively engaged.

    Further, anyone wishing to do research based on zooites' Galaxy Zoo 2 clicks - to take just one example - can easily and freely access a much richer dataset of classifications than those provided in Quench (see here for details).

    So, if we have access to Quench classifications data to the same level of detail as is available in the Galaxy Zoo 2 catalog, we should be able to complete Stage 2.

    And if we have a comparable level of involvement by professional astronomers as in SpaceWarps and Planet Hunters - and as was clearly intended ("Throughout, [volunteers will] discuss with the science team their interpretation of the results. At the end of the process, volunteers and the science team will collaboratively write a 4-page Astrophysical Journal article") - we should also be able to complete Stage 3 successfully.

    Posted

  • zutopian by zutopian in response to JeanTate's comment.

    I'm more than happy to have all five names on the email

    Due to that incident and its background story, I don't agree, that my name is listed as signer.

    Posted

  • ChrisMolloy by ChrisMolloy in response to JeanTate's comment.

    Your What bias does the varying fraction of Eos - in the QS catalog - introduce? should be included. A fascinating post.

    How would it appear, as a stand-alone new thread, in the Questions for the Scientists! section?

    I think it should be ok.

    the addition of "We have posted the content of this email in the Quench project Talk thread, {Quench project: a proposal aimed at reviving and completing it. URL} We urge you to log on and discuss this proposal online. If you would prefer to respond with a private email, please do so; however, please state explicitly if you do not wish me (Jean Tate) to copy the content of any email replies to the Talk thread."

    This is good too. So, if they respond to you by email but don't want it made public does that mean you send us a response by PM with a summary?

    Posted

  • jules by jules moderator

    Hi all,
    Just popping back in to read the proposed e-mail - which is a good, succinct summary.

    Just a few comments:

    1. I am happy with whichever of the examples you think fit to choose - they are all relevant and all discussed on Talk anyway.

    2. "The subject of the email, and the title of new Talk thread: Quench
      project: a proposal aimed at reviving and completing it."

    That is the title of this discussion so something a little different for the proposed new discussion is required. Maybe "Quench Project revival - a proposal" for example.

    3.. I have checked back to see if anyone else was involved or made comments etc in the last few months just to make sure we have included everyone. I can't find anyone else so in summary there have been the 5 contributors you have listed - and now just 4 names to include with the e-mail.

    Thanks Jean!

    Posted

  • JeanTate by JeanTate

    Thanks everyone.

    I think we're very close. I'll try to get another draft up later today; however, it will have only minor edits. I will also email it either very late tonight (my time) or very early tomorrow morning, so the recipients will have it at or near the top of their email in-boxes when they log on first thing on Monday.

    On the specific comments:

    [zutopian] I don't agree, that my name is listed as signer.

    The email, and new thread, will not have your name on it.

    [ChrisMolloy] Your What bias does the varying fraction of Eos - in the QS catalog - introduce? should be included.
    [jules] I am happy with whichever of the examples you think fit to choose - they are all relevant and all discussed on Talk anyway.

    One reason why I want to keep the number low - preferably only two - is to hint that we understand that the paper will have a scope compatible with the description of Stage 3 "We will collaboratively write a 4-page article on the results from our study, ..." Within a ~four page limit, we won't have space to cover more than a small number of interesting findings. A successful completion of this project - a published paper - will have given all of us practical experience in how to do relevant data analysis, how to collaboratively write a paper, and how to get it published. That means that should we - or any subset of we - wish to follow any interesting finding separately, we will have a good template to follow. Myself, I certainly intend to do just that, and would be very happy to work with any other zooite(s).

    [ChrisMolloy] So, if they respond to you by email but don't want it made public does that mean you send us a response by PM with a summary?

    No. I may be considered an old fossil, but I regard PMs and emails as confidential and private, even to the point of being reluctant to publicly acknowledge receipt of any such. If any private response contained something I considered worthy of being shared, I'd write back and ask explicit permission ... my request would very likely contain the exact text I proposed to use, so there could be no doubt about what I was asking (FWIW, when I wrote for Universe Today, that was my MO).

    [jules] so something a little different for the proposed new discussion is required. Maybe "Quench Project revival - a proposal" for example

    Good point. How about "Reviving and Completing the Quench Project: A Proposal"?

    Posted

  • ChrisMolloy by ChrisMolloy in response to JeanTate's comment.

    Have to go away for a couple of days. I trust your judgement on the final content.

    How about "Reviving and Completing the Quench Project: A Proposal"?

    I like this.

    Thanks for the effort you've put into this.

    Posted

  • JeanTate by JeanTate

    Here is the third, and final, draft. How would it appear, as a stand-alone new thread, in the Questions for the Scientists! section? The only changes (except perhaps for some minor formatting, and the way the URLs are displayed):

    • no salutation

    • "The remainder of this email This post gives ...";

    • the addition of "We have posted the content of this email in the
      Quench project Talk thread,
      {Reviving and Completing the Quench Project: A Proposal URL} We urge you to log on and discuss this proposal
      online. If you would prefer to respond with a private email, please
      do so; however, please state explicitly if you do not wish me (Jean
      Tate) to copy the content of any email replies to the Talk
      thread.
      "

    • also "if you have an email address for either edpaget (Ed Paget) or mzevin1 (Michael Zevin), or both, would you mind forwarding this email to them please?"

    The subject of the email, and the title of new Talk thread: Reviving and Completing the Quench Project: A Proposal

    Feedback please!


    Quench project Science Team members,

    We earnestly request that you make public the Quench project clicks database, in a form similar to the Galaxy Zoo 2 data release.

    We would also urge you to publicly commit to work actively with us on the Quench project, following the original project design, right through to the end of Stage 3.

    As of December, 2013, there were several different investigations under way, most of which seem scientifically promising; for example Mass Dependent Merger Fraction (Control vs Post-quenched Sample), and Asymmetrical Classifications. Once the Quench project is revived, we intend to continue these.

    The remainder of this email gives a brief description of the background to, and current status of, the Quench project, and its scientific nature.

    ("Signed")
    {Chris Molloy, Jean Tate, Jules, mlpeck} We are ordinary zooites who were active in the Galaxy Zoo Quench project at the time the serious problem with the integrity of the classifications in the Quench Control and Quench Sample databases (used in Tools) was reported.


    The Galaxy Zoo Quench project is described in this "Project Overview" document.

    Stage 1 - "Classification" - took somewhat longer than expected, partly because the Control sample contained duplicates which had to be removed and replaced (and classified, in the Quench Boost phase), but was completed before the end of August, 2013.

    Stage 2 - "Data Analysis & Discussion" - was pretty rocky, with a lot of good discussion on many topics, but also at least one major revision to the classifications parts of the catalogs used in Tools (which are, to date, the only complete databases available to zooite participants); see this post by SCIENTIST mzevin1, dated September 26 2013. By mid December, 2013, there were five zooites still active - ChrisMolloy, JeanTate (me), jules, mlpeck, and zutopian - one SCIENTIST (trouille), and one Development Team member (edpaget).

    Between 17 and 20 December, 2013 the Quench project Tools databases ("dataset" or "Quench tables") were changed, at least twice; these contain thousands of differences, when compared with the databases made available in late September. Many of those differences are in the summary zooite classifications, which are at the heart of this project (details are in this thread). The Quench project came to a halt.

    The Quench project as science

    In launching Quench, ltrouille wrote "this new Galaxy Zoo Quench project provides the opportunity to take part in the ENTIRE scientific process – everything from classifying galaxies to analyzing results to collaborating with astronomers to writing a scientific article!"

    Within the Zooniverse, there are only two other comparable projects, SpaceWarps and Planet Hunters. Not being a professional scientist, I cannot say - from first-hand experience - to what extent our experience in Quench is typical of collaborative astronomical research; however, by comparing what's in the public record - mostly the Talks of those two other Zooniverse projects - there seem to be two huge differences with what we have experienced:

    • participants in those projects have direct access to the data they need in order to do the data analysis which leads to the writing of professional journal articles
    • professional astronomers are very actively engaged.

    Further, anyone wishing to do research based on zooites' Galaxy Zoo 2 clicks - to take just one example - can easily and freely access a much richer dataset of classifications than those provided in Quench (see here for details).

    So, if we have access to Quench classifications data to the same level of detail as is available in the Galaxy Zoo 2 catalog, we should be able to complete Stage 2.

    And if we have a comparable level of involvement by professional astronomers as in SpaceWarps and Planet Hunters - and as was clearly intended ("Throughout, [volunteers will] discuss with the science team their interpretation of the results. At the end of the process, volunteers and the science team will collaboratively write a 4-page Astrophysical Journal article") - we should also be able to complete Stage 3 successfully.

    Posted

  • jules by jules moderator

    I think that's good to go as a stand alone thread. Once done I'll make it a featured post so that will be easier to find.

    Thanks for putting this together Jean - I sincerely hope it has the desired effect. 😃

    Posted

  • JeanTate by JeanTate

    Thanks again jules.

    Email has been sent, and the new thread created (actually done in the opposite order): Reviving and Completing the Quench Project: A Proposal

    In the email I made a couple of minor edits, and there's one in the new post too. I'll summarize these later.

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    The only change of any significance (I think) is this:

    "Not being a professional scientist, I cannot say - from first-hand experience - to what extent our experience in Quench is typical of collaborative astronomical research" -> "Not being professional scientists, we cannot say - from first-hand experience - to what extent our experience in Quench is typical of collaborative astronomical research"

    [me] I will also email it either very late tonight (my time) or very early tomorrow morning, so the recipients will have it at or near the top of their email in-boxes when they log on first thing on Monday.

    Well, it didn't work out quite as I had hoped 😦 The Aussies - Ivy and Amanda - very likely won't read the email until Tuesday morning (their time); the Europeans - Brooke, Karen, jtmendel - got it well after 'first thing'. So it would have been 'first thing' for only Kyle and Laura (assuming everyone is, today, at their home institution).

    Posted

  • JeanTate by JeanTate

    I've had two responses so far, Laura's (the same as her post here), and Ivy Wong's ("I'll try to be more active on Quench too. Apologies again for being slightly distracted recently.") Both mzevin1 (Michael Zevin) and edpaget (Ed Paget) have now been sent the email; I had also sent edpaget a PM, the content of which is the same as the email.

    [trouille] I tried to find the post in Quench talk from a few months ago where I shared the full results from the classification clicks. No luck

    On reading that, I too went looking, and likewise came up empty-handed. 😦

    The NSF proposal I'm working on includes budget for a web developer to work on improving Zooniverse infrastructure for projects like Quench, including improvements to the discussion forum format (so we can more easily find related posts).

    How does the saying go? Music to my - and I hope your - ears! 😄

    Time to get back to Dealing with Sample Selection Issues; in particular, mlpeck's last two posts (last one here).

    Posted

  • ChrisMolloy by ChrisMolloy in response to JeanTate's comment.

    How does the saying go? Music to my - and I hope your - ears!

    Let's hope so!

    Posted

  • klmasters by klmasters scientist

    Hey guys - what can we say - you've discover that scientists are human, get distracted, get busy etc. etc.! Thanks for the prod (excellent behaviour to encourage in PhD students, to bug their supervisors when needed, so why not here).

    For my part if you are requiring excuses I've been busy touring Australia and New Zealand to talking about Galaxy Zoo (Sept and October), Chile (to talk about Galaxy Zoo in November) and then having a Christmas holiday also. And I've been working on astronomy surveys - I'm heavily involved in something called MaNGA which is a next generation Sloan Digital Sky Survey project (http://www.sdss3.org/future/manga.php). We had a major review of that which we hosted in Portsmouth last week. I've also taken some responsibility to co-ordinate better outreach and public communication in SDSS generally - which includes being involved in writing grant proposals to fund that.

    Oh, and there's my responsibilities to actually do my own scientific research based on Galaxy Zoo (e.g. advising Tom Melvin in his recently accepted paper).

    Life is busy! 😉

    But I really hope we can get something published out of quench at some point, on a realistic time scale. I'll try to pop back into this talk more often if I can. 😃

    Posted

  • JeanTate by JeanTate in response to klmasters's comment.

    Very nice to have you back, Karen! 😃

    As you can see, we're all fired up and ready to go ... we're just waiting to get access to the zooites' classifications, at a level comparable to that of the published GZ2 ones. So there's not much for you to do just now.

    Or maybe there is! 😄

    Dealing with Sample Selection Issues is a - by now - rather long thread. In it is some discussion of sample selection (duh!). Perhaps you could assist us, to make the decisions we need to make?

    Posted

  • ivywong by ivywong scientist

    Hi all, Apologies for getting distracted with other commitments and projects too. Thanks for the prod to action.

    In any case, I was wondering if you can point me to more summaries of the results which you have all found on the various topics such as Asymmetries etc. I found the one on Mass Dependent Merger Fraction but was not able to find some of the others. I think that it would be useful to have a list of the summaries on this thread so that we can have a framework that we can base our paper on because we can then compare with how our results compare with previous studies.

    What do you all think?

    Posted

  • JeanTate by JeanTate in response to ivywong's comment.

    Very nice to have you back too, Ivy! 😃

    In any case, I was wondering if you can point me to more summaries of the results which you have all found on the various topics

    As Laura said, in her post here, the current version of Talk seems, at times, almost to have been deliberately designed to make it hard to find what you want! 😦

    I'll have a go at providing - yet another - list, perhaps tomorrow (my, 'Sandy territory', time). If you're impatient, there are links to many - perhaps most - of the threads on those interim results, upthread.

    I think that it would be useful to have a list of the summaries on this thread

    Sure. However, do keep in mind that whatever has been reported so far could be completely blown out of the water when we take a look at the actual classification data (to the level of GZ2). Except, of course, for those results which are completely independent of the zooites' hard (classification) work ...

    Posted

  • ivywong by ivywong scientist

    Thanks heaps Jean. No worries. Yes, I have found some links but I was not sure how complete my randomised search was. I'm in a fairly odd timezone myself... literally stuck on a desert island 😉 I agree that the results could change a little but my suspicion is that they may only change at the <~20% level. But I look forward to being wrong and corrected on this 😃 Apologies for asking for yet another list, I am hoping to see how the paper could be structured. Sometimes it helps me to figure out the next steps/missing things in data analyses if I can get everything into a pseudo-paper format structure. No rush on this though so take your time 😃

    Posted

  • ivywong by ivywong scientist

    less than twenty percent level. Happy to be proven wrong 😃 Apologies for asking for yet another list, I am hoping to see how the paper could be structured. Sometimes it helps me to figure out the next steps/missing things in data analyses if I can get everything into a pseudo-paper format structure. No rush on this though so take your time 😃

    Posted

  • KWillett by KWillett scientist

    Hi everyone,

    As Laura, Ivy, and Karen have said, we're glad that you're still passionate about the project and continuing to work with the data. I have the latest dataset, but it's not in a very convenient form for analysis. I don't have time to work on it in the next couple of days, but promise to generate a catalog of the consensus user votes by this weekend. I will post a link to it here on Talk once it's done.

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    Thanks Kyle, and it's very nice to have you back too. 😃

    ... promise to generate a catalog of the consensus user votes by this weekend

    Great! 😄 Just two days to go ...

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    I'll have a go at providing - yet another - list, ... If you're impatient, there are links to many - perhaps most - of the threads on those interim results, upthread.

    In no particular order, but starting with this thread (this is an eclectic mix, some you may consider as something other than 'results'):

    The following are from the Data Analysis Results board. There may be other results, from outside that board (that are not already referenced in one of these threads), but I don't think so. As with the above, this is no particular order, and is likely a more eclectic mix:

    Final note: there's a non-stickied thread which has an even more eclectic mix, including links to some threads not in my last list: Index of common analysis topics

    Posted

  • JeanTate by JeanTate in response to ivywong's comment.

    In compiling the above lists, one thing in particular struck me. Consider what Ivy wrote (my bold):

    In any case, I was wondering if you can point me to more summaries of the results which you have all found on the various topics such as Asymmetries etc. [...] I think that it would be useful to have a list of the summaries on this thread so that we can have a framework that we can base our paper on because [...]

    IIRC (if I remember correctly), Laura made several attempts to 'herd the cats', to bring some focus and structure to the many analyses we ordinary zooites had embarked on (Dealing with Sample Selection Issues is a notable example). And this is - obviously - one of the most important things which you professional astronomers can do! Speaking solely for myself, I'm a real butterfly: there are so many dazzlingly pretty flowers among the Quench project data that I could happily waste huge numbers of hours on scientifically marginal (or worse) questions.

    The upshot of this is that very few of analyses ever got very far (several of mlpeck's are notable exceptions); there are, consequently, not many results (much less summaries of them).


    There's another aspect, hinted at in the Proposal which you - and Laura and Karen and Kyle - responded to; here's the context:

    I agree that the results could change a little but my suspicion is that they may only change at the less than twenty percent level. Happy to be proven wrong

    You may - most likely are - right. Consider this though: you are coming at this from years, perhaps decades, of experience of doing astronomy-as-a-science, after (including?) spending ~a decade learning just what astronomy-as-a-science is. From that perspective, it likely takes no more than a glance to have your highly-trained gut tell you that the unreliable data aren't too bad, that the thousands of inconsistencies surely don't amount to much.

    For a moment, please try to put yourself in my shoes*: I start by trusting the data, but to warm up I spend some time trying to get a handle on how consistent (etc) the data is (see, for example, Detailed investigation: outliers and anomalies in 'redshift bin #10'). Others report various inconsistencies too (especially zutopian). And, one by one, we learn that the data cannot be trusted; the Control sample is changed; summary classifications changed (more than once); even key IDs are changed! I get into the habit of immediately downloading every new version of the data, and checking it against older versions; I cease using Tools, partly because I no longer trust the data. Perhaps naively, I expect each version will have fewer and fewer differences/inconsistencies when compared with the previous one; instead, I find that, in general, each has more!! 😮

    So how does one go about developing your kind of instinct, to be able to form such a '< 20%' opinion ('suspicion')?

    *I'm speaking solely for myself; I have little idea how any other ordinary zooite, who's been active in this project, truly feels

    Posted

  • KWillett by KWillett scientist

    Hi everyone,

    I have made the new files on the Quench sample. The data are stored in two formats: a CSV file that can be read by Excel or any other text reader, and a FITS file that's more specialized for astronomical data. I really like using the TOPCAT software for analyzing data like this, but you can use whichever tools you're most comfortable with. http://www.star.bris.ac.uk/~mbt/topcat/

    The data now include for each task both the count and vote fraction for every response. To understand which tasks/responses correspond to morphologies, I've included two additional files. questions.csv gives the text for each task in GZ: Quench, and answers.csv gives the responses. For example: if you were looking at the column "t01_a01_count" in the Quench data, you can use these files to find out that Task 01 asked "How rounded is the galaxy?" and Answer 01 was "In between". For every response, the data contains both a count (the number of votes) and the vote fraction (out of the total number of responses for this question).

    I hope the rest of the columns are self-explanatory, but please post here if you have questions. I will continue to participate as much as I can.

    https://www.dropbox.com/sh/ku7o03rb0vmm689/EUkW25peMx

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    Thanks Kyle.

    I've downloaded all the files, and discovered that the gzquench_control.csv (QZC) file contains 3001 objects, but that the gzquench_sample.csv contains 3002 (I did not check the .fits files). When I checked the QZC sdss_id values (ObjIds) against those in my first downloaded Quench Control file (QC1), I found that QC1 has, indeed, one more object: 587739827133939896, which has a uid of AGS00002bf (neither QZC nor QZS - its _sample counterpart - has a 'uid' field).

    Could you please look into what happened to AGS00002bf?

    Also, QC1 does not contain the 56 Quench Boost (QB) objects; neither does QZC. Would you please provide files in the same format for the QB objects?

    Posted

  • JeanTate by JeanTate

    I'm in the process of checking that data in the two .csv files - I'll get around to confirming what I expect re the .fits ones (namely that there are no inconsistencies between the .csv and .fits) later - and would like to post what I've found so far (in addition to what's in my last post).

    • DR8 ObjId 587732591182020753 is AGS00004cy; it seems that the errors we identified in some fields for this record are in QZC (see this thread for more details).

    • in an early version of QS (from Tools), the _flux_err values for the six* lines were the same as the _flux values (i.e. an error). This was fixed in a later version of QS (except for nad_abs_flux); however, QZS contains the incorrect _flux_err values.

    Not so much an inconsistency, but a note of caution:

    a CSV file that can be read by Excel or any other text reader

    In addition to making sure that the sdss_id field is read in as "text" (i.e. not the default, which would treat the string as an integer or general number), many values in the redshift_err field may be read as text, not a number. For example "1.45904e-05" is a number in 'scientific notation', but because it contains the letter "e", Excel or another spreadsheet may treat the value as a text string. How you deal with this is likely app-specific.

    Finally, while neither QZC nor QZS contains the AGS identifier ("uid"), it's straight-forward to find those, and add them as a field, by using the QSCXIDv3 file (Google spreadsheet) I created and uploaded to Google Docs as a shared resource (click here).

    *halpha, hbeta, nii, oii, oiii, and nad_abs

    Posted

  • ivywong by ivywong scientist

    Hi Jean et al,

    You are all certainly doing the right thing in checking up on the inconsistencies with the sample selection. This is one of the first and most important things most projects need to have--- to have an understanding of the selection biases and shortfalls. So for that, you and all the active Quenchers are to be congratulated because you have all done an outstanding job on this. 😃 Also, this is how serendipitous discoveries are usually made, by going down each and every rabbit hole (so to speak). So it's perfectly alright to explore all the parameter spaces as you all have for anything odd or weird.

    However, I think it's time to put together the paper and so here is where the summaries would be useful and I think we already have a few results with respect to the mass dependences and asymmetries. I'm still going through each of the links that you put together. Thanks heaps for that. I will get back to you all when I've gone through all of them. I'm changing institutes this month (moving across the sandy desert) and may get delayed a little so if you can all go through some previous studies and compare it with the results you found... we can start to put together a picture of how our results are similar or different to previous studies. How does this sound?

    In my experience, observational astronomy typically harbour about a 10% uncertainty in the measured magnitudes/fluxes so when two or more are used to calculate a physical property, the uncertainties are added in quadrature and this will bring the final uncertainty up to the 10-20% mark. Therefore I would expect a scatter in our results by roughly this order.

    Posted

  • JeanTate by JeanTate in response to ivywong's comment.

    Thanks Ivy.

    re checking up on the inconsistencies with the sample selection: this is one aspect of doing a project which is expected to lead to a published paper that I have found extremely helpful/educational. And I'm wondering to what extent are our experiences with Quench typical of (extra-galactic) astronomy projects/research? A topic for another time perhaps ...

    re I think we already have a few results with respect to the mass dependencies and asymmetries: I need a 'puzzled' smilie here ... whatever those results may be, they are only as good as the summary zooite classifications, per what was in Tools. Quench ground to a halt in December because it became obvious that the data in the two Quench Tools databases was unreliable. Unless and until we can - independently - produce reliable, consistent classification summaries, those provisional results are no better than subjective hunches or guesses.

    In terms of sample selection, I think we're very close (sorry I can't point to any particular post within it, but the Dealing with Sample Selection Issues thread has the full discussion, and the tentative conclusion). The plan is also pretty straight forward: once we agree on the selection rules, we re-run the analyses - on trends (etc) that we've already done, using summary classification data we now know to be reliable - we're most likely to include in the paper. The analyses include the following (not necessarily complete, nor would we necessarily include all these):

    • merger fraction versus mass
    • BPT diagrams: AGN vs SFR trends with merger fraction and mass (etc)
    • asymmetry trends

    Good luck with your institutional move; what will be your new home?

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    Aside from checking that what's in the .fits files is the same as that in the .csv files, I think I've finished checking for inconsistencies. There's plenty more I could do, but I think any further checks should be driven by the specific analyses/results we wish to write up.

    In addition to what I've already noted, I found only one other inconsistency worth reporting. It's in QZC; first the background: both QZS and QZC lack fields that are in one version or another of the QS and QC catalogs that were available from Tools; for example, the RA and Dec fields. This is no big deal, because we have the downloaded QS and QC catalogs*, and a database containing all fields (filled with what we understand to be the correct values for all objects) can be fairly easily put together.

    However, for the fields which QZC and QZS do have, some are wrong (e.g. five _flux_err ones, as noted above), and some do not contain data (even though such data had been added to at least one version of Tools catalogs). In particular, in QZC the fields d4000, d4000_err, halpha_flux, halpha_flux_err, hbeta_flux, hbeta_flux_err, nii_flux, nii_flux_err, oii_flux, oii_flux_err, oiii_flux, oiii_flux_err, nad_abs_flux, and nad_abs_flux_err are all null, but various versions of the QC catalog contain non-null values for them (except for the 56 objects replaced by QB).

    I have some questions about the classification data; see my next post.

    To close, then, a summary request for classification data not yet provided (or which should be double-checked):

    1. the 56 Quench Boost objects
    2. the sdss_id 587739827133939896 ( AGS00002bf ) object
    3. the sdss_id 587732591182020753 ( AGS00004cy ) object

    *well, at least I have them, six different versions of each in fact. I can upload any, or any combo, of them, pretty much at any time (just ask)

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    [Kyle] please post here if you have questions

    .

    [me] I have some questions about the classification data

    Ten of the QZS objects have "vote_total" values of 40 (eight) or 41 (two); "vote_total" values for the others are 21 (68 objects) or 20 (the rest). For QZC objects, two have "vote_total" values of 22, and for 69 these are 21 (the rest are 20).

    Is there any particular reason for these distributions of "vote_total"? Specifically, what is special about the ten QZS objects?

    Can you please confirm that the 'counts' are raw, i.e. no weighting has been applied to any of these?

    In Quench Boost, were any quench sample objects classified (if so, which ones)? What about quench control objects: were any of those classified in Quench Boost (other than the 56 replacement objects)?

    Posted

  • mlpeck by mlpeck in response to JeanTate's comment.

    Sorry, I've had no time to work on this.

    However, for the fields which QZC and QZS do have, some are wrong
    (e.g. five _flux_err ones, as noted above), and some do not contain
    data (even though such data had been added to at least one version of
    Tools catalogs). In particular, in QZC the fields

    Everything but the classifications and identifiers should simply be discarded and the classifications should be merged into the late September (v?) data tables. I'm pretty sure I verified that emission line fluxes and errors were consistent with SDSS database entries.

    One remaining error is that Lick Hδ indexes have never been incorporated into the data tables. Somebody read and heeded my posts on the importance of this measurement early on but added Hδ emission line flux values instead of absorption line equivalent widths, which are not at all the same thing.

    As yet another reminder of the importance of the Lick HδA index see Kauffmann 2014, which showed up on astro-ph Monday.

    Posted

  • JeanTate by JeanTate in response to mlpeck's comment.

    Everything but the classifications and identifiers should simply be discarded and the classifications should be merged into the late September (v?) data tables.

    That would be both the most straight-forward and likely least error-prone approach.

    I'm pretty sure I verified that emission line fluxes and errors were consistent with SDSS database entries.

    I too have that recollection. However, the quick search I did failed to turn up the relevant post(s). Talk's Search capability is truly awful! 😦

    One remaining error is that Lick Hδ indexes have never been incorporated into the data tables.

    Is there a quick and easy - but still robust - way to do this? Perhaps for just the 1149 galaxies in your two v2 datasets (near the top of page 8 of the Dealing with Sample Selection Issues thread) ...

    Posted

  • mlpeck by mlpeck

    I too have that recollection. However, the quick search I did failed
    to turn up the relevant post(s). Talk's Search capability is truly
    awful! 😦

    Yes, this has stretched out long enough that my memory is fading too. Which makes better search and easier linking to individual messages a much desired feature of Talk.

    One remaining error is that Lick Hδ indexes have never been incorporated into the data tables.
    

    Is there a quick and easy - but still robust - way to do this? Perhaps
    for just the 1149 galaxies in your two v2 datasets (near the top of
    page 8 of the Dealing with Sample Selection Issues thread) ...

    Sure, they're in my Dropbox account:

    Quench Hδ

    Control Hδ

    These are csv files with 4 entries for each object: ra, dec, Lick HδA, error on same. The files are dated 9/14/2013 so you can work out which version of the data tables they match to. I think it was v4 by your count.

    One tiny caveat was these were obtained by querying the DR10 database. Most observational quantities from the MPA pipeline based on the spectroscopy changed little between DR7 and DR8 (AFAICT), but there could be some differences.

    Posted

  • KWillett by KWillett scientist in response to JeanTate's comment.

    Sorry for my absence the last couple of days. Will try to answer some of the pressing questions.

    The 56 additional galaxies should have been included in these data. I'll look into this in more detail.

    I would guess that the objects with 40 or 41 votes were classified twice. Was there a previous issue with a few galaxies appearing as duplicates in both the control and quenched samples?

    Correct, Jean - I have not weighted the counts or vote fractions for these galaxies in any way. We could apply a correction based on the GZ2 algorithms, but we can't derive it from this dataset itself (way too few galaxies). Since the classification tree is slightly different as well, I'm hesitant to endorse that as a robust method - especially for the merger question, which is critical for this project.

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    I would guess that the objects with 40 or 41 votes were classified twice. Was there a previous issue with a few galaxies appearing as duplicates in both the control and quenched samples?

    There are 29 objects which appear in both the original QS and QC catalogs. The QC ones were removed in post-QB versions of the QC catalog (the QS ones remain). The intersection of "the [QZS] objects with 40 or 41 votes" and the 29 QS 'duplicate' objects is the null set (i.e. none of QZS objects with 40 or 41 votes are objects which are in the original QC catalog).

    There are 13 'QC-QC duplicates' (26 objects in the original QC catalog). None of these are in the post-QB QC catalogs; all of them are in QZC.

    There is one 'QS-QS duplicate' (two objects in the original QS catalog). Both of these appear in all post-QB QS catalogs, and in QZS. The sdss_ids are 587731514231619685 and 587731514231619686. There's some discussion of this galaxy towards the bottom of page one of the Duplicates - summary thread.

    I have not weighted the counts or vote fractions for these galaxies in any way. We could apply a correction based on the GZ2 algorithms, but we can't derive it from this dataset itself (way too few galaxies). Since the classification tree is slightly different as well, I'm hesitant to endorse that as a robust method - especially for the merger question, which is critical for this project.

    Thanks. It seemed obvious to me that there was no weighting, but it also seemed prudent to ask. I agree that any attempt at weighting would very likely produce less-than-robust, possibly quite unreliable, results.

    Posted

  • KWillett by KWillett scientist

    Hi Jean,

    My apologies for the long delay. Laura has rightly prodded me into fixing the issues you and the other Quenchites have identified with the data. I have found the missing galaxy AGS00002bf, as well as the Quench boost objects - I had an outdated version of the metadata I was matching on, for some reason. Both samples now have exactly 3002 galaxies - hopefully the ones that are supposed to be there.

    Attached is a link to a Dropbox folder hosting the data. The column names haven't changed from my last post. Let me and Laura know ASAP if there are any remaining issues you or anyone else spot. Since you clearly have a keen interest and more time to spend on this than I do right now, I want to get you all the correct data and let you go to work!

    https://db.tt/ZtTIQvn7

    cheers,
    Kyle

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    Thanks Kyle.

    Laura has rightly prodded me into fixing the issues ...

    I had prodded you too, but by a Talk PM rather than an email. When I first read this post of yours, I was ... less than happy, wondering why you ignored my PM but responded (with alacrity?) to Laura's email.

    But then I realized that v2 Talk's PM icon no longer works (I reported this as a bug, over in GZ Talk), so you very likely never even knew I'd sent you a PM! 😮 That made me feel better. A little.

    Let me and Laura know ASAP if there are any remaining issues you or anyone else spot.

    Will do. Thanks again. 😃

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    The files in that Dropbox all say they were Modified "25 days ago", so my first thought was that you hadn't actually uploaded the updated QC files (CSV and FITS).

    To be sure, I downloaded the gzquench_control.csv one, and compared it with the file of the same name that I'd downloaded before. They are. 😦

    Posted

  • KWillett by KWillett scientist

    Hi Jean,

    Oops - sorry. Dropbox hadn't synced correctly yet.

    The files are slightly renamed - they should be gzquench_sample_consensus and gzquench_control_consensus, available in both CSV and FITS formats. The old versions should now be gone.

    Kyle

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    Thanks.

    I've downloaded all four files - two CSV, two FITS - and have finishing an initial comparison of the CSV QC (gzquench_control_consensus; "B" hereafter) with gzquench_control.csv ("A" hereafter).

    There are, as expected, 57 objects in B that are not in A, and 56 vice versa; these are the 56 Quench Boost objects, plus the "missing" AGS00002bf (though I have yet to confirm that the key parameter values match what was available earlier from the QC Tools catalog).

    For the 2,945 objects in common - matched by sdss_id - the values in all fields in common, in A and B, also match, for all objects ... with the following exceptions:

    • quite a few field names are different, between A and B, but that's just cosmetic

    • there are 17 fields in A whose content, for all 2,945 objects, is "None", but which have non-"None" content in B

    • one object has non-"None", non-classification data (e.g. redshift, umag) with the same values in A and B, but whose classification data (e.g. t02_a01_count, t10_a00_fraction) is different.

    What is this errant QC object? 587732591182020753 is its sdss_id. From my cross-ID file, I see that its uid (AGS ID) is AGS00004cy. I haven't yet checked which classification data (i.e. the votes and vote fractions) matches the reality of the object itself, but it should be very easy to decide. If it turns out that the values in B are wrong, a correction will be needed before that file can be used.

    I'll post the results of further data consistency and integrity checks as I complete them.

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    (though I have yet to confirm that the key parameter values match what was available earlier from the QC Tools catalog).

    I've now done so, and they do.

    there are 17 fields in A whose content, for all 2,945 objects, is "None", but which have non-"None" content in B

    The values in those fields in B ("gzquench_control_consensus") match the values in the latest (actually last few) versions of the Tools QC catalog I downloaded.

    I have begun to compare gzquench_sample_consensus ("QSB") with gzquench_sample.csv ("QSA"), and have found some, um, strange differences. First, though, the trivial ones, and a general comment:

    • quite a few field names are different, between A and B, but that's just cosmetic

    • there are five fields in QSA whose content, for all but two objects, is wrong; to repeat what I wrote in an earlier post: "in an early version of QS (from Tools), the _flux_err values for the six* lines were the same as the _flux values (i.e. an error). This was fixed in a later version of QS (except for nad_abs_flux); however, QSA contains the incorrect _flux_err values."

    To repeat the general caution I posted earlier (page 5):

    In addition to making sure that the sdss_id field is read in as "text" (i.e. not the default, which would treat the string as an integer or general number), many values in the redshift_err field may be read as text, not a number. For example "1.45904e-05" is a number in 'scientific notation', but because it contains the letter "e", Excel or another spreadsheet may treat the value as a text string. How you deal with this is likely app-specific.

    And it's not just the redshift_err field; several other fields contain at least some value in scientific notation.

    Now for the more troubling differences.

    For two QS objects the vote count ("vote_total" and "total_votes" fields) values are different: 😮

    • 587736919433281840 ( AGS000019f ): 21 in QSA, 20 in QSB
    • 587727178451320944 ( AGS00000hz ): 20 in QSA, 21 in QSB

    As you would expect, if the vote count is different, the values in many classification fields will also be different. And they are. 😦

    Just two strange inconsistencies?

    Sadly, no. There are quite a few more; I'll write them up in a later post.

    *the six are: halpha, hbeta, nii, oii, oiii, and nad_abs; however, nad_abs_err is not one of the fields in QSB.

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    Just two strange inconsistencies?

    Sadly, no. There are quite a few more; I'll write them up in a later post.

    There are, as far as I can tell so far, 15 in total. In terms of the first three 'classification' fields - vote_total, most_common_path, and t00_a00_count - the breakdown is as follows:

    • two differ in vote_total (and also in most_common_path and t00_a00_count)

    • seven more in most_common_path (and also in t00_a00_count); nine in all

    • five more in t00_a00_count; 14 in all

    • for one, the 'earliest' inconsistency is in t02_a00_count

    What is the cause of these inconsistencies? I'll send Kyle an email and a PM to ask him (I guess no other member of either the Science Team or Development Team would know).


    What other checks to do?

    As I had checked that the "fraction" values are consistent with the "count" ones, for QSA, and as all values for the other 2,987 QS objects are the same* in QSA and QSB, the only 'calculation' consistency check I need to do is on these 15 objects, for QSB.

    There are more checks I could do, but I can't think of any that would be relevant to our analyses; can you?

    I need to write up what to do about the errant QC object 587732591182020753 ( AGS00004cy ), and perhaps some suggestions on what to do about the above 15 QS ones.

    *to within < 0.001, or < 1 vote; the two sets have different numbers of significant digits for "fraction" values

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    Of the 15 objects with inconsistent classifications, only five are members of 'the 1194', a.k.a. 'subset_2'. And of these five, I think the vote counts for just one change sufficiently, between QSA and QSB, that we might put it into a different 'merger' or 'asymmetry' classification bin. While I should check this out in more detail - and write it up - I think the classification databases are very likely robust and stable enough for the three (?) analyses we're going to proceed with for the paper.

    Possible caveat: once Kyle has had a chance to investigate how these classification inconsistencies arose, it's possible that many more than 15 objects' classifications are found to be unreliable.

    I can't think of any that would be relevant to our analyses; can you?

    Here's one: there are six non-classification fields in QSB that are not in QSA; I will check that their values match those in the most recent Tools QS catalog.

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    What other checks to do?

    As I had checked that the "fraction" values are consistent with the "count" ones, for QSA, and as all values for the other 2,987 QS objects are the same* in QSA and QSB, the only 'calculation' consistency check I need to do is on these 15 objects, for QSB.

    Now done; no inconsistencies.

    I will check that their values match those in the most recent Tools QS catalog.

    Also done; no inconsistencies.

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    What is the cause of these inconsistencies? I'll send Kyle an email and a PM to ask him (I guess no other member of either the Science Team or Development Team would know).

    I sent an email, to Kyle and Laura, cc Brooke and Karen, on 1 March. I did not send a PM.

    Posted

  • klmasters by klmasters scientist

    Hey Jean, Are these differences mostly in the total vote counts? It's possible that Kyle just did a new download of the Quench classifications which added a handful of clicks. It's unlikely these would significantly change any of the consensus answers. Are you concerned that they did?

    Posted

  • JeanTate by JeanTate in response to klmasters's comment.

    Hi Karen,

    Are these differences mostly in the total vote counts?

    No; of the 15 inconsistencies (QS objects with inconsistent values), only two differ in total vote count. And the inconsistency with the QC object is unrelated to vote count (or any 'click count').

    It's possible that Kyle just did a new download of the Quench classifications which added a handful of clicks.

    The distribution of values (clicks) for various tx_ay_count fields rules that explanation out.

    It's unlikely these would significantly change any of the consensus answers.

    Yes; as I said in my email, the changes affect only ~one consensus answer (the 'QC object inconsistency' is quite different).

    Are you concerned that they did?

    Not really. My main concern - and it's a deep one - is with data integrity; unless the root cause of these inconsistencies is tracked down and explained, how can anyone trust the data? A related concern: no one on the Science or Development Team seems to find this troubling.

    More generally: over in Radio Galaxy Zoo (RGZ), I started a Talk thread on repeat images (Anyone getting any repeat images?). On January 30 2014 11:22 PM, ADMIN bumishness wrote:

    After some investigating, we are pretty sure this is an isolated incident to the m83 project. m83 definitely is having problems with repeats, but we've been unable to find any other systemic problem with any other project.

    There is always potential for the isolated inciden, but this should be in the very low, hundredths of a percent range.

    WizardHowl, I honestly say I can't explain why you received a dupe. If it's a consolation to the community, we've so far received nearly zero repeats from over 400k classifications.

    Since then, WizardHowl reported three new RGZ repeats/dups, and I reported two, so the incidence is surely waaay higher than "hundredths of a percent". No one on the Science or Development Team has responded, despite the fact that bumishness had earlier said "... we take this kind of problem very seriously. Our entire model depends upon each user only seeing each subject once."

    Posted

  • klmasters by klmasters scientist

    I wouldn't worry too much about repeats. We can post process that out. It's obviously undesirable to waste people's time showing them repeats, but it's hopefully not a huge fraction.

    I'm sorry you are doubting our data integrity. Many people are genuinely working very hard to get the best out of Zooniverse data (we spent an entire week in Taiwan recently discussing just that), so it is upsetting that you feel that way.

    Posted

  • JeanTate by JeanTate in response to klmasters's comment.

    I wouldn't worry too much about repeats.

    Sure; for this project its only relevance is whether there are systematic differences between the classifications of N_vote = [20, 22] vs [40,41] objects*.

    I'm sorry you are doubting our data integrity.

    I think the public history of this project shows that 'doubting the integrity of the data' was, scientifically, the right thing to do.

    Within the scope of this project - whose ultimate aim is to publish a paper suitable for submission to a relevant peer-reviewed journal - one of the most important things I hope to learn is what is 'best practice' among professional (extra-galactic) astronomers, when it comes to 'signing off' on content (so they are willing to have their names listed as co-authors). Specifically, what steps do they usually take to convince themselves of the integrity of the key data that is used in the analyses whose results are reported in the paper?

    Many people are genuinely working very hard to get the best out of Zooniverse data (we spent an entire week in Taiwan recently discussing just that), so it is upsetting that you feel that way.

    Let's continue discussion on this elsewhere. 😃

    *While Kyle has yet to confirm this, it's highly likely that the latter group are - or include - the 'repeats' for this project

    Posted

  • klmasters by klmasters scientist

    We are working on a very explicit description of the different weighting and user contribution combinations we have used across the history of GZ, and actually also looking into how much of a difference the different methods make. This was a large part of what we talked about at the recent Taiwan Citizen Science in Astronomy meeting.

    Posted

  • JeanTate by JeanTate

    A couple of days' ago ("March 20 2014 3:13 PM"), I reported the discovery of the (very likely) existence of "repeats", or "dups" among the Quench Project classifications*.

    The existence of repeats/dups has, potentially, huge implications for the Quench project: with the modal number of classifications per Subject only 20, repeats have the potential to seriously bias the classifications. Why? Because we have - implicitly or explicitly - assumed each classification is independent; that if there are 20 classifications of a Quench Project Subject, then 20 independent Users performed the Task of classifying it.

    My question for the Scientists: in your normal, non-Zooniverse, observational astronomy-based research, what lengths do you go to to track down the nature of potential sources of bias such as this (and characterize them, quantitatively)?

    I'll also be sending an email, with this question, to some of the SCIENTISTS.

    *These are instances where a User - in the language Developers etc use, a zooite, or citizen scientist - is given the same Subject - image of a galaxy, for us - to classify, more than once

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    I'll also be sending an email, with this question, to some of the SCIENTISTS.

    I did; my email had the same content as this post.

    I got some replies, and have Kyle Willett's permission to quote his comments.

    Here's the first:

    That's a good question overall. I can tell you that for Galaxy Zoo 1 and Galaxy Zoo 2, we removed such repeat classifications from the final catalogues. These were a very small amount (much less than 1%) of the total number of classifications, since we had hundreds of thousands of galaxies and tens of thousands of users. We're using the same approach to the in-progress reduction of GZ: Hubble and the current projects.

    For GZ: Quench, it's clear that we have fewer independent data points since it's only a small number of galaxies and users. We do have the ability to remove duplicate classifications from the catalog - I would predict that might affect a few points for each galaxy. That data might be used to test a couple interesting questions: which users are consistent when shown the same galaxy again, and whether treating these classifications independently affects the final conclusions.

    I can tell you that less than 5% of Quench galaxies (273) had more than 2 duplicate classifications, and 0.3% had more than 5. None have more than 9. And individual classifications aren't necessarily bad in and of themselves - it's the equivalent of changing the weighting for fewer classifications. I don't see from the data that it's necessarily a problem, unless there are particular galaxies we want to examine more closely.

    There's a plot attached; however, a more appropriate one was attached to a later comment (so I'll omit the first one here).

    I replied (key content):

    Thanks Kyle.

    To be consistent, duplicate classifications should be removed, that seems clear; e.g. "In order to treat each vote as an independent measurement, classifications repeated by the same user were removed from the data, keeping only their votes from the last submission." (Willett+ 2013).

    But which classification to keep? Again, to be consistent, it should be either the first (as was done, for example, in the 'bias study'; Land+ 2008) or the last (GZ2; Willett+ 2013). As Quench is more like GZ2 than GZ1, I guess the latter.

    Here's Kyle's second comment (again, key content only):

    That's a good question. I've made a new version of the GZQ data set where I removed duplicate classifications by the same users. I kept classifications users that weren't logged in, since we don't have enough information to determine whether it was a duplicate or not. The average number of classifications removed per galaxy was about 2/20 (see attached histogram, which revises my estimate from a couple of days ago). That certainly could be enough to sway the morphological classifications for individual galaxies --- however, I'm keen to know whether the properties of the sample as a whole are consistent with the dataset where we kept the duplicate votes. If you have time to look at this data, Jean, hopefully you and the other Zooites can start answering this question.

    The new, duplicate-removed data sets are in the same Dropbox location as before, under the folder "removed_duplicates/". I'm looking forward to seeing what you find.

    https://db.tt/ZtTIQvn7

    I've downloaded the files, but haven't done any significant analyses on the data yet.

    (I'll edit this post, to attach the histogram, later) Done

    enter image description here

    Posted

  • KWillett by KWillett scientist

    Hi Jean, Quench Zooites -

    I have some additional data made at Laura Trouille's request that may help our joint analysis of the data. Specifically, Laura asked for tables that not only included the metadata and vote fractions, but also labels for each category similar to what appears on Talk (ie, "Galaxy is smooth; galaxy is symmetrical; galaxy has 1 off-center bright clump"; etc). Attached are links to versions of those tables. I'll describe in detail here what I did.

    • After talking with Laura, our philosophy (and in the absence of evidence against this so far) is that for the boost galaxies and those with repeated classifications, we will use the first 20 votes for each galaxy. For the 10 galaxies that received 40 votes in the initial Quench phase, then, these classifications are only the consensus of the first 20 timestamps. I have examined the vote fractions by hand and verified that they don't change by more than a couple, well within what I expect the variance to be.
    • Similarly, there are 30 galaxies that had multiple Zooniverse IDs but identical images and SDSS IDs. Jean and other volunteers correctly identified these. In this case, I've selected one out of two sets (randomly, but with this list fixed so we can always replicate the results from now on). As before, I've verified for these 30 that both the overall consensus morphology and individual vote fractions don't significantly change; out of 30 galaxies, ~50% were identical in consensus morphology, and only 3 switched their top-level morphology from smooth to featured (or vice versa). In these cases, it's the case that the first set had something like 11 votes for smooth, 9 for featured and those votes happened to go the other way in the second classification. The list of duplicate IDs is in the folder.
    • I removed galaxy AGS00004n1 from the sample; that's the only one where the image (for some reason) didn't match the objid and metadata. It's been replaced with the classifications from AGS00004cy.
    • Per earlier requests, these classifications also removed duplicate classifications from users who did the same galaxy more than once. So the average number of classifications is now 17-18 for each galaxy. It's easy for me to put them back in if requested.
    • The labels in the new table now are given by what I call the "most common path". Starting at the top task in the tree (features/disk or smooth or star/artifact), I ask which response had the most votes. From there, we proceed through the tree and select the response to each task receiving the most votes. Ties are broken randomly. If a task didn't get a plurality response, the label is left blank. So the threshold for inclusion is 0.50 * 2^n, where n is the number of tasks. This is not perfect (I still suggest using vote counts and fractions), but it's one way to easily select specific morphological categories without each of us coming up with our own recipes.

    Example: a long string of the labels is given as mcp = "s0a0;s1a1;s8a2;s9a3;s10a0;s11a1;". This corresponds to: a smooth galaxy; in-between roundness; no off-center clumps; neither merging nor tidal debris; symmetrical; and users didn't want to discuss it in Talk. Individual labels now each have their own columns in the tables I attached.

    The files are attached in this Dropbox folder: https://db.tt/JOX8Og8n

    And, in the interests of transparency and enabling anyone who wants to work more on the consensus, here is a link that houses all of my code for this: https://github.com/willettk/quench

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    Thanks Kyle (and Laura)! 😃

    Shortly after you posted this - and completely unaware that you'd done so! - I began to report the results of my analyses of differences between the 'nodup' and 'withdup' catalogs: What differences in 'consensus classification' are there if 'duplicates' are included?.

    Laura asked for tables that not only included the metadata and vote fractions, but also labels for each category similar to what appears on Talk (ie, "Galaxy is smooth; galaxy is symmetrical; galaxy has 1 off-center bright clump"; etc).

    That is most welcome!! 😄

    One quick analysis I'll be doing - on the dataset(s) you just posted - will be to see which objects are among the ppo-removed QS (N=1084 (1149-65)) and QC (N=1131 (1196-65)) targets (see this post, and the ones before it, in the Potentially Problematic Sources in 'Subset 2 -- 1149 Source Sample' thread),* and whether any "merging" or "asymmetric" level2^ classifications are affected.

    So the threshold for inclusion is 0.50 * 2^n, where n is the number of tasks. This is not perfect ...

    Indeed. A subtle bias this introduces is that treatment of instances with even-numbered and odd-numbered vote counts is different (e.g. 10/20 is over, as is 10/19, but 9/19 is under; this matters if there are 20 Q1/morphology votes and one is for soa (star or artifact) cf 20 and zero).

    *Redshifts between 0.02 and 0.10 AND estimated z-band absolute magnitudes brighter than -20.0

    ^by 'level2' I mean classifications which combine 'atomic' count/fraction values, to which a threshold has also been applied

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    One quick analysis I'll be doing - on the dataset(s) you just posted - will ... and whether any "merging" or "asymmetric" level2^ classifications are affected.

    I've just started a thread, Clean "021020" galaxies: 11 April catalogs, comparisons, and discussion, in which I plan to present the results of those analyses (etc). Starting with a link to two files (Google spreadsheets) which contain the IDs of those 1084 and 1131 objects.

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    In the thread I just started - "021020" catalogs: "the 1149" and "the 1196", with extra fields (BPT, ppo, ...) - I have posted links to subsets of the two catalogs Kyle posted (the name should be pretty self-explanatory).

    Posted