Statistical comparisons, Quench subset 1 vs. subset 2

by mlpeck

~~Placeholder for now until I have time to post.~~

I still don't have time to post. Scientist ivywong suggested doing bootstrap resampling to look for differences between our proposed subsets of the original quench sample, which are defined by the following selection criteria:

Subset 1: 0.02 < z < 0.08, Mz < -19.5.

Subset 2: 0.02 < z < 0.10, Mz < -20.

There are 792 objects in quench sample subset 1 and 1149 in subset 2.

So far I've looked at medians, weighted medians, and all percentiles from 1st to 99th of some variables of interest, and proportions of some categorical variables.

The plot below is the result of a bootstrap of percentiles of stellar masses for subset 1 (black and light blue) and subset 2 (red). These are obviously different, which they should be given the selection criteria. But the offset in the median mass is only about 0.13 dex, which is no larger than the typical statistical error.

Just as an aside I will try to post R source code for everything I've done on this project when I return home next week (I hope!). Please remind me if nothing has shown up on my dropbox folder by 20 Feb. or thereabouts.

Posted February 11, 2014 5:14 PM
by klmasters scientist

Can you remind us what subset 1 and subset 2 are? How about have a go at writing a figure caption for this plot? Also units on the x-axis would be great. I assume it's log stellar mass given the range of numbers. 😃

Posted February 12, 2014 12:36 PM
by JeanTate in response to klmasters's comment.

Talk makes it extraordinarily difficult to find anything, once there are multiple threads each with over ~two pages of posts, on a topic.

mlpeck's post/thread (above) derives from ivywong's January 29 2014 6:14 PM post, on p8 of the Dealing with Sample Selection Issues thread. Here's the key part:

My thoughts on the redshift extent: (warning: these are just suggestions and will probably require some coding and/or slightly more advanced tools such as "R") - One way to explore the effect of extending the redshift out to z~0.1 from z~0.08 could be to use a bootstrap resampling method on various key parameters within your sample (such as mass, colour, morphological type etc) for both the z~0.08 sample and the z~0.1 sample. Basically you randomly pick a property (eg mass) 10,000 times from both samples, then you have 2 distributions of masses. You can then do a Kolmogorov-Smirnov test of the 2 cumulative distributions to work out the statistical likelihood that both distributions come from a parent sample distribution. Hypothetically, if you find that all your main properties central to the results of the paper do not differ between the 2 samples, then you are free to use the one with the larger sample. Conversely, if there was one key property that is affected, then you also have a number on what the statistical difference is. Is this helpful?

The two subsets being referred to are those in links in mlpeck's January 25 2014 4:00 PM post on the same page of that thread (I'm not going to even try to reproduce the links):

I created a version 2 of possible quench-control matches by decreasing the weight given to (squared) mass differences to 0.001 relative to redshift differences. This obviously decreases the range of redshift differences while increasing the range of mass differences. Whether this matters for any other property we might care to look at, I know not. As I said earlier I never figured out a way to exploit the supposed matching of control to quench objects. This exercise was just a fun application of something I know how to do that I didn't immediately realize was applicable here.

There's a page and a bit of discussion on how to create the matched control subsets (pages 6 to 8 ); you have to go back to page 5 to find the proposed cuts: trouille's December 17 2013 3:47 PM post:

Mlpeck and/or others: Could you make a .csv file available that has all the meta data but cut down to only sources in the subsample you've suggested (Abs_Z LT -20 and redshift between 0.02 and 0.1)? Could you also provide a .csv with the subsample with Abs_Z LT -19.5 and redshift between 0.02 and 0.08?

Producing those two for the QS objects was, obviously, quite straight-forward (though along the way we had to deal with the 'missing mass' QS objects that made the cuts); for the QC ones it was anything but. What mlpeck produced - in v2 - is a good subset of QC objects which are closely matched - pair-wise - with the QS ones in both redshift and log_mass.

I assume it's log stellar mass given the range of numbers.

Yes, per what was downloaded from the Tools catalogs ("log_mass"), supplemented by values mlpeck obtained (from DR8?) for those with 'missing mass'.

Posted February 12, 2014 2:38 PM
by mlpeck in response to klmasters's comment.

Can you remind us what subset 1 and subset 2 are? How about have a go
at writing a figure caption for this plot? Also units on the x-axis
would be great. I assume it's log stellar mass given the range of
numbers.

Sorry, that post was a placeholder that I intend to edit sometime soon. Subsets 1 and 2 were proposed in the sample selection thread -- both are intended to be close to volume limited. Selection criteria:

Subset 1: 0.02 < z < 0.08, M_z < -19.5.

Subset 2: 0.02 < z < 0.10, M_z < -20.

There are 792 objects in quench sample subset 1 and 1149 in subset 2.

I solved a slightly modified version of the assignment problem to pick control objects closely matched in redshift and stellar mass to each quench object as discussed in more detail by JeanTate above.

Posted February 12, 2014 4:15 PM
by trouille scientist, moderator, admin

Nice work on thinking about the differences between the two subsamples.

I'm leaning towards mlpeck's subset 2 selection. All the metadata for this subset2 sample that mlpeck compiled is on p5 of http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS0000223 (with a link to mlpeck's .csv file that you can download).

Note that mlpeck's subset2 .csv file has 27 sources with mass values EQ 'NA' (which is my fault because my original dataset I posted had sources with mass LE 0). Jean made another file that provides the mass values for all subset2 sources, including the 27 sources that originally were missing it. I believe this is because mlpeck made a file for the full quench sample where he got missing masses from the SDSS database (from dr9?).

In any case, just so it's in one place, here is a .tab file with objID, redshift, and mass for this Quench subset2 sample. https://vault.it.northwestern.edu/let412/GZQuench/TalkData/quench_subset2.tab

Jean -- could you expand your work from http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS0000226 (which was on the 778 subsample) to check for potentially problematic sources in this sample of 1149? Might be easiest to find if you start a new thread on that. That will help us know if we need to think about throwing out any of these 1149 sources before re-making plots like merger fraction vs. galaxy mass.

Starting from mlpeck's subset 2 selection, I then identified the matching control sources. I did this by finding which of the control sources from the original control sample* matched up to this subset of quench sources in subset2.

The one tricky part was finding control sources for the 27 quench sources that originally had mass LE 0. For these I needed to find new control sources (but ideally ones for which we already determined classifications) that are in the correct mass and redshift range. To do this, I looked at all the control sources that weren't matched to the subset2 quench sources and identified sources with the correct mass and redshift range. I.e.,

ratio = mass_control / mass_quench ;using actual mass, not log mass
options = where( ratio ge 0.8 and ratio le 1.2 and abs(redshift_control - redshift_quench) le 0.02)

I was able to identify good control sources to use for 13 of the 27 sources that needed new control sources. 14 quench sources in subset2 still need control sources (i.e., didn't have a good match in the existing remaining control sample).

Here is the file with the matching control sample for subset2: https://vault.it.northwestern.edu/let412/GZQuench/TalkData/control_subset2.tab
You'll see that there are 14 sources with ObjID = NoMatch. The corresponding Quench sources are therefore the ones that currently have no good matching control source.

It may be that these remaining 14 quench sources will serendipitously turn out to be bad sources that need to be taken out anyway. Jean will let us know 😃 Once we know how many of the 14 remaining control sources we still need, we'll decide whether to identify control sources in the full SDSS sample and get their classifications from the public, or whether we just take these 14 quench sources out of our final sample and explain why in the article. I'd lean towards the latter since 14 sources is a small enough number to not significantly affect the population statistics.

Mlpeck -- could you post a file for just these control galaxies that provides their Hdelta_a for these sources? I believe you've gotten these from the full SDSS database?

Note: In other posts, mlpeck and possibly Jean as well have culled the full SDSS database for new matching control galaxies. We are constrained however to our current sample of 3002 control galaxies -- i.e., we can't reopen a new classification request for hundreds of new control galaxies.

*Here are the full quench and control samples. Line 1 (i.e., source 1) in the quench file corresponds to line 1 in the control file, etc.
https://vault.it.northwestern.edu/let412/GZQuench/TalkData/sample_091113.tab
https://vault.it.northwestern.edu/let412/GZQuench/TalkData/control_091113.tab

Posted February 15, 2014 3:39 AM
by trouille scientist, moderator, admin

Question for you all -- would you prefer to leave the discussion forum and move to direct emails between mlpeck, jean, jules, and the science team? It might be easier to follow the thread of the conversation and have all materials in one place.

Or, I could open a new discussion forum section that will contain all our conversations from here on out. That might also help so that we all know to go to just one place when we log in.

Or some other option. Let me know what you think. We're now at the normal size of a research team and it may be that the discussion forum isn't needed/ideal anymore.

We are quite close to being able to start solidifying results. I see the next steps as follows, and have listed who could take lead responsibility for each):
1. Identify problematic sources within the subset2 quench sample (Jean lead)
2. Check the classification results for the final quench sample & matching control sample (Kyle + Jean lead)
3. Redo promising results to include in the article:
-- Mergers role in quenching star formation (i.e., difference seen in merger_fraction vs. galaxy mass for Quench vs. Control)
(Jules + Laura lead)
http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS000021w
http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS0000203
http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS00001zt

-- Differences between Quench and Control in terms of AGN - BPT properties (mlpeck lead)
http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS00001y6
http://quenchtalk.galaxyzoo.org/?utm_source=Newsletter&utm_medium=Email&utm_campaign=Quench Launch#/boards/BGS0000008/discussions/DGS00001yi
http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS000021y

(Note, for this, check out recent Galaxy Zoo red spirals article -- might have relevant information on this: http://blog.galaxyzoo.org/tag/red-spirals/)

-- The impact of environment (mlpeck lead)
http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS000021x

--Others?
1. One we have redone these promising results with our clean final sample, we'll decide which few plots we want to include in the article, what the captions would read, and how we can use them as the backbone around which we'll organize the flow of the article. We'll then decide as a group on how we want to proceed with the actual writing and I'll open up the authorea framework for us to work on.
Note: In tandem to this work, mzevin is working on recreating the Yanmei sample selection and will post on the process and how it works out. That way we'll have work within the discussion forum that makes that clear and we can build on his summary of the process to write it up for the article. http://quenchtalk.galaxyzoo.org/#/boards/BGS0000001/discussions/DGS00001xy

Posted February 15, 2014 3:52 AM
by JeanTate in response to trouille's comment.

Starting from mlpeck's subset 2 selection, I then identified the matching control sources. I did this by finding which of the control sources from the original control sample* matched up to this subset of quench sources in subset2.

The one tricky part was finding control sources for the 27 quench sources that originally had mass LE 0. For these I needed to find new control sources (but ideally ones for which we already determined classifications) that are in the correct mass and redshift range. To do this, I looked at all the control sources that weren't matched to the subset2 quench sources and identified sources with the correct mass and redshift range. I.e.,

ratio = mass_control / mass_quench ;using actual mass, not log mass options = where( ratio ge 0.8 and ratio le 1.2 and abs(redshift_control - redshift_quench) le 0.02)

I was able to identify good control sources to use for 13 of the 27 sources that needed new control sources. 14 quench sources in subset2 still need control sources (i.e., didn't have a good match in the existing remaining control sample).

Here is the file with the matching control sample for subset2: https://vault.it.northwestern.edu/let412/GZQuench/TalkData/control_subset2.tab You'll see that there are 14 sources with ObjID = NoMatch. The corresponding Quench sources are therefore the ones that currently have no good matching control source.

We all know, only too well, that it's sometimes nigh on impossible to find anything in Quench Talk (and Talks in general). It seems that you may have missed the work mlpeck did on re-assigning matched control (pairs), within the two (z, abs_z) cuts¹. Admittedly it's somewhat scattered, but it is all within the one thread: Dealing with Sample Selection Issues (starting around p5 or 6). here's the key post², which is on p8:

I created a version 2 of possible quench-control matches by decreasing the weight given to (squared) mass differences to 0.001 relative to redshift differences. This obviously decreases the range of redshift differences while increasing the range of mass differences. Whether this matters for any other property we might care to look at, I know not. As I said earlier I never figured out a way to exploit the supposed matching of control to quench objects. This exercise was just a fun application of something I know how to do that I didn't immediately realize was applicable here.

quench-control match, subset 1, v2

quench-control match, subset 2, v2

If I get a chance, later this weekend, I'll compare your file with mlpeck's (and report what I find).

Jean -- could you expand your work from http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS0000226 (which was on the 778 subsample) to check for potentially problematic sources in this sample of 1149? Might be easiest to find if you start a new thread on that. That will help us know if we need to think about throwing out any of these 1149 sources before re-making plots like merger fraction vs. galaxy mass.

Sure. In fact I have been planning to do just that. However, I want to do it in one go, or as close to one go as I can. So I've been waiting for KWillett to post the detailed classifications of the 56 Quench Boost galaxies (at least 20 of these will be members of the matched controls), and the one 'missing' QC object ( AGS00002bf - this is certainly a member of the matched controls, in both subsets).

¹ 0.02 < z < 0.08 AND abs_z brighter than -19.5

0.02 < z < 0.10 AND abs_z brighter than -20.0

² Another shortcoming of Talk: it is quite difficult, and very error-prone, to copy/paste the content of posts

Posted February 15, 2014 1:35 PM
by JeanTate in response to trouille's comment.

Question for you all -- would you prefer to leave the discussion forum and move to direct emails between mlpeck, jean, jules, and the science team? It might be easier to follow the thread of the conversation and have all materials in one place.

Or, I could open a new discussion forum section that will contain all our conversations from here on out. That might also help so that we all know to go to just one place when we log in.

Or some other option. Let me know what you think. We're now at the normal size of a research team and it may be that the discussion forum isn't needed/ideal anymore.

I'm willing to try anything, particularly as Talk is clearly making things far more difficult for us than it should (compared with it's stated aims). Small correction: jules is no longer participating, but ChrisMolloy is.

We are quite close to being able to start solidifying results. I see the next steps as follows, ...

Two things: a) we're all waiting for Kyle to post the detailed classifications of the 56 Quench Boost objects (and also AGS00002bf ), and b) given the history of poor data integrity (let's say), I'd like to do some extensive 'insanity checking' (as Ivy calls it) before (re-)starting any analyses (this is more extensive than your step 2).
1. Redo promising results to include in the article:
I suggest we include the tentative asymmetry results ChrisMolloy found (and posted).

I'd also like to have a discussion - among the small group you proposed (plus ChrisMolloy, and let's invite zutopian to join too) - of what we do with the provisional results we've already found/posted but which will not go into the paper (I fully agree that we should focus on just a few things, perhaps only three, for the paper!)

(gotta run; more later, maybe)

Posted February 15, 2014 1:50 PM
by mlpeck in response to trouille's comment.

I'm leaning towards mlpeck's subset 2 selection. All the metadata for
this subset2 sample that mlpeck compiled is on p5 of
http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS0000223
(with a link to mlpeck's .csv file that you can download).

OK, that works for me and will save some work looking for differences in the subsets. Except for the obvious mass differences noted above I haven't found anything too dramatic yet.

I believe this is because mlpeck made a file for the full quench
sample where he got missing masses from the SDSS database (from dr9?).

Just to be completely clear all I did was query the SDSS database for MPA pipeline mass values. I used the DR10 "context" as they call it on CasJobs, but the data must be from DR8 or DR9. According to the SDSS website the MPA pipeline is "deprecated" as of DR10.

All of the non-missing values in the original dataset agree with the retrieved values to within at least 4 digits, so we should have as much confidence in the filled in values as we do for anything from the MPA pipeline.

Re matching control objects: As JeanTate explained I figured out how to do the matching objectively and optimally and we have a candidate set of matching controls drawn from the original set. Yes, I have to agree Talk is now hindering communication.

My AHA post was made on 23 January on page 7 of the sample selection thread, with some followups and tweaks over the next few pages.

I think JeanTate is happy with version 2 of the Control-Quench match, but there's no reason why there couldn't be a v3, v4, ...

I will post my code so the process will be transparent and reproducible.

Later, ...

Posted February 15, 2014 4:06 PM
by mlpeck in response to trouille's comment.

Question for you all -- would you prefer to leave the discussion forum
and move to direct emails between mlpeck, jean, jules, and the science
team? It might be easier to follow the thread of the conversation and
have all materials in one place.

I'm not quite as unhappy with the Talk interface as JeanTate, and I can see some value in keeping the analysis process public. One of the original stated aims of this project after all was to use Talk as the primary communications medium to carry out the analysis phase.

Switching to email is fine with me though. I'll just need to get a new email address that won't be flooded with spam for awhile.

-- Differences between Quench and Control in terms of AGN - BPT properties (mlpeck lead)

-- The impact of environment (mlpeck lead)

OK, I can work on those. I will have questions about resources to use, but I will save the questions for a bit.

--Others?

Well this may fit in the "AGN-BPT properties" topic, but I think it will be important to address. Close to half of the QS objects in subset 2 (subset 1 as well) are, spectroscopically, starforming. As far as I can tell they fall squarely on the local universe "star forming main sequence" (ref?).

I'll try to get a more detailed discussion of this topic started elsewhere.

Posted February 15, 2014 9:43 PM
by mlpeck in response to mlpeck's comment.

-- The impact of environment (mlpeck lead)

I think I have a strategy for describing the environment of quench sample galaxies, which is not quite the same thing as discussing the impact of environment. In the topic Can we say anything about the quench sample environments? I used a catalog of Tempel, Tago, & Liivamaegi (2011) that contains density estimates at several characteristic distance scales to estimate the density distribution of quench galaxies, and I briefly compared some other catalogs. But, I noticed some systematics that I didn't really understand in their density estimates and for that and other reasons the topic fell off my radar.

Very recently Tempel et al. (2014) published an updated catalog based on DR10 data. What I plan to do is apply exactly the same redshift and magnitude cuts to their sample as we have to subset 2 of ours. The systematic trends I noticed earlier haven't gone away, but any bias they introduce should be the same for their subset as ours.

I will also divide their sample into red and blue sequence galaxies, but this time I'll try to be more consistent with other recent literature (specifically Tojeiro et al. and Masters et al.). What we will see is that the distribution of densities is the same for the quench sample as for blue sequence galaxies in Tempel+ sample.

Posted February 17, 2014 11:49 PM
by JeanTate in response to mlpeck's comment.

This looks to be a very interesting result! 😃

I will also divide their sample into red and blue sequence galaxies, but this time I'll try to be more consistent with other recent literature (specifically Tojeiro et al. and Masters et al.).

Hmm ... IIRC, zkKevin mentioned (presented?) at the GZ conference in Sydney, Australia recently that dust-corrected (i.e. accounting for internal reddening) made some interesting changes to the traditional red sequence/blue cloud (and green valley) bimodal distributions; something to the effect that "X is a red herring!" As Karen was there, maybe she could give us a quick summary of this?

Posted February 19, 2014 1:50 PM
by trouille scientist, moderator, admin in response to mlpeck's comment.

Hi all,

We can't reopen a new classification request for hundreds of new control galaxies. The approach mlpeck used to cull the wider SDSS database for new matching control galaxies is just not an option.

That is why we have to use the existing sample of 3002 Control galaxies for which we have classifications. This is the file with the original 3002 Control galaxies (as in my post on p1 of this thread):
https://vault.it.northwestern.edu/let412/GZQuench/TalkData/control_091113.tab

As in my post on p1 of this thread, here is the file I created that provides the matching control sample for subset2: https://vault.it.northwestern.edu/let412/GZQuench/TalkData/control_subset2.tab

You'll see that there are just 14 sources with ObjID = NoMatch. The corresponding Quench sources are therefore the ones that currently have no good matching control source. With this approach, we would only need to get classifications for 14 new Control sources.

And just so all the info is in this 1 post, here again is the file for the Quench sample that was mlpeck's selection:
https://vault.it.northwestern.edu/let412/GZQuench/TalkData/quench_subset2.tab

The source in line 1 of control_subset2.tab is the Control source for the Quench source in line 1 of quench_subset2.tab.

The control source identification is done using the following rules:

---> abs(redshift_control - redshift_quench) le 0.02

--> ratio ge 0.8 and ratio le 1.2, where ratio = mass_control / mass_quench ;using actual mass, not log mass

Posted February 26, 2014 9:42 PM
by mlpeck

We can't reopen a new classification request for hundreds of new control galaxies. The approach mlpeck used to cull the wider SDSS database for new matching control galaxies is just not an option.

Nobody proposed reopening classifications. I performed a match of existing control galaxies to quench galaxies using an algorithm that guarantees an optimal match given a specified cost function. There turned out to be 1196 control galaxies meeting the redshift and magnitude cuts of subset 2. Therefore there are P(1196, 1149) possible ways to match control to quench galaxies. That's a rather large number, but it just happens that I know how to solve the problem optimally. So there's no problem, except for the small one that one of the control galaxies is in the background of M101 (fortuitously seen through an interarm region, but it's quite possible the photometry is contaminated).

What we're still missing are the classification data (vote totals and vote fractions) for the 56 galaxies that were added in the "quench boost" phase plus I think one other one (per JeanTate). Kyle Willett is the custodian of that data, and he should be aware of the issue.

Posted February 26, 2014 10:11 PM
by trouille scientist, moderator, admin in response to mlpeck's comment.

Excellent! I see that in http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS000022m you include plots explaining what you've written here. Nice work.

Ramin Skibba is a Galaxy Zoo astronomer (http://cass.ucsd.edu/~rskibba/) who has worked on galaxy environment studies. He wrote the following in a recent email to me about advice on measuring environment around our Quench galaxies:

I work on environments and large-scale structure, so if you want to include some things about that, I could give some input. The Baldry density (using 4th & 5th nearest neighbors) is a popular and pretty good one. If you want to see comparisons between environment measures, you might want to look at this paper I was involved in:
http://arxiv.org/abs/1109.6328

The Baldry densities are correlated with halo mass at M GT 1e13 Msun, but there is a lot of scatter below that. In our other "Measures of Galaxy Environment" papers, we showed that the Baldry densities are primarily sensitive to small-scale environments (a few hundred kpc/h scales) and that they can be affected by projection effects and redshift errors, but those effects are probably small (unless the redshift errors are large).

The Tempel et al. group catalog seems like a robust one. If you're interested, the Yang et al. and Tinker et al. group catalogs seem to be a little more popular, and more people will be familiar with them. But I think all three should yield approximately similar results.

I posted this here, but I'm also posting it in the more relevant thread about environment. Best to continue the environment related discussion over there.

Posted February 26, 2014 10:31 PM
by trouille scientist, moderator, admin in response to mlpeck's comment.

Nobody proposed reopening classifications. I performed a match of existing control galaxies to quench galaxies using an algorithm that guarantees an optimal match given a specified cost function. There turned out to be 1196 control galaxies meeting the redshift and magnitude cuts of subset 2.

Strange. When I tried to find matches between the ObjIDs of your proposed Control sample of 1149 sources (qcmatch2.csv)* with the ObjIDs of the original 3002 Control sample, I found none. Perhaps that's not the right control sample of yours for subset2 I should be checking? If it is the right sample, then do you know what's going on?

I just also looked at qcmatch1.csv from that same post, and there the Control SDSS IDs do match up with ObjIDs in the original Control sample of 3002 galaxies.

*from p7 of http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS0000223

Posted February 26, 2014 10:37 PM
by trouille scientist, moderator, admin in response to mlpeck's comment.

What we're still missing are the classification data (vote totals and vote fractions) for the 56 galaxies that were added in the "quench boost" phase plus I think one other one (per JeanTate). Kyle Willett is the custodian of that data, and he should be aware of the issue.

Yes, this is a problem. I will email him again to request this. I had thought he had provided a file with all the classification results, but clearly this is not the case. My apologies for not having seen this in the threads and taking action earlier!

Could someone point out in which thread Willett posted the latest version of the classification results? And could you point me to the thread where it explains what the problem currently is (or just repost here, if possible)? That way I can be very clear in my email request to him on what is needed. Thanks!

Posted February 26, 2014 10:38 PM
by mlpeck

Here was my proposed match of control objects to quench subset 2. Actually this was version 2, which JeanTate and I agreed was better than my first go. Producing a version 3 will be no problem if it's needed. I will post the code that does the matching later.

quench-control match, v2

This has 8 columns: uid, DR7 ID, log_mass, and redshift for each quench object and its matching control. If using a spreadsheet remember the sdss_id fields must be specified as text!

Posted February 26, 2014 11:55 PM
by mlpeck

Could someone point out in which thread Willett posted the latest
version of the classification results? And could you point me to the
thread where it explains what the problem currently is (or just repost
here, if possible)? That way I can be very clear in my email request
to him on what is needed. Thanks!

Look on pages 5-6 of this thread: http://quenchtalk.galaxyzoo.org/#/boards/BGS000000e/discussions/DGS000022f. I think the original announcement and relevant followups are all there.

Posted February 27, 2014 12:01 AM
by trouille scientist, moderator, admin in response to mlpeck's comment.

Here was my proposed match of control objects to quench subset 2. Actually this was version 2, which JeanTate and I agreed was better than my first go. Producing a version 3 will be no problem if it's needed. I will post the code that does the matching later: quench-control match, v2

Excellent. Now I see that the ObjIDs from your control sample are included in the 3002 original Control sample galaxies. Wonderful.

And just to be doubly sure, you were making sure that every Quench galaxy has a Control counterpart that fulfills the following criteria?

---> abs(redshift_control - redshift_quench) le 0.02

--> ratio ge 0.8 and ratio le 1.2, where ratio = mass_control / mass_quench ;using actual mass, not log mass

Thanks!

Posted February 27, 2014 12:03 AM
by trouille scientist, moderator, admin in response to mlpeck's comment.

Perfect. I've emailed Willett a reminder and will ping him again tomorrow if he doesn't respond by the afternoon. Thanks for the quick post helping guide me to where I needed to look!

Posted February 27, 2014 12:12 AM
by mlpeck in response to trouille's comment.

And just to be doubly sure, you were making sure that every Quench
galaxy has a Control counterpart that fulfills the following criteria?

---> abs(redshift_control - redshift_quench) le 0.02

--> ratio ge 0.8 and ratio le 1.2, where ratio = mass_control / mass_quench ;using actual mass, not log mass

The first constraint is easy to meet. The second constraint may not be feasible. Version 2 of the QS-QC match had all redshift differences in the range (-.0056, +.0057) and all log_mass differences in the range (-0.39, +0.22) dex, with 98% in the range ± 0.11 dex. The number of outlying mass differences can be reduced by weighting them more heavily in the cost function at the expense of making the outliers look more outlierish.

Since outliers seem to be something of an issue with JeanTate she liked version 2 better even though the overall dispersion of mass differences is a little larger than in the first version I ran. I agree. For one reason there's the fact that redshifts are measured with relatively high precision (and often even high accuracy), while masses are estimated from models that even optimists would not claim are precise to better than around ± 0.1 dex (and essentially unknown accuracy).

Posted February 27, 2014 3:31 PM
by mlpeck

Just a quick comment here that I've started a topic in the Chat/Tools and analysis section with links to R code that I've used for this project. The link is http://quenchtalk.galaxyzoo.org/#/boards/BGS000000b/discussions/DGS000022p.

The second post has a link to the matching algorithm I used to find control partners to quench objects, along with very brief installation instructions. If any R users need more detailed instructions please ask.

Posted February 28, 2014 10:23 PM