Tuesday, August 6th -- Quench Talk Office Hours

by trouille scientist, moderator, admin

Dear all, just a quick note to let you know I'm online and looking forward to today's exchange.

Posted August 6, 2013 7:00 PM

by trouille scientist, moderator, admin

Yes, this 'office hours' experiment may not pan out. Worth a try. Just an FYI, as I am online, I'm also surfing around the rest of Quench Talk responding to questions and following up on suggestions. Right now I'm specifically looking into the negative/placeholder flux values and seeing how best to address those for use in Tools.

Maybe tomorrow I'll test out porting questions from other boards within QuenchTalk and responding to them within here as well as at their original site.

Other ideas on how to better use 'office hours'?

Posted August 6, 2013 7:15 PM

by trouille scientist, moderator, admin

Right now I'm looking at the ObjID for sources with Halpha fluxes less than 0. I found these by applying a filter within Tools. You can see my dashboard: http://tools.zooniverse.org/#/dashboards/galaxy_zoo_starburst/52013ee5be70a6742e000163

I'm interested to see double check and make sure their spectra at Halpha (6563 Angstrom) don't have a clean emission line. I'm using http://cas.sdss.org/dr7/en/tools/explore. And I inputted the objID into the "Search by" in the left-hand menu.

The first I looked at is 588848899366977698. It has a weird Halpha line -- slants down in a bizarre way. http://cas.sdss.org/dr7/en/tools/explore/obj.asp?id=588848899366977698. So this reassures me that the Halpha flux is tagged as being less than or equal to 0 for a good reason.

Posted August 6, 2013 7:42 PM

by JeanTate

Speaking personally, not all questions are created equal. 😉

One of my most burning - getting the results of Stage 1 (classifications) into the database in a form we can all now use - has already been answered (thanks Laura!). How to let you - MODERATOR, SCIENTIST yous - know that my next most burning is "What - in detail - were the selection criteria you used, to select the QS and QC objects?"? It's a question that at least two others have also asked, already ...

After that? Probably "When will the Multiwavelength Viewer tool become available?"

Posted August 6, 2013 7:44 PM

by trouille scientist, moderator, admin

Yanmei Chen developed a principal component analysis program to identify post-quenched galaxies. http://en.wikipedia.org/wiki/Principal_component_analysis helps to explain this method, but more discussion should follow here.

In brief, the method looks for features (emission and absorption lines, size of the 4000 Angstrom break, etc.) in the galaxy spectra that indicate that they're a member of this special sub-group of galaxies.

The features the program looks for are based on those described in http://postquench.blogspot.com/2013/06/wong-et-al-article-galaxy-zoo-building.html. But it takes this to the next level, by creating a whole range of model spectra with different star formation history, time since quenching, rate of quenching, stellar population models, metallicity, etc. and correlates the SDSS galaxies against all the spectral components of these model spectra.

The power of the principal component analysis is that it uses many spectral features to determine if a galaxy is or isn't a post-quenched galaxy.

Posted August 6, 2013 7:54 PM

by trouille scientist, moderator, admin

This site is quite helpful in describing a bit more about the PCA (principal component analysis) method. http://www.sedfitting.org/SED08/Paper_vs1.0_online/walcher_mssu9.html

Posted August 6, 2013 7:55 PM

by trouille scientist, moderator, admin

I also find the first couple paragraphs in Section 3 of this article helpful:
http://arxiv.org/pdf/astro-ph/9805130v1.pdf

"Suppose we have a sample of N galaxy spectra, all covering the same rest-frame wavelength range. Each spectrum is described by an M-dimensional vector X containing the galaxy ﬂux at M uniformly sampled wavelengths. Let S be the M-dimensional space spanned by the ‘spectral’ vectors X. A given spectrum, then, is a point in S-space, and the spectra in the sample form a cloud of points in S.

The position of a galaxy spectrum point in this space depends on parameters such as age, star formation history and metallicity. However it is impossible to visualise directly how the data are distributed in high dimensional spaces. An alternative is to employ some technique for dimension reduction by projecting the data in, say, two dimensions."

Posted August 6, 2013 8:01 PM

by trouille scientist, moderator, admin

Speaking of classifications, I just talked with our awesome Zooniverse programmer and he's working on transforming the classification results into palatable form within Quench Tools. That should be going up either by the end of today or tomorrow.

Kyle Willett is the astronomer who took the original classifications from GZ Quench and ran them through his existing software from previous GZ projects to create the aggregate results.

Posted August 6, 2013 8:11 PM

by JeanTate in response to trouille's comment.

Thanks! 😄

So that method was set loose on what, the DR7¹ spectroscopic database? Yet the flux values seem to be from galSpecline, in DR9! 😮

How were the QC objects selected? In one post you said (going from memory) each one matches a QS galaxy, in redshift (within 0.02) and stellar mass. But there are ~100 QC objects with no log(mass) values (and ~four in QS), so how were they selected?

¹ because in Examine, all (?) the QS and QC objects have DR7 (Photo) ObjIds (not all are correct tho!)

Posted August 6, 2013 8:13 PM

by trouille scientist, moderator, admin

http://tools.zooniverse.org/#/dashboards/galaxy_zoo_starburst/520150bfbe70a6481a0002dd

Above is the link to my dashboard showing the filters I used to determine which of the control galaxies have bad mass values. It looks like there are 5. I'll follow up on those 5 and see what went wrong.

Could you post how you get ~100 GC objects with no log-mass? That would be really helpful.

Posted August 6, 2013 8:40 PM

by JeanTate in response to trouille's comment.

Sorry, it was the other way round 😦 (I really should trying to do so much from memory)

5 QC objects have log_mass = -1 (per the navigator.csv table downloaded, per your instructions).

112 QS objects have log_mass = -1 (same method)

Posted August 6, 2013 8:46 PM

by wassock moderator in response to trouille's comment.

Think Jean has the a about f the sample has a hundred or so negative masses and the control ~5 (once you've downloaded either set of data the tools refers to it as Quench-n so it's difficult to keep track of which plot is which) anyway the point is (I think) that if the 2 sets have been matched for redshift and mass there shouldn't be this difference

Posted August 6, 2013 8:54 PM

by trouille scientist, moderator, admin

It turns out these 5 control galaxies with bad masses correspond to 5 post-quenched galaxies with bad masses. In our results, we'll filter out those bad post-quenched galaxies from our sample.

Posted August 9, 2013 6:49 PM

by trouille scientist, moderator, admin

I'm really glad and excited you are talking about the 112 post-quenched galaxies with negative mass values and that you're interested in their implications for the data analysis.

While Yanmei (my collaborator who did the sample selection) was able to use the spectra for these galaxies to carry out her principal component analysis method (to identify them as post-quenched galaxies), it's not too surprising that for some galaxies she was not able to get an accurate mass estimate through her automated pipeline. It may be that these sources have something odd about their spectra. Or her pipeline came across something it didn't know how to handle and rejected these sources.

It would be really helpful if someone could examine the spectra, redshifts, and flux values for these 112 galaxies and see if there's a common thread that may have caused this problem.

Someone else could see if these sources are in the 2mass catalog. Another way to get stellar mass for a galaxy is to use its near-infrared flux, which is a good proxy for stellar mass (since long-lived, low mass old stars give off much of their light in the near-infrared).

Cheers,
Laura

Posted August 9, 2013 7:04 PM

by JeanTate in response to trouille's comment.

Hmm ... not sure I understand. In what way do they 'correspond to'?

Posted August 9, 2013 8:07 PM

by JeanTate in response to trouille's comment.

There are ~25 QS objects in the outliers thread already; among them are 4 with log_mass = -1. Small numbers, but if the "negative mass" QS objects were distributed randomly, you'd expect just one.

In addition to the outliers, I excluded another ~200 in order to create my first (QS) BPT diagram. Among these are six "negative mass" QS objects. Somewhat better numbers; if distributed randomly (actually independently), you'd expect seven.

On it! 😃

ETA: It would help greatly if you could re-do (re-load) the QS catalog, with the flux error fields correctly populated 😉

Posted August 9, 2013 8:14 PM

by jules moderator

So these negative mass galaxies are "special"? I have a filtered sample of Quench "mergers" (tidal tails + merging) and am trying to add another filter to exclude these negative mass galaxies so that I can compare the remaining galaxies with similar plots from the control sample. At the moment they are getting in the way and I'll have a look at them later!

My command for removing the negative mass galaxies is not working, however. I've tried

filter .Log_Mass !='-1'

which, though accepted, (in that it appears as an applied filter) is clearly wrong as the table remains unchanged. Help!

Posted August 9, 2013 9:00 PM

by lpspieler moderator

Hi Jules,

I guess the correct command would have been

filter .log_mass != -1

But don't worry about the table not changing. Tables seem to display only very few rows of the entire data set they contain.
The difference will show when you create, say, a histogram of log_mass.

Posted August 9, 2013 11:10 PM

by jules moderator in response to lpspieler's comment.

Ah - lower case variable names. I think I even read about that somewhere too. Thanks Lionel- that worked! 😃 I just discovered that the greater than/less than symbols on the table headers navigate through pages of table data.

Posted August 10, 2013 12:12 AM

by lpspieler moderator

Ah - lower case variable names. I think I even read about that somewhere too.

You can see the column names that need to be used for specifying filters or new columns when choosing the data associated with the X and Y axis in scatter plots or histograms. They are (unfortunately) not equal to the column headings in the tables.

Posted August 10, 2013 12:31 AM

by jules moderator

So I see. I'm getting there... 😉

Posted August 10, 2013 12:48 AM

by mlpeck in response to trouille's comment.

Someone else could see if these sources are in the 2mass catalog. Another way to get stellar mass for a galaxy is to use its near-infrared flux, which is a good proxy for stellar mass (since long-lived, low mass old stars give off much of their light in the near-infrared).

Laura:

I looked at all the spectra, and almost all of them are unremarkable. There was one object that I had already thrown out of my personal database because DR9 had classified it as a QSO with z = .96 +- .90 (!) and the photo showed two point-like objects in close proximity. That object is somewhere in JeanTate's list of outliers.

Anyway, there's another way to fill in those missing mass values using only data you already have. There's lots of literature claiming that stellar masses (or equivalently mass to light ratios) can be estimated with reasonable precision using visual-near IR magnitudes and colors alone (I can dig up references as needed). Here are the results of a linear regression of non-missing log_mass values on absolute R magnitudes and r-z colors (this is just cut and pasted from R output):

Call:

lm(formula = cat.quench$log_mass ~ Mr.quench + rz.quench, na.action = na.exclude)

Coefficients:
             Estimate Std. Error t value Pr(|t|)    
(Intercept)  1.434138   0.032977   43.49   2e-16 ***
Mr.quench   -0.392020   0.001555 -252.06  2e-16 ***
rz.quench    1.173519   0.011388  103.05   2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.1069 on 2887 degrees of freedom
  (111 observations deleted due to missingness)
Multiple R-squared: 0.9653,     Adjusted R-squared: 0.9653

Wow, this talk interface is just awful.

Anyway, given this linear relationship between log_mass, absolute R magnitude and color the missing values can be predicted.

I used H0 = 70 km/sec. and Omega_m = 0.27 to calculate absolute magnitudes from the tabulated r magnitudes. (r-z) gave a tighter relationship than (g-i) or (u-r) which were the other colors I tried.

Posted August 10, 2013 5:26 PM

by JeanTate in response to mlpeck's comment.

Comparing the "log_mass = -1" QS objects with all the other QS objects, on each of the 'observational parameter' fields¹, shows nothing remarkable ... in the following sense:

the max, min, range, mean, σ, ... of the two distributions are very similar for all parameters, for all non-outliers (exception below)
more formally, the means are the same (in a statistically significant sense), as are the σ's
the exception is Oii_flux, because I have not yet tried to identify outliers (specific to this field)
V_disp needs more detailed analysis

While there are - so far - only ten 'outliers' among the "-1"s, this is far, far more than would be expected if they were distributed randomly (there are, so far, only QS 35 outliers). Not too much should be read into this however; my selection criteria for checking whether an object is an outlier or not is far from being unbiased! 😉

¹ As I have been unable to download either QS or QC, since the classifications were added, I cannot check the 'classification' observational parameters

Posted August 12, 2013 12:09 AM

by JeanTate in response to trouille's comment.

I for one am quite interested in doing this sort of investigation.

However, among other things, I am hampered by a complete lack of knowledge concerning how you (or Yanmei) derived the Log_mass values in the QS (and QC) catalogs.¹

At one level, it's all pretty unremarkable (e.g. mlpeck's Estimating the missing stellar masses thread), once you exclude a very small number of stars and overlaps.

At another level, a significant fraction of all objects - whether it's those with Log_mass = -1 or not - are 'outliers' ... parts of the spectra that are masked; line fluxes that are clearly wrong; fiber covering fractions that are ridiculously low; objects whose spectra do not match the SDSS DR7 ObjIds in Examine; etc, etc, etc.

From another perspective, who cares? Simply remove all the "Log_mass = -1" objects (and their corresponding QC counterparts) from the catalog, and base the (eventual) paper on what's left (96+% of the original QS)! 😃

More than anything, however, let's discuss.

¹ the same is true of all the other fields/parameters (other than the classifications, of course). Perhaps everything - including Petro_R50 and all the line fluxes - are simply whatever some CasJobs query returns ... but what context? And were such queries prefaced by (the equivalent of) "only science primaries"?

Posted August 12, 2013 9:49 PM