Galaxy Zoo Starburst Talk

Clean "021020" galaxies: 11 April catalogs, comparisons, and discussion

  • JeanTate by JeanTate

    In this thread I will post results of the comparisons I made with earlier catalogs, of the clean "021020" galaxies in the "11 April catalogs". And discuss these. Please join in! ๐Ÿ˜ƒ

    What are the "11 April catalogs"? The QS and QC catalogs Kyle posted on 11 April, here:

    I have some additional data made at Laura Trouille's request that may help our joint analysis of the data. Specifically, Laura asked ... I'll describe in detail here what I did.

    What are the clean "021020" galaxies? They are the QS and QC galaxies which:

    • have redshifts between 0.02 and 0.10, AND
    • have estimated absolute z-band magnitudes brighter than -20.0, AND
    • which do not contain any of the potentially problematic objects posted in the Potentially Problematic Sources in 'Subset 2 -- 1149 Source Sample' thread; specifically, excluding the 65 QS and 65 QC objects identified on page 4 (QS) and page 3 (QC).

    There are 1084 1083 (see UPDATE) QS "clean 021020" galaxies, and 1131 QC ones. I have uploaded two CSV files with OBJID, uid, redshift, and Z_ABSMAG values for these, from the 11 April catalogs, to Google spreadsheets: 0210QS_excl_ppo, and 0210QC_excl_ppo (please let me know if you are unable to access/download them, or if you have any problems opening them).

    UPDATE: per my post down-thread, I have removed the QS object (OBJID 587735241176514586, AGS000010e) from the file 0210QS_excl_ppo.

    Posted

  • JeanTate by JeanTate

    Non-classification data

    This is all the fields, and values in those fields, except for the vote counts and fractions; things like REDSHIFT and HDELTA_FLUX_ERR.

    For the clean "021020" galaxies all the values, in both QS (N=1084) and QC (N=1131) "11 April catalogs", are the same as those in the 'nodup' catalogs.

    I think the only fields we will be using for analyses* are the four BPT line fluxes (and their errors)^, LOG_MASS, and SFR.

    There are 27 QS galaxies for which LOG_MASS has a value of -1; for each of these 27 the value of SFR is -99. There are no other QS galaxies for which SFR is -99.

    As mlpeck had obtained reliable LOG_MASS values for all 27, we will need to substitute them into the clean "021020" catalog we use for analyses. I need to check if he also found reliable SFR estimates for those.

    Here is a list of those 27:

    587726015627985271, AGS00002b2

    587729227690868785, AGS00002ar

    587729231986229860, AGS00002bd

    587729232519954460, AGS00002ba

    587729748448772448, AGS00002b1

    587731185118216312, AGS00000wi

    587731185125163170, AGS00000xm

    587731186194383200, AGS00000wn

    587731187804733598, AGS00000wl

    587731187814826058, AGS000018a

    587731512080269478, AGS00000n5

    587732703406915765, AGS00001b5

    587732703409406115, AGS00001bb

    587734303269257493, AGS00000wk

    587734303803637967, AGS00000wj

    587734304343458056, AGS00000wm

    587734304877314241, AGS00000wg

    587734305412612416, AGS00000xo

    587742953860759655, AGS00002am

    587745969464475872, AGS00002b9

    588015507656015980, AGS00000ix

    588017723864907947, AGS00001bm

    588017724933013652, AGS00001b9

    588017724934717581, AGS00001be

    588017726548213811, AGS00001bj

    588017728152338543, AGS00001bn

    588017728153518234, AGS00001bq

    All 1131 QC galaxies have positive LOG_MASS values. For only one QC galaxy is SFR -99, 587730815749324948 ( AGS000034j). I need to check if mlpeck found a reliable SFR estimate for this galaxy.

    *in addition to the ID fields (OBJID, a.k.a. sdss_id; and uid), REDSHIFT (and its error), and Z_ABSMAG (there is no error field for this). The fields V_DISP and V_DISP_ERR were used to identify ppos, but are not used in any analyses

    ^dropping the suffixes _FLUX (and _FLUX_ERR), these four are OIII, HBETA, HALPHA, and NII

    Posted

  • JeanTate by JeanTate

    Classification data

    This is all the fields, and values in those fields, which relate to zooites' classifications. In addition to fields like t01_a01_count and t01_a01_fraction, it includes total_votes, and most_common_path (and several other, 'label' fields).

    I compared the values in the _count fields, the _fraction fields, and the total_votes ones, in the "11 April catalogs" and 'nodup' catalogs, for the clean "021020" galaxies. There are very few objects for which there is even one difference: ten in the QS catalogs, and four in the QC ones.

    All four of the QC objects with differences, and seven (of ten) of the QS ones, differ in the 'total_votes' field. If the number of votes does not change, for the remaining three QS objects, what does change? The vote count distribution among the 2/3/4 answers to each of the 11 questions.

    Here is a list of these 14 objects:

    The seven QS objects with different total_votes:

    587727180607258696, AGS00000in

    587735241176514586, AGS000010e

    587735347477741856, AGS00001hg

    587738569779576937, AGS00001ht

    587739721385247087, AGS00001wh

    587739811560161342, AGS00001tp

    588017991774503148, AGS00001kd

    The three QS objects with the same total_votes, but differences in other _counts:

    587733398649503982, AGS000012x

    587739407321071728, AGS00001qr

    588007004165701773, AGS000009s

    The four QC objects with different total_votes:

    587734949130535106, AGS00003fh

    587741489820074277, AGS00003ue

    588017110757867596, AGS00002nj

    588298664116682946, AGS0000450

    In subsequent posts I'll look at how these changes affect the level2 classifications 'merging galaxy' and 'asymmetric galaxy'.

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    Two of the seven QS objects with different total_votes are duplicates, in the sense of 'same object/two different AGS identifiers', as reported in the Duplicates - summary thread (from 8 months' ago), and in Kyle's post (last week):

    there are 30 galaxies that had multiple Zooniverse IDs but identical images and SDSS IDs. Jean and other volunteers correctly identified these. In this case, I've selected one out of two sets (randomly, but with this list fixed so we can always replicate the results from now on).

    The two are:

    The three QS objects with the same total_votes, but differences in other _counts are all duplicates in this sense (QC AGS ID in brackets):

    None of the four QC objects with different total_votes are duplicates in this sense.

    This seems somewhat strange; naively I'd've expected there to be more. So later I will check the z (redshift) and Z (z-band absolute magnitude) of all in the Duplicates - summary thread, to see if any others should have survived the cut.

    In subsequent posts I'll look at how these changes affect the level2 classifications 'merging galaxy' and 'asymmetric galaxy'.

    Asymmetric galaxies

    Using a definition of 'asymmetric galaxy' as one whose t10a01 fraction ("Does the galaxy appear symmetrical?" "No") exceeds a threshold, two of the 14 objects would be classified differently (comparing the 'nodup' catalogs with the '11 April' ones), for all likely choices of threshold* (three significant digits):

    • AGS000010e: 0.778 ('nodup') and 0.438 ('11Apr'); this is a QS object with different total_votes
    • AGS000012x: 0.500 (ditto) and 0.737 (ditto): a QS object with the same total_votes, but differences in other _counts

    Only three others, among the 14, have fractions of 0.5 or greater, in either catalog; they're all QS objects with different total_votes ('nodup' then '11Apr'):

    How else might an asymmetric galaxy be defined?

    One possibility - a true 'level2' type classification - is to include only those objects with a 'soa fraction' (t00a02: "Is the galaxy simply smooth and rounded with no sign of a disk?", "Star or artifact") below a (modest) threshold, 0.3 say. As there are no QS objects with a soa fraction of 0.3 or above, in either catalog, and only one QC one (in both catalogs; it's not one of the four) - 587729781738570015, AGS00002y2; soa fraction 0.313 - the addition of this requirement has no effect, for our analyses.

    Generally, however, determining whether an object meets the definition of a class requires an independent application of the relevant criteria.

    Merging galaxies

    Using a definition of 'merging galaxy' as one whose t09a03 fraction ("Is the galaxy currently merging or is there any sign of tidal debris?" "Neither") does not exceed a threshold, only one - perhaps two - of the 14 objects would be classified differently (comparing the 'nodup' catalogs with the '11 April' ones), for all likely choices of threshold^ (three significant digits):

    • AGS000010e: 0.278 ('nodup') and 0.438 ('11Apr'); clearly likely to be classified differently
    • AGS00001ht: 0.524 and 0.500, respectively; marginally likely

    Both these are QS objects with different total_votes, as is the only other of the 14 with fractions of 0.5 or less, in either catalog ('nodup' then '11Apr'): AGS00001kd, 0.143 and 0.150.

    How else might an merging galaxy be defined?

    As with alternatives for asymmetric galaxies, one possibility - a true 'level2' type classification - is to include only those objects with a 'soa fraction' (t00a02: "Is the galaxy simply smooth and rounded with no sign of a disk?", "Star or artifact") below a (modest) threshold, 0.3 say. And as Q9 and Q10 have the same, single, 'cut' - in terms of how these questions may be asked in a classification (or not); every classification includes both questions, except for those which a zooite classed as 'soa' - the same conclusion applies: the addition of this requirement has no effect, for our analyses.


    *a minimum fraction of 0.5, with 0.5 as 'over'; fractions of 0.5, 0.6, and 0.7, both inclusive and not. Why are these "all likely choices of threshold"? Let's discuss! ๐Ÿ˜ƒ

    ^a maximum fraction of 0.5, with 0.5 as 'under'; fractions of 0.5, 0.4, and 0.3, both inclusive and not. Why are these "all likely choices of threshold"? Let's discuss! ๐Ÿ˜ƒ

    Posted

  • JeanTate by JeanTate

    To wrap up: does this set of comparisons lead to suggestions for excluding QS and/or QC objects from the clean "021020" galaxies (1084 QS ones, and 1131 QC ones)? And are there further actions to be taken, before beginning to (re-)analyze the data*?

    excluding QS and/or QC objects from the clean "021020" galaxies

    I would recommend excluding just one more object, the QS ObjId 587735241176514586, AGS000010e (originally also QC object AGS00003b8). Why? Because its secure classification as an asymmetric galaxy, and as a merging galaxy, would require us to understand why the vote fractions are so different, in the 'nodup' and '11Apr' catalogs. For just one object, I don't think it's worth the effort^.

    If we agree, then the remaining action - which I'll take - is to add this to the list on page 4 of the Potentially Problematic Sources in 'Subset 2 -- 1149 Source Sample' thread

    Are there further actions to be taken, before beginning to analyze the data?

    Yes:

    • find the values of LOG_MASS, from one of mlpeck's posts, for the 27 QS objects for which it's -1, in the '11Apr' catalog, and replace
    • ditto for the "-99" SFR values (same 27 QS objects, one QC one), if they have indeed been posted
    • when all other actions have been completed, prepare new catalogs, containing all relevant data on all 1083 QS objects and 1083 (or 1131) QC ones; publish them as CSV files (maybe FITS too)

    Maybe:

    • re-run the QS-QC matching exercise which mlpeck did, to select the
      1083 QC objects which best match the 1083 QS ones (pair-wise)

    Did I miss anything?

    Anything else?

    One thing I will do, as a separate exercise which will have no effect on the analyses, is what I mentioned above:

    This seems somewhat strange; naively I'd've expected there to be more. So later I will check the z (redshift) and Z (z-band absolute magnitude) of all in the Duplicates - summary thread, to see if any others should have survived the cut.

    Of course, there's deciding what definition of 'asymmetric galaxy' to use, ditto 'merging galaxy'. That will very likely entail deciding fraction thresholds to use. Also BPT classes. Etc. However, I consider these to be part of 'analyzing the data', not to mention that in many cases it is simply necessary to write down what we've already decided (and check to see that it's 100% unambiguous, etc).

    What do you think?


    *Per Laura's Dealing with Sample Selection Issues thread; from the OP:

    What now? There are 2 clear steps to take:

    #1 โ€“ Revisit our results with this stricter sample selection applied. Do we still get statistically significant results?

    For starters, Jules โ€“ can you give this a try for your merger fraction versus mass plots? Mlpeck โ€“ can you see if environmental effects can be seen? Jean or Mlpeck โ€“ can you replot the Quench and Control BPT diagrams for this sample?

    #2 - Identify if any problematic sources are still in this stricter sample selection of 778 Quench and 778 Control sources. As done previously, letโ€™s group the remaining problematic sources into categories and list their ObjIDs. That way we can make it very clear in the article why we have done any additional removal of sources (if we find that we need to).

    Note: we subsequently changed the "778 Quench and 778 Control sources" to 1149 QS and 1196 QC objects, by changing the z (redshift) and Z (r-band absolute magnitude) selection criteria.


    ^Of course, it's surely well worth the effort for all other Zooniverse projects of this kind: such a case suggests that there may be very significant biases, or systematic effects, in deriving classifications based on decision tree vote fractions; biases or systematic effects which Science Team members are quite unaware of

    Posted

  • JeanTate by JeanTate

    This seems somewhat strange; naively I'd've expected there to be more. So later I will check the z (redshift) and Z (z-band absolute magnitude) of all in the Duplicates - summary thread, to see if any others should have survived the cut.

    I've now checked. There are seven "021020" QS-QC duplicates (using the terminology of that thread); five are as already noted. The other two are:

    There are 19 QC-QC duplicates, in total (i.e. not just "021020" galaxies) but they were all removed from the QC catalog once the Quench Boost phase was done.

    Lastly, there's a bit of a mystery concerning the sole QS-QS object, which has AGS IDs of AGS0000080 (DR7 ObjId 587731514231619686) and AGS00000j6 (587731514231619685). In DR7, both have spectra, and both have estimated redshifts between 0.02 and 0.10. If you can find the DR7 images, you'll see that AGS00000j6 seems to be centered on the galaxy's nucleus, while AGS0000080 is a clump in the disk (the galaxy is an Eos, edge-on spiral, or close to one). As both are QS objects, both spectra show post-quench attributes. However, in DR9 and DR10, there's just one spectrum! ๐Ÿ˜ฎ

    Curiously, the estimated Z (z-band absolute magnitude) of AGS00000j6 (the DR7 object apparently centered on the nucleus) is fainter than the -20.0 threshold (it's -19.29), while that of the clump is brighter (-22.01). No surprise, then, to learn that AGS0000080 is the object in the "021020" QS database. Exclude this object too? I'd say no, even though there's another mystery, to do with the BPT class (or type) of each, as determined by a calculation using the MPA DR7 pipeline input values ...

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    Under Maybe as the answer to "Are there further actions to be taken, before beginning to analyze the data?":

    re-run the QS-QC matching exercise which mlpeck did, to select the 1083 QC objects which best match the 1083 QS ones (pair-wise)

    I do not have the tool mlpeck used, so I can't, myself, re-run the exercise. However, here's an interesting result: of the 65 QC ppos, just two are matched with the 66 QS ppos (in the qcmatch2_v2 CSV file he posted). Why is this interesting? Because it is consistent with the QS and QC ppos being independent! ๐Ÿ˜ƒ

    Posted

  • mlpeck by mlpeck

    I haven't had much time to post, but...

    I have at least three issues with this exercise:

    • Missing data and outliers happen, and it is very much not standard practice to throw out entire objects in a multidimensional data set because some items are missing or suspect.
    • Your methodology is subjective; it's not reproducible; it's not scaleable.
    • As far as I can remember you haven't shown that any object is actually problematic.

    I'm actually quite shocked that no practicing scientist has had a word to say about this, although sadly I'm also not at all surprised since there has been precious little communication from any of the "science team."

    I do have one recommendation, perhaps for JeanTate but certainly for any working scientists who wander by: get the recently published book Statistics, Data Mining, and Machine Learning in Astronomy by Ivezic, Connolly, VanderPlas and Gray (2014, Princeton Univ. Press, ISBN 978-0-691-15168-7).

    Calculus and some basic matrix algebra are prerequisites to understand the text, so it's probably not for the average zoo-ite. I would guess that most working scientists would find something to learn from the book even if they are experts in some aspect of data analysis just because the authors cover a huge amount of ground (mostly superficially to be sure).

    Another thing that's useful about the book for astronomers is they make use of non-toy astronomical datasets from SDSS and other large surveys. Every chapter has some discussion of robust methods and methods for outlier detection in large datasets, and some of their ideas are certainly applicable to this project.

    Right now I am skipping my way through the text. When I have access to more computing resources than the laptop and tablet I have with me at the moment I plan to dig into some of their data and algorithms. Techniques for cross-validation are pretty new to me, and I have some data I want to try out some ideas on.

    The book uses Python code throughout, but its introduction to the language is too brief to be really useful. So, for the non Python programmer another resource is needed to learn Python. Something else fun to do!

    Posted

  • JeanTate by JeanTate in response to mlpeck's comment.

    Thanks! ๐Ÿ˜ƒ

    I have at least three issues with this exercise:

    Hmm ... I don't understand ... in this thread I report the results of just one "exercise": field-by-field I compare the values contained in two catalogs ("11 April catalogs") with those in two others (the 'nodup' ones), for 1084 QS objects and 1131 QC ones.

    Based on what I found - findings which are objective, quantitative, and independently verifiable - I suggest that just one QS object be excluded from our analyses, as it is a ppo (potentially problematic object; Laura's own term).

    Further, the apparent cause of 587735241176514586 ( AGS000010e) being problematic is how the raw classification data has been processed, as Kyle explained here: the implementation of the Zooniverse code for the Quench project did not prevent a (registered, logged-on) zooite from classifying the same object more than once*, and ~40 SDSS objects (unique ObjIds) were each given two different AGS IDs. As the raw classification data has not been made available to us, we cannot tell if

    • some entire objects (in multidimensional data sets) have been thrown
      out because some items are missing or suspect
    • the methodology used to process the data is subjective, reproducible, and scaleable (or not)

    However, based on what has been made available to us, there are (or at least were) strong grounds for doubting the integrity of the processed data, with respect to both (and more).

    But maybe I am misunderstanding what you wrote; perhaps you are referring to the 65 ppo QS objects, and 65 ppo QC objects, I posted in the Potentially Problematic Sources in 'Subset 2 -- 1149 Source Sample' thread (on pages 3 and 4)?

    I do have one recommendation, perhaps for JeanTate but certainly for any working scientists who wander by: get the recently published book Statistics, Data Mining, and Machine Learning in Astronomy by Ivezic, Connolly, VanderPlas and Gray (2014, Princeton Univ. Press, ISBN 978-0-691-15168-7).

    Cool! ๐Ÿ˜ƒ

    Over the last few weeks I've been looking for books of this kind, and had found Modern Statistical Methods for Astronomy: With R Applications by Feigelson and Babu (2014, Cambridge University Press, ISBN 978-0-521-76727-9), and Statistical Data Analysis, by Cowan (1998, Oxford University Press, ISBN 978-0-198-50156-5). Are you - or any other reader - familiar with either? Would you recommend either? If your budget can stretch to just one, which would you recommend?

    *despite various public statements to the contrary, for example here.

    Posted

  • johnfairweather by johnfairweather in response to mlpeck's comment.

    I bought it a few weeks ago, see my thread here - http://www.galaxyzooforum.org/index.php?topic=281790.0.

    Posted

  • mlpeck by mlpeck in response to JeanTate's comment.

    But maybe I am misunderstanding what you wrote; perhaps you are
    referring to the 65 ppo QS objects, and 65 ppo QC objects, I posted in
    the Potentially Problematic Sources in 'Subset 2 -- 1149 Source
    Sample' thread (on pages 3 and 4)?

    Well yes, that is what I was directly referring to. But this topic seems, to borrow a phrase, to firmly assume that those ~130 objects are of no further interest to the analysis. I disagree, but I see you have apparently rethought that.

    Over the last few weeks I've been looking for books of this kind, and
    had found Modern Statistical Methods for Astronomy: With R
    Applications by Feigelson and Babu (2014, Cambridge University Press,
    ISBN 978-0-521-76727-9), and Statistical Data Analysis, by Cowan
    (1998, Oxford University Press, ISBN 978-0-198-50156-5). Are you - or
    any other reader - familiar with either? Would you recommend either?
    If your budget can stretch to just one, which would you recommend?

    I've only looked at the table of contents of Feigelson & Babu, but I am seriously considering buying it. Just from perusing the TOC the book covers the same ground as Ivezic et al. with perhaps a bit more emphasis on statistics and less on data mining and machine learning. If I had the budget for just one I'd choose based on my preferred data analysis environment -- R or Python.

    I'm not familiar with Cowan.

    Posted

  • JeanTate by JeanTate in response to mlpeck's comment.

    Well yes, that is what I was directly referring to. But this topic seems, to borrow a phrase, to firmly assume that those ~130 objects are of no further interest to the analysis. I disagree, but I see you have apparently rethought that.

    Thanks for clarifying that.

    I'm not sure if you've had a chance to check out the "021020" catalogs: "the 1149" and "the 1196", with extra fields (BPT, ppo, ...) thread yet or not, but you'll see that what I published is data on all 1149 (QS) and 1196 (QC) objects which have redshifts between 0.02 and 0.10 AND which have estimated z-band absolute magnitudes brighter than -20.0, with a flag field (ppo) to identify those I reckon are ppos (66 QS, 65 QC).

    My hope is that we - you, me, ChrisMolloy (and jules and zutopian if they choose to re-join), and the science team (or at least three members of it) - have a discussion on how to treat these objects, in the various analyses we undertake.

    For example, 27 of the 1149 QS objects have values of -1 in the LOG_MASS field, yet you were able to find apparently comparable (with the other 1122) estimates; should we add those in to the catalog? Replace the original values? Or search for a different source of LOG_MASS estimates, one which includes non-N/A values for all 1149 objects?

    Another: in any analysis involving BPT types, do we remove those objects for which at least one of the four emission lines is masked? Or do we include them, as "other" (or similar)? Or both?

    The question of how to treat objects with missing data, outliers, etc is surely one every observational astronomer who writes a paper on the results of their research has to address (even though - oddly - it's not mentioned in the description of Stage 2 of the Quench project). For me, this is one of the main things I'd like to learn, in this part of the project.

    Thanks too for your comments on the books.

    If I had the budget for just one I'd choose based on my preferred data analysis environment -- R or Python.

    Gotta laugh; I am somewhat familiar with Python, and not at all familiar with R ... but I'd really like to learn R! Now where's that jar of quarters I've been keeping for a rainy day? ... ๐Ÿ˜„

    Posted

  • mlpeck by mlpeck in response to JeanTate's comment.

    For example, 27 of the 1149 QS objects have values of -1 in the
    LOG_MASS field, yet you were able to find apparently comparable (with
    the other 1122) estimates; should we add those in to the catalog?
    Replace the original values? Or search for a different source of
    LOG_MASS estimates, one which includes non-N/A values for all 1149
    objects?

    I thought we all understood and agreed on what to do with the originally missing stellar mass values. I retrieved them from the MPA pipeline tables in the DR10 "context" of the SDSS CasJobs database -- which means in practice that the values date from the DR8 release. All non-missing values in common between DR7 and DR8 agreed to at least 4 significant digits beyond the decimal point, so simply substituting the DR8 values for the originally missing ones seems an innocuous choice.

    There's a reasonable audit trail in the topic Estimating the missing stellar masses and there was probably some discussion elsewhere. Since I took the lead on this I can re-document what was done if needed.

    I've downloaded the stellar mass estimates from all other groups that contributed to DR10, and I also have them from the outside "VESPA" database. I don't have any of them with me, but I can compare them in detail if there is interest. None of these will be complete for our data set and there will be scattered differences of 1 dex or more that I'm sure will be cause for consternation.

    Posted

  • JeanTate by JeanTate in response to mlpeck's comment.

    I thought we all understood and agreed on what to do with the originally missing stellar mass values.

    Me too. Which is one reason why I was somewhat surprised to find "missing mass" objects in the 11 April catalogs (or that there was no extra field, containing these). As members of the Science Team post here so rarely, we can only guess as to why.

    It used to be that we were all working with a single, periodically updated, catalog (or actually pair of catalogs, one for QS and one for QC). Now?

    Slightly OT (off-topic), but might as well be here as anywhere else: as we'll be working with BPT types, which list do you think we should use (associating each of the 1149 QS and 1196 QC objects with a unique BPT type)? In "021020" catalogs: "the 1149" and "the 1196", with extra fields (BPT, ppo, ...) I posted links to files which contain values assigned by the method I described; however, you have posted plots in which objects have been assigned BPT types, and in at least one (that I recall) the method you used is different than the one I used.

    Posted

  • mlpeck by mlpeck in response to JeanTate's comment.

    however, you have posted plots in which objects have been assigned BPT
    types, and in at least one (that I recall) the method you used is
    different than the one I used.

    Travel day today, so I have to be quick. I used an AGN/LINER division that was proposed by Schawinski et al. (2007?) in an early GZ related paper or perhaps a pre-GZ one. I've also suggested an ad hoc alternative to Brinchmann's low s/n SF/AGN classes.

    Posted

  • JeanTate by JeanTate in response to mlpeck's comment.

    Thanks! I remember reading the Schawinski+ reference (but didn't remember the specifics), but not the "ad hoc alternative to Brinchmann's". I confess I didn't try too hard to find them ... Talk's Search capability I find next to useless (except if you have an AGZ ID).

    When you have more time, which method for assigning BPT class/type do you suggest we use? I assume you agree that we should all use just one, consistently ...

    Posted

  • mlpeck by mlpeck in response to JeanTate's comment.

    Me too. Which is one reason why I was somewhat surprised to find
    "missing mass" objects in the 11 April catalogs (or that there was no
    extra field, containing these). As members of the Science Team post
    here so rarely, we can only guess as to why.

    It used to be that we were all working with a single, periodically
    updated, catalog (or actually pair of catalogs, one for QS and one for
    QC). Now?

    It appears that KWillett has done what he's directly responsible for, which is update the clicks summaries, but he's joining them to an outdated basic data set that nobody seems to be actively maintaining. That leaves version control up to us I guess.

    When you have more time, which method for assigning BPT class/type do
    you suggest we use? I assume you agree that we should all use just
    one, consistently ...

    Sure, we should use a consistent method. The Kewley and Kauffmann curves for separating AGN, "Composite," and starforming spectra are widely used, and I think we're using the same expressions. I've seen different AGN/LINER dividing lines used but Schawinski's seems as good as any and it will probably be an uncontroversial choice among GZ related scientists. It could also be that we don't really care about the AGN/LINER distinction. I think there's some evidence for an evolutionary process in which LINERS are either older or had less massive starbursts than more active galaxies, but the case might not be all that convincing.

    I'm not sure there's a strong consensus about what to do with spectra that aren't classifiable in BPT diagrams. If some scientist wants to make a decree I'm fine with that.

    Posted

  • JeanTate by JeanTate in response to mlpeck's comment.

    That leaves version control up to us I guess.

    I guess so.

    Posted

  • trouille by trouille scientist, moderator, admin

    Hello again. Good to see this discussion. And good that we've converged on the foundational piece of our selection:

    • have redshifts between 0.02 and 0.10
    • have estimated absolute z-band magnitudes brighter than -20.0

    There is precedent for removing sources from a sample selection in which it is possible to visually inspect each source individually as Jean has done. We should remove sources whose photometry is clearly contaminated by foreground stars, diffraction spikes, and/or other catastrophic error in the image (like having half the galaxy fall of the edge of an image).

    But we should not remove sources that have, for example, 'galaxy overlap'. Here we get into a more subjective regime that we'll have difficulty accounting for (for example, many of our sources may have fainter overlapping galaxies that the depth of our images just don't allow us to pick up -- we won't be able to model this properly and so we should avoid introducing this bias/error).

    We will definitely take advantage of Jean's efforts to identify spectra where masking/problems are visible and make sure we set values to NA for individual problematic emission/absorption lines in a given source (for example, if a given source has the [OIII] line masked but the pipeline incorrectly gave an [OIII] flux value. Jean, are there sources that have this problem?

    But, unless the entire spectrum for a source is clearly problematic, I haven't found examples in the literature where those sources are removed from a sample. There's usable information (redshift, emission/absorption fluxes from lines outside of the masked regions, etc.). Again, we just need to be sure that any lines that do fall in the problematic areas are correctly accounted for in our analysis.

    I would very much like to hear your thoughts on this.

    Posted

  • trouille by trouille scientist, moderator, admin

    For sources with classification result issues, those shouldn't be removed outright from the sample selection either. We should, of course, be very clear on our procedure for determining what ultimate classification to assign to each source.

    Jean -- I didn't quite understand your post titled 'classification data' earlier in this thread (p1 of http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS0000231). I've emailed Kyle and will prompt his response to be sure we're all on the same page and clear about those results.

    Posted

  • trouille by trouille scientist, moderator, admin

    A few additional questions on specific categories:

    One source I wondered about -- http://quenchtalk.galaxyzoo.org/#/subjects/AGS00002ha is one that Jean you noted as having an unreliable redshift. Is this based on the warning 'small delta chi2'? Is there an automatic search through the Quench and control samples that you ran to find sources with this error? If so, could you post the result from that search? We'll want to be sure our redshift assignments reflect this. Those sources will then fall out of our sample selection.

    Another source type I wondered about: http://quench.galaxyzoo.org/#/examine/AGS00002et has 'unrecognized star contaminates spectrum, photometry'. Could you describe what you mean by this? Is it that you've deconvolved the spectrum and resolve both a galaxy and a foreground star? Or is it that in the photometry you suspect there to be a foreground star? Unless we see the diffraction spikes of a foreground star, our identification is too subjective to be able to have strong enough grounds to remove this from our sample. Let us know what your thoughts are on this category.

    Posted

  • JeanTate by JeanTate in response to trouille's comment.

    Hi Laura,

    I'm not sure what's not clear, but it will be good to get Kyle's take on the cause(s) of the differences.

    In some more detail, then, here's what I did.

    IMPORTANT NOTE: I analyzed the 1084 "clean" QS and the 1131 "clean" QC objects ONLY. There are surely at least some, similar, differences among the other 1918 QS and 1871 QC objects.

    I start with two different versions of the QS and QC catalogs, the "11 April catalogs", which Kyle posted on 11 April, here; and the 'nodup' catalogs, which I posted on 28 March, here (Kyle is the source for these too, but he emailed me, rather than posting).

    I compared the values in all 64 'classification' fields* in each pair of files. I found only 14 objects (10 QS, 4 QC) for which there is at least one difference.

    Perhaps an example or two might help.

    The QS object 587727180607258696 ( AGS00000in): in the 11 April QS catalog, the value in the total_votes field is 20; in the nodup one, it's 29. With such a large difference in total_votes, it's no surprise that there are differences in many other of the other 63 'classification' fields!

    The QC object 587734949130535106 ( AGS00003fh): in the 11 April QS catalog, the value in the total_votes field is 20; in the nodup one, it's 21. As this one 'extra' vote is for "Smooth", there are 11April-nodup differences in only a few other fields.

    The QS object 587733398649503982 ( AGS000012x): total_votes is the same in both catalogs (19). However, the values in the three t00_count fields (i.e. the number of votes for "Smooth", "Features or disk", and "Star or artifact") are different: (15,4,0) and (17,1,1), for the 11April and nodup catalogs, respectively. Naturally, these vote differences produce many other differences, in other classification fields.

    *i.e. the fields which contain data directly derived from zooites' clicks

    Posted

  • JeanTate by JeanTate in response to trouille's comment.

    But we should not remove sources that have, for example, 'galaxy overlap'.

    While I take your point about there almost certainly being plenty of overlaps that are difficult to discern (and so choice will be subjective), in the four I posted the overlaps are pretty extreme. And in at least one case you can see features (lines) in the spectrum which are surely due to the other galaxy.

    Also, earlier I thought we had agreed to remove AGS00003ky for exactly this reason ("one of the control galaxies is in the background of M101 (fortuitously seen through an interarm region, but it's quite possible the photometry is contaminated)", to quote mlpeck, page 2 of this thread).

    for example, if a given source has the [OIII] line masked but the pipeline incorrectly gave an [OIII] flux value. Jean, are there sources that have this problem?

    Yes. For example, QS object 587741726574444657 ( AGS000022s): H-alpha is masked, but HALPHA_FLUX is 10039.3 (ยฑ6878.4). Of course, several of the 13 QC and other 3 QS ones have _FLUX values of 0 where lines are masked (all three other QS objects, in fact). I'll compile a list.

    But, unless the entire spectrum for a source is clearly problematic, I haven't found examples in the literature where those sources are removed from a sample. There's usable information (redshift, emission/absorption fluxes from lines outside of the masked regions, etc.). Again, we just need to be sure that any lines that do fall in the problematic areas are correctly accounted for in our analysis.

    How do we decide?

    For example, early on in the project we removed an object (I forget which; I'll dig it up later) whose spectrum was clearly flawed (the red and blue arms are mismatched; the continuum has a 'jump'), but whose redshift etc values should be robust.

    What concerns me most is use of LOG_MASS estimates derived from clearly 'bad' spectra ... fitting stellar models to spectra with, for example, several hundred nm missing surely gives results of dubious value, doesn't it?

    Posted

  • mlpeck by mlpeck in response to trouille's comment.

    We will definitely take advantage of Jean's efforts to identify
    spectra where masking/problems are visible and make sure we set values
    to NA for individual problematic emission/absorption lines in a given
    source (for example, if a given source has the [OIII] line masked but
    the pipeline incorrectly gave an [OIII] flux value.

    Early on we identified some objects with unphysically large emission line flux values in the MPA pipeline that weren't flagged as unreliable. Most of those also had very large uncertainty estimates so they wouldn't necessarily pass a S/N > 3 threshold test. As far as I can tell there were 4 of those in the quench sample and 3 in the control. The way the MPA pipeline signals that an emission line flux shouldn't be used is a flux value identically = 0 with a corresponding error โ‰ค 0.

    Unreliable velocity dispersion values are signalled with entries of 400, 500, or 850 for the dispersion value with a negative error estimate -- either -3 or -50 seem to be the only values used.

    In the SDSS pipeline I've assumed that a stellar velocity dispersion value of 0 with a corresponding error estimate of 0 indicates an unreliable value also, however I don't see any examples from the MPA pipeline in the quench sample. There are dispersion values of 0 with positive error estimates, which I take to indicate that the actual velocity dispersion is smaller than the SDSS spectroscopic resolution.

    While we might use stellar velocity dispersions to calculate dynamical masses in at least early type galaxies no one has actually suggested doing so for this project, so even if one failed to recognize when velocity dispersions should be set to "NA" these objects would be unproblematic.

    Posted

  • JeanTate by JeanTate in response to trouille's comment.

    One source I wondered about -- http://quenchtalk.galaxyzoo.org/#/subjects/AGS00002ha is one that Jean you noted as having an unreliable redshift. Is this based on the warning 'small delta chi2'?

    Yes, and visual inspection of the spectrum.

    Is there an automatic search through the Quench and control samples that you ran to find sources with this error? If so, could you post the result from that search?

    No, but it should be easy enough to do.

    We'll want to be sure our redshift assignments reflect this. Those sources will then fall out of our sample selection.

    Hmm ... we will have a challenge: quite a few QS objects have the 'small delta chi2' warning, for the obvious - and perfectly understandable - reason that the automated spectroscopic pipeline is really stupid! Any galaxy spectrum with strong Balmer absorption lines causes the pipeline to get confused about the redshift; when you see one of these you want to scream at it "can't you SEE that this is classic strong Balmer absorption spectrum!" But of course, the pipeline works on just ~30 templates, none of which is an E+A galaxy (with strong Balmer absorption).

    Another source type I wondered about: http://quench.galaxyzoo.org/#/examine/AGS00002et has 'unrecognized star contaminates spectrum, photometry'. Could you describe what you mean by this?

    Sure.

    The (DR10) image contains what looks like a star, a star which is within the spectroscopic fiber, but the photometric pipeline does not identify an object at this position.

    Is it that you've deconvolved the spectrum and resolve both a galaxy and a foreground star? Or is it that in the photometry you suspect there to be a foreground star?

    Neither. However, checking the 'fiber magnitudes' (I forget what the name of this set of fields is, in the photometric database) and the estimated photometry (from the spectroscopic database) would be a sensible thing to do.

    Unless we see the diffraction spikes of a foreground star, our identification is too subjective to be able to have strong enough grounds to remove this from our sample. Let us know what your thoughts are on this category.

    In several - but not all - cases, the spectrum has obvious features arising from a foreground star. Unfortunately, the SDSS spectroscopic pipeline does not attempt to estimate something equivalent to 'probability of a compound spectrum' (i.e. an overlap, or spectroscopic binary for stars). The technique mlpeck developed would certainly help here: it can show clearly if a spectrum is composite if the flux from the contaminating star is large enough.

    A star with diffspikes which did NOT get its own photometric pipeline ID would be a catastrophic failure! Many (most?) of the objects in my "unrecognized star contaminates spectrum, photometry" class are quite unusual in this regard; the photometric pipeline is normally too aggressive in smashing single objects into several pieces, rather than the other way round.

    Posted

  • mlpeck by mlpeck in response to JeanTate's comment.

    What concerns me most is use of LOG_MASS estimates derived from
    clearly 'bad' spectra ... fitting stellar models to spectra with, for
    example, several hundred nm missing surely gives results of dubious
    value, doesn't it?

    The MPA stellar mass estimates are based on the photometry (apparently with corrections for emission lines, which seems to be the only use they made of spectra). See http://www.mpa-garching.mpg.de/SDSS/DR7/Data/stellarmass.html.

    Missing bits of spectra aren't necessarily problematic for fitting stellar models but it doesn't matter anyway as far as the mass estimates are concerned.

    Posted

  • JeanTate by JeanTate in response to mlpeck's comment.

    Thanks.

    You know, I know I know that ... yet somehow my brain cannot rid itself of this misconception! Sometimes it seems some knowledge has a hard time sticking ... ๐Ÿ˜ฆ

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    I'll compile a list.

    Of the four QS objects I classed as "One or more of the key BPT lines masked", just one has a non-zero value for the flux of the masked line, 587741726574444657 ( AGS000022s), as already noted.

    Of the 13 QC objects so classed, seven have a non-zero value for the flux of the masked lines. These are:

    • 587736543089328388 ( AGS00002r8): OIII 9.46136ยฑ2.4643
    • 587741421636878645 ( AGS000031u): NII 197.83ยฑ5.06923
    • 587739305289449546 ( AGS00003bn): NII 16.353ยฑ8.96766
    • 588017977800327312 ( AGS00003kr): NII 24.8234ยฑ7.24363
    • 587736781995639006 ( AGS00003lz): NII 640.406ยฑ9.716831
    • 587732482733637809 ( AGS00003x6): H-beta 36.1614ยฑ3.56132
    • 587739719754514434 ( AGS0000495): H-alpha 1250.25ยฑ12.93863

    For all masked lines in all the other QS (three) and QC (six) objects, the FLUX is zero, and the FLUX_ERR negative.

    1 this line is not masked in the interactive spectrum (ETA: the sky has zero flux near the wavelength of this line (~two pixels only?))

    2 in the interactive spectrum, the sky has zero flux at the wavelength of this line

    3 the blue wing of the NII line may be masked too

    Posted

  • mlpeck by mlpeck in response to trouille's comment.

    Is there an automatic search through the Quench and control samples
    that you ran to find sources with this error?

    Here is a breakdown of values of the zWarning flag for the full* quench and control samples and for the 1149 and 1196 satisfying the redshift and magnitude cuts 0.02 โ‰ค z โ‰ค 0.1, Mz โ‰ค -20:

    Quench full

    • No warning 2959
    • Small Delta chi-square 3
    • Many outliers 33
    • Negative emission 4
    • small ฮ” ฯ‡2+many outliers 1

    Quench subset

    • No warning 1120
    • Small Delta chi-square 2
    • Many outliers 24
    • Negative emission 2
    • small ฮ” ฯ‡2+many outliers 1

    Control full

    • No warning 2987
    • Small Delta chi-square 9
    • Many outliers 4

    Control subset

    • No warning 1195
    • Small Delta chi-square 1

    A couple comments: the official advice from SDSS is that the "many outliers" flag rarely indicates a real problem.

    "Small delta chi-squared" means that more than one set of templates with different redshifts gave nearly equally good fits, and this might or might not be a problem. The three + 1 with two flags set in the full quench sample all have strong Balmer lines that are correctly identified, so the redshifts are fine. Seven of the 9 in the control sample including the one in the 1196 subset have very low (โ‰ค 3) S/N spectra. For future reference you might want to consider applying the same S/N cut to both control and program samples.

    The "negative emission" flag is only applied to objects classified as QSO, of which there are 4 in the quench sample. All 4 of those have broad Hฮฑ lines in emission and their redshifts are all correct. One of those, AGS000007y, is among my favorite objects in the entire sample. I'd guess that a pretty good science case could be made for applying for telescope time to do integral field spectroscopy on this system.

    * These were downloaded from CasJobs on 7 May 2014. I dropped two objects with z < 0 from my personal data set, so there are 3000 objects in my copy of the full quench sample. I also lost 3 from the 3,003 in the September 2013 version of the control sample for reasons that I can no longer remember, so that also has 3000 total objects.

    Posted

  • JeanTate by JeanTate in response to mlpeck's comment.

    Very cool! ๐Ÿ˜ƒ

    The three + 1 with two flags set in the full quench sample all have strong Balmer lines that are correctly identified, so the redshifts are fine.

    Just as I remembered .... except for the fact I remembered there being more of them. For an unusual class of (spectroscopic) objects, outliers may simply be a failure of the pipeline (to include at least one template closer to the unusual class). Do you have the IDs of these objects handy? And of the other "negative emission" ones?

    ... the one in the 1196 subset have very low (โ‰ค 3) S/N spectra.

    That's AGS00002ha, right? Where does the S/N value, for the whole spectrum, come from?

    A couple comments: the official advice from SDSS is that the "many outliers" flag rarely indicates a real problem.

    What seems odd - to me, at least - is that Quench (both full and subset) has far more of them than Control (ditto). Is it, for example, something about an E+A spectrum that greatly increases the chances of the spectroscopic pipeline getting indigestion?

    One of those, AGS000007y, is among my favorite objects in the entire sample. I'd guess that a pretty good science case could be made for applying for telescope time to do integral field spectroscopy on this system.

    Among other things, the lines seem to have interesting offsets from the pipeline's values of where they should be, given a single redshift ... ๐Ÿ˜‰

    I also lost 3 from the 3,003 in the September 2013 version of the control sample for reasons that I can no longer remember, so that also has 3000 total objects.

    IIRC, all three are extreme outliers; in fact, one is the strange chimera, the partial duplicate of another object (there were always only 3002 unique QC objects).

    Posted

  • mlpeck by mlpeck in response to JeanTate's comment.

    Here are the uid's of the 4 quench sample objects with the small ฮ” ฯ‡2 flag set:

    AGS000006u
    AGS00001ng
    AGS00001p3
    AGS00001ta
    

    The 4 with "negative emission"

    AGS000007y
    AGS00000v8
    AGS00000w5
    AGS000013n
    

    And finally the 33 with "many outliers":

    AGS000000h 
     AGS000001k 
     AGS000004b 
     AGS000005d 
     AGS00000ad 
     AGS00000cl 
     AGS00000da 
     AGS00000ea 
     AGS00000so 
     AGS00000un 
     AGS00000v9 
     AGS000019r 
     AGS00001am 
     AGS00001ao 
     AGS00001du 
     AGS00001ht 
     AGS00001jw 
     AGS00001ne 
     AGS00001ol 
     AGS00001oy 
     AGS00001p6 
     AGS00001sf 
     AGS00001sj 
     AGS00001uw 
     AGS00001w3 
     AGS00001xa 
     AGS00001yk 
     AGS0000206 
     AGS000022l 
     AGS000023v 
     AGS000025h 
     AGS000025q 
     AGS00002ba 
    

    I think you nailed the cause: they probably lack suitable templates for "K+A"-like spectra. You might remember some months ago I discovered that a number of galaxies in Goto's catalog of 564 "E+A" galaxies had effectively gone missing in DR8+ because of redshift measurement failures. Many of those were caused by misidentifying Balmer absorption lines as QSO emission lines. Fitting templates upside down is something I'd expect from much less reputable disciplines, and I hope this particular source of error is corrected before the final data release.

    Where does the S/N value, for the whole spectrum, come from?
    

    The median S/N over all good pixels is tabulated in the SpecObj and SpecObjAll tables and it's also in the SPECOBJ hdu in the spectrum files. They also tabulate S/N over the wavelength ranges of the photometric filters.

    Posted

  • KWillett by KWillett scientist

    Hi @JeanTate, @mlpeck, Laura -

    Sorry for being away for a bit. It's been my first semester teaching at the university, and it has taken up an obscene amount of time (although also rewarding). I'm looking forward to returning to GZ research full-time after the end of next week.

    I think @mlpeck has phrased my current role pretty well; I've been mostly working on updating the click counts, and making that as transparent as possible (more so than I did initially). I absolutely agree that reproducing my steps are an essential part of GZQ and of good science. This is the reason that I posted the code on Github (https://github.com/willettk/quench) - I believe everything contained there allows reproduction of the latest datasets posted.

    The metadata question is not something I've worked much on. After finalizing the issues we had with both repeat subjects and repeat classifications, I matched the objIDs against the metadata files that Laura provided me with several months ago. If this is deprecated (particularly for non-existent stellar mass values), they should definitely be updated.

    Jean - unless I misunderstand, I think we're OK on the latest click set I provided (I know I found a bug or two in earlier versions, so it's not necessarily "wrong" if they differ from previous datasets I produced). Aside from galaxy AGS000010e, do you have any other requests or questions on the clicks themselves?

    Posted

  • KWillett by KWillett scientist

    Also, a books recommendation (and I'm delighted that we're using more advanced stats/tools): I think the Ivezic et al. book is particularly good, and it comes with a built-in website with many of the tools that you can try out. Python is also my primary tool for data analysis these days. http://www.astroml.org/index.html

    If you're looking for a particular book (and I know academic books are expensive), many library systems should have it. I don't know the specifics of where all of you live, but in Minnesota (for example) anyone can request books from the University library, such as these, via ILL for free. Might be worth a try.

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    Thanks Kyle.

    I absolutely agree that reproducing my steps are an essential part of GZQ and of good science. This is the reason that I posted the code on Github (https://github.com/willettk/quench) - I believe everything contained there allows reproduction of the latest datasets posted.

    I'm not sure I thanked you for that, when you posted it. It really made my day to read it! Unfortunately, I have not had a chance to try to reproduce what you did (and I'm not sure I could, without first learning a good deal more about the code, and how to execute/implement it). So this next question may be really foolish: does that mean there is, in some sense, access to the 'raw clicks data'?

    Jean, ... I think we're OK on the latest click set I provided

    Yes, we are.

    Aside from galaxy AGS000010e, do you have any other requests or questions on the clicks themselves?

    No, I have no more requests or questions on the clicks themselves (at this time anyway). With the caveat that I checked only a subset of all the data (1084 QS and 1131 QC objects). If what I found is representative, then it may be reasonable to expect that there are a few (one to four?) objects similar to AGS000010e among the remaining ~4k objects.

    Oh, and glad to hear that your first semester went well, and even gladder to hear that it was so rewarding. ๐Ÿ˜ƒ

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    Thanks.

    I'll almost certainly get both the "R" and the "Python" books, and will start learning in earnest in a couple of weeks' time. I do not intend to use anything beyond what's in widely available tools like spreadsheets (I use Open Office's) for the Quench project, because I want to prove-by-doing that ordinary zooites can do the data analysis for what ends up as a published paper, using nothing more than what they learned in high school (assuming they took mathematics classes at an appropriate level in their final years there) and tools which are widely used in the general community (mostly spreadsheets).

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    No, I have no more requests or questions on the clicks themselves (at this time anyway). With the caveat that I checked only a subset of all the data (1084 QS and 1131 QC objects). If what I found is representative, then it may be reasonable to expect that there are a few (one to four?) objects similar to AGS000010e among the remaining ~4k objects.

    I've just finished checking the 65 "ppo" QS objects, and the 65 QC ones* (if we decide to include even one of the ppos in any analysis, it would be A GOOD IDEA to understand what differences there are, if any, between the 'nodups' and '11 April' catalogs).

    Here's what I found:

    • none of the 130 objects have LOG_MASS = -1 (good)
    • one has SFR = -99, QS object 587726878344741087 ( AGS00000j9)
    • all 65 QC objects' classification values (in the 64 'classification' fields) match (are the same in the two catalogs)
    • two QS objects have at least one value which does not match (details below)
    • the 'Soa' (Star or artifact) classifications of these two does not differ, assuming a realistic fraction threshold
    • the 'merging' and 'asymmetric' classifications of these two do not differ, assuming realistic fraction thresholds

    The two QS objects with differences in classification data are:

    • 587736942525415485 ( AGS00001c9), total votes 21 in the 'nodups' catalog, 20 in the '11 April' one; as the vote removed was for Soa, only Q1 and Q11 values change
    • 588007004179660890 ( AGS00000fq), also 21 -> 20 total votes; the vote difference is in Fod (features or disk), 14 -> 13, so values in many classification fields changed

    *reminder: selection criteria are 0.02 < z < 0.10 AND Z brighter than -20.0

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    For example, early on in the project we removed an object (I forget which; I'll dig it up later) whose spectrum was clearly flawed (the red and blue arms are mismatched; the continuum has a 'jump'), but whose redshift etc values should be robust.

    It's AGS00001w5, and is discussed in the Bad spectrum: clearly not a z=0.327 galaxy! thread (but the thread title concerns a quite different object). It meets the "0.02<z<0.10 AND Z brighter than -20.0" criteria, and is one of the 65 QS ppos.

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    However, checking the 'fiber magnitudes' (I forget what the name of this set of fields is, in the photometric database) and the estimated photometry (from the spectroscopic database) would be a sensible thing to do.

    I've made a start on this.

    Some time ago, mlpeck used the (DR7) plate, mjd, fiberid field values in a QS catalog to select objects for some analysis or other (I don't remember the details, and they are unimportant - here - anyway). Curiously, quite a few of the DR7 spectra are not science primary in DR8 (DR9, ...). If I have done my matching correctly, there are 92 such QS objects among the 1149 "021020" galaxies, which is 8%. Of these 92, six are ppos, which is 9% of all ppos; the same proportion, statistically.

    I identified four QS ppos as being primarily ppo because of "Unrecognized star' contaminates spectrum, photometry"; two of these are among the six ppos whose DR8 (etc) spectrum is no longer science primary: AGS00000hm (primary science spectrum is M3 STAR) and AGS00001dn (spectrum is, to me, obviously a galaxy/star composite).

    Posted

  • Peter_Dzwig by Peter_Dzwig

    I wish I'd discovered this thread earlier. What is the Current feeling about the R book as opposed to the Ivezic book. I have been wondering which to acquire. Neither is cheap with the R book cheaper here in the UK, but then I program in Python...On the other hand I think that at base I am more interested in the statical techniques covered. I'd like to hear what people currently think.

    Posted

  • mlpeck by mlpeck

    Peter:

    Are you referring to the book by Feigelson & Babu? I've thought about buying it but haven't yet. Judging from the table of contents and short excerpt available on Amazon it appears the two books' range of subject matter (broad) and depth of coverage (not very) are nearly identical. Even the datasets they use appear to be similar.

    If I had the budget for just one I'd choose based on my preferred data analysis system. Right now I'm trying to learn Python -- I'm already fairly proficient with R. Ivezic's introduction to Python is inadequate to get a good start, and I'm sure Feigelson & Babu is similarly deficient wrt R.

    One possible consideration is that more astronomers seem to be adopting Python than R, so there are more astronomically oriented packages out there in Python than R. For example I've been able to process images stored in FITS files from several sources using pyfits and scikit.image, something that would be difficult or maybe impossible in R.

    Posted