Potentially Problematic Sources in 'Subset 2 -- 1149 Source Sample'

by trouille scientist, moderator, admin

Let's use this new Discussion Thread as the place to put all information about potentially problematic sources in the 'Subset 2 - 1149 Source Sample'.

Posted February 26, 2014 11:17 PM
by trouille scientist, moderator, admin

For example, on p10 of http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS0000223 mlpeck points out a problematic source:

AGS00003ky. When you zoom out, you see that it's a spot offcenter from a larger galaxy. It's SDSS ID is 587735695913320628.

This source is in my control_subset2.tab file as well (posted on p1 of http://quenchtalk.galaxyzoo.org/#/boards/BGS0000008/discussions/DGS000022j). Once I know of all the problematic sources in that control sample, I'll find replacements for them.

Posted February 26, 2014 11:28 PM
by JeanTate in response to trouille's comment.

Happy to do that.

However, it seems that Kyle did not upload-and-replace the updated QC file 😦 So I won't be starting on finding potentially problematic sources just yet ...

Posted February 27, 2014 5:08 PM
by mlpeck

If AGS00003ky is removed from the list of control candidates and the matching algorithm is run on the remaining 1195 controls that meet the redshift and magnitude cuts AGS0002g6 will be added. 14 other matches will be shuffled around but there are no other additions or subtractions from the matched controls.

I'm going to defer further investigations of this sort until I'm sure we have a final sample selected.

Posted March 1, 2014 6:47 PM
by JeanTate

I have searched Talk for QS objects ("the 1149") which have been mentioned as possibly being problematic; I found 99 altogether (listed below, in uid - AGS ID - order).

It's very likely that only a handful of these are, in fact, problematic with respect to the particular analyses we intend to do for the paper; I'll work on identifying that (likely just a) handful later.

AGS0000005

AGS0000008

AGS000000w

AGS000003c - bad spectrum; H-alpha and [NII] lines not recorded

AGS000003j

AGS000005g

AGS000006v

AGS000007c

AGS000007f

AGS000007r

AGS000008i

AGS000008x

AGS000009f

AGS00000ag

AGS00000c3 - merger; spectrum not that of main galaxy's nucleus/bulge?

AGS00000c5

AGS00000cz - M3 star overlaps outskirts of galaxy; galaxy spectrum contaminated by star?

AGS00000d6

AGS00000dk

AGS00000e9

AGS00000fs

AGS00000gf

AGS00000gg

AGS00000gs

AGS00000hn

AGS00000iu

AGS00000j9

AGS00000jq

AGS00000js

AGS00000jt

AGS00000or

AGS00000rc

AGS00000tb

AGS00000uq

AGS00000vo

AGS00000x6

AGS00000y8

AGS00000yn

AGS00000zb

AGS00000zi

AGS00000zu

AGS0000103

AGS0000141

AGS000014f - bad spectrum; H-alpha and [NII] lines not recorded

AGS000014z - near bright star; photometry unreliable

AGS0000153 - star (?) overlaps galaxy; spectrum may be contaminated by 'star'

AGS000015m

AGS0000161

AGS0000198

AGS00001c8

AGS00001c9 - skewered by diff spike; photometry likely unreliable

AGS00001dn - star overlaps nucleus/bulge; spectrum is composite

AGS00001dz

AGS00001f7

AGS00001fx

AGS00001g0

AGS00001ho

AGS00001hr

AGS00001it

AGS00001j3

AGS00001kg

AGS00001l4

AGS00001la

AGS00001mb - near very bright star; is photometry reliable?

AGS00001mh - bright star overlaps galaxy (it's not a merger); photometry likely unreliable

AGS00001nu

AGS00001q9

AGS00001qc - spectrum is of a non-nuclear region, possibly contaminated by overlapping star; no DR9 spectrum

AGS00001rm - beautiful merger; spectrum is of a non-nuclear region, possibly contaminated by overlapping star

AGS00001ry

AGS00001sf

AGS00001sv - so dust-choked its colors are extreme; an outlier of the most interesting kind!

AGS00001uc

AGS00001vw

AGS00001w3

AGS00001w5 - bad spectrum; continuum misaligned (red/blue arms of spectrograph)

AGS00001x1

AGS00001x5

AGS00001xa - bad line fluxes; all Balmer lines are absorption

AGS00001xe

AGS00001xi

AGS00001yl

AGS000020f

AGS000020v

AGS0000226 - bad spectrum; H-alpha and [NII] lines not recorded

AGS000022s - bad spectrum; 'missing' part include H-alpha, [NII] (so their reported fluxes and errors are wrong)

AGS0000242 - spectrum is of a non-nuclear region; color outlier; corrected u-band anomaly

AGS000025u

AGS000026b

AGS000026p

AGS000026x

AGS000027y

AGS000028a

AGS000028h

AGS0000294

AGS000029m

AGS000029w

AGS00002ab

AGS00002ar - bad line flux; H-beta is absorption, not emission

Posted March 7, 2014 10:07 PM
by JeanTate in response to JeanTate's comment.

To keep the length of this thread under control, and so make it easier to actually find stuff, I'll be editing the post above with details of why particular QS objects may be considered problematic, in terms of the analyses we will be done ('problematic' is, in other words, context-specific).

Posted March 8, 2014 12:44 PM
by JeanTate in response to JeanTate's comment.

I have searched Talk for QC objects, among "the 1196" (i.e. those whose redshifts are between 0.02 and 0.10 AND whose estimated z-band absolute magnitudes are brighter than -20.0) which have been mentioned as possibly being problematic; I found 12 altogether (listed below, in uid - AGS ID - order).

While it's likely that only a handful of these are, in fact, problematic with respect to the particular analyses we intend to do for the paper, it's also quite likely there are other - potentially problematic - objects which have not yet been identified as such.

So considerably more work remains to be done here.

As with the QS list, to keep the length of this thread under control, and so make it easier to actually find stuff, I'll be editing this post with details of why particular QC objects may be considered problematic, in terms of the analyses we will be done ('problematic' is, in other words, context-specific).

AGS00002j8 - bad spectrum; blue arm only

AGS00002o8 - overlap; seen through/in front of body of an elliptical

AGS00002vf - spectacular merger; spectrum not that of main galaxy's nucleus/bulge?

AGS000031r - near very bright star, skewered by diff spike; photometry likely unreliable

AGS00003ky - overlap; seen through an inter-arm region of M101

AGS00003nh - skewered by diff spike; photometry likely unreliable

AGS00003xb - near very bright star, skewered by diff spike; photometry likely unreliable

AGS000040k - bad spectrum; H-alpha and [NII] lines not recorded

AGS000044x - bad spectrum; H-alpha and [NII] lines not recorded

AGS0000474 - not in DR9; photometry unreliable (and it's a polar ring galaxy too?)

AGS00004h4 - star overlaps nucleus/bulge; spectrum is composite

AGS00004o4 - skewered by diff spike; photometry likely unreliable

Posted March 10, 2014 3:58 PM
by JeanTate in response to JeanTate's comment.

I've started to go through "the 1196" (QC objects), looking for potentially problematic objects (ppo). As I find them, I'll update/edit this post periodically.

Here are some of the ppos I've found so far:

AGS00002c5 - star (?) overlaps galaxy; spectrum may be contaminated, photometry unreliable; poor quality image

AGS00002c6 - star (?) overlaps nucleus/bulge; spectrum is composite

AGS00002et - star, which is not a separate photo object, overlaps galaxy; contaminated spectrum, unreliable photometry

AGS00002gs - not in DR9; very strange

AGS00002ha - unreliable redshift; noisy spectrum, "small delta chi^2"

AGS00002nu - not in DR9; near bright star

AGS00002o4 - skewered by diff spike; photometry unreliable

AGS00002qi - red star overlaps center; spectrum contaminated, photometry unreliable

AGS00002r8 - noisy spectrum; [OIII] affected by bad pixels

AGS00002ra - star overlaps center; spectrum contaminated, photometry unreliable

AGS00002sv - bad spectrum, H-beta not recorded; bright star overlaps galaxy, spectrum contaminated, photometry unreliable

AGS00002w5 - skewered by diff spike; photometry likely unreliable

AGS00002wb - not a photometric object in DR9

AGS00002xj - skewered by diff spike, poor quality image; photometry likely unreliable

AGS0000309 - near bright star; photometry unreliable; poor quality image

AGS000031u - [NII] masked; too big; star overlaps galaxy

AGS000038m - near two bright stars, may be skewered by diff spikes; contaminated spectrum, unreliable photometry

AGS00003bn - bad spectrum; H-alpha and [NII] not recorded and/or affected

AGS00003gf - bad spectrum; H-alpha and [NII] not recorded

AGS00003ji - bad spectrum; H-alpha and [NII] not recorded

AGS00003kf - double overlap; photometry unreliable

AGS00003ko - skewered by diff spike; photometry likely unreliable

AGS00003n7 - poor quality image; bad spectrum, chunk missing (but BPT lines OK)

AGS00003x6 - H-beta masked

AGS00003wz - poor quality image; bad spectrum, big chunk missing (but BPT lines OK)

AGS00004gb - really poor quality image

AGS00004mf - near bright star; poor quality image; star overlaps nucleus: contaminated spectrum, unreliable photometry

Posted March 11, 2014 2:36 PM
by JeanTate

I have started to put the various ppos into different Collections. Here are two I have already populated (not necessarily complete!): Bad BPT (QC), "the 1196", and Bad BPT (QS) "the 1149".

zutopian has a very useful collection, poor colour quality images, although it doesn't distinguish QS from QC, nor whether they make the 0.02 < z < 0.10 AND z-band abs_mag brighter than -20.0 cut or not.

I intend to create more Collections, containing 'pure' subsets of the ppos I list here.

Posted March 15, 2014 10:38 PM
by JeanTate

I have noted rather a lot of 'poor quality image' QC objects. Many, perhaps most, seem to have lots of flags on the Explore page; conversely, only a few which I did not note as 'poor quality image' have lots of flags*.

Here's an example, AGS000039u (DR9 image next to it):

Why did I flag it as a poor quality image, before clicking on the "View on SkyServer" link? Because the galaxy is fringed green on top and purple below (many 'poor quality image' galaxies have such features). Sometimes the strange colors disappear in the DR9 images, or become very muted; sometimes - like this example - they don't.

This particular galaxy has the following (DR9) flags: DEBLEND_DEGENERATE DEBLENDED_AT_EDGE BAD_MOVING_FIT MOVED BINNED1 INTERP COSMIC_RAY NODEBLEND CHILD BLENDED. Does any one or two of these say "poor quality image!"? Or is it more like 'the more flags, the lower the image quality'? I don't know; anyone?

I haven't yet gone through zutopian's poor colour quality images, but here's one much like the above, AGS00001fx:

This has fewer flags (DEBLENDED_AT_EDGE BAD_MOVING_FIT MOVED BINNED1 INTERP CHILD)

One interesting difference: the photocenter, and the center of the spectroscopic fiber, is clearly displaced/offset from the nucleus for AGS000039u, but not for AGS00001fx:

One more, with two rather different flags, AGS000032r:

The flags are PSF_FLUX_INTERP INTERP_CENTER STATIONARY BINNED1 INTERP MANYPETRO.

Absent anything else - such as the fiber likely not including the nucleus, as in AGS000039u - should we even seriously think about 'poor quality images' as ppos? Do any of the flags, or any combo of flags, suggest ppo?

*and my brain says rather a lot of those are Eos (edge-on spirals) ... is there a correlation? I might dig into this later, after we're done with the formal Quench project.

Posted March 18, 2014 4:19 PM
by JeanTate in response to JeanTate's comment.

I intend to create more Collections, containing 'pure' subsets of the ppos I list here.

One new one: Not in DR9 (QC), "the 1996". All four in it so far are identified as "Objects with spectra" in DR9 Navigate. Two (of the four) are close to bright stars, which may be why they're dropped as photometric objects in DR9. While none of the QS objects in my various 'outliers' lists has an equivalent 'not in DR9' note, I'm pretty sure there's at least one such QS object. So I'll almost certainly be starting another 'not in DR9' Collection ...

The 'pure' subsets idea didn't survive its contact with reality; rather a lot of the most pp of ppos are problematic for more than one reason! 😮

Posted March 18, 2014 7:08 PM
by mlpeck

A fairly common problem in Navigate is a failure to find spectroscopic objects even when they are marked in the maps. None of the spectra are lost though!

I have a total of 10 spectroscopic objects in subset 2, v.2 of the control sample that have no associated bestObjid and therefore no DR9 photometry and related quantities. All of those have spectroscopy related measurements from the MPA pipeline as well as stellar mass and star formation rate estimates.

There are 6 objects in subset 2 of the quench sample with the same issue.

This isn't really problematic as far as I can see. I've gone off the reservation by digging around in DR10 databases, but the investigation is using DR7 photometry and data products from MPA that don't appear to have changed much between DR7 and DR8.

Posted March 19, 2014 2:32 PM
by JeanTate in response to mlpeck's comment.

A fairly common problem in Navigate is a failure to find spectroscopic objects even when they are marked in the maps. None of the spectra are lost though!

SDSS has some strange ~~nooks and crannies~~ idiosyncrasies; for example, I found an object Navigate insisted (and still does, for all I know) has a spectrum, but there is none (it's the feature of my My Dog Ate My Homework OOTD).

This isn't really problematic as far as I can see.

As I've been checking all 1196 of the QC objects (see next post), I've wondered what sort of approach we should take to ppos.

Two high-level principles should be OK, but it's very important to state them clearly (I do not claim the following is sufficiently clear!):
1. whatever cuts we apply, to remove pos (problematic objects), we should apply the cuts consistently, including to QS and QC equally. If we decide to throw out a QS object which has the part of its spectrum containing H-alpha and [NII] masked, then we throw out all such QS objects, and all such QC ones too (to make an example)
2. a kinda corollary: the cuts should be unambiguous, and preferably quantitative.
Posted March 20, 2014 11:52 PM
by JeanTate in response to mlpeck's comment.

I've now finished going through all 1196 of the QC objects which have redshifts greater than 0.02 but less than 0.10 AND which have estimated z-band absolute magnitudes brighter than -20.0. Now I need to go through my notes and summarize what I found.

First, though, by "going through" I mean eyeballing the Quench image, the DR9 one which comes up when you click the View on SkyServer link, and the PNG spectrum in the Explore page. I also checked for any threads which mention each object, and read all comments made by zooites. I also checked the general field (using Navigate), and/or the detailed spectrum (using the Interactive spectrum tool) for ~20% of the objects. I have not yet checked for color, abs_mag, or size outliers.

I have notes on ~600 of these ( 😮 ); however, I gave only ~60 a red flag (i.e. my initial impression is 'almost certainly a ppo).

Given that these 1196 objects are a fairly random selection of SDSS galaxies with spectra, AND redshifts between 0.02 and 0.10 AND estimated abs_z mags brighter than -20.0, I found it, um, interesting to see what turned up. For example:
- AGS00004ld, noted by super-zooite c_cld as containing a previously unknown supernova candidate
- AGS0000432, a galaxy whose nucleus seems to have diffraction spikes (written up here)
- AGS00002m3, easily mistaken for a strong gravitational lens, a false positive (written up here)
- AGS000034z, a strange, apparently face-on spiral with an asteroid crossing its disk, one that no zooite had noticed before (see here and here).
That seems to be rather a lot ... 😃

Posted March 21, 2014 12:22 AM
by JeanTate in response to JeanTate's comment.

That describes a task, or set of tasks, which would surely take quite some time to complete.

Here's an alternative: I will compile two lists - one for "the 1149", one for "the 1196" - of objects which I think are the most p (problematic) of the ppos, and post them. Together with an explanation of why I think they are problematic. Then we discuss and agree. Then we can get on with the analyses into merger fractions and asymmetry fractions (and any other questions/preliminary results we agree should go into our paper).

Posted March 21, 2014 11:51 AM
by mlpeck

I made a post here earlier today, realized shortly after that I had made an elementary mistake that made every word of my post wrong, and erased the post. Probably nobody noticed, but here's the corrected post.

I've been doing galaxy SED fitting using the miuscat extension of the MILES library of stellar population models. Just for fun I decided to add a small library of star spectra to check for star/galaxy overlaps. That was done by downloading ~5000 stellar spectra from SDSS and stacking them into a dozen color bins. Here are results for 3 spectra that JeanTate has flagged as having a star most likely within the fiber.

AGS00000hm is interesting because it has two spectra, one of which was identified as an M3 star by the SDSS spectro pipeline, the other as a z=0.058 galaxy. Neither spectrum has any warning flags set. I have to admit I didn't believe the M3 classification because even though the spectrum has some odd wiggles an M star doesn't really work in detail. There's also the obvious detail that there's really a z=0.058 galaxy there.

I'd say there's a rather robust detection of a star here that contributes a maximum of about 40% of the monochromatic flux in the spectrum. The other 2 spectra have about 80% maximum contributions from a star and the detection in AGS00001dn at least is certainly robust. AGS00001ka on the other hand has rather marginal S/N.

Here is another look at AGS00000hm -- this plots the observed flux along with model fits using SSPs only and with added stars. Notice the odd little wiggles from ~6000-8000Å that the SSP only models can't quite fit are captured perfectly with the addition of a cool star.

Posted March 24, 2014 8:20 PM
by JeanTate in response to JeanTate's comment.

I had good intentions. 😦

With regard to photometrically ppos, I think I can fairly easily compile such a list (or pair of lists).

However, for spectroscopically ppos, I ran into a big problem, as I describe in How to decide which objects' MPA-JHU derived parameters are unreliable because ...

Posted March 26, 2014 2:25 AM
by JeanTate in response to mlpeck's comment.

Well, as I missed the now-deleted post, I have no idea how ... it was!

This is cool stuff, mlpeck! 😄 Especially as I've spent some time reading the key MPA-JHU papers ...

But perhaps because it's late here; there are some things I don't quite follow:
- what is "SSP"?
- how did you decide what sort of 'star spectrum' to apply (to subtract)?
- how did you decide how bright the star to subtract should be?
- did you ensure that the star and galaxy have similar foreground (galactic) extinctions (i.e. you 'corrected' for dust uniformly)*?
Not all blobby things near target nuclei are foreground stars - though I've now got several others for you to play with, if you'd like - some will be large, distant background galaxies, and some may be small, compact foreground ones ... Oh, and then there are bright stars some distance away - up to several arcmins perhaps - whose light was scattered into the spectroscopic fiber, especially if the night was well below photometric quality, and doubly so if the fiber got 'diffspiked' (simple stellar spectra won't work for these; the diffspike is itself a low-grade spectrum!).

*this is not necessarily sensible; the foreground star could be 'this side' of much of the galactic dust, especially if it's a cool dwarf

Posted March 26, 2014 2:50 AM
by mlpeck in response to JeanTate's comment.

I'm being pithy to a fault. Sorry.
- SSP is "simple stellar population" or "synthetic stellar population." To build one you create (in the computer!) a population of stars with some assumed initial mass function and follow the properties of the population as it evolves. There are several publicly available libraries of spectra. I'm using an extension of the MILES stellar library.
- I don't "decide" anything. I use a (weighted) non-negative least squares routine to pick an optimal combination of SSP models and individual stars.
- I correct the galaxy fluxes for foreground extinction using the Schlegel, Finkbeiner & Davis 1998 extinction maps as amended by Schlafly & Finkbeiner 2011. This is the same procedure NED uses and pretty standard practice. I did not correct the star spectra for extinction, since they are presumably somewhere inside the dust column rather than beyond it. Extinction corrections should be small anyway and uncorrected extinction can be compensated for (in the nnls fit) by selecting a redder star. Since I don't really care about the star beyond trying to quantify how much it's contributing to the spectrum I am untroubled by this choice.
Posted March 26, 2014 2:38 PM
by JeanTate in response to mlpeck's comment.

Thanks! 😃

More questions, some more to be sure I've actually understood (rather than just guess):
- what, exactly, is (are) the red set ("SSP w/stars - SSP no stars")? In particular, does either (or both!) "stars" mean "star" (for an individual plot)?
- does each "Star contribution" (green) correspond to just one (synthetic, but mimics real) star? Or is it some combo of up to 5,000 quasi-real stars?
- if just one, what sort of star is it (in each case)?
- centuries of work by astronomers have shown that an extremely high percentage of galactic stars are binaries (and some triples, and more); also, doubles (line-of-sight/overlapping, but not gravitationally bound) are not uncommon. Does your method allow for the possibility that the foreground 'star' may be a binary or double?
- if you had a library of supernova templates, I guess you'd be able to repeat this analysis, for example to check that c_cld did, in fact, discover that the spectrum of AGS00004ld contains something very much like a supernova*
Looking at AGS00001dn, some/most of the obvious absorption lines in the SDSS spectrum that show up in the green (approx wavelength, in Å, per the interactive spectrum):
- 6564: H-alpha
- 4862: H-beta
- 5896: NaD
- 3934 and 3969: H&K (CaII)
These also show up in the red. As there's no doubt of their reality, the failure to match the absorption lines better is likely due to insufficiently precise modelling of metalicity/temperature (and gravity?), with - perhaps - the possibility that the star is binary/double. Which in turns suggests that a two-step match might produce even better results: match the obvious absorption (or, in other stars, emission) lines, then match the continuum/rest of the spectrum.

AGS00000hm is neat because the star's notable spectral features are molecular bands, which also suggests a two-step matching process (albeit one that would surely be more difficult to do).

Amazing what spectra can tell you, eh? 😛

*you can't see it in the image, because the epochs are too far apart

Posted March 26, 2014 4:13 PM
by JeanTate in response to JeanTate's comment.

Nonetheless, here's an initial list of 65 potentially problematic objects (ppos) among "the 1196"*, by category. "Category" is my subjective judgement of the principal reason why the object is problematic; for many - indeed, most - there's at least one other reason why I consider it problematic. And in at least one case (AGS00002sv), it's problematic because there are so many such reasons. Many, but nowhere near all, of these I have posted earlier in this thread.

In a later post I'll provide an ID-RA-Dec list, which can be copy/pasted into the SDSS Image List Tool to get DR10 images (and more; this follows mlpeck's suggestion (see page 10 of this thread)).

~~To keep this list in a single post, I'll be posting, then editing it; when done I'll delete this line. UPDATE: I've just one category to add (9 objects)~~ Changed my mind; when done I'll 'strikethrough' that text (yes, all 65 QC ppos now posted).

V_DISP_ERR negative; and/or V_DISP unrealistic

(an excellent proxy for "MPA-JHU model fit is unacceptable")

AGS00003sv

AGS00003vz parts of spectrum blue-ward of ~740nm masked (several dozen nm in all); yellow 19.81 r-band star 0.052' from nucleus; poor quality image

AGS000048e contaminated by nearby bright star; poor quality image

AGS000048m

AGS00004ld supernova candidate

One or more of the key BPT lines masked

(H-alpha, H-beta, [NII] 6583, [OIII] 5007. BPT class cannot be determined. In general, ~dozens to ~a hundred nm of the spectrum masked)

AGS00002j8 red half of spectrum missing, from ~620 nm

AGS00002r8

AGS00002sv

AGS000031u another part of spectrum masked; 17.11 r-band star 0.179' from nucleus; too big

AGS00003bn galaxy overlap

AGS00003gf 15.54 r-band star 0.066' from nucleus

AGS00003ji galaxy overlap (same redshift)

AGS00003kr

AGS00003lz

AGS00003x6

AGS000040k

AGS000044x

AGS0000495

Unreliable redshift

AGS00002ha

Large part of spectrum masked, or missing

(this is the most subjective category, as I have no idea what the answers to my questions in this thread will be)

AGS00002cy region missing includes the 4000Å break

AGS00002ki everything redward of ~800 nm is missing; 'star' overlap (likely a galaxy)

AGS000032k

AGS00003i4 poor quality image

AGS00003wz poor quality image

AGS00004aj poor quality image

Not in DR9

(a good proxy for "unreliable photometry"; spectroscopy may be OK though)

AGS00002gs this is very strange!

AGS00002nu contaminated by nearby bright star

AGS00002wb

AGS00003ky M101 is in the foreground, so spectrum and photometry contaminated by stars/gas/dust in M101

Diffspiked

(A diffraction spike from a nearby very bright star crosses part of the image of the galaxy, making the photometry unreliable. The spectroscopy may also be unreliable, even if the fiber was not diffspiked (which we can't determine anyway) ... many spectra were obtained on non-photometric nights)

AGS00002w5 poor quality image

AGS00002xj poor quality image

AGS000038m contaminated by two nearby (very) bright stars

AGS00003bm poor quality image

AGS00003ko poor quality image

AGS00003kq overlap? poor quality image

AGS00003m2 contaminated by two nearby (very) bright stars; overlap?

AGS00003nh

AGS00003xn

AGS0000419

AGS00004hq

AGS00004o4

'Smashed' galaxies

(highly unreliable photometry because the SDSS pipeline 'smashed' a galaxy into several different photometric objects (a simple example here). this 'smashing' is particularly common for Eos (edge-on spirals), but usually has only a minor effect; for the ppos here it's major)

AGS00002c5 Eos smashed into three POs; (16.41, 0.108') overlapping 'star'; poor quality image

AGS00002o4 smashed into two POs; near two very bright stars; double diffspiked

AGS000031r smashed into two POs; near very bright star; diffspiked

AGS00003ax Eos smashed into two POs; diffspiked; poor quality image

AGS00003p8 Eos smashed into several POs

AGS00003x8 smashed into two POs; diffspiked

AGS00003xb smashed into ~10 POs; no DR9 spectrum; near very bright star; diffspiked

AGS0000472 merger smashed into two POs; (15.55, 0.111') overlapping 'star'

AGS00004lc smashed into two POs; diffspiked?; small (~an nm or two) region of spectrum masked

'Unrecognized star' contaminates spectrum, photometry

(an object which looks like a star, not part of the galaxy, is close enough to the center of the spectroscopic fiber that its light was recorded in the spectrum; however, the photometric pipeline does not recognize this 'overlap star' as a separate photometric object)

AGS00002c6 I estimate the distance is ~0.02', well within the fiber

AGS00002et merger?

AGS000043e I estimate the distance is ~0.04'; also contaminated by nearby bright star; poor quality image

'Star' contaminates spectrum, photometry

(unlike the previous category, the contaminating object is recognized by the SDSS pipeline as a separate photometric object; numbers are the r-band magnitude of this 'star' and its distance from the location of the galaxy's photocenter, which is - presumably - the same as the center of the spectroscopic fiber)

AGS00002ra (19.99, 0.026')

AGS000039s (15.98, 0.041')

AGS000039v (21.71, 0.050'); diffspiked; near very bright star; small region of spectrum near 410nm masked; poor quality image

AGS000043a (14.94, 0.085'); noisy spectrum

AGS000044n (21.81, 0.061')

AGS00004af (18.16, 0.096'); poor quality image

AGS00004h4 (16.18, 0.036')

AGS00004mf (19.08, 0.033'); near bright star; poor quality image

'Stars', both recognized and not, contaminate spectrum, photometry

(a bit of both the above categories)

AGS00003jg (15.87, 0.218) star; 'unrecognized green star' ~0.10' from galaxy's photocenter

Galaxy overlap

(light from the other galaxy - which is not a merger - contaminates the photometry, spectrum, or both ... even if the two have very similar redshifts)

AGS00002o8 ~dozen or so nm of spectrum masked; other galaxy has very similar redshift

AGS00004la

AGS00004lz poor quality image; other galaxy has redshift of 0.243

*QC objects with 0.02 < z < 0.10 AND estimated z-band absolute magnitude brighter than -20.0

Posted March 26, 2014 5:37 PM
by mlpeck in response to JeanTate's comment.

what, exactly, is (are) the red set ("SSP w/stars - SSP no stars")? In
particular, does either (or both!) "stars" mean "star" (for an
individual plot)?

The red lines show the difference between fits with the library of stars added and fits without. What this mostly shows is that by adjusting the mix of ages and metallicities in the SSP fit it's possible to match the overall continuum slope of the spectrum, but in cases where there's convincing evidence for foreground star contamination the SSP fit will fail to capture specific features. In the first example for instance there are extra absorption lines that are nicely accounted for with a z=0 stellar spectrum added to the mix, and in the third molecular absorption troughs in a cool star nicely account for the extra "wiggles" in the red part of the spectrum that no mix of SSP models can fit.

Since I don't impose any constraints on the individual star contributions other than non-negativity the algorithm could add spectra of disparate temperatures, but it could also just be interpolating between the fairly broad temperature bins that I used.

Here are the spectra, which I created by stacking ~5100 SDSS stellar spectra in color bins about 0.33 magnitude wide in g-i:

The SDSS pipeline has some physical data and classifications for stars in the spectrum files and I had initially planned to use those for binning. It turns out that data isn't very good though and the bins ended up not being homogeneous enough for stacking to work well. So instead I synthesised g,r,i magnitudes by projecting the spectrum fluxes onto the filter response functions. Here is a color-color diagram for the entire sample. The color coding is by the values of "bv" in the spectrum files which presumably is B-V color. If "bv" were an accurate measure of color there should be a rainbow in this diagram, which obviously is not quite the case.

And here is a histogram of synthesized g-i colors. I think F stars were favored for spectrophotometric calibration, which probably accounts for the first big peak in the frequency distribution.

I didn't really see any point in getting too fine-grained about binning the data and I don't really know how to estimate surface gravity or metallicity anyway, so I settled for a dozen color bins each of which is 1/3 magnitude wide in g-i (with the outliers to the blue of -1 and red of +3 thrown into the first and last bin).

Posted March 26, 2014 7:53 PM
by JeanTate in response to mlpeck's comment.

I think F stars were favored for spectrophotometric calibration,

From Tremonti+ 2004 (some non-Latin symbols did not copy):

The SDSS spectrographs do not employ an atmospheric dispersion corrector, and the spectra are frequently acquired under nonphotometric conditions. The survey has nevertheless been able to obtain a remarkable level of spectrophotometric precision by the simple practice of observing multiple standard stars simultaneously with the science targets. (The artifice in this case is that the ‘‘standards’’ are not classical spectrophotometric standards but are halo F subdwarfs that are calibrated to stellar models; see Abazajian et al. 2004 for details.) To quantify the quality of the spectrophotometry, we have compared magnitudes synthesized from the spectra with SDSS photometry obtained with an aperture matched to the fiber size. The 1 error in the synthetic colors is 5% in g r and 3% in r i (kg 4700 8; kr 6200 8; ki 7500 8). At the bluest wavelengths (3800 8) we estimate the error to be 12% based on repeat observations. There is also a systematic error in the sense that the spectra are bluer than the imaging by 2% in the g band, but it is unclear at present
whether this represents an error in the absolute calibration of the photometry or the spectroscopy.

As the SDSS spectrum of a star of a given spectral class (e.g. M3V) is, ideally, clean, and as foreground stars will almost always be of just one (common) spectral class*, wouldn't a better approach be to find a good template spectrum (or library of templates)? Uncommon classes aside (and ignoring things like novae and planetary nebulae), within a given class the only variable is metallicity (I think; temperature and gravity are 'built in' to the spectral class).

Perhaps an iterative approach might work well: first estimate a broad class (e.g. OBAFGKM, basically just temperature), then try a finer class, perhaps using interpolation?

*of course, spectroscopic binaries complicated this (nice GZ forum thread: Spectroscopic Binaries)

Posted March 28, 2014 12:02 AM

by JeanTate in response to JeanTate's comment.

In a later post I'll provide an ID-RA-Dec list, which can be copy/pasted into the SDSS Image List Tool to get DR10 images (and more; this follows mlpeck's suggestion (see page 10 of this thread)).

Here it is (not quite the same order as above):

V_DISP_ERR negative; and/or V_DISP unrealistic

QCuid,ra,dec
AGS00003sv,122.97152,41.80555
AGS00003vz,125.37005,41.2649
AGS000048e,230.98415,5.255537
AGS000048m,236.36224,7.1136274
AGS00004ld,172.25229,48.733124

One or more of the key BPT lines masked

QCuid,ra,dec
AGS00002j8,242.01917,0.11980407
AGS00002r8,209.1272,9.5846376
AGS00002sv,141.7901,53.820187
AGS000031u,128.05977,19.622227
AGS00003bn,181.15939,36.057137
AGS00003gf,53.203636,0.73268747
AGS00003ji,197.59135,3.5347576
AGS00003kr,134.41769,27.284994
AGS00003lz,239.95284,33.722469
AGS00003x6,135.97768,37.39502
AGS000040k,186.37794,9.3944168
AGS000044x,216.11134,16.639608
AGS0000495,196.09451,28.810776

Large part of spectrum masked, or missing

QCuid,ra,dec
AGS00002cy,184.94919,14.286131
AGS00002ki,173.01382,49.17387
AGS00004aj,205.80527,16.655584
AGS000032k,234.08218,57.518658
AGS00003i4,117.47579,41.203808
AGS00003wz,231.52452,29.238565

Unreliable redshift

QCuid,ra,dec
AGS00002ha,204.29787,9.3562222

Not in DR9

QCuid,ra,dec
AGS00002gs,182.92636,5.6259999
AGS00002nu,132.86823,33.239185
AGS00002wb,192.97311,4.5770364
AGS00003ky,210.72018,54.40913

Diffspiked

QCuid,ra,dec
AGS00002w5,217.27522,22.035683
AGS00002xj,137.17714,33.912479
AGS000038m,150.4267,52.200794
AGS00003bm,241.37042,8.0898228
AGS00003ko,167.25092,30.621782
AGS00003kq,213.76962,2.8734627
AGS00003m2,121.33378,12.431553
AGS00003nh,142.24089,12.590751
AGS00003xn,115.81785,24.280655
AGS0000419,255.9035,33.109158
AGS00004hq,158.15909,4.6106997
AGS00004o4,119.63489,20.611591

'Smashed' galaxies

QCuid,ra,dec
AGS00002c5,219.98843,13.826877
AGS00002o4,235.66699,13.706963
AGS000031r,229.42706,6.5128703
AGS00003ax,227.04459,1.9497555
AGS00003p8,197.79723,43.72633
AGS00003x8,238.37627,53.796959
AGS00003xb,238.8474,9.6105852
AGS0000472,162.48299,56.830578
AGS00004lc,226.72464,48.299641

'Unrecognized star' contaminates spectrum, photometry

QCuid,ra,dec
AGS00002c6,248.04549,22.138491
AGS00002et,229.0529,19.741594
AGS000043e,340.15994,-10.099327

'Star' contaminates spectrum, photometry

QCuid,ra,dec
AGS00002ra,133.88885,37.434757
AGS000039s,192.05965,50.983063
AGS000039v,121.07872,14.789177
AGS000043a,214.09982,29.287455
AGS000044n,251.09805,44.029934
AGS00004af,243.64807,16.701731
AGS00004h4,323.32028,11.620528
AGS00004mf,30.816111,0.1919311

'Stars', both recognized and not, contaminate spectrum, photometry

QCuid,ra,dec
AGS00003jg,341.5311,-10.369423

Galaxy overlap

QCuid,ra,dec
AGS00002o8,230.00021,32.771442
AGS00004la,119.9402,18.403969
AGS00004lz,26.651016,-0.48102674

Posted March 28, 2014 12:38 AM

by mlpeck in response to JeanTate's comment.

As the SDSS spectrum of a star of a given spectral class (e.g. M3V)
is, ideally, clean, and as foreground stars will almost always be of
just one (common) spectral class*, wouldn't a better approach be to
find a good template spectrum (or library of templates)?

I must be missing something. This is a library of templates. Just eyeballing the spectra they span roughly the spectral type range B5-M5 with, again roughly, around 1/2 spectral class resolution.

Since I'm not interested in doing stellar classification here (and would use other methods if I were interested) a dozen or so broad temperature bins seemed sufficient for the task at hand. If you think otherwise I'm more than happy to share my code and point you to some sources of data. Some stellar libraries that try to cover spectral parameter space as fully as possible include MILES, Elodie, and the Indo-US library.

Posted March 28, 2014 5:51 PM
by JeanTate in response to mlpeck's comment.

Hmm ... so "~5100 SDSS stellar spectra" refers to templates, not the observed spectra of ~5100 individual stars?

Here is a color-color diagram for the entire sample. The color coding is by the values of "bv" in the spectrum files which presumably is B-V color. If "bv" were an accurate measure of color there should be a rainbow in this diagram, which obviously is not quite the case.

If the inputs were templates, the outliers at least would seem to be hard to explain; if individual spectra, perhaps not so much (e.g. the 'star' observed is actually a binary/double, esp if one is red/cold the other blue/hot), although I think I read that white dwarfs can also be 'off the beaten track' (esp DZ ones).

If you think otherwise I'm more than happy to share my code and point you to some sources of data.

Thanks. I am interested ... however, it's something I'd rather look at after we've put the draft of our paper to bed ... 😉

Posted March 28, 2014 7:37 PM
by mlpeck in response to JeanTate's comment.

Hmm ... so "~5100 SDSS stellar spectra" refers to templates, not the
observed spectra of ~5100 individual stars?

I must really be too pithy. The dozen spectra in the long skinny graph a few posts up are the template library. They were created by

a) Downloading a random selection of SDSS spectra that are classified as stars, have measured redshifts in narrow limits around z=0, and met a S/N cut. I think I imposed a zwarning cut too. I hope so.

Edit: Yes I did. Here is the entire query:
```
SELECT TOP 5000 specObjID, plate, mjd, fiberid INTO mydb.stspec
FROM SpecObj 
WHERE class = 'STAR' AND zWarning = 0 AND z> -0.001 AND z (the symbol for less than) 0.001 AND snMedian>15
```
b) Each spectrum is reduced to a restframe wavelength scale by dividing by (1+z) and synthetic colors are calculated by projecting the spectrum fluxes onto filter response curves for g, r, and i (which were downloaded from some SDSS PI's website as I recall).

c) These were sorted into a dozen broad bins in g-i color.

d) Finally, all spectra are adjusted to a common wavelength grid by interpolation and inverse variance weighted means are calculated for each wavelength in each color bin. The mean spectra are normalized to have an average value of 1 in the wavelength range [5200,5800] Å, which is a relatively blank and flat part of most spectra.

Yes, the outliers in the color-color plot could be binaries, oddball stars, overlaps with galaxies, misclassified galaxies, misclassified QSOs, bad data, ...

Sorry if I've hijacked your topic.

Posted March 28, 2014 9:35 PM
by JeanTate in response to mlpeck's comment.

Thanks! 😄

I got tripped up over "template"; to me that means (meant) something like "some team somehow associated with SDSS picked a bunch of stars widely accepted by the relevant observational community (i.e. professionals who've spent a decade or more studying stars and their optical spectra) as having archetypical spectra, of all main spectral classes (including the dwarf-to-supergiant dimension), extracted their SDSS Legacy (i.e. not BOSS) spectra, removed any wonky ones, transformed them to z=0 (etc), and published them".

Kinda like what's available from the Spectral cross-correlation templates webpage, only more so (and only for stars).

Posted March 28, 2014 9:46 PM
by JeanTate in response to JeanTate's comment.

I've started analyzing these, with the aim of creating a list of strong ppos, selected by similar criteria as the QC ones I posted above, and sorted into the same categories (with the possibility of adding one or two).

Posted March 28, 2014 9:50 PM
by JeanTate in response to JeanTate's comment.

Or like what's in "The Gaia FGK Benchmark Stars - High resolution spectral library" (Blanco-Cuaresma+, 2014, arXiv:1403.3090), but for all major classes:

Context. An increasing number of high resolution stellar spectra is available today thanks to many past and ongoing spectroscopic surveys. Consequently, numerous methods have been developed in order to perform an automatic spectral analysis on a massive amount of data. When reviewing published results, biases arise and they need to be addressed and minimized.
Aims. We are providing a homogeneous library with a common set of calibration stars (known as the Gaia FGK Benchmark Stars) that will allow to assess stellar analysis methods and calibrate spectroscopic surveys.
Methods. High resolution and signal-to-noise spectra were compiled from different instruments. We developed an automatic process in order to homogenize the observed data and assess the quality of the resulting library.
Results. We built a high quality library that will facilitate the assessment of spectral analyses and the calibration of present and future spectroscopic surveys. The automation of the process minimizes the human subjectivity and ensures reproducibility. Additionally, it allows us to quickly adapt the library to specific needs that can arise from future spectroscopic analyses.

Posted April 7, 2014 9:23 PM
by JeanTate in response to JeanTate's comment.

Here's an initial list of 71 potentially problematic objects (ppos) among "the 1149"* QS objects, by category. "Category" is my subjective judgement of the principal reason why the object is problematic; for many - indeed, most - there's at least one other reason why I consider it problematic.

I have kept the same categories as I used for the QC ppos, but had to add two ("not main" and "other/multi"), which I explain below. Almost all these objects have been posted at least once before, with a note that it may be an outlier/unusual/etc; but not all have been posted in this thread before.

In a later post I'll provide an ID-RA-Dec list, which can be copy/pasted into the SDSS Image List Tool to get DR10 images (and more; this follows mlpeck's suggestion (see page 10 of this thread)).

In a separate, later, post, I'll compare and contrast the QS and QC lists of ppos.

~~To keep this list in a single post, I'll be posting, then editing it; when done I'll 'strikethrough' this text.~~ (yes, all 71 QS ppos now posted

V_DISP_ERR negative; and/or V_DISP unrealistic

(an excellent proxy for "MPA-JHU model fit is unacceptable")

AGS00000an

AGS00000l1 color anomaly; near bright star; V_DISP too large; bad Petro_Rad

AGS00000so

AGS000013n 'negative emission line' warning

AGS00001ho

AGS00001nu

One or more of the key BPT lines masked

(H-alpha, H-beta, [NII] 6583, [OIII] 5007. BPT class cannot be determined. In general, ~dozens to ~a hundred nm of the spectrum masked)

AGS000003c diffspiked? near bright star

AGS000014f

AGS0000226

AGS000022s

Unreliable redshift

(None; I included this category only for consistency with that of the QC objects)

Large part of spectrum masked, or missing; other gross spectroscopic problem

(this is perhaps the most subjective category, as I have no idea what the answers to my questions in this thread will be)

AGS00000j9 all line fluxes are zero, but the spectrum has obvious emission lines

AGS00001hr ~50nm masked, red-ward of [SII]; Eos smashed into three POs; (18.3, 0.069') overlapping G2 STAR

AGS00001w5 continuum misaligned across blue/red spectroscope's arms

AGS00001xa very poor model match

Not in DR9

AGS00001qc center of fiber for spectrum is not a DR9 PO (and does not contain the nucleus); smashed into six POs; (18.3, 0.069') overlapping 'star'

Diffspiked

(A diffraction spike from a nearby very bright star crosses part of the image of the galaxy, making the photometry unreliable. The spectroscopy may also be unreliable, even if the fiber was not diffspiked (which we can't determine anyway) ... many spectra were obtained on non-photometric nights)

AGS000003j

AGS000005g

AGS000008x

AGS00000jq

AGS00000js

AGS000014z

AGS00001c9

AGS00001f7

AGS00001j3 very poor quality image

AGS00001l4 very poor quality image

AGS00001la

AGS00001ry

AGS00001uc poor quality image

'Smashed' galaxies

(highly unreliable photometry because the SDSS pipeline 'smashed' a galaxy into several different photometric objects (a simple example here). this 'smashing' is particularly common for Eos (edge-on spirals), but usually has only a minor effect; for the ppos here it's major)

AGS000006r Eos smashed into x POs; color outlier; galaxy overlap?

AGS000006v bless (bulge-less) Eos smashed into six POs; near very bright star

AGS000007c Eos; clump/'star' in disk is separate PO

AGS00000b9 smashed into two POs; color outlier; near bright star; 'noisy sky'

AGS00000fq smashed into two POs

AGS00000gf smashed into eight POs; near two very bright stars

AGS00000hn Eos smashed into two POs; few pix near 660nm masked; (18.83, 0.169) overlapping 'star'

AGS000020v smashed into two POs; (14.73, 0.233) overlapping 'star'

AGS000025u smashed into two POs; near two bright stars; near Field boundary

AGS000029w smashed into two POs; (16.70, 0.123) overlapping 'star'

'Unrecognized star' contaminates spectrum, photometry

(an object which looks like a star, not part of the galaxy, is close enough to the center of the spectroscopic fiber that its light was recorded in the spectrum; however, the photometric pipeline does not recognize this 'overlap star' as a separate photometric object)

AGS00000hm M3 STAR (per spectrum) overlaps nucleus, I estimate the distance is~0.01'

AGS00000zi I estimate the distance is ~0.09'

AGS0000153 color outlier; I estimate the distance is ~0.03'

AGS00001dn spectrum is a mix of star and galaxy; I estimate the distance is ~0.02'

'Star' contaminates spectrum, photometry

(unlike the previous category, the contaminating object is recognized by the SDSS pipeline as a separate photometric object; numbers are the r-band magnitude of this 'star' and its distance from the location of the galaxy's photocenter, which is - presumably - the same as the center of the spectroscopic fiber)

AGS000004e (14.76, 0.206'), very red

AGS000008i (17.75, 0.078')

AGS00000ag (22.22, 0.059'), (20.52; galaxy overlap)

AGS00000cz (19.54, 0.064'), an M3 STAR

AGS00000dk (20.11, 0.038')

AGS00000y2 (16.39, 0.055'); bad Petro_Rad

AGS0000198 (19.21, 0.067'), (22.01, 0.080; galaxy overlap')

AGS00001c8 (16.76, 0.090')

AGS00001dz (19.65, 0.106')

AGS00001g0 (19.79, 0.097'), (16.71, 0.189'), faint diffspike through nucleus

AGS00001mb near bright star

AGS00001mh near bright star

AGS00001sf (18.61, 0.084'); "many outliers" (spectrum)

AGS00001w3 (18.59, 0.056'), (21.98, 0.105'); "many outliers" (spectrum)

AGS00001xi (14.61, 0.193')

AGS000026b near very bright star; possibly diffspiked; poor quality image

AGS000026p near very bright star; possibly diffspiked; poor quality image

AGS00002ab (16.84, 0.130')

Galaxy overlap

(light from the other galaxy - which is not a merger - contaminates the photometry, spectrum, or both ... even if the two have very similar redshifts)

AGS000028a (15.96, 0.152) 'star'; small region of spectrum near NaD lines masked; poor quality image

Spectrum does not include nucleus

(Fiber aperture does not include the main galaxy's nucleus, so the 'post-quenched' feature refers solely to a clump within the disk or merger feature (possibly a fading starburst); among QC objects, there are only three which might, marginally, be like this)

AGS00000c3 near bright star; ~few nm of spectrum near 480 nm masked

AGS00001rm merger; many separate POs; three 'objects with spectra', none of which includes a galactic nucleus

AGS0000242 merger; two separate POs; two 'objects with spectra', the one which includes a galactic nucleus is not 'post-quenched'; color anomaly; near bright star

Multi/other

(problematic because there are many reasons to categorize the object as such; however, none on their own are strong enough to warrant being included here)

AGS000007f near very bright star; a few pixels near H-delta masked; poor quality image

AGS00000f1 color anomaly (u-band)

AGS00000jt Eos near very bright star; different Field, so may be diffspiked

AGS00000or color anomaly; poor quality image

AGS00000zu Eos near two very bright stars; poor quality image

AGS00001vw near very bright star; poor quality image ('noisy sky')

AGS000029m color anomaly; near very bright star; poor quality image ('noisy sky')

*QS objects with 0.02 < z < 0.10 AND estimated z-band absolute magnitude brighter than -20.0

Posted April 7, 2014 9:29 PM

by JeanTate in response to JeanTate's comment.

In a later post I'll provide an ID-RA-Dec list, which can be copy/pasted into the SDSS Image List Tool to get DR10 images (and more; this follows mlpeck's suggestion (see page 10 of this thread)).

Here it is:

V_DISP_ERR negative; and/or V_DISP unrealistic

QSuid,ra,dec
AGS00000an,137.90992,-0.71499759
AGS00000l1,117.42611,33.923579
AGS00000so,160.33014,57.75002
AGS000013n,151.81002,39.32988
AGS00001ho,184.09475,14.298068
AGS00001nu,148.30424,30.856234

One or more of the key BPT lines masked

QSuid,ra,dec
AGS000003c,175.877,-3.5993068
AGS000014f,188.38286,43.81658
AGS0000226,163.90967,23.107597
AGS000022s,174.23003,24.890424

Large part of spectrum masked, or missing; other gross spectroscopic problem

QSuid,ra,dec
AGS00000j9,331.51203,-8.4202393
AGS00001hr,189.53453,14.034299
AGS00001w5,245.10924,14.762785
AGS00001xa,175.4975,26.950018

Not in DR9

QSuid,ra,dec
AGS00001qc,195.62202,34.733757

Diffspiked

QSuid,ra,dec
AGS000003j,178.10641,-1.2674584
AGS000005g,327.82533,-0.95701783
AGS000008x,27.305306,14.096745
AGS00000jq,317.04809,9.4422283
AGS00000js,322.70788,10.636529
AGS000014z,232.14244,30.855626
AGS00001c9,232.28023,30.49568
AGS00001f7,223.57399,11.98018
AGS00001j3,192.97821,7.884574
AGS00001l4,226.89694,6.2965129
AGS00001la,231.84046,6.6545883
AGS00001ry,181.87656,33.477182
AGS00001uc,221.29835,25.202274

'Smashed' galaxies

QSuid,ra,dec
AGS000006r,26.625349,-0.61915449
AGS000006v,28.614,-0.79548225
AGS000007c,39.538851,0.86762158
AGS00000b9,164.22383,65.269306
AGS00000fq,198.48143,63.756318
AGS00000gf,239.75179,49.810236
AGS00000hn,326.51212,-7.7836288
AGS000020v,122.25983,11.137602
AGS000025u,195.25068,16.663229
AGS000029w,232.53735,14.812482

'Unrecognized star' contaminates spectrum, photometry

QSuid,ra,dec
AGS00000hm,328.60819,-7.3671116
AGS00000zi,179.80899,10.369002
AGS0000153,234.41569,30.663386
AGS00001dn,255.95089,24.707944

'Star' contaminates spectrum, photometry

QSuid,ra,dec
AGS000004e,252.70968,62.753657
AGS000008i,13.704997,13.952345
AGS00000ag,135.4816,-0.017969568
AGS00000cz,216.61479,1.6116015
AGS00000dk,121.31291,39.099961
AGS00000y2,231.59102,48.704023
AGS0000198,245.91302,26.322251
AGS00001c8,231.69747,28.95394
AGS00001mb,116.22677,47.937679
AGS00001mh,119.86461,53.630081
AGS00001sf,204.46376,35.79883
AGS00001w3,240.76515,21.697465
AGS00001xi,182.16548,28.402867
AGS000026b,185.3928,21.311354

Galaxy overlap

QSuid,ra,dec
AGS000028a,224.78466,12.476627

Spectrum does not include nucleus

QSuid,ra,dec
AGS00000c3,167.22802,2.6765546
AGS00001rm,118.63381,16.804202
AGS0000242,238.46782,10.315059

Multi/other

QSuid,ra,dec
AGS00000f1,218.39609,4.0567239
AGS00000jt,322.99613,10.850892
AGS00000or,190.48442,4.4600355
AGS00000zu,155.00088,8.226057
AGS000029m,216.30363,18.259904

Posted April 8, 2014 2:39 PM

by JeanTate in response to JeanTate's comment.

In a separate, later, post, I'll compare and contrast the QS and QC lists of ppos.

Here is that comparison:
```
  QC   QS Tot. Category
   5    6   11 V_DISP_ERR negative; and/or V_DISP unrealistic
  13    4   17 One or more of the key BPT lines masked
   1    0    1 Unreliable redshift
   6    4   10 Large part of spectrum masked, or missing; other gross spectroscopic problem
   4    1    5 Not in DR9
  12   13   25 Diffspiked
   9   10   19 Smashed' galaxies
   3    4    7 Unrecognized star' contaminates spectrum, photometry
   8   18   26 'Star' contaminates spectrum, photometry
   1    0    1 'Stars', both recognized and not, contaminate spectrum, photometry
   3    1    4 Galaxy overlap
   0    3    3 Spectrum does not include nucleus
   0    7    7 Multi/other

  65   71  136 Total
1196 1149 2345 population
```
To me, perhaps the most remarkable thing is that the QC and QS distributions are so similar! 😮

The searches for ppos, among QC and QS objects*, were many and varied; however, while they were certainly not independent - I did a lot of the searching, for example - ppos in the largely subjective categories (which is almost all of them!) were found without consciously directly comparing objects.

A formal, 2x2 chi-square contingency test on each category (excluding those with counts < 5) - and the total - says only three have p < 0.05:
- One or more of the key BPT lines masked; p = 0.035
- 'Star' contaminates spectrum, photometry; p = 0.038
- Multi/other; p = 0.007 (not, strictly speaking, a valid test; E is < 5)
The totals are statistically indistinguishable (p=0.440).

Is it possible that the subjective selection for "'Star' contaminates spectrum, photometry" is markedly different, between QC and QS? Yes ... and I'll check that shortly.

What about "One or more of the key BPT lines masked"? Harder to say ... in my spare time I'll re-check the spectra of "the 1149", to see if there are any I missed (there's no doubt that 13 QC objects have one or more of the key BPT lines masked).

Are there ~10 QC objects which are borderline, in terms of possibly being "Multi/other"? Yes, that's possible.

However, unless anyone still active in the project - especially one of the SCIENTISTs - suggests otherwise, I'll check only the QS "'Star' contaminates spectrum, photometry" for consistency with the QC ones. Once that's done, I'll proceed to analyze the effects of removing "repeats/dups", excluding the ppos (see here for details of the "repeats/dups").

*objects with 0.02 < z < 0.10 AND estimated z-band absolute magnitude brighter than -20.0

Posted April 8, 2014 4:13 PM
by JeanTate in response to JeanTate's comment.

Is it possible that the subjective selection for "'Star' contaminates spectrum, photometry" is markedly different, between QC and QS? Yes ... and I'll check that shortly.

I compared all 18 QS "'Star' contaminates spectrum, photometry", and all 7 QS "Multi/other" with the 8 QC "'Star' contaminates spectrum, photometry"; my subjective judgement is that 6 of the QS ppos are fairly clearly less problematic than any of the 8 QC ones.

"'Star' contaminates spectrum, photometry":
- AGS00001dz
- AGS00001g0
- AGS000026p
- AGS00002ab
"Multi/other":
- AGS000007f
- AGS00001vw
~~I'll remove them from the two lists of ppos above (later).~~ Done.

Posted April 8, 2014 8:22 PM
by JeanTate

From the Clean "021020" galaxies: 11 April catalogs, comparisons, and discussion thread, one more object - a QS galaxy - to exclude, as a ppo. Details are in this post in that thread:

I would recommend excluding just one more object, the QS ObjId 587735241176514586, AGS000010e (originally also QC object AGS00003b8). Why? Because its secure classification as an asymmetric galaxy, and as a merging galaxy, would require us to understand why the vote fractions are so different, in the 'nodup' and '11Apr' catalogs. For just one object, I don't think it's worth the effort.

Posted April 21, 2014 1:46 PM

by mlpeck

For the sake of completeness I've run my SSP+stars fitting routine on all of the spectra that JeanTate suspected of being contaminated with a foreground star. I've added a little bit to the estimates returned -- the routine now calculates the median star contribution over all good pixels and the average percentage contribution (relative to the total fit flux) in the wavelength intervals [3750,4150], [4840,5030], [6520, 6620] Å. These wavelength ranges include the one that was used to select the quench sample, which also includes the 4000 Å break and Hδ; the area around Hβ - [O III] 5007; and the area around Hα and the [N II] doublet.

I also calculate the "Bayesian Information criterion" for each fit both with and without foreground stars included in the fit. The BIC is defined as

BIC ≈ -2ln(L) + kln(N)

where ln(L) is the log-likelihood of the data given a model, k is the number of parameters and N the number of data points (this is eqn. 5.35 in Ivezic et al.). In the tables below delta_bic is the difference in BIC between the fit with stars and the one without. A negative value indicates an improvement after adjusting for the additional number of parameters in the fit, which is 12.

First, here are the controls:

         uids  med pct_3750_4150 pct_4840_5030 pct_6520_6620 delta_bic
1  AGS00002c6 17.2         34.74          21.9          15.5        39
2  AGS00002et  3.0          0.82           1.8           2.8        86
3  AGS000043e  1.7          1.46           1.4           1.6        89
4  AGS00002ra 12.4          3.77           7.9          11.1      -221
5  AGS000039s 14.8         19.35          16.7          12.9        34
6  AGS000039v  5.8          8.51           6.3           5.1        85
7  AGS000043a 21.6         14.31          16.6          21.2        68
8  AGS000044n  3.7          2.22           2.7           3.5        84
9  AGS00004af  5.3          3.57           4.3           5.3        60
10 AGS00004h4  7.2         10.20           6.9           7.2        38
11 AGS00004mf 17.0          5.36          11.9          18.1      -214
12 AGS00003jg  2.3          0.86           1.6           2.1        87

And here are the quench objects with suspected foreground stars, plus the ones JeanTate categorized as "multi/other":

         uids   med pct_3750_4150 pct_4840_5030 pct_6520_6620 delta_bic
1  AGS00000hm 18.33          7.01         12.15         20.96     -1990
2  AGS00000zi  4.26          3.06          3.66          4.16        85
3  AGS0000153 22.28         14.14         18.27         20.75       -89
4  AGS00001dn 38.14         60.68         43.76         36.61     -1592
5  AGS000004e  1.85          1.19          1.62          1.84        91
6  AGS000008i  0.00          0.00          0.00          0.00        91
7  AGS00000ag  2.73          0.81          1.49          2.46        91
8  AGS00000cz  4.37          2.16          2.99          4.16        68
9  AGS00000dk  5.15          2.59          3.83          4.79        85
10 AGS00000y2  7.57          3.63          5.21          7.12        33
11 AGS0000198  3.58          2.33          2.67          3.51        88
12 AGS00001c8  5.09          3.25          4.24          4.80        84
13 AGS00001mb  0.00          0.00          0.00          0.00        92
14 AGS00001mh  4.28          1.86          2.88          3.82        72
15 AGS00001sf  4.20          2.35          3.37          4.24        50
16 AGS00001w3  2.95          0.94          1.64          3.15        76
17 AGS00001xi  0.84          0.30          0.57          0.83        91
18 AGS000026b  3.61          2.39          3.20          3.33        88
19 AGS00000f1  3.03          1.37          2.26          2.81        89
20 AGS00000jt  6.82          2.99          4.48          5.94        82
21 AGS00000or  7.06          3.76          5.83          6.44        79
22 AGS00000zu  9.44          7.58          8.10          9.40       -18
23 AGS000029m 11.67          8.63         10.76         11.93        47

I will look at objects with suspected diffraction spikes in the pictures later.

If any scientist co-authors think it is important to use these fits in the final analysis I will document what I did in as much detail as requested (and/or supply code and data). Personally I think you might have a hard time selling this to a referee.

Posted May 21, 2014 4:41 PM

by mlpeck

One more post and I'm done with this topic unless I need to document my fitting procedure.

These are the star contributions for the suspected diffraction spike groups. I'm assuming here that if a diffraction spike happens to cross the fiber during a spectroscopic exposure it will contribute a copy of the spectrum of the star that produced it.

First the controls:

         uids  med pct_3750_4150 pct_4840_5030 pct_6520_6620 delta_bic
1  AGS00002w5 3.60          2.65           3.0          3.70        79
2  AGS00002xj 0.00          0.00           0.0          0.00        91
3  AGS000038m 2.62          0.87           1.5          2.26        89
4  AGS00003bm 4.78          5.98           4.8          4.09        80
5  AGS00003ko 4.88         11.00           5.8          4.46        85
6  AGS00003kq 0.00          0.00           0.0          0.00        91
7  AGS00003m2 3.34          3.68           2.7          3.30        59
8  AGS00003nh 8.15         10.50           7.9          8.12        57
9  AGS00003xn 2.93          1.36           2.2          2.82        90
10 AGS0000419 0.76          3.93           1.5          0.52        91
11 AGS00004hq 2.27          0.92           1.5          2.08        91
12 AGS00004o4 3.70          1.25           2.7          3.60        83

Next quench:

         uids   med pct_3750_4150 pct_4840_5030 pct_6520_6620 delta_bic
1  AGS000003j 2.516         0.985         1.595         2.486        87
2  AGS000005g 5.373         2.587         3.851         5.396      -122
3  AGS000008x 0.016         0.016         0.017         0.014        91
4  AGS00000jq 1.993         1.477         1.795         1.928        90
5  AGS00000js 0.929         0.840         0.835         0.895        91
6  AGS000014z 5.742         3.388         4.513         5.244        84
7  AGS00001c9 7.343         4.277         5.539         5.987        27
8  AGS00001f7 8.566         7.627         7.267         7.485        64
9  AGS00001j3 6.911         2.815         4.334         6.011        26
10 AGS00001l4 6.988         2.819         4.563         6.682        78
11 AGS00001la 3.853         2.501         3.376         3.706        88
12 AGS00001ry 4.046         2.411         3.064         3.684        64
13 AGS00001uc 2.700         1.930         2.323         2.512        90

I checked the one object where there was formally an improvement in the fit with foreground stars added, and frankly I don't see how that result isn't spurious unless there happens to be a completely unresolved foreground star right on top of the galaxy nucleus (which is quite bright).

Posted May 21, 2014 11:00 PM

by JeanTate

I'm slowly checking the 'star within spectroscopic fiber' ppos; here's a very preliminary plot:

The y-axis is frequency, the x-axis is normalized z-score (i.e. mean =0, stdev =1), normalized on the subset of 993 "OK !ppo" QS objects (all of which have 0.02<z<0.10 AND Z brighter than -20.0), with bins 0.3 wide; the lowest bin in < -1.35, the highest > 1.35 to 5. The parameter calculated is '3D color distance from mean' derived from the fiber magnitudes (I'll explain later). The blue line shows that the distribution of these colors is very far from being normal! Although there are only 40 in the second subset (orange line, "OK ppo1"), the distribution of their colors is quite similar to that of the 993. The purple "!OK" subset (N=10) may be marginally different (hard to say); the green ("OK ppo2"; N=12) is very likely quite different.

Why is this relevant? Because the 'green' objects are all the QS objects (within this subset of 1084) that I had identified as having a star within (or very close to) the fiber that feeds the SDSS spectrographs. In other words, if there's a bright enough foreground star (within our own galaxy) whose light ends up in the spectrum, the fiber magnitudes (actually colors) will, on average, be different from those of galaxies of the same kind which do not have such foreground stars.

I haven't shown this very clearly yet, not even for the 1083 QS subset; stay tuned! 😃

Posted May 22, 2014 3:16 PM
by JeanTate in response to mlpeck's comment.

This is cool! 😃 As is the previous post! 😄

On the second (diffspikes affecting spectra): about the only circumstance I can think of - off the top of my head - in which anything but a blindingly bright diffspike would affect an SDSS spectrum would be if the 'star' contained a very, very strong emission line.

The geometry of the SDSS telescope and plate (which has the holes in it, to which fibers are attached, to feed the spectrographs) is such that any diffspike will 'sweep over' a fiber, so adding light to the spectrum for just a small fraction of the 90 (?) minute exposure. The 2.5m SDSS telescope mount is alt-azimuth, so if a star is tracked, its diffspikes will appear to rotate (the secondary mirror's support structure - which produces the diffspikes - is fixed, relative to the telescope 'tube').

What is the fraction? It's a function of the (RA, Dec) coordinates of the target galaxy, the time the spectrum was obtained (-> the altitude and azimuth range), the 'distance' of the bright star producing the diffspikes from the target, and the 'width' of the diffspike (these last two are related, but not completely degenerate), and some other - minor? - factors too.

Photometry? Different story.

I'll comment on the first ("BIC") post later.

Posted May 22, 2014 3:36 PM
by JeanTate in response to JeanTate's comment.

Some time ago I used a CasJob Query to obtain the fiber magnitude estimates (and errors) for the QS objects whose Quench spectra (per their DR7 SpecObjId) are still primary in DR8; many thanks to mlpeck for the work he did which made this easy to get.

'Fiber magnitudes' are estimates of the flux (expressed as magnitudes) through the fiber that feeds the spectrographs, in each of the five bands (u, g, r, i, and, confusingly, z). With five bands, there are four independent colors; here are three color-color plots of these, for 993 QS objects (0.02<z<0.10 AND Z brighter than -20.0), the 'OK !ppo' subset:

"OK" refers to a classification of fiber and model magnitudes that I did some time ago (I'll provide details later, if anyone's interested); among the 1085 (of 1149) QS objects for which I have fiber mags, only 11 are "!OK" (not OK). There is no single reason why; some objects have at least one model mag that is fainter than the fiber mag (indicative of some kind of failure of the photometric pipeline; all these QS objects are much bigger than ~1.5" in radius, in all bands); some have huge estimated errors; some have crazy colors; some ...

The "!ppo" refers to "potentially problematic object"; of the 66 QS I identified earlier in this thread, 60 are among the 1057. There are 993 "OK !ppo" objects.

While there is probably some coherent structure to the distribution of points in the second and third plots (other than an obvious linear trend), it is not as obvious as the bifurcation in the first plot (what's that due to? My guess: the 4000 Å break, and I'm going to have fun checking this out! 😃). I don't know how this structure in the first plot might affect my analyses, so to be a bit conservative I proceeded with just three colors, g-r, r-i, and i-z.

I transformed each color so that the 993 have zero means; then I calculated the distance - in 3D color space - of each point (representing an object) from the (3D) mean. Here is the distribution of those distances, in bins (thick blue line; x-axis is color distance, y is fraction):

There are 41 "OK ppo1" objects; these are ppos, whose colors I had classified as "OK", and which are not 'star contaminates spectrum, photometry'; the distribution of their color distances is plotted as a thin orange line. That distribution is very similar to that represented by the thick blue line.

There are 11 "!OK" objects; the distribution of their color distances is plotted as a thin purple line; there is an excess in one distance range (and a deficit in another), but otherwise this is even closer to the thick blue line. This is not as odd as it seems; many of the 'problematic' colors/magnitudes for these 11 arise from u-band estimates only, and the u-band is not used in calculating color distance.

The thick green line - "OK ppo2" - represents the distribution of color distances of 12 (of 13) ppo 'star contaminates spectrum, photometry' objects; the 13th is one of the "!OK" objects. Sure looks different, than the thick blue, thin orange, and thin purple lines, doesn't it? 😃

Not finished: I need to repeat this analysis for the QC objects, and add the remaining 92 QS objects. And also perform some relevant statistical tests, to see just how different the thick green distribution is ... (oh, and maybe also try to add the fourth color, u-g).

Posted May 23, 2014 10:29 AM
by JeanTate in response to mlpeck's comment.

I don't think I said this before ... this post (and the next) is AWESOME! 😄

Excluding the single 'diffspike' hit (AGS000005g), and the single 'multi/other' one (AGS00000zu), here's a plot of deltaBIC against 'distance' (the estimated distance from the star's photocenter to the center of the fiber, in arcmins), for distances up to 0.06' (both QS and QC combined):

Some details: "unrecognized" means that I think there's a star, but the SDSS pipeline doesn't; "brighter" means an r-band mag of 20 or brighter; "fainter", well, fainter.

Yes, not many data points, but:
- all stars, recognized or not, at distances < ~0.035' have a quantitatively demonstrable effect on the galaxy's spectrum (one exception)
- no stars at distances > ~0.035' have any such effect
I wonder:
- why does the unrecognized star in AGS00002c6 not have a demonstrable effect?
- why does the 16.18 mag star in AGS00004h4, at a distance of just 0.036', have no effect, yet the 19.08 one in AGS00004mf does (distance 0.033')?
A moot quibble: four objects in one or other of my lists of 'star contaminated spectrum, photometry' ppos are not among your outputs: AGS00001g0, AGS00001dz, AGS00002ab, and AGS000026p. Why moot? Because the estimated distances are all >> 0.04'.

To any SCIENTIST reading this: do you know of anything in the literature like this? Is it something which could be generalized and used to identify galaxies whose spectra are potentially contaminated by stars, with or without first selecting suspects (e.g. by selecting for 'STAR closer than 0.04')?

Oh, and did I say these two posts are AWESOME? 😄 😄

Posted May 24, 2014 9:23 AM
by JeanTate in response to mlpeck's comment.

the routine now calculates the median star contribution over all good pixels and the average percentage contribution (relative to the total fit flux) in the wavelength intervals [3750,4150], [4840,5030], [6520, 6620] Å. These wavelength ranges include the one that was used to select the quench sample, which also includes the 4000 Å break and Hδ; the area around Hβ - [O III] 5007; and the area around Hα and the [N II] doublet.

Follow-up: are these wavelength intervals dynamically adjusted for the redshifts of each galaxy you checked against?

Or maybe I simply do not understand ... for any galactic star, these intervals will certainly include the features mentioned. However, for galaxies, there will be only a limited redshift range in which the (distant) galaxy's features will fall within these (observed frame) intervals.

For example: [OIII]5007 will be observed at 5030 Å at z~0.0044, and all galaxies in our selection (0.02<z<0.10) will have any [OIII]5007 lines affected by local stars outside the [4840,5030] Å interval.

Posted May 24, 2014 5:49 PM
by mlpeck in response to JeanTate's comment.

Thanks. I'm just cleaning up loose ends.

do you know of anything in the literature like this?

Tsalmantza and Hogg (2012) did something essentially similar to verify some already known dual redshift systems. Since the paper was mostly algorithm development the application was basically just a proof of concept.

Just for fun I might try to replicate their exact approach (yes I have the code to do so, at least most of it). More interesting would be to see if it can find contaminated spectra in a randomly selected blind sample.

are these wavelength intervals dynamically adjusted for the redshifts
of each galaxy you checked against?

The wavelength intervals are in the galaxy rest frames.

Posted May 24, 2014 8:03 PM
by JeanTate in response to mlpeck's comment.

Thanks! 😄

More interesting would be to see if it can find contaminated spectra in a randomly selected blind sample.

As I understand it, you'll find 'contamination' only if the 'contaminants' are stars, of the kind in your templates. However, if you had a library of 'galaxy' spectra, you could also find overlaps ... except that you'd also have to add an arbitrary redshift (not a problem with foreground stars, in SDSS spectra, as they all have an effective redshift of zero).

Here's a suggestion: what regions within SDSS' spectral wavelength coverage contain highly distinctive features, ones found in particular classes of star? For example, in cool stars there are the various molecular bands; in A stars, nice Balmer absorption lines; ... A pity that foreground planetary nebulae and supernova remnants have surely all long since been found (their spectra are very distinctive).

Posted May 26, 2014 3:38 PM
by JeanTate in response to JeanTate's comment.

Not finished: I need to repeat this analysis for the QC objects, and add the remaining 92 QS objects.

I've (finally!) managed to obtain the necessary data to do these analyses. Using a CasJobs database mlpeck had set up (a big THANK YOU once again), I was able to get the 'fiber data' on QC objects quite easily; of the 1196 with 0.02<z<0.10 AND Z brighter than -20.0, only 13 were missing (vs 92 QS ones¹).

Using the SDSS CrossID tool, I was able to track down 86 (of 92) "missing" QS objects, and 11 (of 13) QC ones. Of those, five QS and four QC ones turned out to be different photometric objects, so of no use in my analyses. Manual searching turned up five (of six) "doubly missing" QS objects, and one of the QC ones (which turned out to be a different photometric object). Leaving one QS and one QC object that simply does not exist in DR8. I'll reconcile these various missing/different objects with the "Not in DR9" ppos later.

I've begun an analysis of the complete sets of 'fiber data', and will write up what I find later. For now, a preliminary plot, to show why astronomy is so terrific (and fun):

Why does that scream 'astronomy is fun!'? Well, that's ~1k QC objects, and here's the (somewhat) corresponding plot for ~1k QS ones (posted earlier/elsewhere):

Notice anything different?

¹ This is probably telling us something important; an investigation for the future perhaps?

Posted May 26, 2014 4:02 PM
by JeanTate in response to JeanTate's comment.

I've begun an analysis of the complete sets of 'fiber data', and will write up what I find later.

There are five bands, so four independent colors. Here's a KS plot for the (r-i) colors, derived from the fiber magnitudes; there are 1073 QS objects, and 1080 QC ones:

I'll crunch the numbers later, but I have no doubt that these two distributions are, statistically speaking, different. And that's not the least bit surprising ... QS objects have nuclear/bulge spectra which differ considerably from the matched QC galaxies.

What does 'clean' mean, here?

For this analysis, I divided the objects in each of the two datasets (QS and QC) into four, independent, subsets:
- ppos which I earlier classed as having a recognized, or unrecognized, star which contaminates the photometry, spectra; called 'star'
- other ppos, but excluding those with 'wonky photometry'; called 'ppo !st !wp'
- objects with 'wonky photometry', excluding those in either of the above subsets; called 'wp !st'
- everything else; called 'clean'
What criteria did I use for 'wonky photometry'?
- in any band, if the estimated fiber magnitude error > 0.3
- ditto, for the 'cModel' magnitude error
- in any band, if a fiber magnitude is brighter than the corresponding cModel magnitude*
Here are the KS plots for all 1143 QS objects and all 1190 QC ones:

I think these two plots are consistent with what I posted earlier; namely, that
- 'non-star' ppos have a fiber color distribution close to that of the 'clean' subsets
- 'star' ppos have a fiber color distribution that is quite different from that of the 'clean' subsets
And one result which is somewhat different:
- objects with 'wonky photometry' have color distributions different from those of the 'clean' subsets.
All three other colors that I looked at - (u-g), (g-r), and (i-z) - seem to show the same patterns; however, the patterns I described seem much clearer in the (r-i) color.

Are these differences statistically significant? And can the same patterns be seen in the cModel colors? Stay tuned! 😃

Oh, and comments welcome, of course (and if you want a copy of my data, just ask!)

*it might be argued that if the difference is only ~0.1 mag, in only one band, this cut is too radical; for those with 'wonky photometry' there are very few which are 'problematic' on just one criterion, in just one band, and none close to being marginal (in this sense)

ETA: something went wrong with some of the plots; ~~I'll replace them with the correct ones later (and note that)~~ I have now replaced them. While I'm at it, I changed the 'co' label to 'wp' (wonky photometry).

Posted May 28, 2014 7:41 PM
by JeanTate in response to JeanTate's comment.
For this analysis, I divided the objects in each of the two datasets (QS and QC) into four, independent, subsets:
- ppos which I earlier classed as having a recognized, or unrecognized, star which contaminates the photometry, spectra; called 'star'
- other ppos, but excluding those with 'wonky photometry'; called 'ppo !st !wp'
- objects with 'wonky photometry', excluding those in either of the above subsets; called 'wp !st'
- everything else; called 'clean'
This isn't all that clear; re-phrasing:
- there are three flags, 'star', 'ppo', and 'wp'
- 'star' is a proper subset of 'ppo', and one of the four subsets plotted; some 'star' are 'wp', but most are '!wp'
- the subset 'ppo !star !wp' is the objects with the 'ppo' flag set, but neither the 'star' nor the 'wp' one
- 'wp !star': objects with the 'wp' flag set, but not the 'star' one; can be ppo or !ppo
- 'clean' objects have neither the 'ppo' nor the 'wp' flag set; by definition, the 'star' flag is not set
Some examples:
- AGS0000153 is an 'unrecognized star' QS object, so a ppo; however it is not a wp one (all its photometry is good). 'star'
- AGS000004e is a 'recognized star' QS object, so a ppo; its i-band fiber magnitude error is 2.47, its z-band one 3.9, etc. Definitely a wp, but 'star' has precedence
- AGS00000b9 is a 'smashed galaxy' QS object, a ppo; while its u-band fiber magnitude is slightly brighter than its corresponding cModel one (the only such 'wonky photometry'), the difference is trivial (21.71 vs 21.72), so it's not a wp; 'ppo !st !wp'
- AGS000014z is a ppo ('diffspike'), but not a 'star'; it's a wp (the u-band fiber error is 9.091 mags, the z-band one 1.779). 'wp !st'
- AGS00002l5* is not a ppo, but a wp (fiber g and cModel z band errors >1; four others between 0.3 and 1; cModel g and z band mags fainter than corresponding fiber ones). 'wp !st'
- AGS00001qg is not a ppo (so also not a star), nor a wp. 'clean'
*the only QC example (but there are both QS and QC objects in all six classes)

Posted May 29, 2014 6:31 PM
by JeanTate in response to JeanTate's comment.

I'll reconcile these various missing/different objects with the "Not in DR9" ppos later.

There's just one "Not in DR9" QS object, and four QC ones. In the end, I managed to 'find' 1143 QS objects, and 1190 QC ones. The differences are thus six (QS, 1149-1143) and also six (QC 1196-1190).

Huh?

Well, one of the four QS objects is a ppo, in a different class: AGS00001rm is a 'not main'. And so is one (of six) QC ones: AGS00003xb is a 'smashed galaxy'. And yes, I did not 'find' the sole QS "Not in DR9", nor the four QC ones.

So what's up with AGS00001dx, AGS00001hw, AGS00001s9, and AGS00001xu (QS); and AGS0000474 (QC)?

I don't know. Two of the QS objects - and the one QC one - are not, in fact, in DR9 ( AGS00001hw and AGS00001xu; AGS0000474, respectively, so should definitely be added to the ppos); one is part of the same galaxy, but 'not main' ( AGS00001dx, ditto), and one is in DR9, but has "-9999" as the value for every (?) photometric parameter ( AGS00001s9, ditto).

Posted May 29, 2014 8:46 PM
by JeanTate

I'll crunch the numbers later, but I have no doubt that these two distributions are, statistically speaking, different.

Yep, for all four colors, the QS and QC 'clean' distributions compared have P = 0.000, for the D_n from the cumulative fractions (KS statistic).

To perform the calculations, I used the online form here (I don't know how to calculate P, given D_n, using just a spreadsheet; however, I was able to find D_n from first principles ... it was, um, tiresome). As the max number of values is 1024, I had to first find a way to randomly remove ~50 QS 'clean' and ~60 QC 'clean' objects, which I did (holler if you'd like to know how).

Are these differences statistically significant? And can the same patterns be seen in the cModel colors? Stay tuned! 😃

The one word answer to the first question is 'no'. 😮

In more detail:
- all four of the QC 'wp !st' distributions have P < 0.05 for the KS statistic, when compared with QC 'clean'; wonky photometry makes for ppos! For QC objects anyway
- for two colors the QC 'st' distributions are marginally statistically significant, when compared with the 'clean' one: fr-fi has P = 0.105, and fi-fz 0.050.
I haven't looked at cModel colors yet. Nor have I tried combining the different color distributions into a 3D or 4D form, and checking it to see if 'star's are distributed differently (statistically speaking).

Posted May 30, 2014 8:25 PM
by JeanTate in response to JeanTate's comment.

I haven't looked at cModel colors yet. Nor have I tried combining the different color distributions into a 3D or 4D form, and checking it to see if 'star's are distributed differently (statistically speaking).

I've now finished almost all these (a couple of loose ends to take care of); here's what I found:
- the cModel color distributions are very similar to the fiber color ones, in terms of what pairs have KS statistics that are significant
- in particular, all the 'clean' QS distributions are different from the corresponding 'clean' QC ones (P=0.000 for all four)
- also, all four cModel QC 'clean' vs 'wp !st' distribution pairs are different, per their KS statistics (though the ci-cz pair have P=0.078)
- of the 20 other pairs - QS/QC (2), four colors, 'clean' vs 'st'/'ppo !st !wp', so 2x4x2, plus four QS 'wp !st' - there are four for which P < 0.10; three of these are 'ci-cz' ones (of five such)
- I also compared the ~50 'extra' 'clean' QS objects with the ~1024 ones chosen (at random), and similarly the ~60 'extra' 'clean' QC ones; of the 16 pairs, just one has P < 0.10 (as you'd expect, right?)
- of the 14 3D and 4D combined color parameter distributions I checked, three have P < 0.10; the most interesting one has a P of 0.230 (more on this later)
- one of the QC 'clean' vs 'wp !st' pairs has a P of < 0.05; the only other with such a low P is one of the 'clean' QS vs 'clean' QC pairs (P=0.000).
In my next post I'll do a wrap-up and tidy the loose ends. The conclusion from my analyses seems to be that while 'stars', as a class, do not seem to have colors that are different from the 'clean' objects that they are otherwise drawn from, those QC objects with 'wonky photometry' do.

Posted May 31, 2014 5:01 PM
by JeanTate in response to JeanTate's comment.

Some loose ends.

Start with this:

I also compared the ~50 'extra' 'clean' QS objects with the ~1024 ones chosen (at random), and similarly the ~60 'extra' 'clean' QC ones; of the 16 pairs, just one has P < 0.10 (as you'd expect, right?)

Here are the 16 P values, from low to high: 0.083, 0.279, 0.363, 0.363, 0.386, 0.434, 0.471, 0.500, 0.702, 0.702, 0.779, 0.780, 0.840, 0.847, 0.881, 0.927. Nicely randomly distributed in [0,1], eh?

Other loose ends, as bullets:
- among the 86 KS tests I did, 13 gave P=0.000; if the P's are expected to be distributed randomly in [0,1]* there should be exactly zero of these. Nine of these 13 are QS-QC 'clean' pairs; the other four are QC-wp pairs
- Among the remaining 73, those with P<0.110 seem overrepresented; perhaps this points to some differences that would be real but for small numbers of objects in the relevant class? For example, there are 61 QC wp objects, but only 17 QS ones; if there were more QS wp objects...?
- the QS 'clean' distribution is the same as that of the QC one, in the 'fiber 3D color space' I used (P=0.230). Remarkable, and unexpected. Worth exploring further?
- I combined the QS and QC 'star' datasets, and compared the resulting distribution with both the QS and QC clean ones, in this fiber 3D color space. P fell, to 0.126 compared with QS 'clean', and 0.062 (QC 'clean'). This suggests that in a bigger universe with ~5k QS and QC objects, and the same fraction of 'stars', 'stars' would have statistically different distributions of colors
- combining QS and QC 'wp !st' produces a fiber 3D color distribution that is statistically different from both the QS 'clean' (P=0.097) and QC 'clean' (P=0.046) ones
- doing the same with the 'ppo !st !wp' objects reduces P, but leaves it well over 0.100
- bad photometry and bad spectroscopy are essentially independent: in QS and QC combined: there are 39 'bad spectroscopy' ppos, among them just three with 'wonky photometry', which is about the same percentage as 'wp' among the total population (7.7% vs 3.3%); for 'bad photometry' ppos the wp fraction is 22.6%
What's the deal with 'wonky photometry' objects? Why are there so many more QC ones (61) than QS (17)? Why does having wonky photometry mean the distributions of fiber (and cModel) colors are different from those of 'clean' objects (the two should be essentially independent)? I have looked into this, and it's going to take me well away from ppos, so I'll start a new thread on it.

Wrap-up next post.

*they're not, for several different reasons; however it's not a bad zero-th level approximation, is it?

Posted June 1, 2014 3:49 PM
by JeanTate in response to JeanTate's comment.

Wrap-up next post.

Here's something mlpeck wrote, on page 1 of the Clean "021020" galaxies: 11 April catalogs, comparisons, and discussion thread; it will serve as a good way to introduce my wrap-up:
I have at least three issues with this exercise:
- Missing data and outliers happen, and it is very much not standard practice to throw out entire objects in a multidimensional data set because some items are missing or suspect.
- Your methodology is subjective; it's not reproducible; it's not scaleable.
- As far as I can remember you haven't shown that any object is actually problematic.
I'm actually quite shocked that no practicing scientist has had a word to say about this, although sadly I'm also not at all surprised since there has been precious little communication from any of the "science team."
Take "this exercise" to refer to the methods I used to identify the 65 QS and 65 QC ppos, in this thread.

With the benefit of what's been posted here in the last couple of months or so, what might be the reasons for throwing out entire objects? Here are some, with reference to the class of ppo to which they refer:
- QS objects were selected by an analysis of SDSS spectra. We do not know how 'bad spectra' (however defined) might affect the selection of such objects; as far as we know, no analysis was done to estimate 'completeness and contamination' (etc). In the absence of such inputs, we can be conservative - e.g. throw out all objects with 'bad spectra' - or not (accept everything), or something in between. If we take the first approach, then V_DISP_ERR negative; and/or V_DISP unrealistic objects get thrown out, as do Large part of spectrum masked, or missing; other gross spectroscopic problem and Unreliable redshift.
- In any analyses involving BPT diagrams, we can throw out objects whose spectra are problematic - in an objective (etc) sense - with regard to one or more the four lines we use to construct such diagrams; i.e. One or more of the key BPT lines masked. Of course, for other analyses, these objects do not need to be thrown out; however, we should first agree whether we will use a "one strike and you're out" cut, or do analyses with different numbers of objects (or something in between). We have not had such a discussion.
- if the only data we had was from DR7, there would be no Not in DR9 class of ppo. If we accept that the processing done to produce DR9 included the removal of ppos (in general), then we either decide to use this as an objective (etc) cut or not; either way, we should be clear what we are doing.
- mlpeck's analysis (upthread) shows that overlapping stars* can and do contaminate the regions around the four lines used to make BPT diagrams; for those objects where this was shown - objectively - they get tossed (the rest lose their ppo hats, unless they are ppos for another reason).
- As described (upthread), the method used to select objects in remaining ppo classes^ is not subjective, reproducible, or scalable. If we decide those attributes are important, then all these objects lose their ppo hats.
- However, as I discovered quite recently, cutting on fiberMagErr and/or cModelMagErr - a quite objective, reproducible, and scalable method - puts the ppo hats back on: 22 ppos 'change hats' (and 63 previously 'clean' objects now become ppos).
I'll gladly provide lists, and data, upon request.

However, unless the "science team" returns, I'm done.

*'Star' contaminates spectrum, photometry, 'Unrecognized star' contaminates spectrum, photometry, and 'Stars', both recognized and not, contaminate spectrum, photometry

^Diffspiked, 'Smashed' galaxies, Galaxy overlap, Spectrum does not include nucleus, and Multi/other

Posted June 3, 2014 4:33 PM