Duplicates
-
by JeanTate
DEFINITION: The same object (usually galaxy) which appears more than once in the combined QS+QC database.
What good are duplicates?
- They give a quasi-independent estimate of classification variation.
- other benefits?
Why are duplicates bad news?
- {Insert reasons here}
This thread is devoted to listing them, by "uid" (a.k.a. AGS ID), and noting which database they're in (either QS or QC).
How many are there? I have a list of ~50, so far.
Posted
-
by JeanTate
QS: AGS00000hz (DR7 ObjId 587727178451320944)
QC: AGS00002i4 (same)
Posted
-
by JeanTate
QS: AGS00000j6 (DR7 ObjId 587731514231619685)
QS: AGS0000080 (587731514231619686):
Posted
-
by JeanTate
QC: AGS00003xd (587736545780629664)
QS: AGS00002aw (same):
Posted
-
by JeanTate
QC: AGS00002q4 (587734690889335050)
QC: AGS00003na (588023046935085316):
Posted
-
by JeanTate
QC: AGS00002kt (588017729229488185)
QC: AGS000034b (587739405719437440):
Posted
-
by JeanTate
QS: AGS00000yl (587732771574055241)
QC: AGS00004ho (same):
Posted
-
by JeanTate
QC: AGS00004gs (588017991234879703)
QS: AGS00001k4 (same):
Posted
-
by JeanTate
QS: AGS00001hg (587735347477741856)
QC: AGS00002w0 (same):
Posted
-
by JeanTate
QC: AGS00003ll (587736477056434335)
QS: AGS00001k6 (same):
Posted
-
by JeanTate
QC: AGS00003ep (588017625609404654)
QC: AGS00002x2 (587744874785079427):
Posted
-
by JeanTate
QS: AGS00001je (588017991226032299)
QC: AGS00004c3 (same):
Posted
-
by JeanTate
QC: AGS00003tu (587742061047775665)
QS: AGS000020x (same):
Posted
-
by JeanTate
QC: AGS00003wr (587744873716384075)
QS: AGS0000215 (same):
Posted
-
by JeanTate
QC: AGS00003fg (587726033847320693)
QC: AGS000032d (587742590404067614):
Posted
-
by JeanTate
QS: AGS00000kg (587727220869366024)
QC: AGS00003tr (same):
Posted
-
by JeanTate
QC: AGS00003xw (587742576459251895)
QS: AGS0000294 (same):
Posted
-
by JeanTate
QC: AGS00003h9 (587742773489107099)
QS: AGS000025r (same):
Posted
-
by JeanTate
QS: AGS000020j (587742013271572599)
QC: AGS00002p1 (same):
Posted
-
by JeanTate
QC: AGS00003ts (587741602571747497)
QC: AGS00003cf (587742572148818133):
Posted
-
by JeanTate
QC: AGS00003pn (588023721246392440)
QC: AGS000040l (588017726556471481):
Posted
-
by JeanTate
QS: AGS00001nc (587739377231855822)
QC: AGS00004c5 (same):
Posted
-
by JeanTate
QS: AGS00001v8 (587739829281816750)
QC: AGS00003is (same):
Posted
-
by JeanTate
QS: AGS000026z (587742013285007551)
QC: AGS00003w2 (same):
Posted
-
by JeanTate
QS: AGS00001u8 (587739809953284249)
QC: AGS00004cv (same):
Posted
-
by JeanTate
QS: AGS0000236 (587742062143144019)
QC: AGS00002z3 (same):
Posted
-
by JeanTate
QS: AGS00001ci (587736942531510660)
QC: AGS00003tg (same):
Posted
-
by JeanTate
QS: AGS00001dp (587733399190307310)
QC: AGS00002xx (same):
Posted
-
by JeanTate
QS: AGS0000233 (587741725502275691)
QC: AGS00003dj (same):
Posted
-
by JeanTate
QS: AGS000027k (587741727114395852)
QC: AGS00002wy (same):
Posted
-
by JeanTate
QS: AGS000019f (587736919433281840)
QC: AGS00003c2 (same):
Posted
-
by JeanTate
QC: AGS00002j1 (587739720293023848)
QC: AGS00003fb (587732578299281748):
Posted
-
by JeanTate
QS: AGS000012x (587733398649503982)
QC: AGS00002qc (same):
Posted
-
by JeanTate
QC: AGS000046u (587739296163037307)
QC: AGS00003sp (587739507162480738):
Posted
-
by JeanTate
QS: AGS000010e (587735241176514586)
QC: AGS00003b8 (same):
Posted
-
by JeanTate
QC: AGS00002mx (588017605238521897)
QC: AGS00002sy (587731511538745451):
Posted
-
by JeanTate
QS: AGS00001sa (587739303139475618)
QC: AGS00002v5 (same):
Posted
-
by JeanTate
QS: AGS00001qr (587739407321071728)
QC: AGS0000449 (same):
Posted
-
by JeanTate
QS: AGS000013g (588017625631752405)
QC: AGS00003e3 (same):
Posted
-
by JeanTate
QC: AGS00002bi (587732484345036995)
QC: AGS000039z (587730846890787554):
Posted
-
by JeanTate
QC: AGS00002l2 (588017720649384015)
QC: AGS00003oa (587733609627254987):
Posted
-
by JeanTate
QC: AGS00002og (588298663584596167)
QC: AGS00002v3 (588009372832301090):
Posted
-
by JeanTate
QC: AGS00004cr (587731868021686336)
QC: AGS00004lx (587724197205835909):
Posted
-
by JeanTate
QS: AGS000009s (588007004165701773)
QC: AGS00004mc (same):
Posted
-
by jules moderator
Nice work Jean. Surely this is a priority. We need clean datasets to work with before touching Tools. Maybe the team has a clever way of identifying and removing them?
Posted
-
by JeanTate in response to jules's comment.
Thanks jules.
I think I've identified them all - there are 43 pairs.
So now, how to decide which to remove?
QS-QC pairs are easy: remove the QC object
QC-QC ones are more difficult: pick one at random? Or analyze them, to see if there's a systematic effect?
Anyway, removing can only be done by the Science Team (except in your own copy) ...
Posted
-
by JeanTate
I think the ST may have tried to identify duplicates before they made the final selection. If so, they may have used a method similar to that Land et al. (2008) - the 'spin bias' paper - namely, "We use the maximum PETRORAD_R of a pair to determine if their OBJIDs actually point to the same object". However, this method fails when the measure of size - petro_R50 in our case - is wrong (too low), as it is in many of the objects in the Outliers - collect them here! thread.
The method I used is more robust, in terms of identifying duplicates, although it certainly involves more manual effort.
Posted
-
by mlpeck in response to JeanTate's comment.
I haven't looked at the duplicates, but one way you might get them is that some objects were observed more than once and therefore have multiple spectra available. There is supposed to be exactly one "science primary" spectrum for any given object. If that is the cause of some or all of the duplications the obvious choice is to pick the primary one. There is a bitfield for that in the primary HDU (I think) in the spectrum fits files.
Posted
-
by JeanTate
One thing which these duplicates can tell us is how reliable the various parameter estimates are.
In the case of the 13 QC-QC duplicates:
- all the Log_mass (pairs of) values are the same
- ditto the redshifts
- the u-band mag differences range from 0.03 to 5.09 (!)
- the minimum model mag differences are all ~<0.03 (except for the g-band, 0.11)
- the max model mag differences decrease monotonically with wavelength: 5.09 -> 1.4 -> 1.18 -> 1.16 -> 0.78
- there is a correlation between the absolute mag diffs and the Petro_R50 diffs, in the direction you'd expect (bigger Petro_R50, brighter mag), R2 = 0.56
HOWEVER, three (of only 13!) duplicate pairs have the model mags increasing (i.e. getting fainter) with increasing Petro_R50!! 😮
Sure, 13 isn't such a big sample, and 3/13 is rather underwhelming (and for one of these three the differences are pretty marginal) ... I wonder what a detailed look at the (considerably bigger) QS-QC duplicates might show?
Posted
-
by JeanTate in response to JeanTate's comment.
I started to do some more digging, and came up with Oh dear!
Posted
-
by lpspieler moderator
Hmm when choosing the QS and QC according to some general properties it is of course possible that some galaxies fulfill both sets of constraints. I don't know if this alone would spoil the statistics and stochastics.
Things get weird though when QS and QC are not totally independent otherwise. And indeed they aren't: According to Laura QC is designed to provide a partner with similar z and mass for every galaxy in QS. Hope it didn't happen that some galaxies served as partners for themselves?
Posted