Galaxy Zoo Starburst Talk

R code

  • mlpeck by mlpeck

    I'm going to try to make as much as possible of the R code that I've used for this project publicly available. These will be functions that I've written for repetitive tasks that I want to be reproducible, not stuff I've done interactively.

    I haven't yet added lines indicating authorship or license. Consider them licensed under the most permissive terms you can find. I'm OK with the wtfpl license myself.

    Here's the first tranche. More may follow. All of these may be modified at any time and I'll try to keep the dropbox copies in sync with my local ones.

    • index_id -
      purpose is to index one table against another on a common id
    • index_pos.r - index by position match on ra, dec supplied in decimal degrees. Note: the default position tolerance of 1" may be too tight.
    • jdplot -
      scatterplot with contours to indicate density of points. Axis limits are adjustable and can be reversed (for color-magnitude plots for example). Also, some pretty rainbow palettes.
    • median.i - functions for bootstrapping. Requires the package boot, which is part of the standard R distribution.
    • cosmo - cosmological distance calculator. Also luminosities in various forms.

    As mentioned below I'm going to try maintaining my code with a real version control system and to get started I've created a repository on github for R software I either wrote for this project or that was inspired by it. Right now there are routines for cross-indexing data tables in various ways, 3 different routines for spectroscopic classification, several functions for nonparametric bootstrapping, and the quench/control matching algorithm discussed in more detail below.

    The URL for the repository is https://github.com/mlpeck/gzquench_code. The software files on dropbox will probably go away since it is much easier to keep files in sync using git.

    I'm planning to turn the cosmological distance calculator mentioned in the struck out portion of this post into an R package. That will be ready in the near future, I hope.

    Posted

  • mlpeck by mlpeck

    This is the function I wrote to match quench control objects to quench sample counterparts by solving a variant of the assignment problem. The R code is at https://www.dropbox.com/s/sq2pstgmdfz2grv/qcmatch.r https://github.com/mlpeck/gzquench_code/blob/master/qcmatch.r.

    Also required is the R package lppuw, which I authored and maintain. The Windows binary is at http://wildlife-pix.com/rpackages/lppuw_1.0.2.zip. Source is at http://wildlife-pix.com/rpackages/lppuw_1.0.2.tar.gz.

    The Windows binary should work out of the box. If you install from source you will also need the open source linear program solver lpsolve. That can be downloaded at http://sourceforge.net/projects/lpsolve/files/lpsolve/5.5.2.0/. You probably want to build from source. The library file liblpsolve55.a should be copied to a location where your C compiler can find it.

    Note added 13 May 2014: I may at some point in the near future move all of my R software to github. I will update the address if that happens.

    Posted

  • mlpeck by mlpeck

    I've decided to try using a real version control system to help maintain my software, and as a first step in doing so I've put R software written mostly for this project into a git repository, with a public version on github at https://github.com/mlpeck/gzquench_code.

    Open source software advocates generally recommend explicitly licensing all software even -- perhaps especially -- if you don't care what people do with it. I've decided to use the MIT license because the license text is short and the terms are about as permissive as possible.

    Posted

  • JeanTate by JeanTate in response to mlpeck's comment.

    Thanks for this! 😃

    A chance to learn about version control, and a fair bit more too.

    Hope you don't mind, but I've got a few questions ...

    • why git (and not some other version control system; there are plenty
      out there)?
    • What's "Blame" (when you click on one of the documents/files, one of the options - along with Open, Raw, and History)?
    • If I understand correctly, most of the 11 files/documents are R code, which can be copy/pasted (or downloaded), and should run (assuming appropriate environments, etc); is that right?

    Posted

  • mlpeck by mlpeck in response to JeanTate's comment.

    why git

    I've never used a version control system, so I'm not invested in any. Basically git caught my eye because github is popular with Python developers and also with GZ scientists and technical people. Also, git is installed on my Linux machine. As is svn and probably more that I don't know about.

    What's "Blame"

    According to the man page "git blame" shows the revision and author of each line in a file.

    and should run (assuming appropriate environments, etc); is that
    right?

    Well I hope so. Most of these routines were intended either to perform a line by line match of data sets I've downloaded from SDSS to the ones we were provided initially through Zooniverse tools or to work directly on those downloaded data sets. Right now the only file that might be useful outside the context of this project is the one containing functions for bootstrapping.

    Posted

  • mlpeck by mlpeck

    As promised I created an R package for my little cosmological distance calculator. It is hosted at https://github.com/mlpeck/cosmo. The Windows binary can be downloaded from https://github.com/mlpeck/cosmo/releases/download/v0.1.1/cosmo_0.1.0.zip and the source in tar.gz or .zip form can also be downloaded by clicking on the "releases" tab from the main project page.

    The main function is "dcos", which calculates the distances given in Hogg's cookbook paper on arxiv.org: Hogg, D.W., 2000, "Distance measures in cosmology".

    Posted