Using Galaxy Zoo for Science

What is Galaxy Zoo?

Galaxy Zoo was a project designed to classify galaxies in the SDSS database based on their morphology. Because of the large number of galaxies involved, visual classification by a small number of scientists is impractical. Performing this task using an automated procedure tends to introduce biases that are difficult to understand. Instead, Galaxy Zoo used over 100,000 volunteers who logged onto the Galaxy Zoo website to view SDSS images of 893,212 objects and classify them. Most (738,175) of these galaxies come from the SDSS Main Galaxy Sample. The remaining 155,037 are galaxies with spectroscopic data in SDSS. Volunteers voted to place objects in one of six categories: elliptical galaxies, clockwise spiral galaxies, anticlockwise spiral galaxies, edge on spiral galaxies, stars/don't know, and mergers. The results are available in the data release for Galaxy Zoo 1.

The Galaxy Zoo 1 Data Release

Data from the project are available on the Galaxy Zoo data page. The data release is described in more detail in Lintott et al. 2011.

Greater/Clean/Superclean and Bias

When making a classification based on a fraction of votes, there is a trade-off between the number of unclassified galaxies and the number that are misclassified. The data release defines three criteria for classification to varying degrees of certainty, but notes that the best fraction to use depends on what the data are being used for. The "greater" criterion places every galaxy in the category in which it received the greatest number of votes. The "clean" sample is defined by requiring 80% of the votes to be in a single category and the "superclean" sample requires 95% of the votes. Galaxies that cannot be called either spirals or ellipticals according to a particular criterion are unclassified.

Due to the limitations of the SDSS images, faint, distant and/or small spiral galaxies often appear elliptical. This is accounted for in the Galaxy Zoo data release through the use of "debiased" vote fractions. It is assumed that the fractions of elliptical and spiral galaxies do not vary with distance within the SDSS sample. A correction is applied to distant galaxies so that the fractions for each size and luminosity bin match the local values. Because the object's redshift is used in the bias calculation, this procedure can only performed for objects with spectroscopic information in SDSS. No correction is applied to account for human error in identifying galaxy types.

DR1 Data Tables

Table 2 contains data for 667,945 galaxies. These galaxies all lie between redshift 0.001 and 0.25 and have spectroscopic information and u and r photometry in SDSS DR7. Galaxies that are extreme outliers in terms of absolute magnitude and/or size are excluded because biases cannot be reliably calculated for these objects. The table includes the following information:

OBJID - SDSS PhotoObjID for the galaxy
RA - right ascension (HH:MM:SS.ss)
DEC - declination (sDD:MM:SS.s)
NVOTE - total number of votes for this object
P_EL - fraction of votes for elliptical
P_CW - fraction of votes for clockwise spiral
P_ACW - fraction of votes for anticlockwise spiral
P_EDGE - fraction of votes for edge-on spiral
P_DK - fraction of votes for star/don't know
P_MG - fraction of votes for merger
P_CS - fraction of votes for combined spiral (CW+ACW+EDGE)
P_EL_DEBIASED - debiased fraction of votes for elliptical
P_CS_DEBIASED - debiased fraction of votes for combined spiral (CW+ACW+EDGE)
SPIRAL - flag: 1 if identified as spiral by the "clean" criterion using the debiased results, 0 otherwise
ELLIPTICAL - flag: 1 if identified as elliptical by the "clean" criterion using the debiased results, 0 otherwise
UNCERTAIN - flag: 1 if identified as uncertain by the "clean" criterion using the debiased results, 0 otherwise

The other tables are as follows (as described in Lintott et al. 2011):

Table 3 contains the same information for galaxies without spectroscopic data in DR7, minus the debiased results and flags since redshifts are not available.
Table 4 gives confidence measures for the classifications, such as the difference between raw and debiased vote fractions and the probability that an object was misclassified.
Table 5 is similar to table 3, but for a bias study that used mirror images to study systematic errors in human classifications.
Table 6 is similar to table 3, but for a bias study that used monochromatic images to study systematic errors in human classifications.
Table 7 includes vote fractions from the main study and the bias studies. Votes from the bias study are excluded in tables 2-4 above because the bias studies affected the behavior of the volunteers.