Using Galaxy Zoo for Science

What is Galaxy Zoo?

Galaxy Zoo was a project designed to classify galaxies in the SDSS database based on their morphology. Because of the large number of galaxies involved, visual classification by a small number of scientists is impractical. Performing this task using an automated procedure tends to introduce biases that are difficult to understand. Instead, Galaxy Zoo used over 100,000 volunteers who logged onto the Galaxy Zoo website to view SDSS images of 893,212 objects and classify them. Most (738,175) of these galaxies come from the SDSS Main Galaxy Sample. The remaining 155,037 are galaxies with spectroscopic data in SDSS. Volunteers voted to place objects in one of six categories: elliptical galaxies, clockwise spiral galaxies, anticlockwise spiral galaxies, edge on spiral galaxies, stars/don't know, and mergers. The results are available in the data release for Galaxy Zoo 1.

The Galaxy Zoo 1 Data Release

Data from the project are available on the Galaxy Zoo data page. The data release is described in more detail in Lintott et al. 2011.

Greater/Clean/Superclean and Bias

When making a classification based on a fraction of votes, there is a trade-off between the number of unclassified galaxies and the number that are misclassified. The data release defines three criteria for classification to varying degrees of certainty, but notes that the best fraction to use depends on what the data are being used for. The "greater" criterion places every galaxy in the category in which it received the greatest number of votes. The "clean" sample is defined by requiring 80% of the votes to be in a single category and the "superclean" sample requires 95% of the votes. Galaxies that cannot be called either spirals or ellipticals according to a particular criterion are unclassified.

Due to the limitations of the SDSS images, faint, distant and/or small spiral galaxies often appear elliptical. This is accounted for in the Galaxy Zoo data release through the use of "debiased" vote fractions. It is assumed that the fractions of elliptical and spiral galaxies do not vary with distance within the SDSS sample. A correction is applied to distant galaxies so that the fractions for each size and luminosity bin match the local values. Because the object's redshift is used in the bias calculation, this procedure can only performed for objects with spectroscopic information in SDSS. No correction is applied to account for human error in identifying galaxy types.

DR1 Data Tables

Table 2 contains data for 667,945 galaxies. These galaxies all lie between redshift 0.001 and 0.25 and have spectroscopic information and u and r photometry in SDSS DR7. Galaxies that are extreme outliers in terms of absolute magnitude and/or size are excluded because biases cannot be reliably calculated for these objects. The table includes the following information: The other tables are as follows (as described in Lintott et al. 2011):