Novice Enum Use In Java

An enum in Java is a limited set of constant values. For example:

enum Direction { LEFT, RIGHT, UP, DOWN }

Each variable of type Direction can take on only one of those four values. (Or null, alas.)

I recently got the chance to talk to Andy Stefik, who does really interesting work on evidence-based language design in Quorum (check out this talk of his for more info). He is interested in adding enums to Quorum, but lamented that there is little data about enum use in other languages. He knew of our Blackbox data set (collected from users of the BlueJ beginners’ Java IDE), and asked about enum use in Java in the educational wild. So: here it is. All data is from the beginning of the project in mid-2013 up until the end of February 2016, a few days ago.

Enum Use in Java

On its initial release in 1996, the Java programming language had no support for enums. They were added in Java 5 in 2004 (although of course adoption of new language versions is never instant). This matters when looking at enum use in Java, because although they have been in Java for 12 years, several software projects started before they existed, many course instructors may have trained before enum use was widespread, and so on. So we will probably see less enum use in Java than if they had been present from the outset.

In the data we have 11,666,331 source files which have been successfully compiled at least once. I looked at the most recent successful compilation of each of those source files, to see if they contained an enum. 20,333 (0.2%) of those source files contained an enum (either as top-level declaration, or an inner type), with 22,693 enums overall.

It’s hard to decide what value we would have expected there. Obviously not every type is going to be an enum. If we’d found, say, 10% of all types were enums then that would be weirdly high. I did a quick count on our own BlueJ/Greenfoot code base, which began life before Java 5. We have only used Java 5 features since 2008, so would not have used enums before then. Across 2129 source files we have 70 enums in 61 files, so about 2.9% of our source files have an enum. (I tried searching github to get some data, but a lot of the enum results seemed to be IDE test-cases. Interesting!)

If anything I would expect that enum use should be higher in teaching than professional code, because I expect that educators will deliberately show enums in order to teach the concept, even though they are not used with particularly high frequency in full programs. So I am surprised to see enums used quite so infrequently in Blackbox.

Size of enums

I made a prediction about enums to Andy: that if we plotted the frequency of number of enum values (e.g. four for my direction example at the top of the post) there would be a spike at seven because many examples would be very artificial and use day of the week. I think this is just about confirmed, although not as pronounced as I had expected:

enum-sizes

Looks like the most common number of items in an enum is 3 or 4. The data tails off after 10 as you would expect (but omitted from graph above).

Enum Features

Something else that is of interest about enums is how often sub-features of enums are used in Java. I showed above the use of a simple plain enum, but there are several other allowable features:

enum Color
{
    RED(255,0,0), // Constructor arguments
    GRAY(128,128,128)  // Individual body:
    {  
        public String toString() { return "gray/grey"; }
    },
    BLACK(0,0,0);

    // Group body:
    int red, green, bluej;
    private Color(int red, int green, int blue)
    {
        this.red=red;this.green=green;this.blue=blue;
    }
}

So: how many of the enums in Blackbox use each of these features? Of the 22,693 enums, 4,597 (20.3%) have a group body, i.e. fields, methods or constructors as well as just the item list. 3,264 (14.4%) have items which use constructors with arguments. Just 190 (0.8%) have an individual body for any of the items.

Names of Enums

A final item of interest is the most popular names for enums, which gives hints as to what they are being used for. Here’s a graph of the top ones. The y-axis is purely cosmetic to separate out items with similar frequencies, and I only show frequency 900 upwards because it gets congested below that:

enum-popular

The left/right/up/down directions are the most frequent use, colours are another use, and omitted from this graph are monday/tuesday/etc on about 700. This isn’t relevant to enums specifically, but I was interested to see the grad/rad/deg/degmin/degminsec pattern. This is not some extra-common pattern you’ve never heard of, but I believe is instead an artifact of the one of the MOOCs which have taken place using BlueJ. If one of their examples uses a particular enum, this gets multiplied up by the number of users, and thus shows up prominently in our data — this probably also explains “border_collie” and “pitbull” at around frequency 600. It’s something we need to be mindful of when analysing the Blackbox data; not all observations in the data set are as independent as one might think.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s