There are lots of reasons to contribute to open source software projects, the simplest, probably to fix a bug, but more importantly it’s a good way to get exposed to other coding practices and learn things you wouldn’t see just scripting alone on your own pet projects. Here I’m going to go over my first submission to an open source software project and what I learned from it. As a beginner, the easiest way is to use a package, and if you encounter a simple bug, try to fix it. At least, that’s how I went about it.
Recently, I’ve been using NLTK (NLTK 3) a lot at work for my research on ngrams. I’ve been wanting to contribute to an open source project for a while now, and when I eventually found a (small) bug in NLTK, I wanted to try contributing a fix for it. It went embarrassingly poorly (but with a happy ending), so I thought I’d document some things to do, and not do, and what I learned.
If you want to see some hilariously bad coding, you can take a look at my first ever pull request to an OSS project. I made tons of really dumb mistakes, but I learned a lot in the process. If you’re a total newbie like me, this might be useful for you.
What I did Wrong
I made a lot of poor decisions during this entire affair. However, I am a firm believer in mistakes being something to learn from.
My process went something like this:
- Locate the bug (setting the JAVAHOME or JAVA_HOME env vars doesn’t help NLTK locate Java on Windows)
- Post a question to a related NLTK mailing list question about my bug
- No answer; send an email with the fixed source file to the dev mailing list
- Get reply about github repo; submit a pull request there from my local copy of NLTK
- Get told that I had tried to overwrite the NLTK 3 repo with NLTK 2
1 Find the Dev Version
The very first thing you should do is find the development version of the software. In the case of NLTK, the NLTK website has no link to that version, and Googling only turned up their mailing list. It is quite possible, or likely even if the software has long update cycles, that your bug is already fixed in the dev version.
This brings me to another point. You should probably be using the dev version. They usually have many more features and fewer bugs. This of course depends on the software.
If you can’t find a link to the dev project home, send someone an email asking about how to contribute. This was the first and biggest mistake I made, and I feel dumb for it. Once you do find the home for the project’s development version, get that version. My second mistake was that I didn’t download the dev version and look for the error there. I naively assumed that what you download from NLTK.org is the most recent version. Of course it isn’t.
2 Find the Ticket
If the system they use has ticketing, search for a ticket for the issue. You may find it as resolved already, or not input yet. This is a good chance to start discussion of the bug. If there is no ticket, submit one, and start talking about possible fixes. This will help to avoid problems with making poorly informed coding decisions later on, and can save you a lot of trouble.
This is a good chance to talk to the project manager(s) and to find out who wrote the code originally. You might need to check with them. Why is that weird-looking uncomment block of code on line 233 there? Find out who wrote it and ask them! Maybe there is a good reason for it, and they just forgot to add a comment.
3 Find How to Contribute
NLTK uses github, so contributing is easy once you’ve figured out forks and pulls. Make sure you find out what the standard procedure is for your project. Do they have svn, mercurial, or git, or something else that you’ve never heard of? They almost surely have a standard procedure, so find out what it is, in case it affects the way you need to fix the bug.
In my case github meant I just needed to fork their repo, and pull it onto my machine, then I could start fixing the bug.
Once you (a) have the dev version of the software, (b) know the bug exists in the dev software, and (c) know how to contribute a fix, then it’s time to…
4 Fix the Bug
This sounds easy but I still got it wrong. Well, sort of. Here is what I learned:
- Don’t make assumptions
- Don’t break backwards compatibility without a discussion with the project manager(s)
- Update documentation
I made a bunch of incorrect assumptions about what the method was meant to do. I assumed that weird uncommented code was probably wrong, when it did indeed serve a purpose (albeit a weird one). This lead me to make a bunch of changes that broke a lot of backward compatibility, and when review time came, I had to go back and undo all those changes, causing myself a lot of extra work. This is closely related to “2 Find the Ticket” above, have a dialogue, it will save you time.
This brings us to testing. First, make sure the code works on your machine. Second if they have a test suite run the tests against your new code. Third, recall that there are many other users with different systems. Get someone else to check that your changes do not break anything on their systems. In my case there was some worry that it would break on UNIX, so I made sure to test it against my work computer (Windows) and my home Mac.
You should be updating the documention as you edit. You need to:
- Write readable code (if you’re not sure what this means, Google for coding standards for your language)
- Include line comments as you write code, especially if the code is complex, obscure, etc.
- Python has docstrings. You need to update them for the method(s) you edit and you may need to update them for the module as well
- You might need to check if there are other kinds of documentation that will need to be updated as well
5 Review, Review, Review
Finally, review your code, have other people review it. Hopefully they are using a version tracking system like git(hub) that will make it easy for people to see, comment on, and test out your code. If you’re like me, there’re probably tons of mistakes you didn’t see, because you’re not experienced. Take on the advice of other members of the project, they know the code base and have experience, and at the end of they day, they will decide whether or not to merge your changes. With luck if you followed step 3 and discussed with project members in advance you will have avoided most possible problems.
Things I took away from this experience were:
- Find out the correct procedures and dev source code
- Talk to people in the project about the bug
- Talk to people about the code if it looks weird, find the original/previous author if possible
- Most of all, take your time and think carefully about what you’re doing