How to read code – a primer

I like programming – it’s what I do and I am blessed in that I get to spend most of my waking hours developing software. Like a lot of programmers I obsess over how good my code is and how I can get better at it.

Over the years there have been reading a lot of articles and books on software development. There has been a lot of ink spent (both physical and virtual) on ways to improve your “programming foo” and become a super ninja programmer ! There are some common pearls in all this ink and one of them is the advice on reading code. This advice, is usually a one liner couched in the midst of a bunch of other recommendations and usually along the lines of – find some great open source software or any piece of software that you admire, open up the source code (or print it out) and read it. While, this is on the whole, great advice there are some problems with actually putting it in practice.  In this post I endeavor to give some practical suggestions on reading code, but first let us enumerate the problems.

  • The usual impression conveyed (in the posts that advise one to read code) is that the dispenser of the advice is a programming guru who can literally sit back in their chair with a page of code and read it like a novel. Well, I am sure there are some superb programmers out there who enjoy looking at pages of cryptic English-like statements over a cup of coffee and can hold entire class hierarchies and architectures in their heads. This post is not meant for them – this post is for poor slobs like me who find staring at reams of code a boring, frustrating and ultimately pointless exercise. Of course, it can be argued that one can learn simply by reading a single class or even a function of a the entire project code, but, IMO, except for the most simple problems, most software is interdependent. It is often impossible to appreciate the design decisions and the rationale behind a particular function or class layout without knowing the rest of the system…
  • The next problem is getting code to read (actually before that you need to be able to identify code worth reading – check out this post for details on that). There is a lot of great software out there – both open source and freely available and licensed or proprietary. There are huge open source directories like Sourceforge and Google Code, and huge pieces of software like Open Office and Linux. If you are working in a software development company, you can probably get access to the proprietary code in your source control repository. A third common avenue are the programs distributed along with books on software development  or as part of resources for  education( Minix being the canonical example). Indeed we are actually spoiled for choice and from this universe of software identifying the ones that are good candidates for our purpose is a hard but essential task.
  • Another problem is the language in which the program is written – reading someone else’s code is tough enough as it is, adding the burden of familiarizing yourself with the quirks and syntax of a new language while doing this, is, IMO, a recipe for disaster and immense frustration . You need to find code written in a language that you are familiar with. This particular problem is not relevant if you are going through the code distributed as part of a book or as an educational resource, since you would have the book or your mentor to explain things and set out the context. If, despite this forewarning, you are planning to read code written in a different language than you are used to (without the benefit of having a book or a mentor) I would advise, that you at least learn enough of the language to create your own programs  in it (“Hello World” does not count :-)) .
  • The bit about context brings me to the next problem – figuring out what the code is doing is a lot harder if you are not familiar with the software itself. For example, it is far more difficult to go through the Linux code and figure out the concept of runlevels if you don’t use Linux daily and see the Linux boot sequence. Using the software gives one a context with which to read the code – this context includes the common terminology used, the functionality and features of the software, even the quirks and bugs that you experience.

I have realized that for me ‘reading code’ does not really describe the activities that I undertake – a better phrase for what I do is ‘code comprehension’. It is quite difficult for me to sit back with a laptop screen (or a printout) full of code and simply read through it. I need a lot more than simply a piece of code – I like to be able to look at documentation, play with the software, step through the code and even write tests for it before I really appreciate it. This is a significant investment of my time and effort, so I have to be very picky about the software I want to “read” (comprehend).

  • The first filter I place on the code directory, when looking for code  is the language filter – for me this means – C# or VB.NET or Python or Javascript(while I am familiar with C++, Ruby and F# as well I do not consider myself at a level where I can understand other people’s code in them). Next is to look for software that I have used – this allows me the a bit of a leg up since I know what the code is meant to do, cannot do and (if I am familiar enough) its limitations. Good candidates are open source software that you use in your day job (for eg. I use Cruise Control.NET, NANT and NUnit which are open source tools written in C#)
  • I happen to work in a software product company (a Microsoft shop), so one of the candidates for my reading list is the code in my companies source repository. If you happen to work in a software company, you can look at other projects, and even older versions of the software you are working on. In addition to providing insight on code, you get a pretty good idea of what was tried before and since. There are a few caveats though –
    • First, if you don’t have direct access to other projects, you need to ask permission – some companies are very touchy about their “intellectual property”.
    • Second, the quality of the software may not be as high as you think, since, in general, proprietary code does not get the kind of scrutiny open source code does. Warning signs to look out for are a lack of regular code reviews – if the software is not code reviewed the odds are that it would not be of good quality.
    • Third (this point is inspired from feedback provided by my friend Praseed), if the code in your company is business software (HR, Finance, ERP, etc) there is a lot of business context that needs to be understood first. Also, since most of this code tends to be factored by business functionality, it generally seems less modular than utility code or APIs.
  • Look for well documented projects (this applies to open source as well as proprietary code).  By this I mean, that the documentation should highlight the overall design, and rationale for the way the code is. Simply having auto-generated Java Doc type documents cannot be considered documentation :-). One useful avenue to explore is software created as educational resources (like Minix ). Since, the target is to teach through the software, they are usually quite clearly documented and have plenty of material explaining the design rationale behind the code.

So, you have identified the software and downloaded the source code and documentation, so let’s get down and start spelunking ;-)

  • Go through the design documentation and try to get a feel for the way the code has been built. Good software projects follow certain architectural patterns – these dictate the code organization. Once you get a handle on this, understanding the code becomes a whole lot easier. If you can create a class diagram of the code you can get a good idea of the layout.
  • The next thing to do is to compile it and run it. This can be straightforward or tough depending on the process followed in the project and it’s documentation.
  • Now it’s time to fire up your favorite IDE and go exploring. A good place to start your code exploration  would be to try to trace a functionality of the project that you are familiar with. This would let you go through the various layers and sub-systems and get a handle on how they inter-connect. For example when I was exploring NUnit – I started by writing a test and looking at the code classes I needed to do that.
  • Try and identify the design patterns used in the code. If you do not know what design patterns are, then you need to stop reading this post right now and read this book. Familiarize yourself with design patterns – they form a great way to recognize and understand the design of well written code. This makes it easier to keep it in your head while reading code. It also helps you identify nuances and customizations made by the programmers more easily.
  • Try to write tests for the code to fully understand it – this is really useful way to understand the dependencies between different parts of the code. When you try to write a test for the code you first need to satisfy (mock) all its dependencies. Next you need to understand the possible entry points as well as the exit values for the code. This improves your understanding of the code and get you to the next level.
  • Finally, try to refactor the code. In this step you have moved from simply understanding the code to becoming familiar enough to be able to modify it. As the sophistication of your refactoring increases so too does your understanding. At this point you can if needed contribute your own code to the project :-)
“Code Reading” IMO is more than just reading – it is a distinct set of activities that together help one understand code. It might seem more intimidating than simply “reading code” but it is well worth then effort IMO.
Happy “code reading” :-)
Update: I came across this post by Joel Spolsky where he quotes Seth Gordon as saying code reading “Is just like reading the Talmud”… Yup, code reading is definitely not easy.

15 comments on “How to read code – a primer

  1. I can’t remember the last time I enjoyed an article as much as this one. You have gone beyond my expectations on this topic and I agree with your points. You’ve done well with this.

  2. Thanks for the information regarding this. You really helped me understand some things I did not before.

  3. really like the post that you wrote actually. it just isn’t that simple to discover even remotely good stuff toactually read (you know.. READ and not just going through it like some uniterested and flesh eating zombie before going somewhere else), so cheers man for not wasting my time! :)

  4. Pretty good write-up. I just found your blog and wanted to say I’ve genuinely loved reading your opinions. By any indicates I’ll be subscribing for all feed and I actually hope you submit yet again soon….

  5. When I stumble upon a great blog post I do one of three thing:1.Share it with all the close friends.2.Bookmark it in all my common bookmarking sites.3.Be sure to come back to the same blog where I first read the article.After reading this article I’m really thinking of doing all three…

  6. I mistyped this website and luckily I found it again. presently am at my university I added this to favorites so that I can re-read it later regards

  7. Pingback: lessons from “Coders at work” ~ numerodix blog

  8. Hi There! Very nice post on a very interesting subject. Reading code is challenging. In my opinion the most important thing is to understand what is going on in terms of abstraction.

    First of all, you have to know what the piece of code you are reading does. The better way to do that is to understand the domain of the application. For example, if you are working on a financial application and you don’t know what an Options contract is, you will have a hard time even understanding what the application is supposed to do.

    Second, you have to get the idea behind the implementation: how the code achieves the goal? In order to do that I think there are two approaches.

    Firstly, if you are very experienced, chances are that you will already have seen some similar way of solving the given problem before and therefore you will pick it up real quick.

    The other approach is to find out what is the flow and what are the classes involved. If you are a UML guy we would draw sequence and class diagrams. The way I do that is to find the event that triggers the code (thanks to swingspy I have been able to do that in swing apps), put a breakpoint there and start debugging. During the process, keep writing the flow and the classes using pencil and paper. Once you are done, start analyzing your notes and draw a nice diagram. After that I usually have a rough understanding of what is going on and who is doing what and when.

    I think it’s very useful sometimes because in OO systems there is a tendency of going back and forth between super classes , subclasses, interface implementations etc. It is very easy to get lost or confused.

    And finally and most important, you have to be fluent in the vocabulary. In the vocabulary I mean the elements, algorithms and data structures used in the language. For example, it would be very difficult for a person to read code that uses hash tables to solve a problem if this person does not know what it is and how it works.

    more on that here: http://rcforte.wordpress.com/2010/07/10/my-2-cents-on-reading-code/

  9. Pingback: My 2 cents on reading code « S.I.M.P.L.E

  10. Nice post…
    Coincidentally, we (Shine,Viby and myself ) had a session on diagrams and design patterns last week, Shine covered lot of your points…

  11. Pingback: Tweets that mention How to read code – a primer « Technikhil Writing -- Topsy.com

  12. Excellent article :)

    I recently had to go through the core code of the new project i was assigned to. At first everything was a big mess. I figured out that the project used things like Spring and Apache XML Beans. So I read documentations from bothprojects ( Spring documentation is very educating ) . Even then the code remained a mystery because i didn’t know how a Java EE application operated. So I had to find out that . A colleague spent half an hour explaining what was going on with J2EE. And then the design document ( just one diagram, really ) made sense. From then on, reading code was a matter of drilling down to the implementation of concepts. Design patterns were there, but like good implementations, it was mostly solving problems rather than trying to implement particular patterns ( i.e they solved the problem and it happened to be some design pattern ).

    Of all this, the most important document was the little diagram in which the architecture of the application was explained. Without it, I would have taken a lot of time to figure out what was going on. And what many opensource projects lack is this – a proper design/architecture document. I think _that_ is the most important bit of documentation you have to look for when reading code.

    • I have found creating a good class diagram helps a lot. One of the nice things about modern IDE’s is that the good ones have ways to auto-generate them out of the code (at least Eclipse and VS.NET 2010 does). After all – A picture is worth a thousand words :-)
      As regards the use of design patterns – your experience is why I encourage looking for them – once you can recognize the design pattern and the problem, you tend to make that association in your mind. This is useful for the next time you encounter a similar problem :-)

Comments are closed.