Open source writing using Git

Of late I have noticed a trend – writers, especially technical writers, are embracing the software development process and incorporating it into their writing process. There’s a great podcast about this on Herding Code where some technical writers talk about this trend.

Writers are embracing the agile philosophy of software development, one of the core tenets of this is to publish often and get feedback so you can improve/adjust. Authors are posting the content for their book projects online to get feedback – to the extent that publishers like Manning and O’Reilly offer programs that let people access to the book content as soon as the author creates it, for a discounted amount. The pioneers of this practice were the Pragmatic Programmers who offer “beta” versions of their books to customers at discounted prices. The idea behind this of course is that it lets authors get valuable feedback as to what is important, what to focus on. It also provides, authors with an incentive to keep from procrastinating about their writing.

Another major aspect of this trend is the embrace of source control in the writing process. I think this is a great development, and it makes a lot of sense if you think about it.  The book writing process, like software development, is subject to a lot of chopping and changing as the book evolves. Also writing collaboratively on something is I imagine similar to writing code in a team and probably have the same issues –  one needs ways for multiple people to be able to work on the same document without worrying if they are overwriting someone else’s hard work. Finally, a version control provides you with a fine-grained backup that allows you to rewind and replay the evolution of the book, which should be invaluable during the editing process. If there is a conflict somewhere, version control also offers tools to detect and fix it in a safe manner.

Some authors have posted their content on Github, which I think is an awesome idea.    Leveraging, the capabilities of a version control, especially a powerful and distributed one like Git and Github, is well suited for writing projects. Authors can leverage the tools  and capabilities of Github  -

  • On Github, feedback can easily be provided in the form of  commit notes, line notes, issues – heck you can even fork the project make the corrections and submit a pull request back to the author.
  • You can use Git as a fine grained distributed backup for your writing project that allows you to rollback specific portions of you project independently.
  • You can use the social features of Github to collaborate on the project and even drum up interest by distributing the link online.
  • You can maintain your code samples and other resources separately from your content and correct and upgrade them as needed. This can be very useful if the book is about specific technical frameworks that get upgraded after the book is completed and published.
  • Github even has a basic editor so you can work on your project online

While browsing through Github for examples of books that use Github as a version control system for writers – I came across this tool which is  in my mind a good first step, but I think there is a lot more that we can do to improve this experience. There are a whole lot of tools being built around this concept which is quite encouraging.

Writing and programming, have a lot in common. Both involve text, both are creative activities and both are ways to express ideas (one to a machine and the other to an audience) . There is scope for a lot of cross pollination of ideas between these fields. I think, currently the main challenge is a lack of knowledge and a fear of the technical nature of a tool like Git.

Related links

How to read code – a primer

I like programming – it’s what I do and I am blessed in that I get to spend most of my waking hours developing software. Like a lot of programmers I obsess over how good my code is and how I can get better at it.

Over the years there have been reading a lot of articles and books on software development. There has been a lot of ink spent (both physical and virtual) on ways to improve your “programming foo” and become a super ninja programmer ! There are some common pearls in all this ink and one of them is the advice on reading code. This advice, is usually a one liner couched in the midst of a bunch of other recommendations and usually along the lines of – find some great open source software or any piece of software that you admire, open up the source code (or print it out) and read it. While, this is on the whole, great advice there are some problems with actually putting it in practice.  In this post I endeavor to give some practical suggestions on reading code, but first let us enumerate the problems.

  • The usual impression conveyed (in the posts that advise one to read code) is that the dispenser of the advice is a programming guru who can literally sit back in their chair with a page of code and read it like a novel. Well, I am sure there are some superb programmers out there who enjoy looking at pages of cryptic English-like statements over a cup of coffee and can hold entire class hierarchies and architectures in their heads. This post is not meant for them – this post is for poor slobs like me who find staring at reams of code a boring, frustrating and ultimately pointless exercise. Of course, it can be argued that one can learn simply by reading a single class or even a function of a the entire project code, but, IMO, except for the most simple problems, most software is interdependent. It is often impossible to appreciate the design decisions and the rationale behind a particular function or class layout without knowing the rest of the system…
  • The next problem is getting code to read (actually before that you need to be able to identify code worth reading – check out this post for details on that). There is a lot of great software out there – both open source and freely available and licensed or proprietary. There are huge open source directories like Sourceforge and Google Code, and huge pieces of software like Open Office and Linux. If you are working in a software development company, you can probably get access to the proprietary code in your source control repository. A third common avenue are the programs distributed along with books on software development  or as part of resources for  education( Minix being the canonical example). Indeed we are actually spoiled for choice and from this universe of software identifying the ones that are good candidates for our purpose is a hard but essential task.
  • Another problem is the language in which the program is written – reading someone else’s code is tough enough as it is, adding the burden of familiarizing yourself with the quirks and syntax of a new language while doing this, is, IMO, a recipe for disaster and immense frustration . You need to find code written in a language that you are familiar with. This particular problem is not relevant if you are going through the code distributed as part of a book or as an educational resource, since you would have the book or your mentor to explain things and set out the context. If, despite this forewarning, you are planning to read code written in a different language than you are used to (without the benefit of having a book or a mentor) I would advise, that you at least learn enough of the language to create your own programs  in it (“Hello World” does not count :-)) .
  • The bit about context brings me to the next problem – figuring out what the code is doing is a lot harder if you are not familiar with the software itself. For example, it is far more difficult to go through the Linux code and figure out the concept of runlevels if you don’t use Linux daily and see the Linux boot sequence. Using the software gives one a context with which to read the code – this context includes the common terminology used, the functionality and features of the software, even the quirks and bugs that you experience.

I have realized that for me ‘reading code’ does not really describe the activities that I undertake – a better phrase for what I do is ‘code comprehension’. It is quite difficult for me to sit back with a laptop screen (or a printout) full of code and simply read through it. I need a lot more than simply a piece of code – I like to be able to look at documentation, play with the software, step through the code and even write tests for it before I really appreciate it. This is a significant investment of my time and effort, so I have to be very picky about the software I want to “read” (comprehend).

  • The first filter I place on the code directory, when looking for code  is the language filter – for me this means – C# or VB.NET or Python or Javascript(while I am familiar with C++, Ruby and F# as well I do not consider myself at a level where I can understand other people’s code in them). Next is to look for software that I have used – this allows me the a bit of a leg up since I know what the code is meant to do, cannot do and (if I am familiar enough) its limitations. Good candidates are open source software that you use in your day job (for eg. I use Cruise Control.NET, NANT and NUnit which are open source tools written in C#)
  • I happen to work in a software product company (a Microsoft shop), so one of the candidates for my reading list is the code in my companies source repository. If you happen to work in a software company, you can look at other projects, and even older versions of the software you are working on. In addition to providing insight on code, you get a pretty good idea of what was tried before and since. There are a few caveats though -
    • First, if you don’t have direct access to other projects, you need to ask permission – some companies are very touchy about their “intellectual property”.
    • Second, the quality of the software may not be as high as you think, since, in general, proprietary code does not get the kind of scrutiny open source code does. Warning signs to look out for are a lack of regular code reviews – if the software is not code reviewed the odds are that it would not be of good quality.
    • Third (this point is inspired from feedback provided by my friend Praseed), if the code in your company is business software (HR, Finance, ERP, etc) there is a lot of business context that needs to be understood first. Also, since most of this code tends to be factored by business functionality, it generally seems less modular than utility code or APIs.
  • Look for well documented projects (this applies to open source as well as proprietary code).  By this I mean, that the documentation should highlight the overall design, and rationale for the way the code is. Simply having auto-generated Java Doc type documents cannot be considered documentation :-). One useful avenue to explore is software created as educational resources (like Minix ). Since, the target is to teach through the software, they are usually quite clearly documented and have plenty of material explaining the design rationale behind the code.

So, you have identified the software and downloaded the source code and documentation, so let’s get down and start spelunking ;-)

  • Go through the design documentation and try to get a feel for the way the code has been built. Good software projects follow certain architectural patterns – these dictate the code organization. Once you get a handle on this, understanding the code becomes a whole lot easier. If you can create a class diagram of the code you can get a good idea of the layout.
  • The next thing to do is to compile it and run it. This can be straightforward or tough depending on the process followed in the project and it’s documentation.
  • Now it’s time to fire up your favorite IDE and go exploring. A good place to start your code exploration  would be to try to trace a functionality of the project that you are familiar with. This would let you go through the various layers and sub-systems and get a handle on how they inter-connect. For example when I was exploring NUnit – I started by writing a test and looking at the code classes I needed to do that.
  • Try and identify the design patterns used in the code. If you do not know what design patterns are, then you need to stop reading this post right now and read this book. Familiarize yourself with design patterns – they form a great way to recognize and understand the design of well written code. This makes it easier to keep it in your head while reading code. It also helps you identify nuances and customizations made by the programmers more easily.
  • Try to write tests for the code to fully understand it – this is really useful way to understand the dependencies between different parts of the code. When you try to write a test for the code you first need to satisfy (mock) all its dependencies. Next you need to understand the possible entry points as well as the exit values for the code. This improves your understanding of the code and get you to the next level.
  • Finally, try to refactor the code. In this step you have moved from simply understanding the code to becoming familiar enough to be able to modify it. As the sophistication of your refactoring increases so too does your understanding. At this point you can if needed contribute your own code to the project :-)
“Code Reading” IMO is more than just reading – it is a distinct set of activities that together help one understand code. It might seem more intimidating than simply “reading code” but it is well worth then effort IMO.
Happy “code reading” :-)
Update: I came across this post by Joel Spolsky where he quotes Seth Gordon as saying code reading “Is just like reading the Talmud”… Yup, code reading is definitely not easy.

The Humble Programmer – Edsger W Dijkstra

I first heard of Edsger W Dijkstra in the context of agile programming.I was having a discussion regarding agile programming with some friends and explaining Test Driven Development and the concept of first creating tests that can prove show the correctness of the code before writing the code, when a friend mentioned that this sounded a lot like some of the arguments put forward by Prof. Dijkstra in his Turing award lecture in 1972. I found that hard to believe, after all, if this was known in 1972 then why is it only becoming popular now ?

So I started looking up Edsger W Dijkstra and realized that this man was one of the pioneering giants of software programming. He is the father of structured programming and one of the guiding heads responsible for much of the way we program computers today. There is a lot written about him all over the place – I shall focus on his Turing award lecture, that was titled “The Humble Programmer“. In this lecture, Prof. Dijkstra puts forth six arguments on the way software programming should be done. On reading these six arguments I cannot help but feel that this lecture was one of the main inspirations used by the authors of the agile programming movement and design patterns community.

The six arguments put forward in the lecture are as follows -

  1. “A study of program structure had revealed that programs —even alternative programs for the same task and with the same mathematical content— can differ tremendously in their intellectual manageability. A number of rules have been discovered, violation of which will either seriously impair or totally destroy the intellectual manageability of the program.I now suggest that we confine ourselves to the design and implementation of intellectually manageable programs. The programmer only needs to consider intellectually manageable programs, the alternative alternatives he is choosing from are much, much easier to cope with.”
  2. “As soon as we have decided to restrict ourselves to the subject of intellectually manageable programs, we have achieved once and for all a drastic reduction of the solution space to be considered. This argument is distinct from argument 1.”
  3. “If one first asks oneself what the structure of a convincing proof would be and having found this, then construct a program satisfying this proofs requirements, the these correctness concerns turn out to be a very effective heuristic guidance. By definition this approach is only applicable if restrict ourselves to intellectually manageable programs.”
  4. “The only mental tool by which a very finite piece of reasoning can cover a myriad of cases is called an “abstraction”. There are number of patterns of abstraction that play a vital role in the construction of programs. Knowledge of these patterns of abstraction are essential.”
  5. “A programmer is fully aware of the limited size of his own skull; so he approaches the task of programming in full humility and avoids clever tricks.”
  6. “The only solutionproblems we can solve in a satisfactory manner are those that finally admit a nicely factored solution.”

When you go through these arguments you see the seeds for the various movements in programming software today -

I really enjoyed reading his lecture – I have the lecture here (The_Humble_Programmer)  if you want to read it. He must have been an extremely engaging speaker – a lot of his quotes are available here.

Till next time – Happy Programming !

Update: I have made a couple of corrections based of some of the comments here.  When I wrote that argument 3 is the basis for TDD I meant that in the lecture Dijkstra talks about first finding the structure of a proof and then constructing the program satisfying the proofs requirement. This is similar to the TDD approach of first writing a test and then writing code that satisfies the test.

Google Android and the CLI

Today morning I was reading about Brad AdamsGoing Google and ruminating about what he wanted to do there when several earlier articles I had read suddenly came together and I had an inspiration that I thought I’d blog.

It would by really cool if Brad and Tim Bray (who recently left Sun and joined Google Android) get together and implement the CLI on the Android OS.

Brad is one of the architects of the CLI and has been one of the main driving forces behind the development of the .NET framework and it’s adoption. He is currently looking at what he wants to do when he starts at Google. He mentioned that he thinks the cloud plus devices is one of the dominant trends of the future and I agree.

I think the Android OS is an important part of this future and currently I am frustrated that there is only Java support for developing in it. Tim shares my frustrations and he is looking at getting other languages supported in the Android OS.  He is looking at Ruby right now – it’s open source and a dynamic language and it makes sense . But I feel the CLI (which is an open ECMA specification) is a great fit for the Android OS because it can be used to as a basis to quickly support a lot of languages.  Besides as a .NET developer myself, I think that having the capability to develop Android applications in C# or IronPython is a far more palatable proposition than doing it in Java :-)

I saw this article by Miguel de Icaza where he puts out an idea to incorporate the CLI into the browser engine so that we could use languages other than Javascript in our client-side scripting (my take on that is here). It occurs to me that it should be similarly possible to bring it into the Android OS as well.  There are currently efforts to port Mono that could be used as a starting point.

So what say guys – can we get the .NET CLI in Google Android ?  Become the opposite of the Apple iPhone and embrace developers instead of driving them away :-)

Ganesha – the original lateral thinker

There is an ancient tale from Hindu mythology that illustrates lateral thinking (also known as –  “out of the box” thinking) that I would  like to share -

One day Lord Siva and His consort Parvati were sitting atop their abode on Mt. Kailash with their sons Ganesha and Karthikeyan when the sage Narada dropped by for a visit. Narada had with him a special mango of knowledge, to offer to Siva. After accepting the mango from Narada,  Siva and Parvati decided to have a contest between their sons.

Ganesha

Ganesha

Karthikeyan

Karthikeyan

The first one who circumnavigates the world three times would get to the mango of knowledge. Without further ado Karthikeyan  jumped on his peacock and started off. Ganesha on the other hand was busy eating his favorite ladoos and decided to finish them first. Karthikeyan had completed two rounds by the time Ganesha finally got ready to compete :-)

Ganesha simply approached Siva and Parvati and deliberately walked around them –  He circled them once, twice and three times and then claimed the mango.

When, Siva and Parvati asked him how he could claim the mango when he had not circled the world even once – Ganesha replied – “You both are my world”. Delighted by the answer Siva and Parvati gave Ganesha the mango, which he immediately gobbled up with relish.

Two of the important traits of good software developers are “enlightened laziness” and “Out of the box” thinking.  This tale is an example of both enlightened laziness and out of the box thinking – confronted by the immense task of circumnavigating the world – Ganesha – by simply thinking a little and restating the problem comprehensively defeated his brother Karthikeyan.

So, my eager friends – the ones who are chomping at the bit after the initial presentation of a project – eager to rush into coding it, please spend some time contemplating your problem.  Another, homily you might want to consider is -  “Think twice, code once” – You, might just save yourself a LOT of time and effort !! :-)

Juggling code – the coding zone and burnouts..

Software programming is a very mentally intensive activity. In any non-trivial software system the coder has to juggle a large number of mental models. Like a juggler, a coder,  has to mentally juggle not only the actual code that he/she is writing, but,  details of the code that it is related to, the details of the data being manipulated, the possible errors to be handled, the reliability and performance of the code, it’s security characteristics, the requirement that is being implemented and it’s design and usability, etc (depending on the code there may be more to think of or if you lucky, less :-)). Unlike a juggler who generally juggles things of similar size,  a coder mentally juggles problems whose complexity vary by several orders of magnitude (1 – 109).

Given all this, it takes time for coders to become truly productive when they sit down and start working on something.  Once you get into the what I call the – coding zone , you find the ideas flowing through you seamlessly – coders in the zone lose sense of time and place – the problems and solutions are clear and you find beautiful code coming from your keyboard. Coding when you are in the zone is an immensely satisfying task – it’s like the zone that sportspeople talk about – when they are breaking records, it seems like they are unstoppable and every movement is a beautiful ballet…

This is also why almost all good coders HATE BEING INTERRUPTED !! Whether it is a simple phone call or even a well meaning colleague coming over to tap you on your shoulder and ask a question – the effect is the same as though the coder was invited to a long meeting. It takes time and effort to get back to being productive  after the interruption.

There are several other things that contribute to this problem -

  1. “Open Office” plans where you are compelled to hear your neighbors  conversations.
  2. Having one phone for several people in your area so you cant disconnect it and have to attend it on the off chance it yours.
  3. Conversations over information that can be sent by email or IM or SMS or any of the multitude of asynchronous forms of communication available today.

I have seen several ways to combat this  -

  1. Some people wear head-phones to block the ambient noise and subtly indicate to people they are working on something and interruptions are not encouraged (YMMV – I have seen people ignore the subtle indication and come over anyway).
  2. Some people deal with all their email and IM at scheduled intervals – this way everyone gets their reply and people learn to come with the questions at those times.
  3. If you are lucky enough to have a cabin then disconnecting the phone and leaving a message on the door is often effective.
  4. Some companies even plan their meetings to happen only on certain days so everyday disruption is minimized.
  5. A common inclination is to work at times when no-one else is around to bother you. This is a reason why coders are night owls :-)

Another effect of software programming is burn-out… This is the opposite of being in the coding zone, but it seems to be consequence of being in one…  Like I mentioned in the beginning of this post – software programming is a very mentally intensive activity. Coders have frequently felt mentally burned-out after intensive coding sessions.  This happens more quickly on projects which you don’t find interesting or enjoyable. Sometimes you can continue only for a day, other times it’s a month but invariably – burn-out happens.The key is to recognize it for what it is and deal with it.

Indeed,  when I previously mentioned that coders lose all sense of time when in the zone, I did not mean that they should spend all their time coding.  I am not  applauding coders that brag about sitting for 36 hours at a computer churning out code. Those, that spend 16 hours a day at the terminal and spend their nights dreaming about code are, invariably, the ones whose work the rest of the team has to spend the rest of the month fixing. Like in all things,  there is a balance that needs to be maintained. Spending long amounts of time in intense concentration is tiring, and it is important to give things a rest. It is usually great to take some time off doing something else, like mountain biking, mixed martial arts, flying a plane or playing an instrument (these are pastimes of some of my friends :-)). Some people like physical activity others like mental activities like video games, or chess. The important thing is to have a balance. Sometimes, when you are grappling with a hard problem it is useful to stop thinking about it consciously and let your sub-conscious chew over it.

When you are no longer in the zone and are spinning your wheels, a break is the most productive thing you can do.

The Zen of Programming is being able to get into the zone and more importantly to recognize when you are no longer in it and take that break ! :-)

Update: I came across this article the other day that got me thinking about burn-out.  I mentioned before that burn-out happens more quickly when doing something you don’t find interesting or enjoyable – this advise goes in spades when you are doing something you feel is morally wrong or that goes against your conscience.  Guilt is a catalyst that will accelerate both the speed and the intensity of your burn-out.

IMHO if given a choice it is much more satisfying to do something you believe in at a lower salary than something you don’t at a higher one.