Learning to Read!

by anandeep on November 16, 2006 07:39pm

One of the great things about my job at the Open Source Software Lab (OSSL) here at Microsoft (besides being able to work with both Linux and Windows!) is that I get to go computer science research conferences.  I try not to attend the purely academic ones, but the ones in which both industry and academic research issues are addressed.

I just got back from ISSRE  (pronounced “is-ree”) i.e. the 17th IEEE International Symposium on Software Reliability Engineering, 2006.  This conference talks about everything that impacts the reliability of computers – this includes everything from “drivers of reliability” to  “testing to ensure reliability” to “doing static analysis of programs”.

Skeptical that anything they talk about here would be useful to y’all? Well, think again! They have all kinds of practical advice on doing things right.  The talks I really enjoyed included

    • empirical evidence on the positive impact of using assertions during development (a very development oriented talk)
    • considering stabilization time of an application (i.e. stability during installation and immediately after) as part of a reliability metric
    • the use of “operational profiles” to reduce the number of test cases by over 30% without significantly impacting the reliability of the tested product

Only one of the above talks was from an academic institution, the other two were based on experience with software being widely used in the consumer and application server space.

The one thing that I enjoyed the most was a tutorial on “Software Productivity and Reliability – Tools and Techniques” given by Prof S C Kothari  of  Iowa State University.  The tutorial title is appropriate but I think what it should have been is “Learn to Read Programs Properly!”

Kothari believes that a lot of attention has been paid to what he calls “Program Writing” – developers tools and such.  This has resulted in the creation of very complex software artifacts.  Most real world applications today are built on these already built complex software systems.

The problem is that almost all academic institutions and programs focus on the inventive aspects of programming.  This means that they teach algorithms and techniques assuming that everything will be written from scratch.  Real life is of course never like this – it is difficult if not impossible to be a computer software professional these days and work just with your own code.  More often than not, most developers have to wade through other people’s code to understand, use or modify it.  Developing software today involves a lot more than just writing it.

The skills to “read programs” are acquired the hard way – and sometimes never fully mastered.  Kothari suggested that there needs to be an emphasis on program reading in training and that tools need to be built to aid in reading programs and forming the proper mental model of them.  The barrier to future software productivity is not machines or algorithms but human mastery of the complexity of the vast amount of critical software out there. 

Program reading is not easy, as most people in open source know! This is due to

  1. The complexity of the semantic analysis of the program – figuring out what the module is trying to do, is it part of the scaffolding put in place to support the execution or is it domain knowledge embodied in the module?
  2. A lack of domain knowledge – how would a programmer know how a complex business or legal transaction needs to be done or a certain application level protocol executed– yet this information has to be embodied in the code written by the programmer.
  3. Non-localized relations between software artifacts – the module does one thing in one context and another thing in the other – in some it has to maintain data integrity and in others  it has to undo something that has occurred somewhere else in the program.

There are some tools that are available to assist in program reading such as CScope (BTW Hank Janssen of our lab wrote parts of CScope) but there has not been a lot of attention paid to WHAT program reading needs in order to address the complexity issues raised above.  Kothari has a company Ensoft that provides some very cool tools to do the kinds of things that are needed for reading complex programs.  The tools are based on abstractions that are used in program comprehension (there is a IEEE Conference on Program Comprehension held every year).  Kothari illustrated one that he called “matching pair” (MP).  Matching pairs are defined by a syntactic pattern – which could be artifacts (such as matching parentheses) or events ( such as locking or unlocking a resource).  There are many types of such matching pairs and to make a program correct a matching pair can be defined with respect to control flow, data flow or both.  A control flow matching pair  means that a function f would need to be followed by a function f-inverse in EVERY execution path that the program could take.  Looking through every execution path is hard (and it is proven that to do it via automated static analysis of programs is an intractable problem) – especially in something like the Linux kernel.

Using the tool that Kothari demonstrated – a call graph was generated and a “query language” defined over call graphs.  Looking for matching pairs using the tool became unbelievably simple. This was just one of the things that can be done to reduce the complexity and time taken to figure out what a very complex program was doing.

I think this is a real breakthrough – and I am now a confirmed advocate of program reading. I am hoping to work with Prof Kothari to do some more stuff with this – I hope to share the results if I do end up doing that.

Why do I mention this on this forum?  This is something that open source developers and IT Pros have been doing for a long time.  Open source developers have a culture wherein a lot of code reading is encouraged. And IT Pro’s have to constantly update and upgrade scripts that they use to control and run their infrastructure. The cultural advantage lies with open source developers and IT Pros but given the complexity of software is increasing exponentially everyone could do with a little help