Home   Vita   Projects   Papers   Journal 

 

Online Journal

  • 29.9.2006

    Side effects, and more on indirect programming

    Programming (writing source code) is about communication. It is not just laying down instructions which a computer then executes; writing source code is also (and, as I would argue, primarily) to explain, at the same time, what happens when the code is executed.

    The audience of that explanation includes maintainers, other programmers on the team, and, last but not least, your own future self. In short, everybody who has to deal at a later time with your source code is someone whom you talk to through the way that you write your code.

    This is why there are such things as coding style and formatting guidelines. Most people who are in charge of software projects (of whatever sort, for example open source projects, or commercial software product development) know from experience how hard it is to understand a substantial code base a few months or even years after it has been originally written; the task gets all the more difficult if the codebase is heterogenous with respect to coding style, idiosyncratic formatting and other such things. The project leaders therefore ask developers to follow some rules which render the code base more uniform and thus easier comprehensible. (This does not mean that these rules have to be forced on the developers 'from above', of course; more often, a team that is comprised of engaging developers will develop its own rules from a consensus.)

    The fact that writing source code is about communication with other developers is also one of the reasons why side effects are bad. If you write a program sequence that exploits side effects, this means that your code actually does something different than what you communicate it does. Reading or debugging such code can be incredibly hard if done some time later, or by a different person (or both). And the reason for this is not that such code is more difficult to grasp (in the sense in which a tricky recursive algorithm may be more difficult to grasp than, say, a simple loop). It is because the original programmer made a misleading statement, through his source code, as to what the code was supposed to do; and the programmer who later had to debug that code relied on him and was thus misled.

    Take an example: A while ago I was in a project that had some very old database access code in some peripheral module which suddenly stopped working. Nobody had changed that code (or the database), and no errors occured. It simply gave empty result lists all the time. We had long debugging sessions and couldn't find anything wrong in the code. It worked quite well whenever we executed passages of it separately - only in context, integrated in the system, it always returned empty results. Then we double-checked that nobody had changed that code. We examined the entire revision history in the source code control system. Nothing had been changed - nothing, that is, but a few logging statements which had been removed. We weren't suspicious of these logging statements at first, but then one of the team members noticed: there was a logging statement which did print the return value of a function, and that function was - a call to a bit of initialization code which made the database connection and retrieved all the data!

    Now, this is a somewhat crass example. Most of the time the code that exploits side effects is a bit more subtle - but that is only a difference of degree. What matters is that the programmer who originally wrote this did neglect the communication aspect of coding. Very probably he first wrote that statement which caused the initialization, and at a later time (presumably during some debugging), he added a return code and wrapped the log statement around it. This makes sense only as long as one doesn't consider how someone else would read the code. One simply focuses on the task at hand, and for that it makes no difference where the initialization statement is called. But the very moment that someone else (or the same programmer a couple of months later) has to understand that code, the log statement is quickly skipped when reading the code - because log statements are, by convention, something that doesn't add functionality. (Many systems are built so that logging may be disabled entirely when production mode is entered.)

    One might argue that the root problem in this case was that this developer did not neglect such a fundamental of programming, he merely violated a basic rule that applies to log statements: they shouldn't be written in a way that they change the program state or the content of data structures; they should be 'read-only'.

    True, but that's exactly the point: log statements should not trigger side effects on which the main control flow depends. And that is precisely because nobody expects log statements to do that. It is a convention that helps to read source code (because it enables the reader to skip inessential lines of code). It is a convention, that is, that facilitates communication between developers (or between earlier and later selves of the same developer) about the program code in question. It is not at all necessary for the compiler or the runtime system to execute the code.

    In one of my earlier posts, I wrote about indirect programming, and what makes it so bad. The discussion above shows an additional aspect of indirect programming: it is the analogue, on a software design level, to what side effects are on the more basic coding level.

    Passages of a program that have been generated by indirect programming are much more difficult to understand because what they actually do is not what they seem to do. (Often enough, they misleadingly seem to do nothing sensible at all - in that respect they are very similar to the logging statement in the example above.)


 

All content on this site is Copyright (c) 2005-2010 by Leif Frenzel. All rights reserved.