Rules for Commenting Code Revisited, v2

Posted on January 2, 2012

0


CC-by-SA

This is a revision of a post I made on the same subject back in March, 2011. This new one renders the old one obsolete.

I’ve seen questions about comments every once in a while, and I found this one in SO very interesting. What tense should we use when writing comments?

The question is very simple (simple enough for many l33t hax0rs to dismiss it), but it does raise a question on how to write good comments (an art on itself.) There are rules, guidelines and best-practice suggestions  on how to name classes and methods (or procedures and data types in procedural programming). More importantly, there are such things as code naming anti-patterns, like the pernicious “get” and “set” dumb-ass naming conventions found in Java (and to a lesser, but no less useless extend in some C++ implementations.)

But before we continue with this line of thought, let me explain what my suggestions are not.

What This is Not.

Given that there are guidelines on how to name classes, methods, data types and procedures, it would seem then, as a obvious consequence, that there should be guidelines on how to comment code.  For instance, the father of the SOLID principles, Robert “Uncle Bob” Martin makes compelling arguments in favor of writing code in blocks that are 1) small, 2) explicit, and 3) with few comments. This is very important for tackling complexity (cyclomatic, halstead, cohesion).

So far so good. Unfortunately, this is sometimes taken as a rule that sufficiently explicit code  should have no need for comments. In fact, Mr. Martin himself takes the position that a comment is an apology for bad code. I don’t necessarily agree with that. Code can never explain the why, only the how.

On the other side of the extreme line, we can be working under coding guidelines that demand we produce, say, a minimum 50% code to comment ratio. I don’t subscribe to that position either.

The former position breaks down under a variety of conditions; the size of the teams increases; or  we go lower in the syntactic power of the language in use; or we do not have control over the quality of programming talent at our disposal. Furthermore, I strongly believe (objectively I hope) that it is not universally tenable to produce code that is expressively enough to not require comments.

The later position (mandated comment/ratio quotas) by itself is no guarantee that the code is well constructed. This position requires additional effort (in the form of peer review) to guarantee that the code and comment combinations are correct.

Those are things I will write about later, and my current suggestions on how to comment your code are of a different nature.

So, What Are We Talking About Then?

The rules (rules for me, suggestions for you) are of a semantic and stylish nature. More to the point, my rules (suggestions for you) deal with the proper usage of grammatical tense to convey actions, descriptions and statements of fact (assertions, in-variants, pre-conditions and post conditions.) The worst thing about code comments are comments that are obsolete, uninformative and wrong.

The second worst thing, however, is a combination of code and comments that actually conveys information, but in such a way (typically structurally) which forces the programmer to become a paleo-linguist/cryptanalyst to pry open the intended meaning from the maws of unintended complexity. Such efforts take valuable time away from the development process. Ergo, proper usage of grammar, in particular grammatical sense is imperative.

This might seem like minutia compared to the oh-so-l33t-haxor and more glamorous concerns involved in the art, craft and discipline of writing and shipping workable code. But guess what? Writing comments have a ROI (or cost) associated to it. So it is your job (the thing you do for which you get a paycheck) to do a reasonable effort in doing it well to the benefit of your employer.

So, How Do Comment with teh gramm3r?

For me, I tend to follow some rules of thumb that I’ve derived by trial and error throughout the years.

To me, comments are (or should be) like anything written, expressions of something, and they should simply follow the same rules of technical writing in natural languages (taking into account short-hands and abbreviations specific to the situation or artifact being documented.)

Comments in the present tense (.ie “it changes” or “it is changing”) can be use to indicate that something (a datum for example) is being altered or manipulated by the execution of the algorithm being documented.  They can also be used to indicate side effects or consequences of the algorithm in question, in particular at the place where the comment resides. That is, present tense comments explain either what the code is doing or what is occurring to the the data being manipulated.

For example:

With user credentials, we process updates to matching records in the general ledger table.

Comments in the past tense should indicate an assertion, precondition or post-condition of something that has happened prior to the point where the comment resides. That is, they document a temporal/logical happens-before condition that is critical or important for code to execute properly at this point. For example:

Input has already been validated before entering this block of code

Past tense can also be used to indicate a post-condition that must be true after code has been executed up to that point:

Data has been written to X file at this point in the execution of this method.

One important caveat: Comments in the past tense that indicates a change to the code itself (.ie. “X was changed to Y”) should not exist in the code. That’s what source control is for. That is, they should exist as revision comments in the source code repository. Examples of these type of comments to avoid are the following:

Method X has been renamed method Y.

or

This code has been commented out because it is no longer needed.

Comments in the future tense should indicate a condition that needs to be met or addressed, but that for X or Y reason, it is not being done right now (now as in at the current point of execution or at the point in the source code where the comment resides). For example:

When we finally migrate to the new the db, we will have to change this logic

or

TODO: asap, revisit validation of input – it might fail for X or Y type of input, might require massive changes that cannot be implemented right now.

Danger, Danger Will Robinson!!!

For the later TODO type of comments, some other form of documentation should exist to make sure that such changes actually take place. The last thing you want are TODOs lost in time and space .:P

Take it with a grain of salt, but typically those are the rules I usually follow when I do my own comments. I hope you find them useful.

CC-by-SA

Advertisements