Na navigaci | Klávesové zkratky

I Don't Trust Statistics That I Haven't Faked Myself

Václav Novotný has prepared an infographic comparing developer activity in Nette and Symfony. I'm eager and curious to look at it, but without an explanation of the metric, the numbers can be treacherously misleading. Exaggerating a bit: with a certain workflow and naive measurement, I could appear in the statistics as the author of 100% of the code without having written a single line.

Even with straightforward workflows, comparing the amount of commits is tricky. Not all commits are equal. If you add five important commits and at the same time ten people correct typos in your comments, you are, in terms of the number of commits, the author of one-third of the code. However, this isn't true; you are the author of the entire code, as corrections of typos are not usually considered authorship (as we typically perceive it).

In GIT, “merge-commits” further complicate matters. If someone prepares an interesting commit and you approve it (thus creating a merge-commit), you are credited with half of the commits. But what is the actual contribution? Usually none, as approval is a matter of one click on GitHub, although sometimes you might spend more time discussing it than if you had written the code yourself, but you don't because you need to train developers.

Therefore, instead of the number of commits, it is more appropriate to analyze their content. The simplest measure is to consider the number of changed lines. But even this can be misleading: if you create a 100-line class and someone else merely renames the file with it (or splits it into two), they have “changed” effectively 200 lines, and again you are the author of one-third.

If you spend a week debugging several commits locally before sending them to the repository, you are at a disadvantage in the number of changed lines compared to someone who sends theirs immediately and only then fine-tunes with subsequent commits. Therefore, it might be wise to analyze, perhaps, summaries for the entire day. It is also necessary to filter out maintenance commits, especially those that change the year or version in the header of all files.

Then there are situations where commits are automatically copied from one branch to another, or to a different repository. This effectively makes it impossible to conduct any global statistics.

Analyzing one project is science, let alone comparative analysis. This quite reminds me of the excellent analytical quiz by Honza Tichý.

Related: How the ‘Hall of Fame’ on is calculated

You might be interested in

phpFashion © 2004, 2024 David Grudl | o blogu

Ukázky zdrojových kódů smíte používat s uvedením autora a URL tohoto webu bez dalších omezení.