Thoughts on code metrics

# Thoughts on code metrics ## Cautions * [McNamara fallacy](https://en.wikipedia.org/wiki/McNamara_fallacy) - one reason the U.S. is viewed as having "lost" the Vietnam war was because of a focus on quantifiables (in this case, body count). * [Perverse incentive](https://en.wikipedia.org/wiki/Perverse_incentive) - an incentive that produces results contrary to the intentions of the designers. * [Goodhart's law](https://en.wikipedia.org/wiki/Goodhart%27s_law) - when a measure becomes a target, it ceases to be a good measure. ## Principles 1. **Effectiveness ultimately must be measured in relationship to end goals**. All current business key performance indicators relate *directly* to an end goal, but typical code metrics (focusing on LOC) do not. Since code metrics do not directly relate to *any* end goal, they should not be used as a *key* indicator. Said another way, we could maximize or minimize almost every typical LOC code metric and it would likely not indicate to us whether our software was going to be "successful" or not. 1. **Effectiveness is more closely related to *organization* than metabolic effort** (and most LOC code metrics seem to be measuring metabolic effort) * Trying to improve the number of powerpoint presentations given (or slides per presentation) is probably counterproductive for an executive. An executive that gives no powerpoint presentations maybe isn't doing their job, but arguably fewer presentations and fewer slides per presentation may actually be a sign of *better* organization and effectiveness. * Neurons *must* be firing for a researcher to create a business plan, but the overall effectiveness of a business plan may not be related at all to the *number* of neurons that fired during the creation of the plan. Two individuals may have fired the same number of neurons, but one plan may be thousands or millions of times more effective. * Monkeys and humans have a high degree of DNA sequence similarity. Relatively minor differences in how that information gets organized and used across time and space in a developing embryo results in humans possessing far greater intelligence. * A pile of compost and a human are both highly metabolically active and generate large amounts of heat, but metabolism alone says very little about whether the metabolic activity results in higher order functioning. 1. **Controlled studies?** What controlled (or even observational) studies exist supporting the idea that individual or organizational LOC metrics (or their use as KPIs) result in more effective software? Are these measurements used at "high-performing" organizations? Is there any consensus on which metrics are most effective at predicting effective code or coders? 1. **Complexity of a task vs. complexity of code?** Unless we know how complex a task *is* we have no way of knowing how optimal a solution is. A simple task that results in a high *measure* of Kolmogorov complexity is a *poor* solution. But, a task that *requires* a high level of complexity that fails to achieve a commensurate level of code complexity is likely to be incomplete or incorrect. Also, we do not necessarily expect that code metrics produced by those who code different *types* of components in different languages will even be directly comparable since the activities may require different *kinds* of optimizations (consider variance in heart structure and function between [rowers, runners, and swimmers](https://www.frontiersin.org/articles/10.3389/fphys.2018.01700/full)). ## LOC-based metrics may be counter-productive These are some real-world type examples where typical LOC code metrics may be counter-productive: 1. A person might spend hours profiling and debugging and find only a single line of code to change for all of that effort. Their code metrics would look terrible based on this expenditure. However, that *one* line of code may result in roughly 100X performance gains across the entire system for months or even years. 2. Person A might dive into a project head-first and code a solution that is suboptimal. This solution will then require a significant amount of code and dev-hours to correct and *maintain* over time. OTOH, Person B may spend 2 days researching, reflecting, and *planning* a solution to the problem. They create a solution that is optimal and requires 1/100th the amount of code as Person A. 3. Person A might dive into a project and code a solution. The solution solves the problem but requires a month of coding and results in thousands of lines of code and significant time and effort to get right. Person B spends several hours researching and finds a library that has already been written that accomplishes that task. They spend several more hours understanding how to use the library. They commit a handful of lines of code to solve the problem. 4. Person A duplicates their code across hundreds of controllers. Person B factors out redundant code into a single library that is reused across hundreds of controllers. Person C writes their code so that a single controller handles all 100 cases in a few lines. Most LOC metrics will award the most to A, then B, then C, when it seems like the reverse outcome is desirable. 5. Person A spends time carefully reviewing others' code. This helps others write more maintainable code now and in the future and catches potential problems before code has even been written. Most code metrics award zero points to this effort, incentivizing cursory code reviews. ## Proposals for consideration: 1. Perhaps we should direct measures of code health at projects and code-bases and not individuals or teams? The product is the code, not the team, so the goal should be the code base, not the aggregate team measure. Individuals and teams should be working together to achieve optimal code base metrics. 2. Maybe we shouldn't call these KPIs, since they aren't really KPIs and shouldn't be used like that. Call developer metrics "vital signs" and code base measures "code health indicators". ## Appendix ### Indicators of code health, *all of which* can be managed with CI/CD thresholds: * Code coverage (%). * All tests passing. * Maintainability: * Consistency (flake8 + pylint) * Small methods/functions (i.e., the distribution of cyclomatic complexity across functions) * The degree of repetition in code (can be measured with pylint) * Total lines (or characters) of code that must be maintained: more LOC is **worse**. ### Developer "vital signs" * Commits per day * Work days with 1 or more commits ### Impact points? If we really want to try and measure developer productivity, it might be with "impact points". Take a JIRA task and estimate its *Product Impact* + *Code Impact* + *Performance Impact*. Developers complete tasks and are awarded the points. The people making the most impact will be somewhat proportional to the number of impact points earned. Will this improve productivity though?