Somewhere, right now, a technology executive tells their directors: “we
need a way to measure the productivity of our engineering teams.” A working
group assembles to explore potential solutions, and weeks later, proposes
implementing the metrics: lead time, deployment frequency, and number of
pull requests created per engineer.
Soon after, senior engineering leaders meet to review their newly created
dashboards. Immediately, questions and doubts are raised. One leader says:
“Our lead time is two days which is ‘low performing’ according to those
benchmarks – but is there actually a problem?”. Another leader says: “it’s
unsurprising to see that some of our teams are deploying less often than
others. But I’m not sure if this spells an opportunity for improvement.”
If this story arc is familiar to you, don’t worry – it’s familiar to
most, including some of the biggest tech companies in the world. It is not uncommon
for measurement programs to fall short when metrics like DORA fail to provide
the insights leaders had hoped for.
There is, however, a better approach. An approach that focuses on
capturing insights from developers themselves, rather than solely relying on
basic measures of speed and output. We’ve helped many organizations make the
leap to this human-centered approach. And we’ve seen firsthand the
dramatically improved understanding of developer productivity that it
provides.
What we are referring to here is qualitative measurement. In this
article, we provide a primer on this approach derived from our experience
helping many organizations on this journey. We begin with a definition of
qualitative metrics and how to advocate for them. We follow with practical
guidance on how to capture, track, and utilize this data.
Today, developer productivity is a critical concern for businesses amid
the backdrop of fiscal tightening and transformational technologies such as
AI. In addition, developer experience and platform engineering are garnering
increased attention as enterprises look beyond Agile and DevOps
transformation. What all these concerns share is a reliance on measurement
to help guide decisions and track progress. And for this, qualitative
measurement is key.
Note: when we say “developer productivity”, we mean the degree to which
developers’ can do their work in a frictionless manner – not the individual
performance of developers. Some organizations find “developer productivity”
to be a problematic term because of the way it can be misinterpreted by
developers. We recommend that organizations use the term “developer
experience,” which has more positive connotations for developers.
What is a qualitative metric?
We define a qualitative metric as a measurement comprised of data
provided by humans. This is a practical definition – we haven’t found a
singular definition within the social sciences, and the alternative
definitions we’ve seen have flaws that we discuss later in this
section.
Figure 1: Qualitative metrics are measurements derived from humans
The definition of the word “metric” is unambiguous. The term
“qualitative,” however, has no authoritative definition as noted in the
2019 journal paper What is Qualitative in
Qualitative Research:
There are many definitions of qualitative research, but if we look for
a definition that addresses its distinctive feature of being
“qualitative,” the literature across the broad field of social science is
meager. The main reason behind this article lies in the paradox, which, to
put it bluntly, is that researchers act as if they know what it is, but
they cannot formulate a coherent definition.
An alternate definition we’ve heard is that qualitative metrics measure
quality, while quantitative metrics measure quantity. We’ve found this
definition problematic for two reasons: first, the term “qualitative
metric” includes the term metric, which implies that the output is a
quantity (i.e., a measurement). Second, quality is typically measured
through ordinal scales that are translated into numerical values and
scores – which again, contradicts the definition.
Another argument we have heard is that the output of sentiment analysis
is quantitative because the analysis results in numbers. While we agree
that the data resulting from sentiment analysis is quantitative, based on
our original definition this is still a qualitative metric (i.e., a quantity
produced qualitatively) unless one were to take the position that
“qualitative metric” is altogether an oxymoron.
Aside from the problem of defining what a qualitative metric is, we’ve
also encountered problematic colloquialisms. One example is the term “soft
metric”. We caution against this phrase because it harmfully and
incorrectly implies that data collected from humans is weaker than “hard
metrics” collected from systems. We also discourage the term “subjective
metrics” because it misconstrues the fact that data collected from humans
can be either objective or subjective – as we discuss in the next
section.
Type | Definition | Example |
---|---|---|
Attitudinal metrics | Subjective feelings, opinions, or attitudes toward a specific subject. | How satisfied are you with your IDE, on a scale of 1–10? |
Behavioral metrics | Objective facts or events pertaining to an individual’s work experience. | How long does it take for you to deploy a change to production? |
Later in this article we provide guidance on how to collect and use
these measurements, but first we’ll provide a real-world example of this
approach put to practice
Peloton is an American technology company
whose developer productivity measurement strategy centers around
qualitative metrics. To collect qualitative metrics, their organization
runs a semi-annual developer experience survey led by their Tech
Enablement & Developer Experience team, which is part of their Product
Operations organization.
Thansha Sadacharam, head of tech learning and insights, explains: “I
very strongly believe, and I think a lot of our engineers also really
appreciate this, that engineers aren’t robots, they’re humans. And just
looking at basic numbers doesn’t drive the whole story. So for us, having
a really comprehensive survey that helped us understand that entire
developer experience was really important.”
Each survey is sent to
a random sample of roughly half of their developers. With this approach,
individual developers only need to participate in one survey per year,
minimizing the overall time spent on filling out surveys while still
providing a statistically significant representative set of data results.
The Tech Enablement & Developer Experience team is also responsible for
analyzing and sharing the findings from their surveys with leaders across
the organization.
For more on Peloton’s developer experience survey, listen to this
interview
with Thansha Sadacharam.
Advocating for qualitative metrics
Executives are often skeptical about the reliability or usefulness of
qualitative metrics. Even highly scientific organizations like Google have
had to overcome these biases. Engineering leaders are inclined toward
system metrics since they are accustomed to working with telemetry data
for inspecting systems. However, we cannot rely on this same approach for
measuring people.
Avoid pitting qualitative and quantitative metrics against each other.
We’ve seen some organizations get into an internal “battle of the
metrics” which is not a good use of time or energy. Our advice for
champions is to avoid pitting qualitative and quantitative metrics against
each other as an either/or. It’s better to make the argument that they are
complementary tools – as we cover at the end of this article.
We’ve found that the underlying cause of opposition to qualitative data
are misconceptions which we address below. Later in this article, we
outline the distinct benefits of self-reported data such as its ability to
measure intangibles and surface critical context.
Misconception: Qualitative data is only subjective
Traditional workplace surveys typically focus on the subjective
opinions and feelings of their employees. Thus many engineering leaders
intuitively believe that surveys can only collect subjective data from
developers.
As we describe in the following section, surveys can also capture
objective information about facts or events. Google’s DevOps Research and
Assessment (DORA) program is an excellent concrete
example.
Some examples of objective survey questions:
- How long does it take to go from code committed to code successfully
running in production? - How often does your organization deploy code to production or
release it to end users?
Misconception: Qualitative data is unreliable
One challenge of surveys is that people with all manner of backgrounds
write survey questions with no special training. As a result, many
workplace surveys do not meet the minimum standards needed to produce
reliable or valid measures. Well designed surveys, however, produce
accurate and reliable data (we provide guidance on how to do this later in
the article).
Some organizations have concerns that people may lie in surveys. Which
can happen in situations where there is fear around how the data will be
used. In our experience, when surveys are deployed as a tool to help
understand and improve bottlenecks affecting developers, there is no
incentive for respondents to lie or game the system.
While it’s true that survey data isn’t always 100% accurate, we often
remind leaders that system metrics are often imperfect too. For example,
many organizations attempt to measure CI build times using data aggregated
from their pipelines, only to find that it requires significant effort to
clean the data (e.g. excluding background jobs, accounting for parallel
jobs) to produce an accurate result