In-line assistance – when is it more useful?
The most widely used form of coding assistance in Thoughtworks at the moment is in-line code generation in the IDE, where an IDE extension generates suggestions for the developer as they are typing in the IDE.
The short answer to the question, “Is this useful?” is: “Sometimes it is, sometimes it isn’t.” ¯_(ツ)_/¯ You will find a wide range of developer opinions on the internet, from “this made me so much faster” all the way to “I switched it off, it was useless”. That is because the usefulness of these tools depends on the circumstances. And the judgment of usefulness depends on how high your expectations are.
What do I mean by “useful”?
For the purposes of this memo, I’m defining “useful” as “the generated suggestions are helping me solve problems faster and at comparable quality than without the tool”. That includes not only the writing of the code, but also the review and tweaking of the generated suggestions, and dealing with rework later, should there be quality issues.
Factors that impact usefulness of suggestions
Note: This is mostly based on experiences with GitHub Copilot.
More prevalent tech stacks
Safer waters: The more prevalent the tech stack, the more discussions and code examples will have been part of the training data for the model. This means that the generated suggestions are more likely to be useful for languages like Java or Javascript than for a newer or less discussed language like Lua.
However: My colleague Erik Doernenburg wrote about his experience of “Taking Copilot to difficult terrain” with Rust. His conclusion: “Overall, though, even for a not-so-common programming language like Rust, with a codebase that uses more complicated data structures I found Copilot helpful.”
Simpler and more commonplace problems
Safer waters: This one is a bit hard to define. What does “simpler” mean, what does “commonplace” mean? I’ll use some examples to illustrate.
- Common problems: In a previous memo, I discussed an example of generating a median function. I would consider that a very commonplace problem and therefore a good use case for generation.
- Common solution patterns applied to our context: For example, I have used it successfully to implement problems that needed list processing, like a chain of mapping, grouping, and sorting of lists.
- Boilerplate: Create boilerplate setups like an ExpressJS server, or a React component, or a database connection and query execution.
- Repetitive patterns: It helps speed up typing of things that have very common and repetitive patterns, like creating a new constructor or a data structure, or a repetition of a test setup in a test suite. I traditionally use a lot of copy and paste for these things, and Copilot can speed that up.
When a colleague who had been working with Copilot for over 2 months was pairing with somebody who did not have a license yet, he “found having to write repetitive code by hand excruciating”. This autocomplete-on-steroids effect can be less useful though for developers who are already very good at using IDE features, shortcuts, and things like multiple cursor mode. And beware that when coding assistants reduce the pain of repetitive code, we might be less motivated to refactor.
However: You can use a coding assistant to explore some ideas when you are getting started with more complex problems, even if you discard the suggestion afterwards.
Smaller size of the suggestions
Safer waters: The smaller the generated suggestion, the less review effort is needed, the easier the developer can follow along with what is being suggested.
The larger the suggestion, the more time you will have to spend to understand it, and the more likely it is that you will have to change it to fit your context. Larger snippets also tempt us to go in larger steps, which increases the risk of missing test coverage, or introducing things that are unnecessary.
However: I suspect a lot of interplay of this factor with the others. Small steps particularly help when you already have an idea of how to solve the problem. So when you do not have a plan yet because you are less experienced, or the problem is more complex, then a larger snippet might help you get started with that plan.
More experienced developer(s)
Safer waters: Experience still matters. The more experienced the developer, the more likely they are to be able to judge the quality of the suggestions, and to be able to use them effectively. As GitHub themselves put it: “It’s good at stuff you forgot.” This study even found that “in some cases, tasks took junior developers 7 to 10 percent longer with the tools than without them”.
However: Most of the observations I have collected so far have been made by more experienced developers. So this is one where I am currently least sure about the trade-offs at play. My hypothesis is that the safer the waters are from the other factors mentioned above, the less likely it is that the tools would lead less experienced developers down the wrong path, and the higher the chance that it will give them a leg up. Pair programming and other forms of code review further mitigate the risks.
Higher margin for errors
I already touched on the importance of being able to judge the quality and correctness of suggestions. As has been widely reported, Large Language Models can “hallucinate” information, or in this case, code. When you are working on a problem or a use case that has a higher impact when you get it wrong, you need to be particularly vigilant about reviewing the suggestions. For example, when I was recently working on securing cookies in a web application, Copilot suggested a value for the Content-Security-Policy
HTTP header. As I have low experience in this area, and this was a security related use case, I did not just want to accept Copilot’s suggestions, but went to a trusted online source for research instead.
In conclusion
There are safer waters for coding assistance, but as you can see from this discussion, there are multiple factors at play and interplay that determine the usefulness. Using coding assistance tools effectively is a skill that is not simply learned from a training course or a blog post. It’s important to use them for a period of time, experiment in and outside of the safe waters, and build up a feeling for when this tooling is useful for you, and when to just move on and do it yourself.
Thanks to James Emmott, Joern Dinkla, Marco Pierobon, Paolo Carrasco, Paul Sobocinski and Serj Krasnov for their insights and feedback