Adopting GenAI in day to day tasks at Mobile Engineering.
Niki Belokopytov · April 8, 2024 · 7 min read
“You are just a machine. An imitation of life. Can a robot write a symphony? Can a robot turn a… canvas into a beautiful masterpiece?
“Can you?”
- I, Robot (2004)
(For a higher-level description of AutoScout24’s approach to using GenAI, see GenAI at AutoScout24.)
2023 was the year we will all remember as the year the long-awaited AI breakthrough happened. Instead of conquering the world with an army of killer robots, AI became ubiquitous in our day-to-day tasks, firmly solidifying itself as the next step in productivity (if not creativity).
Generating Code
We tried GenAI for the first time back in March 2023. The initial case was the generation of Unit Tests for a freshly refactored piece of a project in Swift language. While it gave us good results—the coverage was sufficient and the functionality appeaerd to have tested correctly—there were some challenges and it failed our tests. Our engineers needed to review them case by case, fix a number of imports, and provide correct mocks. In another case, the generated unit tests worked out of the box. Overall, it was a good experience and something that we’ll be using more.
Volodymyr Grytsenko, iOS Engineer: “A recent example involved iOS widgets. I wanted to test all possible layout combinations for the title, description, button, and number of results. ChatGPT helped me save a ton of time by providing all possible combinations”
Helpfulness rating: 7/10
Parsing Complex Content: Logs, Classes, Functions, Unclear Idiomatic Constructions
Either when faced with describing functionality with legacy implementation or summing up a complex log file, GenAI has been able to give a hand in seconds. In our experience, the accuracy has not been significantly worse than we could’ve expected from a manual evaluation.
Yahia Allam, Android Engineer: “I used Bing Chat to extract info from a log text. Gave it a log with execution times of multiple functions and asked it for the average time of each function, it was ~90% accurate because it skipped some values”
People less experienced with ObjC code or with RXJava/Kotlin were also able to use GenAI as a pairing partner, describing to them what’s going on in a particularly convoluted piece of code.
Helpfulness rating: 9/10
Autocomplete for Code
Copilot and GenAI have been very helpful for us when generating enum and dictionary entries based on specific patterns. Here is what our engineers have to say:
Alex Codreanu, Android Engineer: “Copilot is great for repetitive tasks, i.e. generating an enum with some default values, especially when the value in the enum is not present on a keyboard: THIRD(position = “³”), and it could go on until the end of time, probably”
Mehmet Emin, Engineering Lead: “When populating some data - let’s say I have “key1: val1” and I need this in a repeating pattern. I usually use Gen AI to get “key2: val2”, “key3: val3”… ”
Volodymyr Grytsenko, iOS Engineer: “ChatGPT is now a valuable colleague for drafting SwiftUI views. Since the approach is code declarative, it’s basically a language that ChatGPT can understand and follow.”
When the author of this article had to do some emergency hands-on work, Copilot made his life easier with its code suggestions, including naming and parameter lists for methods. In some cases, the content of rather boilerplate methods was also auto generated, requiring only minor adjustment for it to work as expected. For a person who is a little rusty around the IDE it was unexpected, but valuable help.
Helpfulness rating: 8/10
Refactoring Suggestions
Outside of purely academic questions on “How can we make this code run faster or consume less memory?,” GenAI has been helpful with its more tactical refactoring suggestions.
Alex Codreanu, Android Engineer: “Copilot is also nice for refactoring, has some ideas on how I want code to look. I found it really useful when writing methods with very clear namings.”
Volodymyr Grytsenko, iOS Engineer: “Copilot Chat recently helped me draft the switch from PromiseKit (a third-party library) to async functions (built into Swift itself).”
Helpfulness rating: 7/10
Autocomplete for ReadMe Files
We were adding a ReadMe file to a newly written module in our Android Project. Quite unexpectedly, Copilot has been accurate in describing the purpose and the details of implementation of certain methods. Unfortunately, when we wanted to add URLs that have more information, we found it suggesting non-existent destinations and hallucinating the contents of these URLs. More complex explanations also needed to be written manually. On the plus side - it had no problems working with Markdown formatting.
Helpfulness rating 6/10
Debugging
Sending a stacktrace to ChatGPT and asking it what seems to be going sideways is going to be the new StackOverflow - mark our words. We had a lot of success in finding issues both in manually written code and in auto generated classes (for example, by Dagger). While not every issue was immediately identified correctly, some of the trickier ones were cracked after providing additional context on the implementation.
Helpfulness rating 8/10
Solving Odd Tasks - SQL, Regex, Excel, Translations, Pictures
At some point, all engineers will encounter an issue that they are not familiar with, where the timeline is too tight to find an expert or to get upskilled on the matter. In these cases, we’ve been using both ChatGPT and BingChat to generate code, text, and pictures for us based on the problem description. More often than not, the code would’ve worked without any additional magic.
Mehmet Emin Deniz, Engineering Lead: “I can generate translations easily. Let’s say I have “hello_string_key:‘Hello!’”, I can quickly generate the same key in 18 different languages easily. Of course, I ask for confirmation from the actual person, but it’s still useful.”
Volodymyr Grytsenko, iOS Engineer: “I can easily transform a JSON from a Swagger file into any language I need. For me, it converts to a Swift Codable model. With a few extra prompts, I can decide if fields are optional and create custom decoders if necessary.”
Vadim, Android Platform Lead: “When I tried to create a new constant for minimum phone number length it suggested the value immediately - 5. I had another one in my mind - 6, but I believe we can trust this result as it is based on knowledge of a large number of developers. I also generated this image for my article with Bing chat. It has some artifacts”
Two of the author’s personal high points were flattening a data structure in BigQuery and writing a nice report, and using crosstab formulas in Excel. Both cases worked out within minutes—quicker than grabbing a coffee.
Helpfulness rating 10/10
Achilles Heel of GenAI — GenericAI
“What has the world come to when you can’t even trust a program?”
- Matrix Resurrections (2021)
Looking at the cases we mentioned, we’ve seen a trend: GenAI did better when it had more detailed input and a more specific question. Apart from that, we’ve also noticed a sharp rise in effectiveness when faced with a task in which the engineer had no expertise.
Every now and then, after hours of being really helpful, GenAI would generate something that would have required the same amount of time to rework as to do from scratch. While annoying, it was still serviceable in the vast majority of the cases it was applied to.
What was not serviceable, though, was that GenAI’s output was always missing the fine detail—facts, numbers, author’s personality and all the other qualities that signal care and professionalism. We did not find GenAI helpful when generating promotion pitches, writing status updates to stakeholders, or making internal announcements, because all of them read generic, manufactured, and redundant.
When queried with additional parameters for a better fit to a task, GenAI would start to imagine or misinterpret context, embellishing some elements and obscuring the others. Of course, GenAI output for these cases was better than nothing, but when modelling a situation of a high-risk, high-impact decision, content provided by GenAI was detrimental to the quality of this decision.
Judging by how the core technology of Large Language Models (LLM) work, it seems natural that the lower the variability of potential responses to a question, the more accurate the response would be. This side of GenAI will become a major, if not revolutionary, multiplier in productivity for the professionals who use it. We’ll find ourselves more capable, more enabled than ever before, at the expense of being a part of an even bigger pool of competition. Now everyone who knows the word “Javascript” can produce Javascript code. Now everyone who knows the word “Cobol” can produce Cobol code.
GenAI was not used when writing this blog post.