How can we build impactful mass-market products with technology that makes mistakes? And how do solve and account for those misstakes in product development.
Written by
Thijs Verreck
Published on
Over the weekend, I was trying to come up with a new training plan for my marathon later this year. After fidgeting in Notion for a little bit and looking at my Strava, I came up with the idea to ask ChatGPT. Even though the results look right on first glance, the plan clearly was wrong.
I think we can now say that this is an example of a ‘bad’ way to use a large language model (LLM). Rule 1: LLMs are not databases. They don’t produce precise factual answers; they generate responses based on probabilities, not certainties. Today’s LLMs can't guarantee completely accurate answers. While the answer might be right, there’s no guarantee.
Quite a few people that I speak to dismiss LLMs as useless, drawing parallels to the skepticism around crypto and NFTs. However, I think this is a misunderstanding. LLMs are extremely good at generating what a good answer might look like. There are use-cases where ‘looks like a good answer’ is exactly what you want, and others where ‘roughly right’ is ‘precisely wrong’.
For instance, even though my training plan was completely wrong, the initial framework was there. It generated all the right building blocks, it just was not able to put them together correctly. After working on it for 15 minutes with my notes and Strava, I now have a passsable plan.
Similary, when I'm working on Prototyper, an answer that looks just right might be exactly what you need. It gives you the skeleton for your next feature, or app. Leaving the tiny details to you, the professional.
As such, I think that there are two ways to approach this issue. One is to treat it as a science problem. We know that models will improve over time.
The other, and my preferred method, is to treat it as a product problem. The question then becomes: how do we build useful products around models that we know will make mistakes?
AI experts often respond to such situations by saying “you’re holding it wrong” — I asked the wrong kind of question in the wrong way. I should have done more prompt engineering. However, I believe that user adoption of new technology doesn't work when you are forcing users to learn complex commands. Products get adopted because they make cutting-edge technology user-friendly.
There are two main product design problems here. First, the product communicates certainty when the model itself is uncertain. Google provides ten blue links, which suggests “it’s probably one of these.” In contrast, LLMs give one ‘right’ answer, which can be misleading.
Second, the product doesn’t guide users on what kinds of questions it can answer well. If the product tries to answer anything, it’s harder for the model to be accurate and harder for the interface to communicate good questions. This is something that I'm still working on for Prototyper, and is hard to get right,
One approach to solving this is the completely general-purpose chatbot-as-product, which has its challenges. I've experienced that when I was building (Frodo)[https://frodo.getaprototype.com] Another approach is to narrow the domain of the product, creating a custom UI that communicates what the model can and cannot do. This is why coding assistants and 'copilot' tools have been successful recently. They show up when the context and setting is right.
Another approach is abstracting the AI so the user doesn’t even know it’s there. The model powers some capability, making it faster and easier to build that capability without the user knowing it’s AI. This is how most machine learning has been integrated into software—new features or better, faster capabilities that aren’t labeled as AI. An example of a company that is very bullish on this approach is Apple. A lot of product features, such as crash detection are based on edge ML models, even though Apple never brings it up.
New technologies start by solving existing problems. Incumbents integrate these technologies as features. Startups then use the technology to unbundle incumbents and create something truly native to the new technology. Think of Instagram using smartphone cameras and filters, or Snap and TikTok using touch screens, video, and location to create native experiences.
This creates a paradox: a general-purpose technology needs to be deployed as single-purpose tools and experiences. Electric motors are general-purpose, but you buy a car, a washing machine, and a blender—not a box of motors and batteries. Similarly, computers and smartphones replaced many single-purpose tools, but each function is achieved through specialized software.
Quite a few people in the tech industry believe LLMs might eventually bypass this pattern and become all-encompassing AGIs. Whilst I agree, that’s an exciting possibility. It’s not where we are today. So, I think we need to start focussing on what we can build today to change the world. Just in case that LLMs don’t ever reach that level of sophistication.