UI is the limiting factor for AI
Why the chat box is constraining the capabilities of AI and how new UI elements can unleash novel use cases
Recently, I got to spend a day in Madrid with my seven year old son. As I normally do, I asked a few friends what we should do, started making a list, and saving their suggestions on a map.
But then I turned to ChatGPT to help us plan the day. I shared my son’s age and interests, a couple of specific places we wanted to visit, some constraints, and asked it to create an itinerary. Spoiler: It failed. But not because of the content. It was the UI that was the problem.
Human vs LLM expertise
When evaluating any new technology or tool, I find it useful to compare the experience to what it would be like to do this same task with a human.
So as I used ChatGPT to plan my day, I mentally compared it to what I would have expected from an “expert” (in this case, someone who knew what to do in Madrid with kids).
I would have expected a human “expert” to ask questions, offer some ideas, listen to my feedback, and help me craft a great plan for our day. They may have offered things like:
“Chocolatería San Ginés is a touristy, but the churros are always a hit with kids.”
“This area of El Retiro Park has an amazing playground for older kids.”
“The zoo is far, but if he really loves animals, it is worth spending a couple of hours there. It would probably be worth just taking a taxi there and back.”
In this case, my expectations of ChatGPT were even higher. Not only did I expect it to be an expert on kids and Madrid, I also expected it to act like a thoughtful, skilled travel agent. I also knew it had the whole Internet at it’s disposal, so assumed it would know things like location, opening hours, admission prices, etc.
Where things fell short
ChatGPT did offer some good ideas and the recommendations were all reasonable. It was the UI that was the problem: The UI didn’t support a conversation that built on itself and didn’t provide any reasonable way to co-create an itinerary. These two key UI changes could have turned this nonfunctional experience into an extraordinary one:
1. Provide a better interface for progressive dialog
A chat interface seems like a natural fit for dialog. But conversations generally aren’t linear. In my case, the dialog wasn’t a simple Q/A. It was an ongoing conversation that required many back and forths.
Because the flow was usually my input > long response > my input > long response, it was hard to reference previous content. I had to scroll a lot to get back to things and couldn’t pin or pull out any items.
It was also hard to go between the macro view (the plan for the full day) and diving into a specific details (like transportation options for getting to the zoo). I wanted to drill down into specific questions, but then be able to re-surface to the main flow of the conversation.
2. Provide a better interface for co-creating
Many of my points from #1 would be moot if there were a better interface for co-creating. Effectively, I was trying to co-create a stable document inside of chat.
I expected ChatGPT to keep track of the items we were considering as we honed the itinerary. But each time I provided feedback, Chat GPT’s responses drifted (some places dropped off while other new ones appeared).
A chat UI doesn’t naturally lend itself well to the creation of an artifact. Chat is a supporting feature — just a means to an end.
Fundamentally, there should have a prominent, persistent view of the artifact that we were co-creating. The artifact should have been front and center. The conversation’s purpose was to support the creation of the itinerary, so chat should have been a secondary element.
What would make a better UI?
These changes would have made this experience dramatically better:
Show the artifact as the primary element and chat as a supporting, secondary element. There are many different UIs that could support this, including a central document with side comments or overlays, etc.
Provide an affordance to drag elements of our conversation into the document e.g. If the LLM suggests a place, like La Campana (below), let a person pull this “object” into the core document. It would also be great to be able to pin or otherwise mark an item for easy retrieval.
Enable multiple views of the content. In this case, I’d love to see the itinerary in a list view and on a map. Other cases might be better supported by different views. (By the way, I still haven’t seen any UIs that do these both well simultaneously — if you know of any, please share!)
What UIs exist today?
ChatGPT’s Canvas feature, Cove.ai, and Notebook LM, have all started to evolve beyond a chat first-UI. When I came across ChatGPT’s Canvas feature, I was hopeful it would work for this use case. But I can’t quite figure out the relationship between chat and a “canvas.” A canvas seems like a one-off document I can spawn from chat, but can’t easily continue to iterate on with chat.
On the other hand, I was blown away by how well Cove.ai solved this use case. In fact, it solved it so well, I considered re-writing this post entirely! Cove.ai starts with chat as the primary input mechanism. But then, like a human, it asks me for more context:
What time do you want to start your day?
How do you want to get around?
Do you have any budget for activities?
At the same time, it also suggests an initial itinerary and recommends activities. I can use chat to modify the itinerary or I can manipulate it directly. Cove.ai doesn’t treat my task as linear, it gives multiple UI affordances for me to manipulate and organize the content. The experience feels much more natural, intuitive, and more human.
Cove.ai’s UI is a great leap forward — simply from expanding the UI elements and thinking of chat as a tool, rather than as the interface.
We all have the opportunity to think beyond the chat box and consider people, their use cases, and what interfaces can best support them. As we introduce new UI primitives and components, we will spawn new use cases, and ignite a flywheel that generates new UIs and new possibilities.