Thanks to the wide variety of building blocks available these days to construct voice-enabled experiences provided by OpenAI, Google, AWS, and Microsoft, the complexity of building voice-enabled assistant apps with conversational capabilities has been drastically reduced.
Background
If a customer had come to us five years ago and asked for a mobile app where users could use their voice to ask questions, and the app would respond naturally, with a friendly voice and drawing from a specific knowledge base relevant to the customer's domain (e.g., a supermarket chain whose customers could ask the app questions like, 'In which store can I find organic eggs?'), I would have quoted a huge project, reached out to many of our partners for assistance, and marked the project as highly risky.
However, since OpenAI released their models for third-party consumption via APIs, they've enabled everyone to integrate Generative AI into their businesses. Later, other major players in the market released their conversational models and added voice support, such as Google's Vertex AI and AWS Lex.
Use Case
We recently had the opportunity to build on top of OpenAI’s real-time API to deliver an app for a customer whose knowledge base was difficult for many people to understand, search, and learn from.
The goal: create a mobile app to help users navigate the knowledge base and allow them to use their voice to query it.
The budget was limited, so we could allocate only a small team of two people—a backend developer and a mobile developer—for a few weeks. We chose React Native for building the app, as we needed to go to market quickly, and this customer did not require a native-grade experience. Additionally, we used a UI Kit for UX/UI development, WebRTC for voice transport, and a Django backend to connect the app to the knowledge base.
Out of the box Generative AI
Before chosing which AI vendor to use, we did a feature vs cost comparison, and OpenAI’s Real-time API was the winner (cost is around $0.40 each five minutes of audio for GPT4o-mini).
We leveraged their agent features to create an agent guiding users through conversations until they reached the desired content.
The heavy lifting of the project included designing the app’s UX, building the API layer to access the knowledge base, and prompt engineering to create the agent, which leveraged the 'functions' feature to pull data from external sources.
When writing the prompt, we had to make sure the agent didn't derail from knowledge base generated responses but also be flexible enough to let the user search using a dozen of different of questions styles.
Connecting the agent to the knowledge base aws also a challenge, as we had to make sure the natural language phrases the user used to search for topics were converted to structured queries that we could send into a database.
Having using Open AI's building blocks made this project possible, taking coplexity down to "just" have to build a regular mobile app. After a few weeks and demos to the customer, the app was live.
At Nimble Gravity, we deliver cost-effective solutions to quickly validate products. Reach out if you think we can help you with similar challenges