Building AI Assistants for your Business: Models

What is a model exactly? It’s the starting point for developing your AI assistant. Performance varies, and so does cost and latency. Most companies start with OpenAI’s Chatgpt model for language, but there are many more choices. Let’s dive into this topic and give you a good understanding of how they work, and what trade-offs affect the performance of your assistant.

Types of Models

Large Language Models – Also called Foundational models, receive an input and generate a response. These generative models can perform many tasks including

Completion – Finish the sentence
Summarization – Simplify a large string of text
Translation – Translate from English to French
Extraction – Find the offensive words, or parse PII

Image Models – Take a text input and generate an image.

Audio Models

These can be a fun, yet dangerous technology. It’s novel to simulate a voice of a famous person, but has obvious pitfalls. Consider using generative audio if you’d like the voice of your company president, or spokesperson simulated for flexibility when creating a voice assistant, or voice overs for video for example.

Text (or Image) to Video

Many companies like Colossyan and HourOne are working hard on this. It’s not too much of a stretch to imagine a person, or small group could use a combination of language models, and AI video generation to produce a high quality movie for a fraction of the budget. Whose the next Steven Spielberg working from their bedroom to create a major hit with a $10,000 budget. Case in point, The Marvels movie cost $274 million to produce.

Multi-Modal Models

Next generation models from OpenAI and Google’s Gemini allow users to interact in any ‘modality’. Recent versions of ChatGPT for instance would be in a certain mode to generate an image, or text-to-speech, or language. Now, users can interact with a single model that handles all of the above.