What does the LLM engine (inside the llama) do?
Our engine runs and optimizes your LLM.

It brings several of the latest technologies and research to bear that was able to make ChatGPT from GPT-3, as well as Github Copilot from Codex.

These include, among others, fine-tuning, RLHF, information retrieval specialized for LLMs, and GPU optimization.
How is this different than just using a single provider’s (e.g. OpenAI’s) APIs off the shelf?
3 major reasons from our customers:

1. Data: Use all of your data, rather than what fits into a prompt.
2. Ownership: Own the generative AI you build, rather than give your usage and development data to an external party.
3. Control (cost, latency, throughput): With ownership, you also have more control over the (much lower) cost and (much lower) latency of the model. We expose these controls in an easy interface for your engineering team to customize.
Do you build my own large model?
Yes, the resulting model is very large!

However, what’s exciting is that it builds on the great work before it, such as GPT-3 or ChatGPT. These general purpose models know English and can answer in the general vicinity of your tasks.

We take it to the next level to teach it to understand your company’s language and specific use cases, by using your data.

This means it will do better than a general purpose model on tasks that matter to you, but it won’t be as good as a general purpose model on generic tasks without any data, e.g. putting together a grocery list (unless that’s what your company does).
What LLMs is the LLM engine using under the hood?
We build on the latest generation of models to make your LLM perform best. We work with customers to determine which models are appropriate, depending on your specific use cases and data constraints.

For example, using OpenAI’s well-known GPT-3 and ChatGPT models are a great starting point for some customers, but not others. We believe in following your needs, and using the best models for you.
If I want to export the model and run it myself, can I do that?
Yes, we can deploy to any cloud service.* This is on the roadmap with our early customers!

This includes setup for running our LLM engine in your own environment, e.g. on your AWS/Azure/GCP instances. If you want to, you can export the weights from our engine and you can host the LLM yourself.
How expensive is using your engine to build and use my model?
In the range of $5,000 per use case. This highly depends on your usage (frequency of using the model, complexity of the task, amount of data). The costs are associated with compute from the GPU clusters that we use to train the model during development.

There’s also ongoing cost to run the model after it’s been trained, and that depends on the usage of the model, e.g. from your users, as well as the desired cost and latency you need for your use case.

We are currently 50% the price of using OpenAI for your own internal development.

Have other questions?

Contact Us