Why are we building this?

Hey there - welcome to our first blog post. We’ll post more content here soon. For now, let’s start with a little about us and what we’re building.

Who are we?

We are a small team of AI enthusiasts and engineers who want to make it easy for application developers to build applications powered by generative AI.

Pre-trained foundation models have gotten so good that individuals can combine them with normal software to create cool applications. We think that’s a big deal.

We also think developers who want to create useful applications powered by state-of-the-art generative AI models should be able to do it without specialist knowledge about transformer architectures or GPUs.

‍

The problem: developers need better tools to build AI-powered applications

If you’ve tried building a useful and scalable application with generative AI models you know that, right now, things don’t just work. There are plenty of obstacles in the way:

Choosing the right AI model to build with is difficult - there are thousands of options with unclear tradeoffs across response quality, cost, latency and scalability. Measuring model performance across multiple notebooks and browser tabs is awkward
Off-the-shelf AI models underperform or hallucinate on domain-specific tasks - and customizing models is tricky. Data extraction, preparation and vetting is hard and time consuming. Getting it wrong is expensive.
Running AI models at scale is hard - endless cycles of profiling, confusing CUDA errors, GPU shortages. If you’ve tried, you know there’s countless failure modes.

Today, getting around these obstacles is difficult and requires specialist knowledge across data science, machine learning and infrastructure engineering.

Building traditional software used to be difficult and require specialist knowledge. In the late 1990s and early 2000s, web development primarily involved writing raw HTML, CSS, and JavaScript code from scratch. On-premise servers had to be purchased, configured and maintained. Authentication systems, user interface components, security features and database interactions had to be created from scratch.

But things changed. We got fully managed clouds, extensible frameworks (like ruby on rails and react) and re-usable component libraries. We built tools and abstractions to make it much easier to build increasingly sophisticated, dynamic, robust and useful software applications.

Generative AI works a little different than traditional software, so we need to figure out new tools and abstractions to make it easier to build sophisticated, dynamic, robust and useful applications.

‍

A generative AI cloud platform for application developers

Konko AI is a place for application developers to test, customize and run AI models at scale. Konko AI has 3 core modules right now, each designed to solve one of the 3 problems we encountered when building applications powered by large language models ourselves:

Model selection
Model customization
Production infrastructure

‍

#1: Model selection

Within our model selection module you can access and play with state-of-the-art foundation models (e.g., GPT-4-turbo, llama-2-70b, mistral-7b) hand-picked by our team. We like to focus on quality over quantity, so we’ve selected the best models we could find for each category (i.e., chat, code, text-to-SQL, function-calling).

Every model within Konko is supported by our production infrastructure. You may prompt several models simultaneously on our playground and compare the results side-by-side. You can adjust system prompts and model parameters such as temperature to find the optimal configuration for your use-case. You can also save your experiments and share the results with your teammates.

‍

#2: Model customization

Our customization module allows you to fine-tune a foundation model using your proprietary data to better understand the specific context and language patterns of the task or domain it is being fine-tuned for.

You may prompt your fine-tuned model and compare its responses side-by-side against leading foundation models or other variants of your fine-tuned model in our playground.

Any model you create on our platform is yours to own and use as you please.

‍

#3: Production Infrastructure

Our production infrastructure module enables you to run inference fast and cost-effectively at scale. Unlike hyper-scalers like AWS, our stack is optimized for inference from the ground up and built to run commercial applications. We incorporate the latest techniques including PagedAttention, CUDA graphs, optimized CUDA kernels and continuous batching. Our infrastructure is built to arbitrage compute across cloud vendors and data centers. This means faster inference at a lower price. You don’t need to know anything about GPUs or infrastructure configuration, we got that part covered.

Just tell us your choice of model and your application’s response speed, throughput and concurrency requirements, we make it happen. When many users are on your app, we automatically scale up to meet your demand. In low usage times we scale down so you only pay for what you use. Focus on your application code while we handle the infrastructure and drastically reduce your time-to-market.

‍

Looking ahead

By building useful tools, we think we can help developers create increasingly compelling and useful applications powered by generative AI in the same way they build with normal software today.

Lots of people want to build applications powered by generative AI. Soon, generative AI models will be another standard, configurable and re-usable building block in software.

‍

What will you build?

Are you working on an application powered by large language models that you’d like to bring to life? Let us know about it!

You can start testing models on our platform right now.

You can try our web app here. You can also join our Discord server to chat with our team, or send us an email at support@konko.ai. We’d love to hear from you.

The Konko team

Other resources

Engineering

How to increase LLM response quality?