Building a ChatGPT clone in minutes with Semantic Kernel and Ollama

Aug 11, 2025

Large language models (LLMs), such as ChatGPT, Claude, and Copilot, have taken the world by storm. Everyone is talking about them, including both technical professionals and non-technical people. There is so much interesting stuff you can do with them.

But did you know that, if you happen to be a .NET developer, there is an easy way of integrating any of these models into your application, regardless of what kind of application you are building and where you intend to run it? You can build any kind of AI functionality into your application, as long as LLMs support it. You can use any advanced LLM features, such as RAG and MCP. You can build chatbots, fully autonomous AI agents, you name it.

So, how do you do it, you may ask? The answer is Semantik Kernel. It is a collection of .NET libraries that provide a simple interface to facilitate interactions between LLMs and custom code.

But it gets even better than this. Not all LLMs charge you extortionate amounts of money to use them. Some of them are completely free and open source with no strings attached. Some of them, like Meta’s Llama, are free with some conditions. These are the models you can host yourself without relying on someone else’s server. And yes, Semantic Kernel supports them all!

Today, I will demonstrate to you how easy it is to use Semantic Kernel for building a fully functional intelligent chatbot that is as good as an early version of ChatGPT. You’ll be surprised by how easy it is!

Hosting an LLM anywhere

So, how exactly can you host an LLM locally (and for free)? Well, you can do so by using Ollama.

If you haven’t heard of it before, Ollama is a lightweight, developer-friendly platform for running large language models (LLMs) locally or in controlled environments, without relying on cloud-hosted AI APIs. It provides a streamlined way to download, manage, and execute models like LLaMA (made by Meta), DeepSeek, Mistral, and Gemma directly on your machine, with simple commands and minimal setup.

One of Ollama’s key benefits is local execution, which gives you full control over your data and model behavior. Since the models run on your own hardware, you avoid sending sensitive information to external servers, which can be critical for privacy, security, and regulatory compliance. Running locally also eliminates recurring API costs and reduces latency, since inference happens right on your system instead of making network calls to a remote service.

Another advantage is portability and flexibility. Ollama works on macOS, Linux, and Windows (via WSL), and it can run a variety of open-source models, letting you choose the best fit for your project rather than being locked into a single vendor’s offering. Moreover, it’s hosted inside a Docker container, which makes it deployable anywhere.

And the good news is, Semantic Kernel can be fully integrated with Ollama. Let’s see how.

Launching the Ollama container

To launch Ollama locally (or on absolutely any other machine), you will need Docker. If you aren’t familiar with what it is, you can find more info here.

Assuming you have your Docker setup, this is the command you should execute to get the Ollama container running:

docker run -d -p 11434:11434 --name ollama ollama/ollama

If you are new to Docker commands, the -p 11434:11434 option maps the open port of the container to the port on your host machine. The port numbers can be anything, but 11434 is the port Ollama normally operates on. If we used the default value, then the container will have the following address on your local machine:

http://localhost:11434

If you want to get your LLMs to use the GPU instead of the CPU, which is generally faster, you can add the --gpus=all option, so your command will look like this:

docker run -d --gpus=all -p 11434:11434 --name ollama ollama/ollama

The -d option means that it’s running in the background. It stands for “detached”.

Next, you will need to get inside the Ollama container. This is the command that will get you there:

docker exec -it ollama bin/bash

Now, all we need to do is pull the actual models we would want to use. This is achieved by executing ollama pull followed by the model name. The full list of models you can use with Ollama can be found here.

You may now ask, how would you choose the model? Well, there are some tradeoffs you need to look out for. If you want to use a good model that supports advanced functionality, like RAG and MCP, then Llama4 would be a good option, which you can load by executing the following command:

ollama pull llama4

However, there’s an important caveat. The more advanced the model is, the bigger it is. Even the light version of it, known as Scout, will occupy 67 GB. The full version, known as Maverick, is 245 GB!

You probably don’t want to give away that much space on your development machine. The good news is that you don’t have to. If you just want to use something that is good enough to build a chatbot with and you don’t care about the additional capabilities, you can use phi3, which only occupies 3 GB and can be loaded via the following command:

ollama pull phi3

You also don’t have to stick to a single model. You can have as many models as you want, and you can switch between them in Semantic Kernel.

Now, once our container is up and running, it’s time to build our Semantic Kernel app in .NET.

Installing the required dependencies

In this exercise, we will create a simple .NET console app. Semantic Kernel comes with several NuGet packages, each designed to work with a specific LLM technology. Since we are working with Ollama, the package we need to install is as follows:

Microsoft.SemanticKernel.Connectors.Ollama

The important point to note is that, at the time of writing, this package is only available as a pre-release. Therefore, if you are in Visual Studio or Rider, ensure that you check the option to include pre-releases while looking for this NuGet package. If you prefer to use the CLI, you can install it by executing the following command in your project folder:

dotnet add package Microsoft.SemanticKernel.Connectors.Ollama --prerelease

That’s it. You are now set to start building your chatbot.

Building the chatbot

Our app will be fairly simple. Therefore, we will not need any additional files other than the default Program.cs. We will start by removing all existing content from this file and populating it with the following namespace references:

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;

These namespaces are needed for the core chat completion functionality.

Next, we add the following code, which will build an Ollama kernel:

#pragma warning disable SKEXP0070
IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddOllamaChatCompletion(
    modelId: "phi3",
    endpoint: new Uri("http://localhost:11434")
);
Kernel kernel = kernelBuilder.Build();

Please note the #pragma warning disable directive. Since it’s a pre-release package version and some features aren’t finalized, it’s needed so the warnings don’t prevent your applictaion from compiling.

The other important points to note are the modelId and enpoint parameters in the AddOllamaChatCompletion() method. modelId is the identifier of the model we want to use, which could be llama4, phi3, etc.. The endpoint parameter is the URI of the system that hosts the model, which, in this case, is a Docker container accessible via port 11434.

The kernel is the core object of the Semantic Kernel system. This is where all types of AI interactions arise. Since we are building a chatbot, the type of interaction we need is called chat completion. We will use the following code to instantiate a chat completion service from the kernel, instantiate chat history, and output the console prompt to instruct the user on what to do next:

var chatCompletionService = kernel
    .GetRequiredService<IChatCompletionService>();

ChatHistory history = [];

Console.WriteLine(
    "Please enter the chat message or type exit to leave the applictaion");

So, as the prompt suggests, anything the user types will be treated as a chatbot prompt, except the word exit, which will stop and exit the applictaion.

To enable this, we will create the following loop that reads user messages from the console and outputs replies to them from the :

bool exit = false;

while (!exit)
{
    string userMessage = Console.ReadLine()!;

    if (userMessage.ToLowerInvariant() == "exit")
    {
        exit = true;
        break;
    }

    history.AddUserMessage(userMessage);

    var response = chatCompletionService
        .GetStreamingChatMessageContentsAsync(
        chatHistory: history,
        kernel: kernel
    );

    await foreach (var chunk in response)
    {
        Console.Write(chunk);
    }

    Console.WriteLine();
    Console.WriteLine();
}

Let’s break down what’s going on here:

We check if the message that was entered by the user is exit and, if so, the program will terminate.
If the message is anything else, the message is added to the message history object.
The history is passed into the GetStreamingChatMessageContentsAsync() method of the chat completion service, which returns a stream.
We iterate through all chunks from the stream and output them one by one.

So, why are we using a stream? Well, the main reasons are to create the output animation and enable faster output generation. It’s similar to how ChatGPT and Copilot generate their outputs. You don’t have to wait a while for the entire reply to appear all at once. You see the reply being written letter by letter.

Of course, if you work with an agentic chat completion service that runs somewhere in the back-end and doesn’t involve any direct human interaction, streaming output is less important. For this purpose, we can use the GetChatMessageContentAsync() method instead of GetStreamingChatMessageContentsAsync() to return the entire output in one go as soon as it’s ready.

Wrapping up

That’s it. We have completed a fully functioning chatbot by using Semantic Kernel. That wasn’t hard, was it?

Next time, we will do something way more exciting with it. We will integrate it with .NET Aspire, and we will build a fully functioning ChatGPT clone that can run in the browser. So, watch this space!

In the meantime, if you want to know more about me and what I do, you can have a look at my website. If you need hands-on technical help, expert advice, or training, you can book a brief call with me.

Until next time!

AI Engineering with Fiodar

Discussion about this post