Getting Started with LlamaSharp

If you want to play with open source large language models (LLMs) privately or you want to build your own custom solution in .Net, one option is to use LlamaSharp. Read on to learn how to set a Visual Studio 2022 project.

Create a new console application in Visual Studio 2022. Select .Net 8 as the target framework. Using either the Edit project option or the Manage NuGet Packages for Solution tools menu option, add the LlamaSharp NuGet package. If you don’t have a GPU or a sufficiently powerful GPU, you will also need to add the LlamaSharp.Backend.Cpu NuGet package.

If you have a GPU and you would like to run inference on the GPU, follow these instructions since it appears the NuGet packages for GPUs is broken.

If you have a NVidia GPU, download the latest CUDA toolkit 12.3. Here is the link.

Download and install the following files:

https://github.com/ggerganov/llama.cpp/releases/download/b1643/cudart-llama-bin-win-cu12.2.0-x64.zip
https://github.com/ggerganov/llama.cpp/releases/download/b1643/llama-b1643-bin-win-cublas-cu12.2.0-x64.zip
Unzip the first zip file and copy llama.dll to your VS project’s bin directory, changing the .Net version to match your build settings. For example, this is the directory path for .Net 8, which is ..\bin\Debug\net8.0\runtimes\win-x64\native\cuda12
Rename llama.dll to libllama.dll
Unzip the second zip file and copy the 3 three files to the previous location

If you haven’t already downloaded a model, go to Huggingface and download a supported model. I suggest one of the quantized versions of Llama-2-7b-chat. As you can see below, I’m using Q5KM.

Replace the contents of Program.cs with the following:

using LLama.Common;
using LLama;
using LLama.Native;

namespace LocalLLM;

internal class Program
{
    static async Task Main(string[] args)
    {
        Program p = new Program();
        string modelPath = @"D:\LM Studio\TheBloke\Llama-2-7B-Chat-GGUF\llama-2-7b-chat.Q5_K_M.gguf"; // change it to your own model path
        var prompt = "Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.\r\n\r\nUser: Hello, Bob.\r\nBob: Hello. How may I help you today?\r\nUser: Please tell me the largest city in Europe.\r\nBob: Sure. The largest city in Europe is Moscow, the capital of Russia.\r\nUser:"; // use the "chat-with-bob" prompt here.
        NativeLibraryConfig.Instance.WithLogs();

        // Load a model
        
        var parameters = new ModelParams(modelPath)
        {
            ContextSize = 32768,
            GpuLayerCount = 35,
            UseMemoryLock = true, 
            UseMemorymap = true
        };
        using var model = LLamaWeights.LoadFromFile(parameters);

        // Initialize a chat session
        using var context = model.CreateContext(parameters);
        var ex = new InteractiveExecutor(context);
        ChatSession session = new ChatSession(ex);

        // show the prompt
        Console.WriteLine();
        Console.Write(prompt);

        // run the inference in a loop to chat with LLM
        while (prompt != "stop")
        {
            await foreach (var text in session.ChatAsync(new ChatHistory.Message(AuthorRole.User, prompt), new InferenceParams { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } }))
            {
                Console.Write(text);
            }
            prompt = Console.ReadLine() ?? "";
        }

        // save the session
        session.SaveSession("SavedSessionPath");
    }
}

Have fun playing with your own private AI!

About the Author

Chuck Beasley

Leave a Reply

Archives

Categories

You may also like these

The Beauty of Open Source

Researching Generative AI with GPT4All

Generative AI: Values and Dangers

Getting Started with ElasticSearch using .Net Core and NEST