How to Run Your Own Local LLMs: Updated for Early 2025


With Privacy Support

Doing this article has become a yearly favorite, so I plan to add extra value by doing two editions this year.

All of the tools below are free, many are open-source, and there are a wide range of LLMs, SLMs, and LMMs out there.

For the uninitiated:

  1. LLMs - Large Language Models that work only with text.

  2. SLMs - Small Language Models that typically have less than 10B parameters.

  3. LMMs - Large Multi-modal Models that work with text, images, audio, and video.

Use perplexity.ai to learn new terms.

I prefer Perplexity over Google for practically everything these days.

I might do a follow-up article about the best models to use with these tools.

There are 16 tools to run and interact with your local LLMs, all given below.

We will be taking a look at all these tools for Local LLMs.

Explore as many of them as you can.

All of them are useful in their own way.

And, finally: Enjoy!


A data center? Really?

  • Offers a no-code GUI for fine-tuning and deploying LLMs locally.

  • Includes H2OGPT, a user-friendly open-source LLM.

  • Focuses on enterprise-grade features with ease of use.

  • Provides server components for API access and local deployment.

  • This tool focuses on enterprise use cases and supports privacy for your data.

  • The focus is on SLMs and customizing for enterprises.

  • H20.ai is a complete toolkit for deploying LLMs.

  • It provides many additional tools - but their discussion would be out of scope for this article.

  • You can find the tool at https://h2o.ai/products/h2o-llm-studio/

Yes - Coffee is The Drink of the Developer!

  • Popular desktop application with a very user-friendly GUI.

  • Easily download models from Hugging Face and chat with them.

  • Operates as a local server in the background for API access.

  • Simple setup and configuration for local LLM experimentation.

  • Many popular tutorials for local LLMs use LM Studio.

  • This tool has an active and vibrant community around it.

  • It also provides compatibility checks and full integration with HuggingFace - see the bonus section.

  • With quantization, large models can run on small (tiny) hardware like 8 GB VRAM.

  • The link to go to is https://lmstudio.ai/

The return of the Llama! (I hope it maintains its hygiene)

  • Designed for incredibly simple command-line and GUI (evolving) LLM serving.

  • Focuses on ease of use and quick deployment of models.

  • Acts as a local server for API interactions.

  • It runs the LLM completely offline.

  • Growing GUI support to enhance user-friendliness.

  • However, security is an issue and requires careful setup.

  • Ollama is known today as the best tool for developers to set up LLMs locally.

  • While it is developer-friendly, even non-developers can use it with existing tutorials.

  • And you can get it from the following website: https://ollama.ai/

Men as engineers is the current stereotype. More women need to enter STEM.

  • Provides a free and open-source desktop application with a user-friendly UI.

  • Offers server components for programmatic API access as well.

  • Runs powerful LLMs on consumer-grade hardware locally.

  • Easy to download and interact with models through the GUI.

  • This tool is friendly for non-developers.

  • This is a powerful tool with thousands of models available.

  • It provides a complete all-in-one setup for running LLMs locally.

  • And you can find this tool at https://gpt4all.io/

Yes, LLMs can be a second brain, but this is going too far!

  • Self-hosted, community-driven alternative to OpenAI, using Docker.

  • Offers an OpenAI-compatible API for easy integration.

  • While backend-focused, community GUIs enhance user interaction.

  • This is a very versatile server supporting a wide range of models and hardware.

  • Since it uses Docker, it can run on all platforms and all devices.

  • This is a powerful tool with several community GUIs available.

  • There are detailed instructions on the website.

  • This tool requires a little technical expertise.

  • You can find this tool at https://localai.io/

Jan.ai looks prettier than I imagined!

  • Cross-platform AI client application with an intuitive GUI.

  • This tool has gained immense popularity of late.

  • Operates as a local server with API access for developers.

  • It markets itself as the future of AI for computing.

  • User-friendly interface for interacting with various AI models.

  • Clean and easy-to-navigate application for local AI tasks.

  • You can access this tool at https://jan.ai/

Well - it got the letters right!

  • Feature-rich web UI for local LLM interaction.

  • Offers a comprehensive API in addition to the web interface.

  • Supports various backends and extensive customization options.

  • Popular for its balance of features and user-friendliness.

  • This is a powerful tool and forks of this tool are very popular.

  • It can run several LLM models locally and is user friendly.

  • You can find the tool at https://github.com/oobabooga/text-generation-webui

VR/AR glasses for development? This is a new one on me!

  • Privacy-focused tool for querying local documents using LLMs.

  • Often includes a user-friendly web UI for document interaction.

  • Processes data locally without sending it to the cloud.

  • Ideal for secure and private document analysis.

  • This tool lives up to its name - no data is sent to the big tech companies.

  • Document analysis with LLMs can now be performed locally.

  • This tool is available at https://github.com/imartinez/privateGPT

The more displays, the better, taken to the ultimate level!

  • High-throughput inference server designed for performance.

  • Deployable locally on a single machine for development or local use.

  • Focuses on speed and efficiency in serving LLMs.

  • Suitable for applications requiring fast local inference.

  • vLLM was a breakthrough for LLM performance with the innovation of PagedAttention.

  • Be warned: this particular tool requires technical expertise.

  • You can find this tool at https://vllm.ai/

I hope those are coolers and not magnetic tapes!

  • Server for serving optimized, compiled LLMs for specific hardware.

  • This tool can run LLMs everywhere - CPUs, tablets, and even mobiles.

  • Provides example web UI demos for interacting with compiled models.

  • Focuses on efficient execution through model compilation.

  • Offers performance benefits through hardware-specific optimizations.

  • This is a promising trend for the future and fully deserves its place on this list.

  • It has universal deployment with native code, and is a highly promising project.

  • You can find this tool at https://mlc.ai/mlc-llm/

No llamas again, but at leat it got the noise-cancelling headphones right!

  • This original file by Georgi Gerganov was the innovation that led to an explosion of local LLMs.

  • It is a highly optimized C++ server for efficient Llama model inference.

  • Widely used as a backend for other tools and standalone servers.

  • Offers robust server API and excellent performance.

  • A foundational tool for local LLM serving.

  • It changed LLM serving forever.

  • The highly remarkable repository is https://github.com/ggerganov/llama.cpp

Does it get more local than this?

  • Server optimized for extremely fast inference of quantized models.

  • Can be deployed as a dedicated server for high-speed applications.

  • Known for its speed when running quantized LLMs.

  • Ideal for resource-constrained environments needing fast inference.

  • This is ideal for edge computing and mobile computing.

  • Embedded systems are also a good use case.

  • You can find this tool at https://github.com/turboderp/exllamav2

This looks like a spaceship!

  • Packages everything to run and serve an LLM into a single executable.

  • Extremely simple deployment for local server scenarios.

  • This is a highly innovative tool which can be used in multiple environments

  • Self-contained server for easy distribution and execution.

  • User-friendly for quick setup and running of LLMs.

  • And you can get this tool from https://github.com/Mozilla-Ocho/llamafile

Does not look like a web, but I'll take it!

  • Enables running large language models directly in the web browser using WebGPU.

  • This is a real innovation that runs locally and without GPUs.

  • Focuses on client-side, in-browser inference for privacy and offline capabilities.

  • Provides JavaScript APIs to load and execute LLMs within web applications.

  • Demonstrates impressive performance for running models directly in the browser environment.

  • However, you will need to download large models initially.

  • You can get this tool at https://webllm.mlc.ai/

Does not look like a transformer, but I'll take it!

  • Core Python library for building and using NLP models, including LLMs.

  • While not a server itself, it's the foundation for creating custom LLM servers.

  • Extremely flexible and widely used by developers in the LLM space.

  • Enables building highly tailored local LLM serving solutions.

  • This site has democratized AI development for millions.

  • It is the backbone behind nearly every tool listed here.

  • To not include it would make this list incomplete.

  • And it contains nearly 1.5M LLMs, downloadable for free.

  • The website is https://huggingface.co/docs/transformers/index

It looks like an app market - but where are the people?

  • A platform for discovering and exploring AI apps, including LLM demos.

  • There are more than 400,000 apps on the Hugging Face App Market today.

  • Spaces showcase user interfaces for interacting with models.

  • Provides a marketplace to find inspiration and examples of LLM applications.

  • Can be used to test and prototype UI ideas for local LLM projects

  • Provides payment for developers to sell their AI models

  • All of this allows entry to the AI market with minimal technical knowledge, democratizing AI.

  • You can access the app market at the following link: https://huggingface.co/spaces

There is no sector that LLMs will not disrupt.

With the correct guidance, Generative AI will reshape the world as we know it.

Often, one might find oneself in a situation where you do not want your data to leave your system.

This is especially true for enterprises and governments.

At such times, these tools will be invaluable.

Cutting-edge research is another sector where you do not want your data to leave your enterprise.

You can deploy the tool of your choice to a centralized server hosted by the company.

The server must be air-gapped to the public and use one of the tools on this list.

I sincerely hope you change the world with your Generative AI research.

All the best for your journey!

The future of AI is world changing and magical - although this picture overdid it!

  1. Unite:

    https://www.unite.ai/best-llm-tools-to-run-models-locally/Unite.AI article reviewing top tools for running LLMs locally, updated for 2025.

  2. DataCamp:

    https://www.datacamp.com/tutorial/run-llms-locally-tutorial: DataCamp tutorial on methods and tools for running LLMs locally, with practical guidance.

  3. Getstream:

    https://getstream.io/blog/best-local-llm-tools: GetStream blog listing the best tools for local LLM execution, with detailed insights.

  4. H2O LLM Studio:

    https://h2o.ai/products/h2o-llm-studio/ - Official product page for H2O LLM Studio, a no-code GUI for LLM fine-tuning and deployment.

    https://github.com/h2oai/h2ogpt - GitHub repository for H2OGPT, H2O.ai's open-source large language model.

  5. LM Studio:

    https://lmstudio.ai/

    Official website for LM Studio, a user-friendly desktop application for running LLMs locally.

  6. Ollama:

    https://ollama.ai/

    Official website for Ollama, designed for simple command-line and GUI-based local LLM serving.

  7. GPT4All:

    https://gpt4all.io/

    Official website for GPT4All, providing a free and open-source ecosystem for local LLMs.

  8. LocalAI:

    https://localai.io/

    Official website for LocalAI, a self-hosted, community-driven local AI server compatible with OpenAI API.

  9. text-generation-webui (oobabooga) API:

    https://github.com/oobabooga/text-generation-webui -

    GitHub repository for text-generation-webui (oobabooga), a feature-rich web UI for local LLMs.

  10. Jan:

    https://jan.ai/

    Official website for Jan, a cross-platform AI client application with local LLM support.

  11. PrivateGPT:

    https://github.com/imartinez/privateGPT

    GitHub repository for PrivateGPT, a privacy-focused tool for local document Q&A using LLMs.

  12. FastChat:

    https://github.com/lm-sys/FastChat

    GitHub repository for FastChat, a research platform for training, serving, and evaluating LLMs.

  13. vLLM:

    https://vllm.ai/

    Official website for vLLM, a high-throughput and efficient LLM inference server.

  14. MLC LLM:

    https://mlc.ai/mlc-llm/

    Official website for MLC LLM, focusing on machine learning compilation for efficient LLM execution.

    https://github.com/mlc-ai/mlc-llm

    GitHub repository for MLC LLM, containing code and examples for local execution.

  15. llama.cpp:

    https://github.com/ggerganov/llama.cpp

    GitHub repository for llama.cpp, a project focused on efficient C++ inference of Llama models.

  16. ExLlamaV2:

    https://github.com/turboderp/exllamav2

    GitHub repository for ExLlamaV2, known for fast inference of quantized LLMs.

  17. WebLLM:

    https://webllm.mlc.ai/

    Official website for WebLLM, enabling in-browser LLM execution using WebGPU.

  18. llamafile:

    https://github.com/Mozilla-Ocho/llamafile

    GitHub repository for llamafile, packaging LLMs into single executable files for easy deployment.

  19. Hugging Face Transformers:

    https://huggingface.co/docs/transformers/index

    Documentation for Hugging Face Transformers library, a core Python library for NLP models.

  20. Hugging Face App Market (Spaces):

    https://huggingface.co/spaces

    Hugging Face Spaces, a platform for hosting and discovering AI application demos.

I still don't understand why AI is depicted as feminine!

First published on https://hackernoon.com/you-can-run-these-16-llms-locally-no-questions-asked

Google AI Studio was used in this article. It is available at this link: https://ai.google.dev/aistudio

All images created by the Flux AI Art Generation Models at Night Cafe Studio: https://creator.nightcafe.studio/explore

While I do not monetize my writing directly, your support helps me continue putting articles like this one out without a paywall or a paid subscription.

If you want articles like this one online, you can get them!

Contact me at:

https://linkedin.com/in/thomascherickal

For your article! (Prices are negotiable and I offer country-wise parity pricing.)

If you want to support my writing, consider a contribution at Patreon on this link:

https://patreon.com/c/thomascherickal/membership

Alternatively, you could buy me a coffee on this link:

https://ko-fi.com/thomascherickal

Cheers!

Comments