How to Run Your Own Local LLMs: Updated for Early 2025

With Privacy Support

Doing this article has become a yearly favorite, so I plan to add extra value by doing two editions this year.

All of the tools below are free, many are open-source, and there are a wide range of LLMs, SLMs, and LMMs out there.

For the uninitiated:

LLMs - Large Language Models that work only with text.
SLMs - Small Language Models that typically have less than 10B parameters.
LMMs - Large Multi-modal Models that work with text, images, audio, and video.

Use perplexity.ai to learn new terms.

I prefer Perplexity over Google for practically everything these days.

I might do a follow-up article about the best models to use with these tools.

There are 16 tools to run and interact with your local LLMs, all given below.

We will be taking a look at all these tools for Local LLMs.

Explore as many of them as you can.

All of them are useful in their own way.

And, finally: Enjoy!

16 Tools to Run LLMs Locally

1. H2O LLM Studio

A data center? Really?

Offers a no-code GUI for fine-tuning and deploying LLMs locally.
Includes H2OGPT, a user-friendly open-source LLM.
Focuses on enterprise-grade features with ease of use.
Provides server components for API access and local deployment.
This tool focuses on enterprise use cases and supports privacy for your data.
The focus is on SLMs and customizing for enterprises.
H20.ai is a complete toolkit for deploying LLMs.
It provides many additional tools - but their discussion would be out of scope for this article.
You can find the tool at https://h2o.ai/products/h2o-llm-studio/

2. LM Studio

Yes - Coffee is The Drink of the Developer!

Popular desktop application with a very user-friendly GUI.
Easily download models from Hugging Face and chat with them.
Operates as a local server in the background for API access.
Simple setup and configuration for local LLM experimentation.
Many popular tutorials for local LLMs use LM Studio.
This tool has an active and vibrant community around it.
It also provides compatibility checks and full integration with HuggingFace - see the bonus section.
With quantization, large models can run on small (tiny) hardware like 8 GB VRAM.
The link to go to is https://lmstudio.ai/

3. Ollama:

The return of the Llama! (I hope it maintains its hygiene)

Designed for incredibly simple command-line and GUI (evolving) LLM serving.
Focuses on ease of use and quick deployment of models.
Acts as a local server for API interactions.
It runs the LLM completely offline.
Growing GUI support to enhance user-friendliness.
However, security is an issue and requires careful setup.
Ollama is known today as the best tool for developers to set up LLMs locally.
While it is developer-friendly, even non-developers can use it with existing tutorials.
And you can get it from the following website: https://ollama.ai/

4. GPT4All

Men as engineers is the current stereotype. More women need to enter STEM.

Provides a free and open-source desktop application with a user-friendly UI.
Offers server components for programmatic API access as well.
Runs powerful LLMs on consumer-grade hardware locally.
Easy to download and interact with models through the GUI.
This tool is friendly for non-developers.
This is a powerful tool with thousands of models available.
It provides a complete all-in-one setup for running LLMs locally.
And you can find this tool at https://gpt4all.io/

5. LocalAI:

Yes, LLMs can be a second brain, but this is going too far!

Self-hosted, community-driven alternative to OpenAI, using Docker.
Offers an OpenAI-compatible API for easy integration.
While backend-focused, community GUIs enhance user interaction.
This is a very versatile server supporting a wide range of models and hardware.
Since it uses Docker, it can run on all platforms and all devices.
This is a powerful tool with several community GUIs available.
There are detailed instructions on the website.
This tool requires a little technical expertise.
You can find this tool at https://localai.io/

6. Jan:

Jan.ai looks prettier than I imagined!

Cross-platform AI client application with an intuitive GUI.
This tool has gained immense popularity of late.
Operates as a local server with API access for developers.
It markets itself as the future of AI for computing.
User-friendly interface for interacting with various AI models.
Clean and easy-to-navigate application for local AI tasks.
You can access this tool at https://jan.ai/

7. text-generation-webui (oobabooga):

Well - it got the letters right!

Feature-rich web UI for local LLM interaction.
Offers a comprehensive API in addition to the web interface.
Supports various backends and extensive customization options.
Popular for its balance of features and user-friendliness.
This is a powerful tool and forks of this tool are very popular.
It can run several LLM models locally and is user friendly.
You can find the tool at https://github.com/oobabooga/text-generation-webui

8. PrivateGPT

VR/AR glasses for development? This is a new one on me!

Privacy-focused tool for querying local documents using LLMs.
Often includes a user-friendly web UI for document interaction.
Processes data locally without sending it to the cloud.
Ideal for secure and private document analysis.
This tool lives up to its name - no data is sent to the big tech companies.
Document analysis with LLMs can now be performed locally.
This tool is available at https://github.com/imartinez/privateGPT

9. vLLM:

The more displays, the better, taken to the ultimate level!

High-throughput inference server designed for performance.
Deployable locally on a single machine for development or local use.
Focuses on speed and efficiency in serving LLMs.
Suitable for applications requiring fast local inference.
vLLM was a breakthrough for LLM performance with the innovation of PagedAttention.
Be warned: this particular tool requires technical expertise.
You can find this tool at https://vllm.ai/

10. MLC LLM:

I hope those are coolers and not magnetic tapes!

Server for serving optimized, compiled LLMs for specific hardware.
This tool can run LLMs everywhere - CPUs, tablets, and even mobiles.
Provides example web UI demos for interacting with compiled models.
Focuses on efficient execution through model compilation.
Offers performance benefits through hardware-specific optimizations.
This is a promising trend for the future and fully deserves its place on this list.
It has universal deployment with native code, and is a highly promising project.
You can find this tool at https://mlc.ai/mlc-llm/

11. llama.cpp

No llamas again, but at leat it got the noise-cancelling headphones right!

This original file by Georgi Gerganov was the innovation that led to an explosion of local LLMs.
It is a highly optimized C++ server for efficient Llama model inference.
Widely used as a backend for other tools and standalone servers.
Offers robust server API and excellent performance.
A foundational tool for local LLM serving.
It changed LLM serving forever.
The highly remarkable repository is https://github.com/ggerganov/llama.cpp

12. ExLlamaV2:

Does it get more local than this?

Server optimized for extremely fast inference of quantized models.
Can be deployed as a dedicated server for high-speed applications.
Known for its speed when running quantized LLMs.
Ideal for resource-constrained environments needing fast inference.
This is ideal for edge computing and mobile computing.
Embedded systems are also a good use case.
You can find this tool at https://github.com/turboderp/exllamav2

13. llamafile

This looks like a spaceship!

Packages everything to run and serve an LLM into a single executable.
Extremely simple deployment for local server scenarios.
This is a highly innovative tool which can be used in multiple environments
Self-contained server for easy distribution and execution.
User-friendly for quick setup and running of LLMs.
And you can get this tool from https://github.com/Mozilla-Ocho/llamafile

14. WebLLM

Does not look like a web, but I'll take it!

Enables running large language models directly in the web browser using WebGPU.
This is a real innovation that runs locally and without GPUs.
Focuses on client-side, in-browser inference for privacy and offline capabilities.
Provides JavaScript APIs to load and execute LLMs within web applications.
Demonstrates impressive performance for running models directly in the browser environment.
However, you will need to download large models initially.
You can get this tool at https://webllm.mlc.ai/

15. Hugging Face Transformers:

Does not look like a transformer, but I'll take it!

Core Python library for building and using NLP models, including LLMs.
While not a server itself, it's the foundation for creating custom LLM servers.
Extremely flexible and widely used by developers in the LLM space.
Enables building highly tailored local LLM serving solutions.
This site has democratized AI development for millions.
It is the backbone behind nearly every tool listed here.
To not include it would make this list incomplete.
And it contains nearly 1.5M LLMs, downloadable for free.
The website is https://huggingface.co/docs/transformers/index

16. Hugging Face App Market (Spaces):

It looks like an app market - but where are the people?

A platform for discovering and exploring AI apps, including LLM demos.
There are more than 400,000 apps on the Hugging Face App Market today.
Spaces showcase user interfaces for interacting with models.
Provides a marketplace to find inspiration and examples of LLM applications.
Can be used to test and prototype UI ideas for local LLM projects
Provides payment for developers to sell their AI models
All of this allows entry to the AI market with minimal technical knowledge, democratizing AI.
You can access the app market at the following link: https://huggingface.co/spaces

Conclusion

There is no sector that LLMs will not disrupt.

With the correct guidance, Generative AI will reshape the world as we know it.

Often, one might find oneself in a situation where you do not want your data to leave your system.

This is especially true for enterprises and governments.

At such times, these tools will be invaluable.

Cutting-edge research is another sector where you do not want your data to leave your enterprise.

You can deploy the tool of your choice to a centralized server hosted by the company.

The server must be air-gapped to the public and use one of the tools on this list.

I sincerely hope you change the world with your Generative AI research.

All the best for your journey!

The future of AI is world changing and magical - although this picture overdid it!

References

Unite:
https://www.unite.ai/best-llm-tools-to-run-models-locally/: Unite.AI article reviewing top tools for running LLMs locally, updated for 2025.
DataCamp:
https://www.datacamp.com/tutorial/run-llms-locally-tutorial: DataCamp tutorial on methods and tools for running LLMs locally, with practical guidance.
Getstream:
https://getstream.io/blog/best-local-llm-tools: GetStream blog listing the best tools for local LLM execution, with detailed insights.
H2O LLM Studio:
https://h2o.ai/products/h2o-llm-studio/ - Official product page for H2O LLM Studio, a no-code GUI for LLM fine-tuning and deployment.
https://github.com/h2oai/h2ogpt - GitHub repository for H2OGPT, H2O.ai's open-source large language model.
LM Studio:
https://lmstudio.ai/
Official website for LM Studio, a user-friendly desktop application for running LLMs locally.
Ollama:
https://ollama.ai/
Official website for Ollama, designed for simple command-line and GUI-based local LLM serving.
GPT4All:
https://gpt4all.io/
Official website for GPT4All, providing a free and open-source ecosystem for local LLMs.
LocalAI:
https://localai.io/
Official website for LocalAI, a self-hosted, community-driven local AI server compatible with OpenAI API.
text-generation-webui (oobabooga) API:
https://github.com/oobabooga/text-generation-webui -
GitHub repository for text-generation-webui (oobabooga), a feature-rich web UI for local LLMs.
Jan:
https://jan.ai/
Official website for Jan, a cross-platform AI client application with local LLM support.
PrivateGPT:
https://github.com/imartinez/privateGPT
GitHub repository for PrivateGPT, a privacy-focused tool for local document Q&A using LLMs.
FastChat:
https://github.com/lm-sys/FastChat
GitHub repository for FastChat, a research platform for training, serving, and evaluating LLMs.
vLLM:
https://vllm.ai/
Official website for vLLM, a high-throughput and efficient LLM inference server.
MLC LLM:
https://mlc.ai/mlc-llm/
Official website for MLC LLM, focusing on machine learning compilation for efficient LLM execution.
https://github.com/mlc-ai/mlc-llm
GitHub repository for MLC LLM, containing code and examples for local execution.
llama.cpp:
https://github.com/ggerganov/llama.cpp
GitHub repository for llama.cpp, a project focused on efficient C++ inference of Llama models.
ExLlamaV2:
https://github.com/turboderp/exllamav2
GitHub repository for ExLlamaV2, known for fast inference of quantized LLMs.
WebLLM:
https://webllm.mlc.ai/
Official website for WebLLM, enabling in-browser LLM execution using WebGPU.
llamafile:
https://github.com/Mozilla-Ocho/llamafile
GitHub repository for llamafile, packaging LLMs into single executable files for easy deployment.
Hugging Face Transformers:
https://huggingface.co/docs/transformers/index
Documentation for Hugging Face Transformers library, a core Python library for NLP models.
Hugging Face App Market (Spaces):
https://huggingface.co/spaces
Hugging Face Spaces, a platform for hosting and discovering AI application demos.

I still don't understand why AI is depicted as feminine!

First published on https://hackernoon.com/you-can-run-these-16-llms-locally-no-questions-asked

Google AI Studio was used in this article. It is available at this link: https://ai.google.dev/aistudio

All images created by the Flux AI Art Generation Models at Night Cafe Studio: https://creator.nightcafe.studio/explore

While I do not monetize my writing directly, your support helps me continue putting articles like this one out without a paywall or a paid subscription.

If you want articles like this one online, you can get them!

Contact me at:

https://linkedin.com/in/thomascherickal

For your article! (Prices are negotiable and I offer country-wise parity pricing.)

If you want to support my writing, consider a contribution at Patreon on this link:

https://patreon.com/c/thomascherickal/membership

Alternatively, you could buy me a coffee on this link:

https://ko-fi.com/thomascherickal

Cheers!

The Singularity Point

Search This Blog