Buld and Run Llama CPP Locally with MacOS
- Install CMake:
brew install cmake - Clone the repo:
git clone https://github.com/ggml-org/llama.cpp.git - cd into llama.cpp:
cd llama.cpp - Build the build:
cmake -B build - Build the bin:
cmake --build build --config Release -j 8 - Maybe cd into the bin dir (otherwise, all later command will need to be relative):
cd build/bin - run llama server:
./llama-server -hf ggml-org/Qwen3-4B-GGUF:Q4_K_M --jinja -c 4000 --host 127.0.0.1 --port 8033
Models will be saved as follows: ~/Library/Caches/llama.cpp/ggml-org_Qwen3-4B-GGUF_Qwen3-4B-Q4_K_M.gguf.
The terminal will show a lot of settings and options that are worth reviewing. Opening 127.0.0.1:8033 in a browser will display a UI to chat with the specified model.