Skip to main content

Buld and Run Llama CPP Locally with MacOS

  1. Install CMake: brew install cmake
  2. Clone the repo: git clone https://github.com/ggml-org/llama.cpp.git
  3. cd into llama.cpp: cd llama.cpp
  4. Build the build: cmake -B build
  5. Build the bin: cmake --build build --config Release -j 8
  6. Maybe cd into the bin dir (otherwise, all later command will need to be relative): cd build/bin
  7. run llama server: ./llama-server -hf ggml-org/Qwen3-4B-GGUF:Q4_K_M --jinja -c 4000 --host 127.0.0.1 --port 8033

Models will be saved as follows: ~/Library/Caches/llama.cpp/ggml-org_Qwen3-4B-GGUF_Qwen3-4B-Q4_K_M.gguf. The terminal will show a lot of settings and options that are worth reviewing. Opening 127.0.0.1:8033 in a browser will display a UI to chat with the specified model.