Buld and Run Llama CPP Locally with MacOS

Install CMake: brew install cmake
Clone the repo: git clone https://github.com/ggml-org/llama.cpp.git
cd into llama.cpp: cd llama.cpp
Build the build: cmake -B build
Build the bin: cmake --build build --config Release -j 8
Maybe cd into the bin dir (otherwise, all later command will need to be relative): cd build/bin
run llama server: ./llama-server -hf ggml-org/Qwen3-4B-GGUF:Q4_K_M --jinja -c 4000 --host 127.0.0.1 --port 8033

Models will be saved as follows: ~/Library/Caches/llama.cpp/ggml-org_Qwen3-4B-GGUF_Qwen3-4B-Q4_K_M.gguf. The terminal will show a lot of settings and options that are worth reviewing. Opening 127.0.0.1:8033 in a browser will display a UI to chat with the specified model.

Previous: Building a Basic Assembly Machine
Next: 5 Levels of GenAI Engagement Maturity