Civic-SLM

Open Source

Civic-SLM is a domain-specialized fine-tune of Qwen2.5-7B-Instruct for U.S. local-government documents — city, county, and township agendas, staff reports, comprehensive plans, minutes, ordinances, and municipal codes. Designed to power civic transparency tools across all 50 states.

Trained on a single Apple Silicon Mac via MLX-LM. Served on whatever runtime you like — MLX, Ollama, LM Studio, llama.cpp, or any OpenAI-compatible endpoint. Released as both MLX-q4 and GGUF Q5_K_M. Documents are crawled with browser-use — one small recipe per jurisdiction.

This project is open source under the MIT license and the source code is available here.

Why

Local government is where most public decisions actually get made, and the documents that drive those decisions — agendas, staff reports, minutes, ordinances — are mostly PDFs buried on legacy CMSes. General-purpose LLMs can read them, but they hallucinate specifics, miss citations, and don’t know the genre. Civic-SLM is a small, open, auditable model trained specifically on this corpus so it can ground answers in the source text, extract structured data from staff reports, and refuse when the context doesn’t support an answer.

Pipeline

Eval-first

The training contract is no training without a baseline. Four benchmarks run against base Qwen2.5-7B before any fine-tuning starts; those numbers are what every subsequent stage has to beat.

BenchWhat it measuresScore
civic_factualityQ&A grounded in held-out docscitation exact-match + word-overlap
refusalrefuses when context lacks the answerrefusal rate (regex + fallback judge)
structured_extractionstaff report → JSONfield-level F1
side_by_sideopen-ended municipal prompts vs base 7B and 72BClaude or local-LLM judge with A/B position swap

Baseline numbers (Qwen2.5-7B-Instruct 4-bit, MLX)

BenchnMeanMedianLatency
factuality100.5010.566637 ms
refusal100.8001.000460 ms
extraction50.2770.000925 ms
side_by_side— (pending 72B comparator)
The bars the fine-tune has to clear. Refusal is already strong — protect it. Extraction is the biggest training opportunity.

Quickstart

uv sync --all-extras
uv run pytest                   # 42 tests across schema, ingest, scorers, synth, train, llm-backend
uv run civic-slm --help

The civic-slm umbrella CLI exposes every stage: doctor, crawl, eval run, eval side-by-side, and train cpt|sft|dpo. See the repo’s docs/USAGE.md for an end-to-end walkthrough and docs/RECIPES.md to add a new jurisdiction.

Status

Scaffold, schemas, ingestion (browser-use + San Clemente demo recipe + a template for any U.S. jurisdiction), 4-bench eval harness, synth pipeline (Anthropic or fully-local backend), MLX training scripts (CPT/SFT/DPO), merge + quantize to MLX-q4 and GGUF Q5_K_M, runtime-agnostic serving, and committed baselines for factuality, refusal, and extraction are all in place. Next up: synth corpus and the first training pass.

Follow along or contribute on GitHub →

Don't miss the next one.

I publish every Tuesday — AI implementation, engineering leadership, and lessons from building at scale. Join the weekly email.