Budgie logo
Budgie

On-Device AI Auto-Categorization

Two on-device models — Qwen3 1.7B for chat and a 768-dim Nomic embedding model — categorize transactions, suggest tags, and learn from your corrections. Your statements never leave the phone.

Why on-device AI is the only AI that protects your statements

Cloud “AI” budgeting apps stream every merchant string to a remote LLM, which often means OpenAI sees your supermarket habits. Budgie loads both models once, then keeps every inference local — same accuracy, zero data exfiltration.

The two-stage pipeline runs embedding lookup first for instant nearest-neighbor categorization from your own history, then falls back to Qwen3 1.7B for novel transactions the embedding index has not seen before. Every accepted or edited suggestion updates the 768-dim index immediately — accuracy compounds over time.

Two-stage categorization flow

Embedding lookup — 768-dim Nomic model finds the nearest historical transaction instantly via sqlite-vec SIMD search

LLM fallback — Qwen3 1.7B Q4 handles novel transactions the embedding index has not seen, proposing category and tags from context

Correction loop — every accepted or edited suggestion updates the embedding index immediately so the next similar transaction lands closer without re-training

What you get

Qwen3 1.7B Q4 model runs entirely on your phone after a one-time download

768-dim Nomic embedding model + sqlite-vec for SIMD-accelerated similarity search

Two complementary signals: vector lookup over your history plus a generative LLM suggestion

Every correction updates the embedding index instantly — accuracy improves as you use it

Statements never leave the device — no OpenAI, no remote inference, ever

How it works

On first run, Budgie downloads Qwen3 1.7B Q4 and a 768-dim Nomic embedding model from the Hugging Face hub. Both are stored in your app sandbox. For each new transaction, the embedding model runs first — if a strong nearest neighbor exists in your history, the result is instant. If not, Qwen3 1.7B generates a suggestion. Your response (accept, edit, or reject) feeds back into the embedding index without any network call.

Frequently Asked Questions

Does the AI work offline?
Yes. Both models live on your device after the one-time download. Categorization runs whether you're online or not.
How big is the model download?
Roughly 1 GB combined: Qwen3 1.7B Q4 for the language model and a 768-dim Nomic embedding model. The download happens on first use of AI features and is fully optional — you can keep using Budgie without AI.
Can I correct the AI's suggestions?
Always. Every transaction lets you accept, edit, or reject the suggestion. Your corrections feed back into the 768-dim embedding index immediately so the next similar transaction lands closer to the right category.
Does Budgie use OpenAI or any cloud LLM?
No. Inference uses ONNX Runtime locally. There is no fallback to a cloud model and no telemetry about your transactions.

Ready to take Budgie for a spin?

Join the waitlist — be first to try the offline-first expense tracker.