| Local LLM Testing & Benchmarking for Apple Silicon | Community Leaderboard |
π¨ Benchmark analysis is live! Check out the results here, over 375+ community submitted runs analyzed Benchmark Report
Anubis is a native macOS app for benchmarking, comparing, and managing local large language models using any OpenAI-compatible endpoint - Ollama, MLX, oMLX, LM Studio Server, OpenWebUI, Docker Models, etc. Built with SwiftUI for Apple Silicon, it provides real-time hardware telemetry correlated with full, history-saved inference performance - something no CLI tool or chat wrapper offers. Export benchmarks directly without having to screenshot, and export the raw data as .MD or .CSV from the history. You can even OLLAMA PULL models directly within the app.
Tri-state control over Ollamaβs think request parameter, exposed in the Benchmark Performance disclosure when the Ollama backend is selected.
think:true to enable reasoning where supportedthink:false to disable reasoning on models that default it on (e.g. recent DeepSeek-R1 builds)The choice persists across launches.
Output tokens/sec is now visible-throughput only for reasoning models. Previously, thinking time was charged against TTFT and thinking tokens were counted as output, inflating the numbers. Fixes #17 and #18.
reasoning_content, reasoning, or inline <think>β¦</think> tags) and surface it wrapped in <think>β¦</think> markers in the responseAnubis now benchmarks Appleβs on-device Foundation Model alongside Ollama, MLX, and the rest β no server, no network, no setup. If your Mac supports Apple Intelligence (macOS 26+), it shows up in the backend menu automatically.
Apple Intelligence from the backend selector and run; it talks directly to the on-device model via Appleβs FoundationModels frameworkInstructionsExport the per-model performance summary directly from the Reports tab.
Push your Apple Silicon to its limits and observe power draw, thermal throttling, and frequency scaling under controlled load - all from within the Monitor.
yes processes per core. Choose All Cores, P-Cores only, E-Cores only, or Single Corememcpy to saturate the memory bus. Reports measured bandwidth in GB/s, directly comparable to your chipβs theoretical max. Three pressure levels (Light 25% / Moderate 50% / Heavy 75% of free memory)A compact, frameless, always-on-top overlay showing live system metrics - launchable from any tab via the sidebar or from the Monitorβs Float button.
Five new built-in prompts covering causal reasoning, system design, dialogue writing, historical analysis, and constrained writing - bringing the total to 15 across five categories.
The local LLM ecosystem on macOS is fragmented:
Anubis fills that gap - all in a native macOS app.
Real-time performance dashboard for single-model testing.
/v1 suffix from backend URLs to prevent double-pathing errorsSide-by-side A/B model comparison with the same prompt.
Standalone real-time hardware monitoring dashboard - no benchmark required.
Upload your benchmark results to the community leaderboard and see how your Mac stacks up against other Apple Silicon machines.
Unified model management across all backends.
~/.lmstudio/models/ and ~/.cache/huggingface/hub/ for disk size, quantization, and pathAnubis checks for updates automatically via Sparkle and notifies you when a new version is available.
GPU Core detail
Arena Mode
Settings (add connections with quick presets)
Vault - View model details, unload, and Pull models directly for Ollama
| Backend | Type | Default Port | Setup |
|---|---|---|---|
| Apple Intelligence | On-device (Foundation Models) | β | macOS 26+ with Apple Intelligence enabled. No setup; appears in the backend menu when supported. |
| Ollama | Native support | 11434 | Install from ollama.com - auto-detected on launch |
| LM Studio | OpenAI-compatible | 1234 | Enable local server in LM Studio settings |
| mlx-lm | OpenAI-compatible | 8080 | pip install mlx-lm && mlx_lm.server --model <model> |
| vLLM | OpenAI-compatible | 8000 | Add in Settings |
| LocalAI | OpenAI-compatible | 8080 | Add in Settings |
| Docker ModelRunner | OpenAI-compatible | user selected | Add in Settings |
Any OpenAI-compatible server can be added through Settings > Add OpenAI-Compatible Server with a name, URL, and optional API key.
Anubis captures Apple Silicon telemetry during inference via IOReport and system APIs:
| Metric | Source | Description |
|---|---|---|
| GPU Utilization | IOReport | GPU active residency percentage |
| CPU Utilization | host_processor_info |
Usage across all cores |
| GPU Power | IOReport Energy Model | GPU power consumption in watts |
| CPU Power | IOReport Energy Model | CPU (E-cores + P-cores) power in watts |
| ANE Power | IOReport Energy Model | Neural Engine power consumption |
| DRAM Power | IOReport Energy Model | Memory subsystem power |
| GPU Frequency | IOReport GPU Stats | Weighted average from P-state residency |
| Process Memory | proc_pid_rusage |
Backend process phys_footprint (includes Metal/GPU allocations) |
| Thermal State | ProcessInfo.thermalState |
System thermal pressure level |
Anubis automatically detects which process is serving your model:
lsof to find the PID listening on the inference port (called once per benchmark start)phys_footprint (same as Activity Monitor) which includes Metal/GPU buffer allocations - critical for MLX and other GPU-accelerated backendsMetrics degrade gracefully - if IOReport access is unavailable (e.g., in a VM), Anubis still shows inference-derived metrics.
# macOS - install Ollama
brew install ollama
# Start the server
ollama serve
# Pull a model
ollama pull llama3.2:3b
git clone https://github.com/uncSoft/anubis-oss.git
cd anubis-oss/anubis
open anubis.xcodeproj
In Xcode:
Cmd+R)Anubis will auto-detect Ollama on launch. Other backends can be added in Settings.
After a benchmark completes, click the Upload button in the benchmark toolbar to submit your results to the community leaderboard. Enter a display name and your run will appear in the rankings - no account required. Only performance metrics and hardware info are submitted; response text is never uploaded.
# Clone
git clone https://github.com/uncSoft/anubis-oss.git
cd anubis-oss/anubis
# Build via command line
xcodebuild -scheme anubis-oss -configuration Debug build
# Run tests
xcodebuild -scheme anubis-oss -configuration Debug test
# Or just open in Xcode
open anubis.xcodeproj
Resolved automatically by Swift Package Manager on first build:
| Package | Purpose | License |
|---|---|---|
| GRDB.swift | SQLite database | MIT |
| Sparkle | Auto-update framework | MIT |
| Swift Charts | Data visualization | Apple |
Anubis follows MVVM with a layered service architecture:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PRESENTATION LAYER β
β BenchmarkView ArenaView MonitorView VaultView Settings β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β SERVICE LAYER β
β MetricsService InferenceService ModelService Export β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β INTEGRATION LAYER β
β OllamaClient OpenAICompatibleClient IOReportBridge ProcessMonitor β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β PERSISTENCE LAYER β
β SQLite (GRDB) File System β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Views display data and delegate to ViewModels. ViewModels coordinate Services. Services are stateless and use async/await. Integrations are thin adapters wrapping external systems (Ollama API, IOReport, etc.).
anubis/
βββ App/ # Entry point, app state, navigation
βββ Features/
β βββ Benchmark/ # Performance dashboard
β βββ Arena/ # A/B model comparison
β βββ Monitor/ # System monitor, stress tests, floating HUD
β βββ Vault/ # Model management
β βββ Settings/ # Backend config, about, help, contact
βββ Services/ # MetricsService, InferenceService, ExportService
βββ Integrations/ # OllamaClient, OpenAICompatibleClient, IOReportBridge, ProcessMonitor
βββ Models/ # Data models (BenchmarkSession, ModelInfo, etc.)
βββ Database/ # GRDB setup & migrations
βββ DesignSystem/ # Theme, colors, reusable components
βββ Demo/ # Demo mode for App Store review
βββ Utilities/ # Formatters, constants, logger, benchmark prompts
All inference backends implement a shared protocol, making it straightforward to add new ones:
protocol InferenceBackend {
var id: String { get }
var displayName: String { get }
var isAvailable: Bool { get async }
func listModels() async throws -> [ModelInfo]
func generate(prompt: String, parameters: GenerationParameters)
-> AsyncThrowingStream<InferenceChunk, Error>
}
All data is stored locally - nothing leaves your machine.
| Data | Location |
|---|---|
| Database | ~/Library/Application Support/Anubis/anubis.db |
| Exports | Generated on demand (CSV, Markdown) |
| Preferences | UserDefaults |
# Make sure Ollama is running
ollama serve
# Verify it's accessible
curl http://localhost:11434/api/tags
ollama pull <model-name>Contributions are welcome. A few guidelines:
errorDescription and recoverySuggestionIntegrations/ implementing InferenceBackendInferenceServiceSettings/If Anubis is useful to you, consider buying me a coffee on Ko-fi or sponsoring on GitHub. It helps fund continued development and new features.
A sandboxed, less feature rich version is also available on the Mac App Store if you prefer a managed install.
GPL-3.0 License - see LICENSE for details.