anubis-oss

Anubis anubis_icon (1)

macOS 15+ Swift License: GPL-3.0 GitHub Release Ko-fi

Local LLM Testing & Benchmarking for Apple Silicon Community Leaderboard

🚨 Benchmark analysis is live! Check out the results here, over 375+ community submitted runs analyzed Benchmark Report

image

Anubis is a native macOS app for benchmarking, comparing, and managing local large language models using any OpenAI-compatible endpoint - Ollama, MLX, oMLX, LM Studio Server, OpenWebUI, Docker Models, etc. Built with SwiftUI for Apple Silicon, it provides real-time hardware telemetry correlated with full, history-saved inference performance - something no CLI tool or chat wrapper offers. Export benchmarks directly without having to screenshot, and export the raw data as .MD or .CSV from the history. You can even OLLAMA PULL models directly within the app.

image

image


What’s New

Ollama Thinking Toggle (New in 3.2)

Tri-state control over Ollama’s think request parameter, exposed in the Benchmark Performance disclosure when the Ollama backend is selected.

The choice persists across launches.

Reasoning-Aware Metrics & Prefill Speed (New in 3.1)

Output tokens/sec is now visible-throughput only for reasoning models. Previously, thinking time was charged against TTFT and thinking tokens were counted as output, inflating the numbers. Fixes #17 and #18.

Apple Intelligence Backend (New in 3.0) 🍎

Anubis now benchmarks Apple’s on-device Foundation Model alongside Ollama, MLX, and the rest β€” no server, no network, no setup. If your Mac supports Apple Intelligence (macOS 26+), it shows up in the backend menu automatically.

Reports Tab β€” Export (New in 3.0)

Export the per-model performance summary directly from the Reports tab.

Denser Benchmark Dashboard (New in 3.0)

Hardware Stress Testing (New in 2.9)

Push your Apple Silicon to its limits and observe power draw, thermal throttling, and frequency scaling under controlled load - all from within the Monitor.

Floating Monitor HUD (New in 2.9)

A compact, frameless, always-on-top overlay showing live system metrics - launchable from any tab via the sidebar or from the Monitor’s Float button.

15 Benchmark Prompts (New in 2.9)

Five new built-in prompts covering causal reasoning, system design, dialogue writing, historical analysis, and constrained writing - bringing the total to 15 across five categories.


Why Anubis?

The local LLM ecosystem on macOS is fragmented:

Anubis fills that gap - all in a native macOS app.


Leaderboard Submissions Now Available! Submit directly through the app

The dataset is robust and open source - check it out here, please contribute!

Features

Benchmark

Real-time performance dashboard for single-model testing.

Arena

Side-by-side A/B model comparison with the same prompt.

System Monitor

Standalone real-time hardware monitoring dashboard - no benchmark required.

Leaderboard

Upload your benchmark results to the community leaderboard and see how your Mac stacks up against other Apple Silicon machines.

Vault

Unified model management across all backends.

Auto-Update

Anubis checks for updates automatically via Sparkle and notifies you when a new version is available.


Screenshots

GPU Core detail Screenshot 2026-02-25 at 4 08 44 PM

Arena Mode Screenshot 2026-02-25 at 4 21 50 PM

Settings (add connections with quick presets) Screenshot 2026-02-25 at 4 24 00 PM

Vault - View model details, unload, and Pull models directly for Ollama Screenshot 2026-02-25 at 4 14 57 PM


Supported Backends

Backend Type Default Port Setup
Apple Intelligence On-device (Foundation Models) β€” macOS 26+ with Apple Intelligence enabled. No setup; appears in the backend menu when supported.
Ollama Native support 11434 Install from ollama.com - auto-detected on launch
LM Studio OpenAI-compatible 1234 Enable local server in LM Studio settings
mlx-lm OpenAI-compatible 8080 pip install mlx-lm && mlx_lm.server --model <model>
vLLM OpenAI-compatible 8000 Add in Settings
LocalAI OpenAI-compatible 8080 Add in Settings
Docker ModelRunner OpenAI-compatible user selected Add in Settings

Any OpenAI-compatible server can be added through Settings > Add OpenAI-Compatible Server with a name, URL, and optional API key.


Hardware Metrics

Anubis captures Apple Silicon telemetry during inference via IOReport and system APIs:

Metric Source Description
GPU Utilization IOReport GPU active residency percentage
CPU Utilization host_processor_info Usage across all cores
GPU Power IOReport Energy Model GPU power consumption in watts
CPU Power IOReport Energy Model CPU (E-cores + P-cores) power in watts
ANE Power IOReport Energy Model Neural Engine power consumption
DRAM Power IOReport Energy Model Memory subsystem power
GPU Frequency IOReport GPU Stats Weighted average from P-state residency
Process Memory proc_pid_rusage Backend process phys_footprint (includes Metal/GPU allocations)
Thermal State ProcessInfo.thermalState System thermal pressure level

Process Monitoring

Anubis automatically detects which process is serving your model:

Metrics degrade gracefully - if IOReport access is unavailable (e.g., in a VM), Anubis still shows inference-derived metrics.


Requirements


Getting Started

1. Install Ollama (or another backend)

# macOS - install Ollama
brew install ollama

# Start the server
ollama serve

# Pull a model
ollama pull llama3.2:3b

2. Build & Run Anubis

git clone https://github.com/uncSoft/anubis-oss.git
cd anubis-oss/anubis
open anubis.xcodeproj

In Xcode:

  1. Set your development team in Signing & Capabilities
  2. Build and run (Cmd+R)

Anubis will auto-detect Ollama on launch. Other backends can be added in Settings.

3. Run Your First Benchmark

  1. Select a model from the dropdown
  2. Type a prompt or pick one from Presets
  3. Click Run
  4. Watch the metrics light up in real time

4. Submit to the Leaderboard

After a benchmark completes, click the Upload button in the benchmark toolbar to submit your results to the community leaderboard. Enter a display name and your run will appear in the rankings - no account required. Only performance metrics and hardware info are submitted; response text is never uploaded.


Building from Source

# Clone
git clone https://github.com/uncSoft/anubis-oss.git
cd anubis-oss/anubis

# Build via command line
xcodebuild -scheme anubis-oss -configuration Debug build

# Run tests
xcodebuild -scheme anubis-oss -configuration Debug test

# Or just open in Xcode
open anubis.xcodeproj

Dependencies

Resolved automatically by Swift Package Manager on first build:

Package Purpose License
GRDB.swift SQLite database MIT
Sparkle Auto-update framework MIT
Swift Charts Data visualization Apple

Architecture

Anubis follows MVVM with a layered service architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    PRESENTATION LAYER                       β”‚
β”‚  BenchmarkView  ArenaView  MonitorView  VaultView  Settings β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                      SERVICE LAYER                          β”‚
β”‚   MetricsService   InferenceService   ModelService   Export β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                    INTEGRATION LAYER                        β”‚
β”‚  OllamaClient  OpenAICompatibleClient  IOReportBridge  ProcessMonitor β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                    PERSISTENCE LAYER                        β”‚
β”‚   SQLite (GRDB)              File System                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Views display data and delegate to ViewModels. ViewModels coordinate Services. Services are stateless and use async/await. Integrations are thin adapters wrapping external systems (Ollama API, IOReport, etc.).

Project Structure

anubis/
β”œβ”€β”€ App/                    # Entry point, app state, navigation
β”œβ”€β”€ Features/
β”‚   β”œβ”€β”€ Benchmark/          # Performance dashboard
β”‚   β”œβ”€β”€ Arena/              # A/B model comparison
β”‚   β”œβ”€β”€ Monitor/            # System monitor, stress tests, floating HUD
β”‚   β”œβ”€β”€ Vault/              # Model management
β”‚   └── Settings/           # Backend config, about, help, contact
β”œβ”€β”€ Services/               # MetricsService, InferenceService, ExportService
β”œβ”€β”€ Integrations/           # OllamaClient, OpenAICompatibleClient, IOReportBridge, ProcessMonitor
β”œβ”€β”€ Models/                 # Data models (BenchmarkSession, ModelInfo, etc.)
β”œβ”€β”€ Database/               # GRDB setup & migrations
β”œβ”€β”€ DesignSystem/           # Theme, colors, reusable components
β”œβ”€β”€ Demo/                   # Demo mode for App Store review
└── Utilities/              # Formatters, constants, logger, benchmark prompts

Backend Abstraction

All inference backends implement a shared protocol, making it straightforward to add new ones:

protocol InferenceBackend {
    var id: String { get }
    var displayName: String { get }
    var isAvailable: Bool { get async }

    func listModels() async throws -> [ModelInfo]
    func generate(prompt: String, parameters: GenerationParameters)
        -> AsyncThrowingStream<InferenceChunk, Error>
}

Data Storage

All data is stored locally - nothing leaves your machine.

Data Location
Database ~/Library/Application Support/Anubis/anubis.db
Exports Generated on demand (CSV, Markdown)
Preferences UserDefaults

Troubleshooting

Ollama shows β€œDisconnected”

# Make sure Ollama is running
ollama serve

# Verify it's accessible
curl http://localhost:11434/api/tags

No GPU metrics

High memory usage

Model not appearing


Contributing

Contributions are welcome. A few guidelines:

  1. Follow the existing patterns - MVVM, async/await, guard-let over force-unwrap
  2. Keep files under 300 lines - split if larger
  3. One feature per PR - small, focused changes are easier to review
  4. Test services and integrations - views are harder to unit test, but services should have coverage
  5. Handle errors gracefully - always provide errorDescription and recoverySuggestion

Adding a New Backend

  1. Create a new file in Integrations/ implementing InferenceBackend
  2. Register it in InferenceService
  3. Add configuration UI in Settings/
  4. That’s it - the rest of the app works through the protocol

Support the Project

If Anubis is useful to you, consider buying me a coffee on Ko-fi or sponsoring on GitHub. It helps fund continued development and new features.

A sandboxed, less feature rich version is also available on the Mac App Store if you prefer a managed install.


License

GPL-3.0 License - see LICENSE for details.

Other projects: DevPad Β· Nabu