Documentation

Architecture

Learn Steno's two-layer design and core dictation pipeline.

Two Layer Design

Steno follows a strict two-layer structure:

  • StenoKit/
    • pure Swift package
    • protocols, models, services, actors
    • no UI dependencies
  • Steno/
    • SwiftUI app target
    • DictationController, views, onboarding, settings UI, menu bar wiring

This separation keeps business logic testable and reusable outside UI concerns.

Steno (SwiftUI app)
  -> DictationController
    -> SessionCoordinator (actor)
      -> AudioCaptureService
      -> TranscriptionEngine
      -> CleanupEngine
      -> InsertionService
      -> HistoryStore

Pipeline

High-level runtime flow:

  1. Hotkey event triggers start/stop transitions.
  2. Audio capture service records session audio.
  3. Whisper CLI engine transcribes to raw text.
  4. Snippets and style/lexicon context are applied.
  5. Cleanup engine runs (cloud or local fallback).
  6. Insertion service writes text into target app.
  7. Transcript entry is persisted for history.

Fallback behavior is explicit at multiple stages (cleanup and insertion), which is why Steno remains usable even during partial failures.

Concurrency

Steno follows Swift 6 strict concurrency principles.

  • DictationController is @MainActor for UI orchestration
  • SessionCoordinator is an actor for session isolation
  • domain models are value types and Sendable
  • no singleton-heavy global mutable state in core paths

This reduces race conditions around recording state, cleanup decisions, and history persistence.

Extension Points

Common extension areas for contributors:

  • new cleanup engines conforming to CleanupEngine
  • new insertion transports conforming to InsertionTransport
  • alternative context providers or app-specific behaviors
  • benchmark and validation tooling in StenoBenchmarkCore

When adding behavior, prefer protocol-first interfaces in StenoKit/Protocols and concrete implementations in StenoKit/Services.