Project 01

George AI

A floating AI orb for macOS — voice-first, privacy-focused, built entirely in Swift + Python. Runs a local LLM on-device via Apple MLX, cascades to cloud APIs only when needed, plays chess, plays Arkanoid autonomously via reinforcement learning, and learns your preferences over time.

UI & App

Swift / SwiftUI / AppKit

Local AI

MLX · Llama 3.2 3B 4-bit

Cloud AI

OpenRouter → OpenAI → Gemini

Backend

Python Flask · localhost:8765

Speech In

SFSpeechRecognizer + AVAudioEngine

Memory

Local JSON · never sent to cloud

Media George AI in Action

Demo

George Interactive

George my llm conversing with me, he is able to see the screen and able to see desktop. He sees my flying bird and makes a comment. I store long term memory with json file, he has a long term memory.

🎬

Coming Soon

Video Title Here

Add your description here.

🎬

Coming Soon

Video Title Here

Add your description here.

🎬

Coming Soon

Video Title Here

Add your description here.

Chapter 01 System Architecture & AI Routing

George uses a two-tier intelligence model: a local on-device LLM running on Apple Silicon GPU via mlx_lm for fast, private answers, and a cascade of cloud APIs for questions requiring real-time data.

File	Role
`GeorgeController.swift`	Main Swift controller — mic, speech recognition, screen capture, TTS, all AI calls
`server.py`	Python HTTP server on localhost:8765 — LLM routing, local inference, cloud API calls
`Agent.swift`	Reinforcement learning layer — learns preferences and improves routing decisions over time
`Memory.swift`	Persistent memory of facts, conversation history, and user profile

Smart 3-Step Routing

Step 1 — Fast Pattern Matching — Before invoking any model, server.py runs the query through compiled regex patterns. Zero tokens, zero API calls.

# server.py — fast_route()
def fast_route(user_input, has_cloud):
    if PERSONAL_LOCAL_PATTERNS.search(user_input): return 'LOCAL'
    if not has_cloud: return 'LOCAL'
    if FORCE_CLOUD_PATTERNS.search(user_input): return 'CLOUD'
    if CLOUD_PATTERNS.search(user_input): return 'CLOUD'
    if LOCAL_PATTERNS.search(user_input): return 'LOCAL'
    return None  # ambiguous — go to Step 2python

Pattern	Triggers
`PERSONAL_LOCAL`	Questions about you, your family, health, schedule — always local, never sent to cloud
`FORCE_CLOUD`	"use the internet", "search online", "google it" — bypasses local model entirely
`CLOUD_PATTERNS`	News, scores, stock prices, current events, today's weather
`LOCAL_PATTERNS`	Definitions, jokes, stories, math, general knowledge — anything timeless

Step 2 — LLM Router — If pattern matching returns None, George uses its own local LLM to classify the query. A 4-token inference call at temperature=0 returns only the word LOCAL or CLOUD.

Step 3 — Cloud Cascade — When a query needs the internet, George tries providers in order:

Priority	Provider	Notes
0	wttr.in	Weather queries bypass all LLMs — free, no API key
1	OpenRouter	Tries gpt-4o-mini, gemma-3, llama-3 in order
2	OpenAI	Direct fallback if OpenRouter fails
3	Gemini	Google Gemini 2.0 Flash as final fallback
4	Local fallback	Answers locally with a caveat if all cloud fails

🟡

When any cloud provider answers, the orb displays a yellow ring — you always know when data left your Mac.

Streaming Response Format

{"sentence": "Here is the first sentence.", "source": "local"}
{"sentence": "And here is the second.",      "source": "cloud"}
{"done": true, "full": "Complete text",      "source": "cloud"}json

Chapter 02 Screen Vision

George can look at your screen on demand. Say "Hey George, what's on my screen?" and George takes a screenshot, encodes it, and sends it to a cloud vision model.

Trigger phrases: "what's on my screen" · "describe my screen" · "what do you see" · "look at my screen" · "read the screen" · "summarize this page"

// GeorgeController.swift — handleScreenVision()
let path = "/tmp/george_screen.png"
_ = await shell("screencapture -x " + path)
{ "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }swift

🔒

George never passively monitors your screen. Screen vision is triggered only when you explicitly ask. Screen Recording permission must be granted in System Settings → Privacy & Security → Screen Recording.

Chapter 03 Hardware Integration

George integrates with macOS hardware through official Apple APIs and standard Unix tools — no kernel extensions, no custom drivers.

Microphone

Framework	Role
`AVAudioEngine`	Captures raw audio at 44.1kHz, streams buffers to the speech recognizer
`SFSpeechRecognizer`	On-device speech recognition. A second instance listens for "stop" / "shut up" while George is speaking

GPU — Local AI Inference

The local LLM runs on Apple Silicon GPU (M1/M2/M3/M4) using Apple's MLX framework via mlx_lm. Loaded into GPU memory at startup.

Hardware Stats

Data	Command
CPU usage	`ps -A -o %cpu \| awk`
RAM total / free	`sysctl -n hw.memsize` / `vm_stat`
Battery	`pmset -g batt`
Disk usage	`df -h /`
Network interface	`route get default`
Fan / CPU temp	`istats fan speed` / `istats cpu temp` (optional)

Chapter 04 Memory Engine

George's memory system is what makes it feel like a real companion rather than a stateless chatbot. Everything lives in ~/.george/memory.json — never sent to the cloud.

{ "persona": { "name": "George", "traits": ["curious","warm","witty","direct","honest"] },
  "history": [ /* last 80 conversation turns */ ],
  "facts":   [ /* up to 300 plain-English facts about you */ ] }json

Proactive Fact Mining

Every user message is scanned by a regex-based fact extractor before it reaches the AI. Categories detected:

Name · Spouse/Partner · Children · Extended family · Job · Location · Age · Pets · Hobbies (21 types) · Music genres (27 types)

Correction System

If you say "correction", "remember I...", "actually I...", or "I don't listen to X", George removes the conflicting fact and stores the corrected version.

Rule
Voice-only output — no markdown, no asterisks, no bullet points
Keep replies to 1–3 sentences unless asked for detail
Naturally weave in past facts when they enrich the answer
Never invent facts, news, weather, or current events without real data
Never mention Claude, Anthropic, LLaMA, or that you're an AI — you ARE George

Chapter 05 Orb Visual System

The floating orb is a real-time status display built in pure SwiftUI with 11 distinct visual styles.

The Five Orb States

State	Visual Behavior
`.booting`	Scale pulls in (0.92x), three animated dots cycle below showing loading progress
`.idle`	Slow breathing — scale oscillates 1.0 ↔ 1.07 over 3 seconds
`.listening`	Orb shrinks to 82%, ripple ring expands and fades, live transcription displayed
`.thinking`	Subtle nudge to 1.02x, 6 particles orbit the perimeter
`.speaking`	Rapid pulse between 1.0 and 1.26 scale — fast heartbeat effect

The 11 Orb Styles

00Classic BlueRadial gradient sphere with blue glow and orbiting particles

01Blue Flower8 rotating ellipse petals orbit the outside of the orb

02Plasma Arc3 partial arcs rotate at 120° offsets — electric effect

03Crystal Gem6 rotating diamond shapes with ice-white facets

04Deep Nebula4 rotating gradient nebula clouds + 12 star dots

05Matrix RainGreen globe with orbiting 0, 1, ▓, ░ characters

06Lava Lamp5 blurred orange/red blobs float and merge inside

07Ghost LightThree concentric rings breathing slowly around a white core

08Aurora5 blurred gradient bands rotating like aurora curtains

09Neon PulseHot pink globe with 3 concentric neon rings pulsing outward

10Vortex6 arc segments colored by hue rotation spin together

💡

Double-tap the orb to quit George. Single taps are ignored — voice is the primary input.

Chapter 06 Chess Engine

George includes a complete, self-contained chess implementation in Swift with no external libraries — full rules, minimax search, alpha-beta pruning, and natural-language move commentary.

Board Representation

Pieces encoded as raw integers 0–12, stored as a flat 64-element array. ChessBoard is a Swift struct (value type) — every speculative move creates an independent copy.

struct ChessBoard {
    var squares: [ChessPiece] = Array(repeating: .empty, count: 64)
    var whiteKingsideCastle  = true
    var whiteQueensideCastle = true
    var blackKingsideCastle  = true
    var blackQueensideCastle = true
    var enPassantSquare: Int = -1
    var whiteToMove = true
}swift

Material Values

Piece	Value	Reasoning
Pawn	100	Baseline unit
Knight	320	Slightly less than bishop
Bishop	330	Bishop pair advantage
Rook	500	Worth ~5 pawns
Queen	900	Worth ~9 pawns
King	20,000	Sentinel — game ends before capture

Minimax with Alpha-Beta Pruning

Searches 4 moves deep. Without pruning: ~810,000 positions (30⁴). Alpha-beta reduces the effective branching factor from ~30 to ~7 — responses under one second on Apple Silicon.

private static func alphabeta(_ board: ChessBoard, depth: Int,
                                alpha: Int, beta: Int, maximising: Bool) -> Int {
    if depth == 0 { return board.evaluate() }
    if b <= a { break }  // beta cutoff — prune remaining branches
}swift

Chapter 07 Reinforcement Learning Game Player

George can play Arkanoid autonomously using a Q-learning agent. It watches the game screen via screenshot, infers state, chooses actions, and sends real keyboard inputs to control the paddle.

Q-Learning Algorithm

# Bellman equation — GamePlayer.swift updateQ()
bestNext = max(qTable[nextState][action] for action in actions)
updated  = current + α × (reward + γ × bestNext − current)
qTable[state][action] = updatedpython

Hyperparameter	Value	Reasoning
Learning rate (α)	0.25	Conservative for noisy screen-based state
Discount factor (γ)	0.95	Longer horizon — brick hits are delayed by many frames
Epsilon (ε)	0.3 → 0.1	Explore freely at first, then mostly exploit learned strategy
Frame rate	50ms / loop	20 decisions per second

Reward Function

Reward	Trigger
+2.0	Ball moving toward paddle AND paddle within 15% of ball
+1.5	A brick was destroyed this frame
+1.0	Ball velocity moving toward paddle (correct anticipation)
-0.5	Ball moving toward paddle but paddle is far away
-2.0	Ball reached the bottom edge (ball lost)

💾

The Q-table is saved to ~/.george/qtable.json. George gets smarter across sessions. On startup: "Starting Arkanoid. I have 847 states learned."

Chapter 08 App Entry Point

main.swift is 46 lines. AppDelegate.swift is 42 lines. Small files, enormous consequences.

Activation Policy

app.setActivationPolicy(.accessory)
// No Dock icon · Not in Cmd+Tab · No menu bar
// George is ambient — always present, never intrusiveswift

Boot Sequence

OS loads George binary → main.swift executes
installCrashHandlers() registers SIGBUS, SIGSEGV, SIGABRT, SIGILL, SIGFPE
NSApplication.shared creates the app singleton
AppDelegate() instantiated and wired to app
setActivationPolicy(.accessory) — hidden from Dock
app.run() — main run loop starts, never returns
applicationDidFinishLaunching fires
NSPanel created and configured (.borderless, .nonactivatingPanel)
Panel positioned bottom-right using NSScreen.main.visibleFrame
GeorgeController() created — Memory + Agent loaded from disk
OrbView SwiftUI view tree created
NSHostingView bridges SwiftUI into AppKit
panel.makeKeyAndOrderFront — orb appears on screen
Task { await george.boot() } scheduled async
Boot: requests mic + speech recognition permissions
Boot: launches server.py as child process
Boot: pings localhost:8765 until server responds
Boot: server.py loads local LLM into Apple Silicon GPU
Boot: state → .idle — breathing animation begins
George speaks a random greeting — fully operational

Chapter 09 Setup & Mobile UI

setup.sh — One-Command Installation

./setup.sh
~/Desktop/George.command   # double-click launcherbash

8 steps: Verify prerequisites → Create ~/.george/ → Copy server.py → Enter API keys → Install mlx-lm → Install yt-dlp → swift build -c release → Download Llama 3.2 3B (~1.8 GB)

Mobile Web UI

A self-contained single-file web app served by server.py at GET /. Open it on any phone, tablet, or computer — no install required. Dark space theme, CSS-only animated orb, streaming text, voice input via Web Speech API.

📱

You can be in another room, send George a query from your phone, and hear the response come out of your Mac speakers. The phone is just a remote control.