One methodology. Five tools. Each one a different expression of the same instinct — human-owned constraints, AI-powered synthesis, and verification at every boundary. The model proposes. Python disposes.
Every MoreSalamander project is built on the same underlying principle: a well-fenced model inside a deterministic scaffold becomes reliable as a system, because the unreliable component is wrapped in reliable ones that decide whether to trust each output — when to retry, when to skip, when to score, when to commit.
This isn't a preference picked up from a book. It was earned through two specific lineages: the pipeline-shape discipline (from the Build It Publisher — named stages, explicit data flow, atomic responsibility), and the verification discipline (from MyMaestro's failure — a study tool that hallucinated, proving that verification must be a system property, not a soft warning at the UI).
Write the constraints before any AI synthesis happens. Constraint documents, schemas, scoring rubrics, series style guides. These are the doctrine. The model works inside them — it never edits them.
Let the model do the high-volume work — drafting prose, generating images, writing scripts, composing narration. The model is one component in the system, not the whole system.
Wrap every model output in pure Python that decides whether to trust the result. The grader can never be an LLM — because the grader cannot be the thing it grades. Reject what fails. Score what passes. Persist only what survives.
The universal writer for the whole suite. An LLM conducts a creative
interview — asking, proposing, reflecting back — until it has enough to
write a complete, detailed spec. One engine, two targets: a
ProductionSpec for my-AI-scene, or a SongSpec
for my-AI-beats. The LLM decides when the interview ends. Python scores
the result.
The interview is the constraint-gathering phase. The chosen target's schema defines every field that must be filled — the LLM's questions serve the schema. The scoring rubric is in the system prompt before the LLM writes a word, so it knows the bar it's aiming for.
Once the LLM is confident it can score well, it generates the full spec from the conversation — every beat and narration line for video, or every section, prompt and energy value for a song. Reference-quality detail, on demand.
A target-specific pure-Python judge scores out of 100 — video on narration/footage/grade, music on prompt specificity and the energy arc. Below 75, the breakdown feeds back and the LLM revises. The scorer is never an LLM.
Collaborative spec authoring where quality is enforced, not hoped for. The scoring incentive — knowing Python will judge the output before it generates — shapes what the LLM produces. It won't settle for vague prompts or a flat energy arc because it knows they score low. Proven live: a synthwave conversation produced a SongSpec that scored 98/100.
A local-first video production engine. It takes a
ProductionSpec — the machine-readable form of a
production guide — and renders it into a graded, scored, assembled
MP4. Narration via Kokoro TTS, visuals via SDXL-turbo, music via
MusicGen, final assembly via ffmpeg. Entirely local, entirely free.
The ProductionSpec is the constraint document.
Every beat declares its narration, footage prompt, music keywords,
and color grade before any model runs. The spec is the law.
Kokoro speaks. SDXL-turbo generates stills. MusicGen composes beds. ffmpeg Ken Burns the stills into motion. Each model fills one stage of the pipeline and no other.
Trust separation baked in: Whisper transcribes Kokoro's output
for narration_verify. CLIP scores SDXL's output for
footage_verify. ffprobe measures the final MP4 for
assembly_verify. The generator never self-grades.
Deterministic video production from a defined spec. Because every stage is verified independently of the model that produced it, the pipeline degrades gracefully — footage fallbacks replace failed stills, music drops without failing the episode, narration retries on transcript mismatch. The worst case is a slightly simpler video, never a corrupt one.
A local-first music generator. It takes a SongSpec — section
structure, key, tempo, and an energy value per section — and renders a
complete, mastered song. MusicGen synthesizes each section, CLAP scores
it, ffmpeg crossfades and loudness-normalizes the master. Entirely local,
entirely free.
The SongSpec is the constraint document. Every section
declares its prompt, bar count, and energy (0–1) before any
model runs. The energy arc — quiet intro, towering chorus, soft outro —
is the song's emotional shape, authored up front.
MusicGen renders each section. The hidden gem: every section after the first is conditioned on the last few seconds of the previous one — audio continuation. Sections don't stitch, they flow. Bark adds optional sung vocals on the side.
CLAP scores each section against its prompt (section_verify);
duration and final-stitch gates are blocking. Vocals are
non-blocking — a failed vocal is dropped, the instrumental
ships. CLAP is the judge; MusicGen never grades itself.
Complete instrumental songs with a real emotional arc, built section by section and flowing as one piece thanks to audio continuation. The same blocking/non-blocking doctrine as my-AI-stories applies: the instrumental is the premise, vocals are enhancement. Proven live: an 8-section indie folk track, 3:48, mastered to −14 LUFS.
A long-form, multi-episode serial fiction engine with multi-voice local TTS. You provide the series bible — characters, theme, plot direction. The system generates full audio episodes, verified beat by beat, with distinct voices per character and a sound design layer that enhances without ever failing the episode.
The series bible is the constraint document. Characters, world rules, plot arcs — written before any synthesis. The LLM fills episodes inside the bible. It never edits the bible.
Ollama generates episode drafts scene by scene. Piper TTS renders each line in the character's assigned voice. Sound cues placed at narrative anchors by a non-blocking mixer.
Three blocking gates: continuity (characters consistent with
bible), structure (beats well-formed), speaker (every line assigned).
One non-blocking gate: cue_verify — unresolvable
sound cues are dropped, never fail the episode.
Episodic narrative audio that stays internally consistent across episodes. The continuity gate means characters don't contradict themselves. The non-blocking sound doctrine means "continuity is the premise, sound is enhancement" — a principle that keeps quality high without making the pipeline fragile.
A local-first, self-improving personal knowledge system. Paste a lesson from anywhere — a course, an article, a video transcript — and the system summarizes it, validates it, and stores it in a Source of Truth. Downstream agents (quizzer, advisor, classroom, general chat) reason over the SOT instead of re-summarizing on every query. Four local Ollama models, each assigned a specific role — the model that owns the SOT never handles ungrounded chat. The system gets better the more it is used.
Four Ollama models with deliberate role separation:
llama3:8b summarizes (owns the SOT),
llama3.1:8b advises (128K context, study guides),
llama3.2 quizzes, chats, and teaches,
mistral judges — never a model that generated
the content it is scoring.
LLM extracts structured summaries, code snippets, key concepts. Chunked for long lessons. Advisor generates full course-wide study plans from SOT context. Classroom builds interactive lesson sessions.
Grounding gates at every persistence boundary — SOT write, Notebook save, Classroom plan persist, raise-hand runtime answers. The audit loop's Judge is a deterministic Python formula, not an LLM. It runs continuously and rotates the SOT toward more-grounded versions over time.
my-AI-stro is the proof of concept for both lineages of the Deterministic Scaffold thesis — simultaneously, in a single working system.
Lineage one: the pipeline-shape discipline. The Build
It Publisher's n8n workflow — named stages, atomic responsibilities,
explicit data flow — was internalized once and then re-encoded in every
project that followed. my-AI-stro is where that discipline runs at full
scale: three coexisting named pipelines (ingestion, advisor, classroom),
all sharing the same NDJSON event vocabulary
(step_start / step_complete /
gate_pass / gate_fail / retry
/ done) that every tool in the suite now speaks.
Lineage two: the verification discipline. MyMaestro hallucinated. The lesson wasn't to write a better prompt — it was that verification must be a system property, not a soft warning at the UI. my-AI-stro was rebuilt from that failure: grounding gates at every persistence boundary, a deterministic Python judge in the audit loop (never an LLM), and a self-improving SOT that rotates toward more-grounded versions over time without human intervention.
The architectural blueprint for the suite. Every tool
that came after inherited its DNA from here. The blocking vs.
non-blocking gate distinction — first encoded in my-AI-stro's hard
validation gates — became my-AI-story's cue_verify doctrine
and my-AI-scene's music_cue_verify. The trust isolation
principle — the model that owns the SOT never handles ungrounded chat —
became the pattern for every verification boundary in the suite: Whisper
verifies Kokoro, CLIP scores SDXL, Python scores the LLM's spec. The
NDJSON vocabulary, the swappable backend protocol, the offline-provable
scaffold — all formalized here first, then carried forward.
my-AI-stro is not the most complex tool in the suite. It is the tool
the rest of the suite was built from.
Every tool in the suite runs the same underlying pipeline pattern: named stages with explicit data flow, a shared NDJSON event vocabulary, and a gate at every boundary that decides whether to trust the model's output. The domain changes. The shape does not.
Every stage in every pipeline emits the same events. Any tool's output can be observed, logged, and streamed to a UI with the same listener — because the vocabulary never changes across tools.
step_start
step_complete
gate_pass
gate_fail
retry
fallback
skip
token
done
error
Hard pass/fail. A blocking failure stops the pipeline at that stage — the model retries within a bounded limit, then falls back or halts. These protect continuity, structure, and correctness. The premise cannot proceed without them.
Soft failures. A non-blocking failure drops the enhancement and continues — a missing music bed, an unresolved sound cue, a low-scoring visual that falls back to neutral. Sound enhances. Continuity is the premise.
The LLM interviews until it can score well, then generates a spec for the
chosen target — a ProductionSpec (video) or a
SongSpec (music). Two blocking gates in sequence:
parse checks structural validity; a target-specific
score checks quality. Both must pass. Failure feeds back
as context for the next attempt.
Per beat, per stage. Blocking ●: Whisper transcribes Kokoro's narration and compares it to the script; CLIP scores SDXL's still against the footage prompt; ffprobe verifies the final MP4. Non-blocking ○: a music bed that fails its gate is dropped — the episode continues without it. The worst case is a slightly simpler video, never a corrupt one.
Per section. The hidden gem ↝: each section is conditioned on the tail of the previous one, so the song flows. Blocking ●: CLAP scores the audio against the section prompt; duration must match the bar count; ffprobe verifies the final master. Non-blocking ○: a failed Bark vocal is dropped — the instrumental ships. Retry then a tone-pad fallback keeps a bad section from corrupting the song.
Three blocking gates protect the story's integrity: continuity (characters
consistent with the series bible), structure (beats well-formed),
speaker (every line assigned to a character). One non-blocking gate:
cue_verify — unresolvable sound cues are dropped, never
fail the episode. Continuity is the premise; sound is enhancement.
Three named pipelines running in the same application, all sharing the same NDJSON event vocabulary. The ingestion pipeline's validation gate uses the deterministic Python judge — not the summarizer, not mistral — to decide whether a lesson's summary meets the grounding threshold before it is written to the SOT. The audit loop runs continuously in the background, independently re-scoring and rotating canonical entries toward more-grounded versions over time.
The model that produces content cannot evaluate its own output. Whisper verifies Kokoro. CLIP scores SDXL. Mistral judges llama3:8b. Python scores the spec the LLM just wrote. Trust separation at every verification boundary.
Every project starts with CONSTITUTION.md and ARCHITECTURE.md before a single line of code. The constraints are written down first. When the code needs to change, the doctrine changes first. The constraint document is the source of truth.
Not all failures are equal. Continuity failures are blocking — a story with inconsistent characters fails. Sound cue failures are non-blocking — a missing ambient bed degrades gracefully. Every gate is classified by whether its failure invalidates the artifact.
No unbounded loops. Every retry path has a maximum, and every maximum has a defined fallback — a neutral clip, a silence, a hard stop. Thrashing is a bug, not a strategy. The system fails predictably or not at all.
Every pipeline is proven with deterministic fakes — ScriptedLLM, ScriptedRenderer, scripted TTS — before any model weights download. The scaffold is the system. The models are one implementation of it. Tests run in seconds on zero dependencies.
Every tool emits the same NDJSON event vocabulary:
step_start / step_complete /
gate_pass / gate_fail / retry
/ done. Named stages, explicit data flow, observable
end-to-end — the Build It Publisher lineage in every project.
In my-AI-stro's audit loop, the summarizer produces better entries because the judge exists and it knows the criteria. In my-AI-script, the LLM produces more detailed specs because the rubric is in its system prompt. Knowing you will be scored changes what you produce.
All synthesis runs on local models — Ollama, Kokoro, SDXL-turbo, MusicGen, faster-whisper, CLIP — on your own hardware. No paid APIs in the render path. The methodology doesn't require a cloud budget. It requires a scaffold.