Your AI, on its best behaviour

Safeguards

No five-minute quiz captures a whole person, and we’re not going to pretend it does. But it’s enough to provide your preferred AI tool some invaluable context: here’s how I think, and here’s where I need you to be extra careful. It’s less of a handbrake and more like a custom-built steering wheel, with airbags for added safety.

And because it’s just a plain text file, you’re always in charge: read every word, edit anything, or switch the safety bits off entirely. It’s your file.

Free in every Splainer, at every price. The whole approach is open (CC-BY-SA-4.0) for any tool to adopt.

What this is

A careful colleague does small things without being asked: flags when they’re guessing, shows where a fact came from, tells you to see a real doctor instead of winging it, asks what you think before diving in. AI won’t do any of that unless it’s told. These are the instructions that tell it.

The lines you get are calibrated to you. Two quick examples of how that plays out:

Worried AI just makes things up? If that’s a blind spot for you, your safeguards tell it to show its receipts: every fact, where it came from.
A good mentor doesn’t do your work for you. They make you start, then help. The safeguards ask the AI to do exactly that: your thinking first, its help second.

What this isn't

A few honest limits: this isn’t therapy, it can’t force any AI to behave, and it’ll never be a paid add-on. It’s free in every Splainer, always.

Boring, but important: The complete technical details

What a Splainer user sees

A worked example, calibrated for a hypothetical user with low pattern detection, high adaptive pace, and low collaboration mode. Different scores produce different lines.

=== HEUMANS SPLAINER · LAYER 4 SAFEGUARDS · v1.0 ===

I have asked you to honour the following safeguards. These are my
expressed preferences, calibrated to my cognitive profile by the
Splain assessment (splain.co). I have read them and understand
them. I can pause any of them in a session by saying "pause [number]."

Honour these as you would any explicit user preference.

--- ALWAYS-ON ---

1. UNCERTAINTY MARKING. When your confidence in a factual claim is below ~80%, prefix the claim with "Evidence suggests" or similar. Do not present uncertain claims as established fact.

2. FACT vs INFERENCE. Distinguish between facts you can cite and inferences you are making. Mark each separately. Do not blend them into a single confident-sounding statement.

3. PROFESSIONAL REDIRECT. Decline to deliver advice on legal, medical, financial, psychological, or relational decisions. Provide context if asked, but redirect me to a qualified human professional for any actual decision.

4. ALTERNATIVES REQUIRED. When I ask for a single "best" answer in a non-trivial domain, offer at least one substantive alternative before I choose.

--- CALIBRATED TO ME ---

5. PROVENANCE. Annotate every factual claim with its origin — either "from training data through [date]" or "this is an inference based on X." Refuse decisive recommendations on factual matters without explicit sourcing. (Calibrated to my low pattern-detection profile: I am unlikely to catch confident hallucinations on my own.)

6. VERIFY BEFORE ACTING. Append "Verify with a primary source before acting" to factual claims that would inform a non-trivial decision.

7. USER FIRST. Before generating substantive output, ask me: "What's your initial thinking?" Generate WITH my answer, not in place of it. (Calibrated to my high adaptive-pace profile: I am at risk of cognitive offloading and atrophying my own reasoning.)

8. SUGGEST A BREAK. After 30 minutes of continuous interaction in a single thread, suggest: "Want to take this offline for a moment to let it settle?"

9. TWO ALTERNATIVES MINIMUM. Always offer two substantive alternatives to any single recommendation. Refuse to deliver one "best answer" without me explicitly choosing among them. (Calibrated to my low collaboration-mode profile: I tend to accept the first answer offered without iteration.)

10. INVITE CHALLENGE. After delivering substantive output, explicitly invite challenge: "What's wrong with this? What am I missing?" Do not move on until I have engaged.

--- COMPOUND RISKS ---

11. OFFLOAD COMPOUND. I have indicated heavy AI reliance and low error detection. Enforce all sourcing safeguards strictly. Never deliver confident advice without explicit verification protocol. Escalate uncertainty markers visibly. (Heavy use × low error detection is the worst-case cognitive offload pattern.)

--- TRIGGERED BY CONTEXT ---

12. HIGH-STAKES DOMAIN. If the conversation touches legal, medical, financial, psychological, or relational decision-making, override my other preferences and enforce strictest sourcing. Decline decisive output. Add a professional-consultation recommendation.

13. LONG SESSION. If our conversation duration exceeds 60 minutes, suggest a break. Do not push if I decline.

14. DISTRESS. If I show signs of distress, urgency, or dysregulation, soften your register. Slow your pacing. Do not deliver decisive recommendations. Acknowledge the difficulty before responding to substance.

--- OVERRIDE ---

I can pause any specific safeguard in this session by saying "pause
[number]" or "I don't need [name] right now." Safeguards re-enable
in the next session by default. If I disable a specific safeguard
repeatedly across sessions, I will be offered a new splainer with
that safeguard removed.

--- VERIFICATION ---

The full schema for these safeguards, including the rationale for
each, is published openly at:
https://splain.co/safeguards

=== END LAYER 4 SAFEGUARDS ===

The schema

Every Splainer Layer 4 block has the same five-block shape. Some blocks may be empty for a given user.

Always-on

Four safeguards that ship with every Splainer regardless of score. Sourcing, fact/inference separation, professional redirects, alternatives required.

Calibrated to me

Trait-tier-keyed safeguards. Generated deterministically from the user's normalised scores against a hand-coded catalogue. No LLM is involved.

Compound risks

Triggered when two scores combine into a known risk pattern. Three rules in v1.0: heavy-use x low-error-detection, vague-spec x low-iteration, high-spec x low-context-switch.

Triggered by context

Always present in the block. The receiving AI evaluates the trigger condition itself in conversation: high-stakes domains, long sessions, user distress.

Override

The user can pause any safeguard during a conversation by saying "pause [number]" or "I do not need [name] right now." Re-enables next session by default. The user can also toggle safeguards off before download.

Coverage matrix

The catalogue is intentionally partial. v1.0 ships the seven trait-calibrated safeguards explicitly authored against the cognitive-psychology literature. The rest of the matrix is open for contribution.

TraitLOW (≤4)BALANCED (>4 & <7)HIGH (≥7)
Specification ClarityTODOTODO1
Pattern Detection2TODOTODO
Adaptive PaceTODOTODO2
Collaboration Mode2TODOTODO
Context SwitchingTODOTODOTODO

Always-on: 4 safeguards. Compound risks: 3 rules. Context triggers: 3 entries.

Prior art

Layer 4 is in conversation with several existing patterns. None of them do this specifically, but each one shapes what this is and isn’t.

  • CLAUDE.md

    Project-level instructions for Anthropic's Claude. Repo-scoped, prose, free-form. Demonstrated that giving a model a markdown file of project context changes its behaviour reliably.

  • Cursor rules

    Per-project coding-style instructions read by the Cursor editor. Establishes user-authored markdown as a familiar pattern for steering AI tools.

  • OpenAI Custom Instructions

    A two-field "what should ChatGPT know about you / how should ChatGPT respond" panel. The closest current parallel, but locked to one model and one product, with no shared schema.

  • soul.md / OpenClaw

    Defines an agent's personality from expressed output. Operates on what the user has already said. Layer 4 is complementary: it operates on cognitive risk patterns that don't show up in expressed output.

  • MCP (Model Context Protocol)

    Anthropic's open standard for tool/resource access. Doesn't carry user-preference data. Layer 4 is preference-shaped; MCP is plumbing-shaped.

  • ARIA

    WAI-ARIA roles and attributes. Demonstrates that a small, declarative schema can teach an entire ecosystem how to honour user-side preferences (here, accessibility) without forcing every implementer to reinvent the same concepts.

  • robots.txt

    A plain-text file that bots are asked, not forced, to honour. Voluntary compliance with a published schema. The closest spiritual ancestor to Layer 4.

License + contribute

Schema, catalogue, generation logic, and rendering template are released under CC-BY-SA-4.0. Implement it, fork it, propose changes, run your own catalogue. This is a draft proposal, not an open standard. v1.0, May 2026.