Open-source · MIT · v0.1 · Python 3.11+

An experiment in giving
autonomous agents an inner life.

Most autonomous agents express their motivation as a paragraph in a system prompt. This project is a small library and an experiment in what changes when values and autobiographical episodes are expressed as data instead — a values engine, an episode stream, and an alignment composer (Tier 1, the three components that did measurable work on the v0.1 experimental seed), plus a system-prompt renderer that projects engine state back into the instruction channel. Tier 2 (Beliefs, Purpose) and Tier 3 (Desires, Goals) extensions ship in the same package, scoped honestly per ADR-011 Decision 2.

It is grounded in research (Schwartz, BDI, AGM, SDT, Ikigai, ACT, plus Bem / Erikson / McAdams for self-concept), but the implementation is uneven by design. Some pieces faithfully encode the cited theory, others are token-overlap heuristics, and none of it has been validated in a live agent loop yet — in fact one headline claim was actively falsified on the v0.1 test seed (see ADR-011). This site explains what is implemented, what is heuristic, what is cited as inspiration, what is not yet validated, and what was falsified on initial test seed. The experiments page walks through the runs themselves — what worked, what did not, and the verbatim agent journals behind the numbers.

Why this matters → Research grounding GitHub

The layers — Tier 1 (core) · Tier 2 (present) · Tier 3 (research)

Per ADR-011 Decision 2 and ADR-012 Part 3, the layers are grouped by measurable benefit on the v0.1 experimental seed. Tier 1 (Values, the Self-Concept episode stream, Alignment, plus the system-prompt renderer) did measurable work; Tier 2 (Beliefs, Purpose) is present but has not yet been shown to add measurable benefit on the experimental seed; Tier 3 (Desires, Goals) is research / experimental. Stability gradient still runs slowest-at-the-top. Tasks are reasoned about by the framework but executed by the host's runtime.

Slowest-changing layer. Stable principles that constrain everything below. Schwartz's ten universal categories plus three agent-specific extensions, arranged on a circumplex with structural-conflict detection between opposing poles.

ValuesEngine: add, check_action (keyword match), detect_structural_conflicts (Schwartz opposing poles), rank_for_context, export/import.

Schwartz (2012); ACT — Hayes, Strosahl & Wilson (2011).

What this addresses

Agents that persist across sessions tend to drift, contradict themselves, and produce decisions no one can audit. A flat system prompt has no conflict detection, no evolution, no traceability. This library asks whether a structured representation of the agent's values and history makes those properties recoverable — with a Tier 1 surface focused on what the v0.1 experiment showed actually moves the needle (per ADR-011 Decision 2).

Read more →

Research-grounded, honestly mapped

Each layer cites the theory it draws on (Schwartz, AGM, SDT, Ikigai, ACT, BDI; plus Bem / Erikson / McAdams for self-concept). Some mappings are faithful (the Schwartz circumplex, BDI's three layers). Many are inspirational, not operational. The research page is explicit about which is which.

Research foundations →

Honest about evaluation

57 correctness benchmarks (Tier 1 + Tier 2) verify the engines behave as specified. They do not test whether an agent using the framework makes better decisions — that's the Tier-3 protocol, which is designed but not yet run because it needs a live agent integration.

Benchmarks →

Modular, embeddable, opinionated

Each layer engine works standalone. Storage is a five-method Protocol (in-memory, JSON file, SQLite ship; bring your own). Async-first with a thin sync facade. Embed alongside LangChain, CrewAI, or any agent runtime — it's a library, not a platform.

Get started →

Understand the question

Why a flat system-prompt motivational layer hits friction in long-lived agents.

Motivation →

See the inspirations (and their limits)

Eight theories from psychology and agent architecture, with an honest map of what is and isn't operationalised.

Research →

Read the metrics

What we measure, what we cannot yet measure, and why the distinction matters.

Benchmarks →

Try it

Install, seed a small hierarchy, and run an alignment check in five minutes. Disagree freely.

Get started →