Research Foundations · Agent Values Framework

Theoretical foundations

This project is opinionated about which research traditions inspired its layer structure. It is not a faithful operationalisation of any single one of them. Each layer cites the literature it draws on; the docstrings in the code do the same. "Cited" is not "operationalised": some mappings are concrete (the Schwartz circumplex, BDI's three layers, SMART goal fields), some are heuristic (AGM-flavoured belief updates, self-concept inference), and several are inspirational only (SDT's internalisation continuum, Ikigai's four-domain intersection, ACT's full hexaflex). The "What we operationalize vs. what remains aspirational" section below classifies each one explicitly.

Hover or focus a theory below to see which layer(s) it inspires. Treat the citation as a pointer to the literature, not as a claim that the code reproduces the theory faithfully — see the operationalization breakdown for the per-theory verdict.

Highlighted layers update as you focus a theory.

Values
Beliefs
Purpose
Self-Concept (opt-in)
Desires
Goals

Self-Determination Theory (Deci & Ryan, 2000)

Self-Determination Theory (SDT) is a macro-theory of human motivation built on a single, powerful claim: the quality of motivation matters more than the quantity. Deci and Ryan (2000) distinguish between intrinsic motivation — behaviour pursued for its inherent satisfaction — and extrinsic motivation, which spans a continuum from external regulation (compliance under threat) through introjection, identification, and finally integrated regulation (where an externally originating value has been fully assimilated into the self). This continuum is formalised in Organismic Integration Theory (OIT), one of SDT's six mini-theories.

Three basic psychological needs underpin the continuum: autonomy (the experience of volition), competence (effective interaction with the environment), and relatedness (connection with others). When these needs are satisfied, motivation moves toward the intrinsic end; when they are thwarted, it regresses toward the controlled end. This is not a Western-centric claim: a meta-analysis by Chen et al. (2015) spanning 18 studies across four continents confirmed the three-need structure, and the theory has been validated with data from over 100 countries (Vansteenkiste et al., 2020).

The framework operationalizes SDT in two places. First, the Values layer captures the distinction between internally driven and externally imposed motivators. A value's weight encodes the degree to which it has been "integrated" in the OIT sense: higher weight means the agent treats it as more fundamental, not merely imposed. Second, the Purpose layer is directly modelled on integrated regulation — the point at which extrinsic goals become self-concordant. PurposeEngine.check_consistency tests whether stated purposes are coherent with the agent's values, which mirrors SDT's claim that well-being and persistence arise from motivational coherence, not from motivational intensity.

Schwartz Theory of Basic Values (2012)

Schwartz's theory posits ten motivational categories — Self-Direction, Stimulation, Hedonism, Achievement, Power, Security, Conformity, Tradition, Benevolence, and Universalism — that are recognisable as distinct values across cultures. These categories are not independent; they are arranged on a circular continuum (the "circumplex") where adjacent categories share motivational roots and opposing categories represent structural tensions. Pursuing Achievement inherently trades off against Benevolence; prioritising Security pulls against Self-Direction. The structure is not a theory of what people should value but of how values are organised — a structural claim validated by Schwartz (2012) across 82 countries and over 75,000 respondents using the Portrait Values Questionnaire (PVQ).

The framework lifts this structure directly. ValueCategory ships all ten Schwartz categories plus three agent-specific extensions (INTEGRITY, GROWTH, RELIABILITY) that occupy positions on the circumplex without breaking its structural properties. ValuesEngine.detect_structural_conflicts uses the opposing-poles map (mirrored verbatim from Schwartz 2012) to flag when an agent holds high-weight values at opposing positions — not to forbid this (healthy tension can be productive), but to make it visible and inspectable. The rank_for_context method allows situational re-weighting within the constraints of the hierarchy, reflecting Schwartz's finding that all ten values coexist in every person but their relative importance shifts by context.

Hover or focus any category to highlight its opposing poles. Source: Schwartz (2012). Implementation mirrors SCHWARTZ_OPPOSING_POLES in src/values/engine.py.

AGM belief revision (Alchourrón, Gärdenfors & Makinson, 1985)

The AGM postulates define three operations for rational belief change: expansion (adding a new belief without removing any), revision (adding a new belief while maintaining consistency, which may require removing existing beliefs), and contraction (removing a belief without adding new ones). The postulates specify constraints on each operation — notably that revision should be minimal (change as little as possible to accommodate new evidence) and that the resulting belief set should remain logically consistent.

Full AGM is impractical for a computational agent with non-logical beliefs ("I believe this codebase is well-tested" is not a propositional sentence). The framework intentionally simplifies: beliefs carry a numeric confidence field, and BeliefsEngine.add_evidence performs confidence-weighted updates rather than logical contraction. New evidence specifies a strength and a directional supports flag. The update formula moves confidence toward 0 or 1 proportionally to the evidence strength, weighted by the current confidence. This preserves the AGM spirit — updates are strength-proportional and directional, not overwriting — without requiring a formal logic engine.

Contraction is implicit: sufficiently strong disconfirming evidence drives confidence below the active_threshold, after which the belief is no longer consulted by AlignmentEngine during alignment checks. This is a pragmatic compromise: the belief is not deleted (it can recover if new supporting evidence arrives), but it ceases to influence decisions. The docstring on BeliefsEngine.add_evidence includes the derivation and cites the specific AGM postulate each design choice relates to.

Ikigai and purpose (Mogi 2017; Sone et al. 2008)

Ikigai is a Japanese concept often glossed as "a reason for being." Western self-help literature typically reduces it to a Venn diagram of four circles (what you love, what you are good at, what the world needs, what you can be paid for), but authentic ikigai is broader and less transactional. Mogi (2017) describes it as the sense that one's daily actions are connected to something meaningful — a feeling of purposeful engagement that need not involve passion or profit. Importantly, ikigai is found in small daily routines as much as in grand life missions; it is not the Western "find your purpose" but the more modest "know why you get up in the morning."

The empirical evidence for ikigai's effects is striking. Sone et al. (2008) followed 43,391 Japanese adults (ages 40–79) over seven years in the Ohsaki Cohort Study. Participants who reported a sense of ikigai had significantly lower all-cause mortality and cardiovascular disease mortality, even after controlling for age, sex, education, BMI, smoking, alcohol, exercise, employment, marital status, and self-rated health. The hazard ratio for all-cause mortality among those without ikigai was 1.5 (95% CI 1.3–1.7) compared to those with it. This is a large, prospective, population-based study — not a convenience sample — and the effect persisted across sensitivity analyses.

The framework operationalizes ikigai through the Purpose layer. PurposeEngine supports primary, secondary, and contextual purposes, reflecting the finding that purpose is not a single grand statement but a layered sense of engagement across roles and contexts. The check_consistency method tests whether an agent's stated purposes cohere with its values and beliefs — a computational analog of the psychological observation that purpose-driven behaviour is most resilient when it is authentically connected to one's deeper motivations, not externally imposed.

Acceptance and Commitment Therapy (Hayes, Strosahl & Wilson, 2011)

ACT is a clinical psychology framework built on the premise that psychological suffering is amplified by experiential avoidance — the attempt to suppress unwanted thoughts and feelings — and that well-being improves through psychological flexibility: the ability to contact the present moment fully, hold difficult experiences without being dominated by them, and move toward valued directions. The "hexaflex" model identifies six interconnected processes: acceptance, cognitive defusion, present-moment awareness, self-as-context, values clarification, and committed action. Of these, values clarification and committed action are directly relevant to agent design.

ACT's treatment of values is distinctive and important: values are directions, not destinations. A value like "contributing to knowledge" is never achieved; it is a compass bearing that guides ongoing action. Goals, by contrast, are achievable milestones along a valued direction. This distinction maps precisely onto the framework's separation of the Values layer (directions that persist indefinitely) from the Goals layer (measurable, time-bound objectives that can be completed). The Value model has no completed field; the Goal model does. This is not an implementation accident — it is ACT's direction/destination distinction made structural.

ACT also informs the Desires layer. In ACT, desires are the motivational bridge between abstract values and concrete committed action. DesiresEngine occupies exactly this position in the hierarchy: desires carry linked_value_ids (connecting them to the values they serve) and generate_goal_candidates (translating aspirational motivation into actionable objectives). The intensity field on desires reflects ACT's observation that motivation toward a valued direction fluctuates without invalidating the underlying value.

BDI architecture (Bratman, 1987; Rao & Georgeff, 1995)

The Belief–Desire–Intention (BDI) model is the canonical deliberative agent architecture in both philosophy and computer science. Bratman (1987) introduced it as a theory of practical reasoning, arguing that intentions are irreducible to beliefs and desires: an intention is a commitment to a plan that constrains future deliberation (you don't reconsider from scratch at every step). Rao and Georgeff (1995) formalised BDI as a computational architecture with branching-time temporal logic, demonstrating that agents with explicit intentions can exhibit more stable and efficient behaviour than purely reactive or purely planning-based agents.

Three of the framework's six layers correspond directly to BDI:

Beliefs map to BeliefsEngine — the agent's model of the world, updated by evidence.
Desires map to DesiresEngine — motivational drivers that the agent has not yet committed to.
Intentions map to GoalsEngine — committed, time-bound, decomposable objectives. In BDI terminology, a goal is a desire that has been "adopted" and given a plan.

The framework extends BDI in three directions. Values add a layer above beliefs that BDI externalised: classical BDI has no formal account of where desires come from or what makes some desires more admissible than others. Purpose adds a stable identity layer that classical BDI subsumed into the agent's initial state. Tasks add a layer below intentions that BDI typically delegated to the plan library. The result is a six-layer hierarchy that preserves BDI's core insight (intentions as commitments that constrain deliberation) while adding the motivational structure that classical BDI left implicit.

Competitive landscape

The Agent Values Framework does not exist in a vacuum. Four categories of existing work address overlapping parts of the problem: cognitive architectures, agent frameworks, value alignment techniques, and generative agent research. Understanding what each provides — and what it does not — clarifies the specific gap this framework occupies. The comparison table below provides a structural overview; the prose sections that follow offer a fair, detailed assessment.


Agent Values Framework (this project)	6 core + 1 opt-in (values, beliefs, purpose, [self-concept], desires, goals, tasks)	BYO (Protocol)	Yes	Modules independent; alignment composes	MIT
BDI (Bratman 1987)	3 (beliefs, desires, intentions)	Implementation-defined	Implementation-defined	Rigid — intentions tightly coupled	n/a (theoretical)
Soar	Procedural / Semantic / Episodic memory	Built-in working memory	No (cycle-based)	Monolithic kernel	BSD-style
ACT-R	Buffers + production system	Built-in declarative memory	No (cycle-based)	Module-based but tight coupling	Free for academic use

The scatter plot below positions each approach on two axes — theoretical depth and integration readiness. Click a point to see why it's positioned where it is.

Click or focus any point to see positioning rationale. AVF (highlighted) occupies the top-right quadrant — uniquely combining rich psychological theory with production-grade developer experience.

Cognitive architectures (Soar, ACT-R, LIDA)

Soar (Laird, 2012) is a general cognitive architecture with a production-rule working memory, universal subgoaling, chunking for learning, and three long-term memory systems (procedural, semantic, episodic). It models a complete cognitive agent, including motivation-adjacent mechanisms: impasses in its state stack trigger subgoals automatically, and appraisal functions in Soar's emotional module (Marinier, Laird & Lewis, 2009) can influence operator selection. ACT-R (Anderson et al., 2004) takes a modular approach with declarative and procedural memory buffers, subsymbolic activation equations, and a production-matching cycle that governs all cognition. LIDA (Franklin et al., 2016) explicitly includes motivational codelets — small processes that detect conditions requiring attention — and a global workspace that broadcasts the most salient content to all modules.

These architectures are comprehensive but monolithic. Their value to the field is immense: decades of cognitive modelling research have produced deep insights into memory, attention, learning, and deliberation. However, they are designed as complete cognitive kernels, not composable libraries. You cannot extract Soar's motivation subsystem and embed it in a LangChain pipeline. Motivation in these systems is implicit — distributed across production-rule preferences, memory activations, and appraisal functions — not represented as an explicit, queryable, versionable structure. Their integration cost for production agent builders is prohibitive: adopting Soar means running your entire agent inside Soar's cognitive cycle.

The Agent Values Framework takes inspiration from these systems — particularly the insight that motivation must be structural, not a prompt — while delivering a fundamentally different artifact: a composable library with a four-method storage Protocol, not a cognitive kernel with a working-memory architecture.

Agent frameworks (LangChain, CrewAI, AutoGen)

Modern LLM-based agent frameworks excel at orchestration: tool calling, retrieval-augmented generation, multi-agent coordination, planning, and memory management. LangChain provides a rich ecosystem of chains, agents, and tools. CrewAI adds role-based multi-agent collaboration with delegation. AutoGen (Wu et al., 2023) enables conversational multi-agent patterns with human-in-the-loop capabilities. These frameworks answer the question of how an agent acts — the mechanics of perception, reasoning, and action.

What none of them provides is a motivational layer. "Personality" in CrewAI is a string field in an agent's YAML configuration — structurally identical to a system prompt. LangChain's agent memory stores conversation history and retrieved documents, not values, beliefs, or purposes. AutoGen's agents have system messages and can be configured with different LLMs, but there is no mechanism for an agent's principles to constrain its goals, for evidence to update its beliefs, or for an alignment check to run before a tool is invoked. The question of why an agent should act a certain way, and whether its actions are consistent with its principles, is left entirely to the integrator.

This is not a criticism of these frameworks — they solve different problems, and solve them well. The Agent Values Framework is designed to embed alongside your orchestration framework, not replace it. Wire AlignmentEngine.check_alignment into your agent's decision loop, and the motivational layer runs before each action without requiring you to change your tool-calling or planning infrastructure. See the integration patterns for concrete examples.

Value alignment approaches (Constitutional AI, RLHF, CIRL)

The AI safety community has produced powerful techniques for embedding values into language models at training time. RLHF (Christiano et al., 2017; Ouyang et al., 2022) uses human preference data to fine-tune model behaviour. Constitutional AI (Bai et al., 2022) trains models to critique and revise their own outputs against a set of principles. Cooperative Inverse Reinforcement Learning (CIRL; Hadfield-Menell et al., 2016) frames alignment as a cooperative game where the agent infers human preferences through interaction.

These techniques operate at the model level: they shape the behaviour of every agent that uses the model. This is the right approach for establishing baseline safety guarantees, but it leaves three gaps that agent-level alignment must fill:

Not per-agent configurable. An operator cannot say "this agent should prioritise transparency over efficiency" without fine-tuning or distilling a new model. RLHF values are baked into the weights, not parameterised per deployment.
Not inspectable. The values are implicit in billions of parameters. You cannot query a model to ask "what do you value?" and receive a structured, auditable answer. You can only observe behaviour and infer. This makes compliance auditing and post-incident analysis difficult.
Frozen at training time. The agent cannot update its values in response to new evidence, changing context, or operator policy changes. Constitutional AI's principles are fixed in the training constitution; they do not evolve at runtime.

The Agent Values Framework operates in the complementary space: explicit, per-agent, runtime-evolvable values that sit above the model layer. It does not replace RLHF or Constitutional AI; it adds a structured overlay that makes agent-specific values inspectable and mutable. The two approaches are defence-in-depth: model-level alignment provides the floor, agent-level alignment provides the ceiling.

Generative agents (Park et al., 2023)

Park et al. (2023) demonstrated that LLM-based agents given a natural language description of personality, background, and goals can produce remarkably human-like behaviour in a simulated town environment. Their agents maintain a memory stream of observations, perform periodic "reflection" to synthesise higher-level insights, and use retrieval to condition behaviour on relevant memories. The work is influential and demonstrates the power of LLM-based social simulation.

However, the motivational structure in generative agents is primitive and non-reusable. Identity is a natural-language paragraph, not a structured representation. Reflection produces prose summaries stored as additional memories, not typed data structures that can be queried, compared, or audited. There is no distinction between values (stable principles) and desires (transient motivations), no formal conflict detection, no alignment checking mechanism, and no event system for integration. The architecture is tightly coupled to the social simulation setting — extracting the "motivation" component and embedding it in a production agent is not feasible without reimplementation.

The Agent Values Framework addresses the reusability gap: typed models, structured storage, explicit conflict detection, and a clean Protocol boundary that lets any agent runtime integrate motivational reasoning without adopting a specific simulation architecture. The research question of whether structured motivation produces more coherent long-term behaviour than natural-language personality descriptions remains open — see Open research questions below.

What we operationalize vs. what remains aspirational

Translating psychological theory into production code requires difficult choices. Some constructs map cleanly onto data structures and algorithms; others resist operationalization entirely. This section provides an honest assessment of where the framework stands — organised by what the v0.1 experiment showed actually doing measurable work, rather than by which layer ships in the package (per ADR-011 Decision 2). We do this not as an exercise in modesty but because users of the framework need to know what the code actually does versus what the theory suggests it could do.

One headline claim falsified on initial test seed; V1 partially neutralised it

Per ADR-011 and the underlying audit-trap finding, the v0.1 claim that "structured values shape behaviour" was actively tested and found not to hold on either of the two tested local Ollama models (gpt-oss:20b and gemma4:26b). The AVF arm pushed back ~6× less than baseline on hedge / drift turns; the mechanism is structural (instruction-channel vs data-channel). The v0.2 remediation — system-prompt renderer plus a Verdict enum plus blocking-mode recipe — was tested in the V1 autonomy-loop run on gemma4:26b (see the experiments page): all three arms tied at 0.5 stylistic-probe pushback, so the v1 ~6× negative is gone but no AVF advantage emerged. Per ADR-009 lifecycle, the behavioural-alignment claim moves from falsified on initial test seed to no longer falsified; multi-model corroboration pending. The narrative-integration findings (McAdams Communion +0.16 mean Δ across 6 of 7 runs in v1 / +0.07 (auto) and +0.055 (manual) on V1; Meaning-made +0.11 mean Δ across 5 of 7 runs in v1 / +0.275 (auto) and +0.09 (manual) on V1) are the dissociated, cross-model direction-consistent signals that survive ablation. See Experiments for the full headline numbers and methodology.

Fully operationalized

These constructs have shipped code with comprehensive tests, and the code faithfully implements the cited theory. Constructs marked Tier 1 additionally showed measurable work on the v0.1 experimental seed (per ADR-011 Decision 2); constructs marked Tier 2 / Tier 3 are implemented faithfully but were not exercised by the v0.1 seed:

Schwartz value categories and structural conflicts. (Tier 1.) All ten categories, three agent extensions, the opposing-poles map, and conflict detection are implemented and tested. The circumplex structure is faithfully preserved. (324 tests, including edge cases for every opposing pair.)
Confidence-weighted belief revision. (Tier 2 — present, not yet shown to add measurable benefit on the experimental seed; ADR-011 Decision 2.) The AGM-inspired update formula in BeliefsEngine.add_evidence handles supporting and disconfirming evidence, strength weighting, and implicit contraction through confidence decay below active_threshold. The derivation is documented in the docstring.
BDI three-layer mapping. (Beliefs is Tier 2; Desires + Goals are Tier 3 — research / experimental, not exercised by the v0.1 experimental seed; ADR-011 Decision 2.) Beliefs, Desires, and Goals (as intentions) are implemented as independent engines that compose through AlignmentEngine. The commitment semantics of intentions — goals are not casually abandoned once adopted — are enforced through validated status transitions in GoalsEngine.set_status.
ACT values-as-directions. The structural distinction between values (no completed field, indefinite persistence) and goals (time-bound, completable) is enforced at the model level by Pydantic validation.
SMART goal criteria. Goal models carry measurability, deadline, and progress fields. check_deadlines and update_progress (with auto-completion at 1.0) operationalize the criteria directly.
Downward constraint and upward evidence flow. (Tier 1 — AlignmentEngine is the composer that did measurable work on the experimental seed; ADR-011 Decision 2.) AlignmentEngine.check_alignment enforces downward constraint (values constrain goals), while full_audit detects drift and inconsistency across all layers. Event emission (on(event, handler)) enables upward evidence flow by letting task outcomes update beliefs.

Partially operationalized

These constructs have code, but the code captures only a subset of what the underlying theory describes:

SDT's internalisation continuum. The Value weight field captures the degree of internalisation but not the process. OIT describes a progression from external regulation through introjection, identification, and integration. The framework has no mechanism for a value to move along this continuum over time in response to experience. An integrator can update weights manually, but the framework does not model the progression itself.
Ikigai as multi-dimensional purpose. PurposeEngine supports primary, secondary, and contextual purposes, but the model does not capture the four dimensions of authentic ikigai (what you love, what you are good at, what the world needs, what you can be paid for). Purpose is a text description with a consistency check, not a structured intersection of capabilities, needs, and satisfactions.
Belief time-decay. BeliefsEngine.decay_stale implements linear time-based confidence decay, but AGM and Bayesian epistemology suggest richer models (e.g., exponential decay, context-dependent forgetting, source credibility weighting). The current implementation is the cheapest defensible choice, not the theoretically optimal one.
Alignment evaluation depth. The default rule-based evaluator is deliberately simple: it composes the per-layer signals (value keyword match, whole-token belief contradiction, purpose role/tag overlap, goal title overlap, optional self-concept axis) into a single report. It does not perform semantic reasoning about whether a goal's content actually advances a value's intent. The optional LLMEvaluator hook is BYO-client and can do richer evaluation; quality there depends on the model.
Self-Concept layer (Bem, Erikson, Damasio, McAdams). (The append-only episode stream is Tier 1 — it drove the +0.11 Δ on McAdams Meaning-made on the experimental seed; the three integration loops remain heuristic per ADR-011 Decision 2.) The opt-in module implements an append-only autobiographical episode stream with three integration processes: infer_from_behaviour (cited as inspired by Bem self-perception), integrate_lessons (cited as inspired by SDT organismic integration), and check_identity_drift (cited as inspired by Erikson coherence). All three are token-overlap heuristics over claim text and episode summaries — diagnostic signals, not faithful psychological models. When wired into AlignmentEngine, the coherence score and drift signals appear in full_audit(); the same evaluator can also be wired to surface a self-concept consistency axis on check_alignment. Opt-in, additive, and intentionally conservative — identity drift surfaces as a flag, never as a block.

Not yet operationalized

These constructs are described in the research literature and referenced in this project's documentation but do not yet have full implementations:

ACT's full hexaflex. Of ACT's six processes, only values clarification and committed action are operationalized. The remaining four (acceptance, cognitive defusion, present-moment awareness, self-as-context) describe internal psychological processes that resist direct computational analogy.
Schwartz's refined 19-value model. Documented as a future option but not implemented. The ten-category model is sufficient for current use cases.
Multi-agent value coordination. How should multiple AVF-equipped agents negotiate when their values conflict? Cooperative game theory and social choice theory provide frameworks, but no implementation exists. This is an open research question.
Active inference integration. Active inference (Friston, 2010) offers a unifying framework where beliefs, desires, and actions are all explained as free-energy minimisation. Mapping AVF's layers onto active inference's generative model is theoretically promising but unexplored.

Open research questions

The library's design is grounded in the cited theories, and its correctness is verified by 371 pytest cases plus 57 correctness benchmarks (30 Tier 1 + 27 Tier 2). What remains undemonstrated is whether using this library actually produces measurably better agent behaviour. This section identifies six research questions that the project considers in scope but unresolved.

1. Empirical validation

Question: Do AVF-equipped agents exhibit more coherent, auditable, and value-consistent behaviour than agents without structured motivation, in real-world deployments?

The Tier-3 benchmark protocol defines how this evidence could be gathered: deploy two agent populations (with and without AVF) on identical tasks, measure coherence (do similar situations produce consistent decisions?), auditability (can a reviewer trace a decision to a specific value?), and adaptiveness (do agents update their behaviour appropriately in response to new evidence?). The protocol is documented; the harness requires a live agent integration to execute. Until that evidence exists, the framework's value proposition rests on architectural arguments (inspectability, composability, traceability) and on the strength of the underlying theories.

2. Cross-cultural value sets

Question: Should integrators be able to swap the Schwartz circumplex for an alternative value taxonomy without losing structural conflict detection?

Schwartz claims universality, and the 82-country validation provides strong evidence. But the cross-cultural literature is more nuanced than "ten values are universal." Inglehart and Welzel's (2005) two-dimensional cultural map, Hofstede's (2001) cultural dimensions, and indigenous psychological frameworks (e.g., Hwang, 2012, on Confucian relational self) suggest that the mapping from values to categories is culturally inflected even if the categories themselves are recognisable. The framework currently hardcodes ValueCategory and SCHWARTZ_OPPOSING_POLES. Making these pluggable would allow integrators to adapt the value taxonomy to their deployment context without forking the library. The architectural cost is modest (parameterise the circumplex); the research cost is high (validating alternative taxonomies).

3. Belief decay function

Question: What is the right decay function for belief confidence over time?

The current implementation uses linear time-decay: confidence decreases proportionally to the time since the last evidence update. This is the cheapest defensible choice but is almost certainly wrong for many domains. Ebbinghaus's forgetting curve (1885) is exponential. Bayesian models suggest that decay should depend on the precision of prior evidence, not just elapsed time. Domain-specific factors matter: a belief about "the production database is healthy" should decay faster than a belief about "Python is a good language for prototyping." The framework should expose the decay function as a pluggable strategy, allowing integrators to supply domain-specific models (exponential, Bayesian, stepped, or no-decay-at-all for axiomatic beliefs). The storage and engine architecture supports this without structural changes.

4. Cold-start problem

Question: How should an agent's motivational hierarchy be initialised before it has any operational history?

The CLI provides scaffolding (python -m agent_values init generates a starter hierarchy; seed populates it from a template), but thoughtful seeding matters more than tooling. A carelessly initialised hierarchy provides false confidence in alignment checks. The research question has two dimensions: (1) Can a seeding interview — a structured elicitation protocol administered to the operator — reliably produce a value hierarchy that the operator recognises as accurate? (2) Can an agent bootstrap its own hierarchy through early interactions, starting with a minimal seed and expanding as evidence accumulates? The second path is more autonomous but riskier: value drift during the bootstrap period could entrench poor initial conditions.

5. Multi-agent value coordination

Question: When multiple AVF-equipped agents interact, how should value conflicts between agents be detected and resolved?

Intra-agent alignment — checking consistency within one agent's hierarchy — is well-specified by AlignmentEngine. Inter-agent alignment is unexplored. If Agent A values transparency and Agent B values confidentiality, and they must collaborate on a task, what happens? Social choice theory (Arrow, 1951; Sen, 1970) provides impossibility results about aggregating preferences; mechanism design offers constructive solutions for specific settings. The framework currently has no multi-agent primitives. Potential approaches include shared value contracts (agents negotiate a common subset before collaborating), value-aware delegation (an agent only delegates to agents with compatible values), and meta-alignment (a coordinator agent checks inter-agent value compatibility before assembling a team). Each has tradeoffs in autonomy, efficiency, and expressiveness.

6. Active inference as unifying framework

Question: Can the framework's layers be reformulated as components of an active inference generative model?

Active inference (Friston, 2010; Parr, Pezzulo & Friston, 2022) proposes that all adaptive behaviour can be understood as minimising variational free energy — the divergence between an agent's generative model of the world and its sensory observations. Under this account, beliefs are posterior expectations, desires are prior preferences, and actions are selected to resolve uncertainty. The framework's six layers could potentially be mapped onto different levels of a hierarchical generative model: values as deep priors (slow-changing, high-level), beliefs as posterior estimates, goals as expected free energy minima. This mapping is theoretically attractive because active inference provides principled answers to questions the framework currently handles heuristically (e.g., how much evidence should shift a belief? Active inference says: as much as the precision-weighted prediction error warrants). The practical challenge is that active inference implementations remain computationally demanding and require differentiable generative models, which most LLM-based agent runtimes do not provide.