Research Foundations
Academic theories that ground each layer of the motivational hierarchy, how the framework compares to existing approaches, what is genuinely operationalized versus aspirational, and the open questions that remain.
Theoretical foundations
This project is opinionated about which research traditions inspired its layer structure. It is not a faithful operationalisation of any single one of them. Each layer cites the literature it draws on; the docstrings in the code do the same. "Cited" is not "operationalised": some mappings are concrete (the Schwartz circumplex, BDI's three layers, SMART goal fields), some are heuristic (AGM-flavoured belief updates, self-concept inference), and several are inspirational only (SDT's internalisation continuum, Ikigai's four-domain intersection, ACT's full hexaflex). The "What we operationalize vs. what remains aspirational" section below classifies each one explicitly.
Hover or focus a theory below to see which layer(s) it inspires. Treat the citation as a pointer to the literature, not as a claim that the code reproduces the theory faithfully — see the operationalization breakdown for the per-theory verdict.
Highlighted layers update as you focus a theory.
- Values
- Beliefs
- Purpose
- Self-Concept (opt-in)
- Desires
- Goals
Self-Determination Theory (Deci & Ryan, 2000)
Self-Determination Theory (SDT) is a macro-theory of human motivation built on a single, powerful claim: the quality of motivation matters more than the quantity. Deci and Ryan (2000) distinguish between intrinsic motivation — behaviour pursued for its inherent satisfaction — and extrinsic motivation, which spans a continuum from external regulation (compliance under threat) through introjection, identification, and finally integrated regulation (where an externally originating value has been fully assimilated into the self). This continuum is formalised in Organismic Integration Theory (OIT), one of SDT's six mini-theories.
Three basic psychological needs underpin the continuum: autonomy (the experience of volition), competence (effective interaction with the environment), and relatedness (connection with others). When these needs are satisfied, motivation moves toward the intrinsic end; when they are thwarted, it regresses toward the controlled end. This is not a Western-centric claim: a meta-analysis by Chen et al. (2015) spanning 18 studies across four continents confirmed the three-need structure, and the theory has been validated with data from over 100 countries (Vansteenkiste et al., 2020).
The framework operationalizes SDT in two places. First, the
Values layer captures the distinction between
internally driven and externally imposed motivators. A value's
weight encodes the degree to which it has been
"integrated" in the OIT sense: higher weight means the agent treats it
as more fundamental, not merely imposed. Second, the
Purpose layer is directly modelled on integrated
regulation — the point at which extrinsic goals become
self-concordant. PurposeEngine.check_consistency tests
whether stated purposes are coherent with the agent's values, which
mirrors SDT's claim that well-being and persistence arise from
motivational coherence, not from motivational intensity.
Schwartz Theory of Basic Values (2012)
Schwartz's theory posits ten motivational categories — Self-Direction, Stimulation, Hedonism, Achievement, Power, Security, Conformity, Tradition, Benevolence, and Universalism — that are recognisable as distinct values across cultures. These categories are not independent; they are arranged on a circular continuum (the "circumplex") where adjacent categories share motivational roots and opposing categories represent structural tensions. Pursuing Achievement inherently trades off against Benevolence; prioritising Security pulls against Self-Direction. The structure is not a theory of what people should value but of how values are organised — a structural claim validated by Schwartz (2012) across 82 countries and over 75,000 respondents using the Portrait Values Questionnaire (PVQ).
The framework lifts this structure directly. ValueCategory
ships all ten Schwartz categories plus three agent-specific extensions
(INTEGRITY, GROWTH,
RELIABILITY) that occupy positions on the circumplex
without breaking its structural properties.
ValuesEngine.detect_structural_conflicts uses the
opposing-poles map (mirrored verbatim from Schwartz 2012) to flag when
an agent holds high-weight values at opposing positions — not to
forbid this (healthy tension can be productive), but to make it
visible and inspectable. The rank_for_context method
allows situational re-weighting within the constraints of the
hierarchy, reflecting Schwartz's finding that all ten values coexist
in every person but their relative importance shifts by context.
SCHWARTZ_OPPOSING_POLES in
src/values/engine.py.
AGM belief revision (Alchourrón, Gärdenfors & Makinson, 1985)
The AGM postulates define three operations for rational belief change: expansion (adding a new belief without removing any), revision (adding a new belief while maintaining consistency, which may require removing existing beliefs), and contraction (removing a belief without adding new ones). The postulates specify constraints on each operation — notably that revision should be minimal (change as little as possible to accommodate new evidence) and that the resulting belief set should remain logically consistent.
Full AGM is impractical for a computational agent with non-logical
beliefs ("I believe this codebase is well-tested" is not a propositional
sentence). The framework intentionally simplifies: beliefs carry a
numeric confidence field, and
BeliefsEngine.add_evidence performs confidence-weighted
updates rather than logical contraction. New evidence specifies a
strength and a directional supports flag.
The update formula moves confidence toward 0 or 1 proportionally to
the evidence strength, weighted by the current confidence. This
preserves the AGM spirit — updates are strength-proportional and
directional, not overwriting — without requiring a formal logic engine.
Contraction is implicit: sufficiently strong disconfirming evidence
drives confidence below the active_threshold, after which
the belief is no longer consulted by AlignmentEngine during
alignment checks. This is a pragmatic compromise: the belief is not
deleted (it can recover if new supporting evidence arrives), but it
ceases to influence decisions. The docstring on
BeliefsEngine.add_evidence includes the derivation and
cites the specific AGM postulate each design choice relates to.
Ikigai and purpose (Mogi 2017; Sone et al. 2008)
Ikigai is a Japanese concept often glossed as "a reason for being." Western self-help literature typically reduces it to a Venn diagram of four circles (what you love, what you are good at, what the world needs, what you can be paid for), but authentic ikigai is broader and less transactional. Mogi (2017) describes it as the sense that one's daily actions are connected to something meaningful — a feeling of purposeful engagement that need not involve passion or profit. Importantly, ikigai is found in small daily routines as much as in grand life missions; it is not the Western "find your purpose" but the more modest "know why you get up in the morning."
The empirical evidence for ikigai's effects is striking. Sone et al. (2008) followed 43,391 Japanese adults (ages 40–79) over seven years in the Ohsaki Cohort Study. Participants who reported a sense of ikigai had significantly lower all-cause mortality and cardiovascular disease mortality, even after controlling for age, sex, education, BMI, smoking, alcohol, exercise, employment, marital status, and self-rated health. The hazard ratio for all-cause mortality among those without ikigai was 1.5 (95% CI 1.3–1.7) compared to those with it. This is a large, prospective, population-based study — not a convenience sample — and the effect persisted across sensitivity analyses.
The framework operationalizes ikigai through the
Purpose layer. PurposeEngine supports
primary, secondary, and contextual purposes, reflecting the finding
that purpose is not a single grand statement but a layered sense of
engagement across roles and contexts. The
check_consistency method tests whether an agent's stated
purposes cohere with its values and beliefs — a computational analog
of the psychological observation that purpose-driven behaviour is most
resilient when it is authentically connected to one's deeper
motivations, not externally imposed.
Acceptance and Commitment Therapy (Hayes, Strosahl & Wilson, 2011)
ACT is a clinical psychology framework built on the premise that psychological suffering is amplified by experiential avoidance — the attempt to suppress unwanted thoughts and feelings — and that well-being improves through psychological flexibility: the ability to contact the present moment fully, hold difficult experiences without being dominated by them, and move toward valued directions. The "hexaflex" model identifies six interconnected processes: acceptance, cognitive defusion, present-moment awareness, self-as-context, values clarification, and committed action. Of these, values clarification and committed action are directly relevant to agent design.
ACT's treatment of values is distinctive and important: values are
directions, not destinations. A value like
"contributing to knowledge" is never achieved; it is a compass bearing
that guides ongoing action. Goals, by contrast, are achievable
milestones along a valued direction. This distinction maps precisely
onto the framework's separation of the Values layer
(directions that persist indefinitely) from the Goals
layer (measurable, time-bound objectives that can be
completed). The Value model has no
completed field; the Goal model does. This
is not an implementation accident — it is ACT's
direction/destination distinction made structural.
ACT also informs the Desires layer. In ACT, desires
are the motivational bridge between abstract values and concrete
committed action. DesiresEngine occupies exactly this
position in the hierarchy: desires carry
linked_value_ids (connecting them to the values they
serve) and generate_goal_candidates (translating
aspirational motivation into actionable objectives). The
intensity field on desires reflects ACT's observation
that motivation toward a valued direction fluctuates without
invalidating the underlying value.
BDI architecture (Bratman, 1987; Rao & Georgeff, 1995)
The Belief–Desire–Intention (BDI) model is the canonical deliberative agent architecture in both philosophy and computer science. Bratman (1987) introduced it as a theory of practical reasoning, arguing that intentions are irreducible to beliefs and desires: an intention is a commitment to a plan that constrains future deliberation (you don't reconsider from scratch at every step). Rao and Georgeff (1995) formalised BDI as a computational architecture with branching-time temporal logic, demonstrating that agents with explicit intentions can exhibit more stable and efficient behaviour than purely reactive or purely planning-based agents.
Three of the framework's six layers correspond directly to BDI:
- Beliefs map to
BeliefsEngine— the agent's model of the world, updated by evidence. - Desires map to
DesiresEngine— motivational drivers that the agent has not yet committed to. - Intentions map to
GoalsEngine— committed, time-bound, decomposable objectives. In BDI terminology, a goal is a desire that has been "adopted" and given a plan.
The framework extends BDI in three directions. Values add a layer above beliefs that BDI externalised: classical BDI has no formal account of where desires come from or what makes some desires more admissible than others. Purpose adds a stable identity layer that classical BDI subsumed into the agent's initial state. Tasks add a layer below intentions that BDI typically delegated to the plan library. The result is a six-layer hierarchy that preserves BDI's core insight (intentions as commitments that constrain deliberation) while adding the motivational structure that classical BDI left implicit.
Competitive landscape
The Agent Values Framework does not exist in a vacuum. Four categories of existing work address overlapping parts of the problem: cognitive architectures, agent frameworks, value alignment techniques, and generative agent research. Understanding what each provides — and what it does not — clarifies the specific gap this framework occupies. The comparison table below provides a structural overview; the prose sections that follow offer a fair, detailed assessment.
| Agent Values Framework (this project) | 6 core + 1 opt-in (values, beliefs, purpose, [self-concept], desires, goals, tasks) | BYO (Protocol) | Yes | Modules independent; alignment composes | MIT |
| BDI (Bratman 1987) | 3 (beliefs, desires, intentions) | Implementation-defined | Implementation-defined | Rigid — intentions tightly coupled | n/a (theoretical) |
| Soar | Procedural / Semantic / Episodic memory | Built-in working memory | No (cycle-based) | Monolithic kernel | BSD-style |
| ACT-R | Buffers + production system | Built-in declarative memory | No (cycle-based) | Module-based but tight coupling | Free for academic use |
The scatter plot below positions each approach on two axes — theoretical depth and integration readiness. Click a point to see why it's positioned where it is.
Cognitive architectures (Soar, ACT-R, LIDA)
Soar (Laird, 2012) is a general cognitive architecture with a production-rule working memory, universal subgoaling, chunking for learning, and three long-term memory systems (procedural, semantic, episodic). It models a complete cognitive agent, including motivation-adjacent mechanisms: impasses in its state stack trigger subgoals automatically, and appraisal functions in Soar's emotional module (Marinier, Laird & Lewis, 2009) can influence operator selection. ACT-R (Anderson et al., 2004) takes a modular approach with declarative and procedural memory buffers, subsymbolic activation equations, and a production-matching cycle that governs all cognition. LIDA (Franklin et al., 2016) explicitly includes motivational codelets — small processes that detect conditions requiring attention — and a global workspace that broadcasts the most salient content to all modules.
These architectures are comprehensive but monolithic. Their value to the field is immense: decades of cognitive modelling research have produced deep insights into memory, attention, learning, and deliberation. However, they are designed as complete cognitive kernels, not composable libraries. You cannot extract Soar's motivation subsystem and embed it in a LangChain pipeline. Motivation in these systems is implicit — distributed across production-rule preferences, memory activations, and appraisal functions — not represented as an explicit, queryable, versionable structure. Their integration cost for production agent builders is prohibitive: adopting Soar means running your entire agent inside Soar's cognitive cycle.
The Agent Values Framework takes inspiration from these systems — particularly the insight that motivation must be structural, not a prompt — while delivering a fundamentally different artifact: a composable library with a four-method storage Protocol, not a cognitive kernel with a working-memory architecture.
Agent frameworks (LangChain, CrewAI, AutoGen)
Modern LLM-based agent frameworks excel at orchestration: tool calling, retrieval-augmented generation, multi-agent coordination, planning, and memory management. LangChain provides a rich ecosystem of chains, agents, and tools. CrewAI adds role-based multi-agent collaboration with delegation. AutoGen (Wu et al., 2023) enables conversational multi-agent patterns with human-in-the-loop capabilities. These frameworks answer the question of how an agent acts — the mechanics of perception, reasoning, and action.
What none of them provides is a motivational layer. "Personality" in CrewAI is a string field in an agent's YAML configuration — structurally identical to a system prompt. LangChain's agent memory stores conversation history and retrieved documents, not values, beliefs, or purposes. AutoGen's agents have system messages and can be configured with different LLMs, but there is no mechanism for an agent's principles to constrain its goals, for evidence to update its beliefs, or for an alignment check to run before a tool is invoked. The question of why an agent should act a certain way, and whether its actions are consistent with its principles, is left entirely to the integrator.
This is not a criticism of these frameworks — they solve different
problems, and solve them well. The Agent Values Framework is designed
to embed alongside your orchestration framework, not replace
it. Wire AlignmentEngine.check_alignment into your
agent's decision loop, and the motivational layer runs before each
action without requiring you to change your tool-calling or planning
infrastructure. See the integration
patterns for concrete examples.
Value alignment approaches (Constitutional AI, RLHF, CIRL)
The AI safety community has produced powerful techniques for embedding values into language models at training time. RLHF (Christiano et al., 2017; Ouyang et al., 2022) uses human preference data to fine-tune model behaviour. Constitutional AI (Bai et al., 2022) trains models to critique and revise their own outputs against a set of principles. Cooperative Inverse Reinforcement Learning (CIRL; Hadfield-Menell et al., 2016) frames alignment as a cooperative game where the agent infers human preferences through interaction.
These techniques operate at the model level: they shape the behaviour of every agent that uses the model. This is the right approach for establishing baseline safety guarantees, but it leaves three gaps that agent-level alignment must fill:
- Not per-agent configurable. An operator cannot say "this agent should prioritise transparency over efficiency" without fine-tuning or distilling a new model. RLHF values are baked into the weights, not parameterised per deployment.
- Not inspectable. The values are implicit in billions of parameters. You cannot query a model to ask "what do you value?" and receive a structured, auditable answer. You can only observe behaviour and infer. This makes compliance auditing and post-incident analysis difficult.
- Frozen at training time. The agent cannot update its values in response to new evidence, changing context, or operator policy changes. Constitutional AI's principles are fixed in the training constitution; they do not evolve at runtime.
The Agent Values Framework operates in the complementary space: explicit, per-agent, runtime-evolvable values that sit above the model layer. It does not replace RLHF or Constitutional AI; it adds a structured overlay that makes agent-specific values inspectable and mutable. The two approaches are defence-in-depth: model-level alignment provides the floor, agent-level alignment provides the ceiling.
Generative agents (Park et al., 2023)
Park et al. (2023) demonstrated that LLM-based agents given a natural language description of personality, background, and goals can produce remarkably human-like behaviour in a simulated town environment. Their agents maintain a memory stream of observations, perform periodic "reflection" to synthesise higher-level insights, and use retrieval to condition behaviour on relevant memories. The work is influential and demonstrates the power of LLM-based social simulation.
However, the motivational structure in generative agents is primitive and non-reusable. Identity is a natural-language paragraph, not a structured representation. Reflection produces prose summaries stored as additional memories, not typed data structures that can be queried, compared, or audited. There is no distinction between values (stable principles) and desires (transient motivations), no formal conflict detection, no alignment checking mechanism, and no event system for integration. The architecture is tightly coupled to the social simulation setting — extracting the "motivation" component and embedding it in a production agent is not feasible without reimplementation.
The Agent Values Framework addresses the reusability gap: typed models, structured storage, explicit conflict detection, and a clean Protocol boundary that lets any agent runtime integrate motivational reasoning without adopting a specific simulation architecture. The research question of whether structured motivation produces more coherent long-term behaviour than natural-language personality descriptions remains open — see Open research questions below.
What we operationalize vs. what remains aspirational
Translating psychological theory into production code requires difficult choices. Some constructs map cleanly onto data structures and algorithms; others resist operationalization entirely. This section provides an honest assessment of where the framework stands — organised by what the v0.1 experiment showed actually doing measurable work, rather than by which layer ships in the package (per ADR-011 Decision 2). We do this not as an exercise in modesty but because users of the framework need to know what the code actually does versus what the theory suggests it could do.
Fully operationalized
These constructs have shipped code with comprehensive tests, and the code faithfully implements the cited theory. Constructs marked Tier 1 additionally showed measurable work on the v0.1 experimental seed (per ADR-011 Decision 2); constructs marked Tier 2 / Tier 3 are implemented faithfully but were not exercised by the v0.1 seed:
- Schwartz value categories and structural conflicts. (Tier 1.) All ten categories, three agent extensions, the opposing-poles map, and conflict detection are implemented and tested. The circumplex structure is faithfully preserved. (324 tests, including edge cases for every opposing pair.)
- Confidence-weighted belief revision. (Tier 2 — present, not yet shown to add measurable benefit on
the experimental seed; ADR-011 Decision 2.) The AGM-inspired
update formula in
BeliefsEngine.add_evidencehandles supporting and disconfirming evidence, strength weighting, and implicit contraction through confidence decay belowactive_threshold. The derivation is documented in the docstring. - BDI three-layer mapping. (Beliefs is Tier 2;
Desires + Goals are Tier 3 — research / experimental, not exercised
by the v0.1 experimental seed; ADR-011 Decision 2.) Beliefs,
Desires, and Goals (as intentions) are implemented as independent
engines that compose through
AlignmentEngine. The commitment semantics of intentions — goals are not casually abandoned once adopted — are enforced through validated status transitions inGoalsEngine.set_status. - ACT values-as-directions. The structural
distinction between values (no
completedfield, indefinite persistence) and goals (time-bound, completable) is enforced at the model level by Pydantic validation. - SMART goal criteria. Goal models carry measurability,
deadline, and progress fields.
check_deadlinesandupdate_progress(with auto-completion at 1.0) operationalize the criteria directly. - Downward constraint and upward evidence flow. (Tier 1 — AlignmentEngine is the composer that did measurable
work on the experimental seed; ADR-011 Decision 2.)
AlignmentEngine.check_alignmentenforces downward constraint (values constrain goals), whilefull_auditdetects drift and inconsistency across all layers. Event emission (on(event, handler)) enables upward evidence flow by letting task outcomes update beliefs.
Partially operationalized
These constructs have code, but the code captures only a subset of what the underlying theory describes:
- SDT's internalisation continuum. The Value
weightfield captures the degree of internalisation but not the process. OIT describes a progression from external regulation through introjection, identification, and integration. The framework has no mechanism for a value to move along this continuum over time in response to experience. An integrator can update weights manually, but the framework does not model the progression itself. - Ikigai as multi-dimensional purpose.
PurposeEnginesupports primary, secondary, and contextual purposes, but the model does not capture the four dimensions of authentic ikigai (what you love, what you are good at, what the world needs, what you can be paid for). Purpose is a text description with a consistency check, not a structured intersection of capabilities, needs, and satisfactions. - Belief time-decay.
BeliefsEngine.decay_staleimplements linear time-based confidence decay, but AGM and Bayesian epistemology suggest richer models (e.g., exponential decay, context-dependent forgetting, source credibility weighting). The current implementation is the cheapest defensible choice, not the theoretically optimal one. - Alignment evaluation depth. The default rule-based
evaluator is deliberately simple: it composes the per-layer signals
(value keyword match, whole-token belief contradiction, purpose
role/tag overlap, goal title overlap, optional self-concept axis)
into a single report. It does not perform semantic reasoning about
whether a goal's content actually advances a value's intent. The
optional
LLMEvaluatorhook is BYO-client and can do richer evaluation; quality there depends on the model. - Self-Concept layer (Bem, Erikson, Damasio, McAdams). (The append-only episode stream is Tier 1 — it drove the +0.11
Δ on McAdams Meaning-made on the experimental seed; the three
integration loops remain heuristic per ADR-011 Decision 2.)
The opt-in module implements an append-only autobiographical
episode stream with three integration processes:
infer_from_behaviour(cited as inspired by Bem self-perception),integrate_lessons(cited as inspired by SDT organismic integration), andcheck_identity_drift(cited as inspired by Erikson coherence). All three are token-overlap heuristics over claim text and episode summaries — diagnostic signals, not faithful psychological models. When wired intoAlignmentEngine, the coherence score and drift signals appear infull_audit(); the same evaluator can also be wired to surface a self-concept consistency axis oncheck_alignment. Opt-in, additive, and intentionally conservative — identity drift surfaces as a flag, never as a block.
Not yet operationalized
These constructs are described in the research literature and referenced in this project's documentation but do not yet have full implementations:
- ACT's full hexaflex. Of ACT's six processes, only values clarification and committed action are operationalized. The remaining four (acceptance, cognitive defusion, present-moment awareness, self-as-context) describe internal psychological processes that resist direct computational analogy.
- Schwartz's refined 19-value model. Documented as a future option but not implemented. The ten-category model is sufficient for current use cases.
- Multi-agent value coordination. How should multiple AVF-equipped agents negotiate when their values conflict? Cooperative game theory and social choice theory provide frameworks, but no implementation exists. This is an open research question.
- Active inference integration. Active inference (Friston, 2010) offers a unifying framework where beliefs, desires, and actions are all explained as free-energy minimisation. Mapping AVF's layers onto active inference's generative model is theoretically promising but unexplored.
Open research questions
The library's design is grounded in the cited theories, and its correctness is verified by 371 pytest cases plus 57 correctness benchmarks (30 Tier 1 + 27 Tier 2). What remains undemonstrated is whether using this library actually produces measurably better agent behaviour. This section identifies six research questions that the project considers in scope but unresolved.
1. Empirical validation
Question: Do AVF-equipped agents exhibit more coherent, auditable, and value-consistent behaviour than agents without structured motivation, in real-world deployments?
The Tier-3 benchmark protocol defines how this evidence could be gathered: deploy two agent populations (with and without AVF) on identical tasks, measure coherence (do similar situations produce consistent decisions?), auditability (can a reviewer trace a decision to a specific value?), and adaptiveness (do agents update their behaviour appropriately in response to new evidence?). The protocol is documented; the harness requires a live agent integration to execute. Until that evidence exists, the framework's value proposition rests on architectural arguments (inspectability, composability, traceability) and on the strength of the underlying theories.
2. Cross-cultural value sets
Question: Should integrators be able to swap the Schwartz circumplex for an alternative value taxonomy without losing structural conflict detection?
Schwartz claims universality, and the 82-country validation provides
strong evidence. But the cross-cultural literature is more nuanced
than "ten values are universal." Inglehart and Welzel's (2005)
two-dimensional cultural map, Hofstede's (2001) cultural dimensions,
and indigenous psychological frameworks (e.g., Hwang, 2012, on
Confucian relational self) suggest that the mapping from values to
categories is culturally inflected even if the categories themselves
are recognisable. The framework currently hardcodes
ValueCategory and SCHWARTZ_OPPOSING_POLES.
Making these pluggable would allow integrators to adapt the value
taxonomy to their deployment context without forking the library.
The architectural cost is modest (parameterise the circumplex); the
research cost is high (validating alternative taxonomies).
3. Belief decay function
Question: What is the right decay function for belief confidence over time?
The current implementation uses linear time-decay: confidence decreases proportionally to the time since the last evidence update. This is the cheapest defensible choice but is almost certainly wrong for many domains. Ebbinghaus's forgetting curve (1885) is exponential. Bayesian models suggest that decay should depend on the precision of prior evidence, not just elapsed time. Domain-specific factors matter: a belief about "the production database is healthy" should decay faster than a belief about "Python is a good language for prototyping." The framework should expose the decay function as a pluggable strategy, allowing integrators to supply domain-specific models (exponential, Bayesian, stepped, or no-decay-at-all for axiomatic beliefs). The storage and engine architecture supports this without structural changes.
4. Cold-start problem
Question: How should an agent's motivational hierarchy be initialised before it has any operational history?
The CLI provides scaffolding (python -m agent_values init
generates a starter hierarchy; seed populates it from a
template), but thoughtful seeding matters more than tooling. A
carelessly initialised hierarchy provides false confidence in
alignment checks. The research question has two dimensions: (1) Can
a seeding interview — a structured elicitation protocol administered
to the operator — reliably produce a value hierarchy that the
operator recognises as accurate? (2) Can an agent bootstrap its own
hierarchy through early interactions, starting with a minimal seed
and expanding as evidence accumulates? The second path is more
autonomous but riskier: value drift during the bootstrap period could
entrench poor initial conditions.
5. Multi-agent value coordination
Question: When multiple AVF-equipped agents interact, how should value conflicts between agents be detected and resolved?
Intra-agent alignment — checking consistency within one agent's
hierarchy — is well-specified by AlignmentEngine.
Inter-agent alignment is unexplored. If Agent A values transparency
and Agent B values confidentiality, and they must collaborate on a
task, what happens? Social choice theory (Arrow, 1951; Sen, 1970)
provides impossibility results about aggregating preferences;
mechanism design offers constructive solutions for specific settings.
The framework currently has no multi-agent primitives. Potential
approaches include shared value contracts (agents negotiate a common
subset before collaborating), value-aware delegation (an agent only
delegates to agents with compatible values), and meta-alignment (a
coordinator agent checks inter-agent value compatibility before
assembling a team). Each has tradeoffs in autonomy, efficiency, and
expressiveness.
6. Active inference as unifying framework
Question: Can the framework's layers be reformulated as components of an active inference generative model?
Active inference (Friston, 2010; Parr, Pezzulo & Friston, 2022) proposes that all adaptive behaviour can be understood as minimising variational free energy — the divergence between an agent's generative model of the world and its sensory observations. Under this account, beliefs are posterior expectations, desires are prior preferences, and actions are selected to resolve uncertainty. The framework's six layers could potentially be mapped onto different levels of a hierarchical generative model: values as deep priors (slow-changing, high-level), beliefs as posterior estimates, goals as expected free energy minima. This mapping is theoretically attractive because active inference provides principled answers to questions the framework currently handles heuristically (e.g., how much evidence should shift a belief? Active inference says: as much as the precision-weighted prediction error warrants). The practical challenge is that active inference implementations remain computationally demanding and require differentiable generative models, which most LLM-based agent runtimes do not provide.