The authoritative changelog lives in CHANGELOG.md. This page mirrors it.Documentation Index
Fetch the complete documentation index at: https://docs.mareforma.com/llms.txt
Use this file to discover all available pages before exploring further.
v0.3.0 — 2026-05-13
Breaking change from v0.2.x. Schema does not migrate from older versions; delete.mareforma/graph.db to start fresh. claims.toml
at the project root is a human-readable record of the prior state —
the prev_hash chain and per-claim signatures cannot be reconstructed
from it, so it is a reference not a backup.
What ships in v0.3.0:
- Ed25519 claim signing with optional Sigstore-Rekor transparency log
- Artifact-hash gate on REPLICATED — converging peers that both supply a SHA-256 must agree
- Identity-gated
graph.validate()with a per-project validators table and signed enrollment chain - DOI resolution against Crossref + DataCite with a persistent cache
- DB-layer state-machine triggers + append-only
prev_hashchain — the storage layer rejects illegal transitions - Cycle / self-loop detection on
supports[]at INSERT and UPDATE - ESTABLISHED-upstream requirement for REPLICATED + signed seed-claim bootstrap (Cochrane / GRADE evidence chains; no replication-of-noise)
- JSON-LD export in a mareforma-native vocabulary
- SCITT-style signed export bundle +
mareforma verifyCLI - In-toto Statement v1 + DSSE v1 PAE envelope on every signed claim, GRADE 5-domain
EvidenceVectorinside every signed predicate, signed verdict-issuer protocol that any third party can integrate against (see below) - RFC 8785-strict canonical JSON for every signed payload — cross-language verifiers in Go, Rust, or JS now read byte-identical bytes from a mareforma envelope. Adds
rfc8785>=0.1runtime dep. - Operator surfaces:
graph.health()single-call audit summary,graph.refresh_convergence()to retry promotions whose detection swallowed an error,graph.refresh_all_dois()to force-re-check DOIs for retraction drift,graph.find_dangling_supports()to audit UUID refs that point nowhere,graph.classify_supports()to inspect the substrate’s claim/doi/external classification. - Validation envelope binds
evidence_seen— passgraph.validate(claim_id, evidence_seen=[upstream_id, ...])to record which claims the validator reviewed before signing. Bound into the signed payload alongside(claim_id, validator_keyid, validated_at). Empty list is a positive “I reviewed nothing” admission. Substrate verifies each cited claim exists and predates validation. - Rekor saga atomicity via a new
rekor_inclusionssidecar table. When a Rekor submission succeeds but the local row-UPDATE fails, the sidecar preserves the coords sorefresh_unsigned()replays the UPDATE without re-submitting (no duplicate log entries). Append-only at the trigger level. - Strict UUIDv4 in claim_id pattern. Non-v4 UUID-shapes in
supports[]are now classified as external references rather than dangling claim_ids. - RFC 6962 Merkle inclusion-proof verification (opt-in). Pass
rekor_log_pubkey_pem(orrekor_log_pubkey_path) tomareforma.open()and every signed-claim submit + everyrefresh_unsigned()re-fetches the entry from Rekor and cryptographically verifies the Merkle audit path against the log’s signed checkpoint. Verification failure refuses to marktransparency_logged=1. Supports Ed25519 (private Rekor) + ECDSA secp256r1 (public Sigstore Rekor). The supplied PEM persists to.mareforma/rekor_log_pubkey.pemas a TOFU pin — silent rotation is refused on subsequent opens; the first-pin write is atomic (O_CREAT|O_EXCL). NewRekorInclusionErrorexception with a stable.reasontoken taxonomy. Restore-time re-verification of stored proofs is on the deferred-features list (needsrekor_inclusionssidecar round-tripped throughclaims.toml). - Defense-in-depth on
db.validate_claim. Direct callers of the substrate function (not justEpistemicGraph.validate) get the full gate sequence: cryptographic envelope verification, LLM-type ceiling refusal, self-validation refusal, payload-field equality vs the row + kwargs. NewInvalidValidationEnvelopeErrorfor structural / cryptographic envelope failures, distinct fromEvidenceCitationErrorfor citation-list failures. - All documented exceptions re-exported at the top level.
from mareforma import RekorInclusionErrorworks without remembering the submodule path. 19 exception classes total, alphabetical underMareformaError.
- In-toto Statement v1 + DSSE v1 PAE envelope. Every signed claim
is now a DSSE envelope (
payloadType=application/vnd.in-toto+json) wrapping an in-toto Statement v1 (predicateType=urn:mareforma:predicate:claim:v1). Standards-aligned; introspectable bycosign, GUAC, and any in-toto-aware tool without a mareforma-specific verifier. The signature covers the DSSE Pre-Authentication Encoding (PAE), not the payload bytes alone — a signature on(typeA, payload)cannot be replayed as a signature on(typeB, payload). - GRADE 5-domain EvidenceVector carried inside every signed claim’s
predicate. Five downgrade domains (
risk_of_bias,inconsistency,indirectness,imprecision,publication_bias) each in[-2, 0], three upgrade flags (large_effect,dose_response,opposing_confounding),rationaledict (required for any nonzero domain), andreporting_compliancelist. Bound into the signature; denormalized intoev_*columns for queryable filters; restore re-derives the canonical bytes and refuses any TOML-tampered upgrade. - Verdict-issuer protocol. Two new tables —
replication_verdictsandcontradiction_verdicts— accept signed verdicts from any enrolled validator. The OSS substrate ratifies what enrolled identities sign; the predicates that PRODUCE verdicts (semantic-cluster, cross-method, hash-match, shared-resolved-upstream, contradiction-detection) live outside the OSS and callGraph.record_replication_verdict()/Graph.record_contradiction_verdict(). NewVerdictIssuerErrorexception covers the gates: signer must be enrolled (chain walk back to a self-signed root), referenced claim must exist, method must be in the allowed enum, contradictionmember != other. t_invalidderived state. New nullable column onclaims. Thecontradiction_invalidates_olderAFTER INSERT trigger oncontradiction_verdictssetst_invalidon the older of the two referenced claims (lex-smallerclaim_idas deterministic tie-break when timestamps collide; idempotent viaWHERE t_invalid IS NULL).validate_claimrefuses to promote at_invalidclaim — a signed contradiction is terminal evidence.include_invalidatedkwarg ongraph.query(),graph.search(),graph.replication_verdicts(),graph.contradiction_verdicts(). Defaults toFalse— invalidated claims and the verdicts that reference them are excluded from default reads. PassTruefor audit / history queries.- Append-only over the signed predicate. New
claims_signed_fields_no_launderingBEFORE UPDATE trigger refuses direct-SQL mutation of any signed-predicate column on rows whosesignature_bundle IS NOT NULL. Value-comparison fires only when something actually changed, so multi-column UPDATEs that re-emit unchanged values pass through. A tampered Python interpreter cannot relax this. - Append-only verdicts.
*_append_only+*_no_deletetriggers refuse UPDATE on signed columns and any DELETE on both verdict tables. The envelope is the source of truth. - PRAGMA foreign_keys = ON. Set on every
open_db(). The verdict tables’ FK references tovalidators(keyid)andclaims(claim_id)are now enforced — direct-SQL INSERTs with fabricated keyids fail at the SQL layer, not just in Python. - Subject ↔ predicate consistency.
claim_predicate_from_envelope()refuses envelopes wheresubject[0].nameorsubject[0].digest.sha256disagree with the predicate’sclaim_idortext. Caught at the envelope-decode layer. - Restore extensions.
claims.tomlround-trip now covers both verdict tables (signatures base64-encoded). Each verdict’s signature is cryptographically verified against the enrolled issuer’s pubkey before INSERT. Verdicts are replayed increated_atorder so the trigger’sWHERE t_invalid IS NULLguard preserves the truthful first-invalidation moment.transparency_logged=truein TOML is downgraded to0when the bundle has norekorblock — hand-edited TOML cannot fake a Rekor inclusion. New adversarial tests for tamperedEvidenceVector, swappedstatement_cid, tampered verdict fields, and forgedissuer_keyid. - New modules:
mareforma._canonical(NFC + sorted-keys + no-whitespace +allow_nan=Falsecanonical JSON),mareforma._statement(in-toto Statement v1 builder +statement_cidcomputation),mareforma._evidence(stdlib-dataclassEvidenceVectorwith__post_init__validator). No pydantic dependency added; mareforma stays at 5 runtime deps. mareforma.signing.dsse_pae()is public so external verifiers can independently re-derive the bytes the signature covers.canonical_statement(claim_fields, evidence)replaces the legacycanonical_payloadfor chain-hash + signature inputs; the old shim is removed because it silently desynced from production bytes.
Identity, signing, transparency
- Ed25519 claim signing.
mareforma bootstraponce to generate a keypair at~/.config/mareforma/key(XDG-compliant, mode0600). Everyassert_claimthen signs before INSERT. The signed payload bindsclaim_id,text,classification,generated_by,supports,contradicts,source_name,artifact_hash, andcreated_at— any tamper breaks verification. - Append-only invariant. Signed claims refuse mutation of any
signed-surface field.
update_claim(text=...)/update_claim(supports=...)/update_claim(contradicts=...)on a signed row raiseSignedClaimImmutableError.statusandcomparison_summaryremain editable. - Sigstore-Rekor transparency log.
mareforma.open(rekor_url=mareforma.signing.PUBLIC_REKOR_URL)submits every signed claim at INSERT time. Submission failure persists the claim withtransparency_logged=0and blocks REPLICATED untilgraph.refresh_unsigned()succeeds. - Identity-gated
graph.validate(). The loaded signer must be enrolled in the project’svalidatorstable. The first key opened against a fresh graph auto-enrolls as the root validator (silent self-signed enrollment with aUserWarning). The validation event itself is signed: a DSSE-style envelope binding(claim_id, validator_keyid, validated_at)is persisted to the row’svalidation_signaturecolumn. - New
mareforma validator add/mareforma validator listsubcommands. Each enrollment is signed by the parent validator andis_enrolledwalks the chain back to a self-signed root before accepting a row — direct sqlite INSERTs with a fabricated parent do not pass. Singleton-root invariant + 64-hop walk cap defend against DoS-by-planted-chain.
Storage-layer state machine
- DB-layer state-machine triggers. Two
BEFOREtriggers enforcePRELIMINARY → REPLICATED → ESTABLISHEDat the storage layer; directPRELIMINARY → ESTABLISHEDis rejected; ESTABLISHED rows requirevalidation_signature. Illegal transitions surface asIllegalStateTransitionErrorwith a parsed<from>-><to>string instead of an opaqueCHECK CONSTRAINT FAILED. - Append-only hash chain. New
claims.prev_hashcolumn carriessha256(prev_chain_link || canonical_payload). UNIQUE partial index +BEGIN IMMEDIATEtogether prevent branched chains from concurrent writers or manual SQL tamper. NewChainIntegrityError. - Cycle / self-loop detection. A claim whose
supports[]would create a cycle (directly or via a chain) raisesCycleDetectedErrorat INSERT and at UPDATE. Forward-walk DFS, depth-capped at 1024 hops. DOI strings insupports[]are not graph nodes and skipped. - ESTABLISHED-upstream requirement for REPLICATED.
REPLICATED promotion now requires at least one ESTABLISHED claim in
the peer’s
supports[]. Matches Cochrane / GRADE evidence-chain methodology — stops replication-of-noise. Strict by default. - Seed-claim bootstrap.
graph.assert_claim(text=..., seed=True)inserts a claim directly atESTABLISHEDwith a signed seed envelope (payload typeapplication/vnd.mareforma.seed+json, bindsclaim_id + validator_keyid + seeded_at). Only enrolled validators can produce seeds — bootstraps the trust chain on a fresh graph without a back door.
Artifact-hash gate
artifact_hashparameter onassert_claim(Python API) and--artifact-hashflag onmareforma claim add(CLI). Accepts a SHA-256 hex digest of the output bytes (figure, CSV, model) backing the claim. Normalised to lowercase, validated as 64-char hex, persisted to the newartifact_hashcolumn, and bound into the signed payload.- REPLICATED gate. When two converging peers BOTH supply a hash, the hashes must match for REPLICATED to fire. When either omits the hash, the gate is bypassed and identity-only REPLICATED applies — the signal is opt-in, not retroactive.
- Idempotency conflict. A replay that supplies a different
artifact_hashthan the original raisesIdempotencyConflictErrorrather than silently dropping the new hash.
Prompt-safety substrate
mareforma.prompt_safetymodule +graph.query_for_llm(). Sanitize-and-wrap helpers for feeding retrieved claim text into an LLM prompt. Strips zero-width / bidi-override / C0-C1 control characters, Goodside U+E0000 tag plane, variation selectors, interlinear annotation anchors, and the fullwidth</>//lookalikes. Caps oversized fields at 100k chars with a visible truncation marker. Free-text fields are wrapped in<untrusted_data>...</untrusted_data>; forged delimiter tags inside the content are replaced with[stripped].get_tools()routes throughquery_for_llm. Thequery_graphtool that ships to LangChain / LangGraph / CrewAI / AutoGen / LlamaIndex / PydanticAI / Smol Agents / OpenAI SDK / Anthropic SDK now returns sanitized + wrapped text. A stored prompt-injection planted by a prior agent is no longer delivered verbatim to the consuming LLM.- Sanitize-on-write.
assert_claimrunssanitize_for_llm(text)before signing and persisting. Defense in depth — any consumer that readsclaim.textdirectly gets a clean string. Hard cap of 100,000 characters; claims that consist entirely of zero-width / control characters are rejected withValueError.
Export
- JSON-LD export — mareforma-native vocabulary. Removed PROV-O references
(
prov:wasGeneratedBy,prov:used) from the JSON-LD@context— the previous export name-dropped the vocabulary without populating the full PROV-O graph. The export now declares@type='mare:Graph'andmare:mediaType='application/x-mareforma-graph+json'. Theusedkey on source-bearing claims was renamed tousedSource(aliased tomare:usedSource). EverySIGNED_FIELDSmember is always emitted on each claim node so downstream consumers (e.g. the bundle verifier below) can re-derivecanonical_payloadfrom a node alone. - SCITT-style signed bundle. New
mareforma export --bundleproduces an in-toto Statement v1 wrapper around the JSON-LD export, withpredicateType='urn:mareforma:predicate:epistemic-graph:v1'and a DSSE-style signature over the whole bundle. Subject names use theurn:mareforma:claim:<uuid>namespace; URN (not DNS) avoids a perpetual-ownership commitment onmareforma.dev. Newmareforma verify <bundle.json>checks the DSSE signature AND every per-claim subject digest. NewBundleVerificationErrornames the first failing check so callers can route between “corrupt” and “cross-version skew”.
DOI verification
- DOI resolution: every DOI in
supports[]/contradicts[]is HEAD-checked against Crossref and DataCite at assert time. Unresolved DOIs mark the claimunresolved=Trueand block REPLICATED promotion.EpistemicGraph.refresh_unresolved()retries previously-failed resolutions. - DOI resolver hardening: DOI suffix URL-encoded before interpolation
(prevents host injection via
#/@);follow_redirects=False(registry must answer directly); pooledhttpx.Clientwith threading lock around lazy init (FD-leak-safe under concurrency); HTTP 429 from either registry skips the cache write; tight exception clause so programmer bugs surface in tracebacks. doi_cachetable: 30-day TTL for resolved entries, 24-hour TTL for unresolved.
Supply chain
- PyPI Trusted Publishing. Releases are published via OIDC-based
GitHub Actions, not long-lived API tokens.
pypa/gh-action-pypi-publishis SHA-pinned.actions/checkoutandactions/setup-pythonare pinned by commit SHA — closes the tag-squat / maintainer-compromise vector against the Trusted Publishing OIDC token. - New
SECURITY.mddocuments the disclosure channel (GitHub Private Vulnerability Reporting), supported-versions policy (latest pre-1.0 only), PyPI Trusted Publishing setup, cryptographic trust boundaries, and out-of-scope categories. - Typosquat reservations.
maraforma,mareform,mareforma-cli,mareforma-py, andmareforma-agentare reserved on PyPI as defensive placeholders that raiseImportErrorand point users back to the canonical package.mare-forma/mare_forma/mare.formaare auto-blocked by PyPI’s confusable-name check. - New
.github/CODEOWNERSand.github/dependabot.yml.
Agent surface
mareforma.open()returns anEpistemicGraph— no@transformrequired. New parameters:key_path,require_signed,rekor_url,require_rekor,trust_insecure_rekor.- EpistemicGraph methods:
assert_claim,query,search,query_for_llm,get_claim,validate,refresh_unresolved,refresh_unsigned,enroll_validator,list_validators,get_validator_reputation,get_tools,close. get_tools(generated_by="agent/...")returns[query_graph, assert_finding]as plain Python callables. One-line wrap for Anthropic SDK, OpenAI SDK, LangChain, LangGraph, CrewAI, AutoGen, LlamaIndex, PydanticAI, Smol Agents.mareforma.schema()— runtime introspection of valid values, defaults, state transitions, and schema version.mareforma.restore(project_root)— rebuild a freshgraph.dbfromclaims.tomlfor catastrophic-loss recovery. Fresh-only, fail-all-or-nothing on signature verification.- CLI:
mareforma bootstrap,mareforma validator add/validator list(with--type human|llm),mareforma claim add/list/show/update/validate,mareforma status,mareforma export [--bundle],mareforma verify <bundle>,mareforma restore [<toml-path>].
Validator type and reputation
- Validator type signal.
validators.validator_type TEXT CHECK IN ('human','llm'), bound into the signed enrollment envelope. Default'human'. The substrate refuses promotion past REPLICATED on an LLM-typed validator’s signature alone (LLMValidatorPromotionError); a human-typed co-signer is required. Self-validation (claim signer == validation signer) is also refused (SelfValidationError). - Reputation-aware retrieval.
query()andsearch()gaininclude_unverified: bool = False. PRELIMINARY claims whose signing key is not in the validators table are excluded by default. Result dicts carry derivedvalidator_reputation(count of ESTABLISHED claims signed by the same validator) andgenerator_enrolled(bool).graph.get_validator_reputation()returns the bulk{keyid: count}map.
Full-text search
- FTS5 over claim text. New
claims_ftsvirtual table (unicode61tokenizer, diacritics folded) synced withclaimsvia three INSERT/DELETE/UPDATE-of-text triggers. Newgraph.search()method exposes FTS5 ranked match. Phrase, prefix, boolean, and proximity operators all supported. Pure-wildcard queries refused.
claims.toml round-trip + restore
- claims.toml format extended. A
[validators]section now travels alongside[claims], carrying signed enrollment envelopes so the restore path can verify the chain. Old files with no[validators]section continue to work as unsigned-mode. mareforma restore(CLI + Python API). Fresh-only rebuild from claims.toml. Refuses non-emptygraph.db. Verifies every signature before any row is inserted. NewRestoreErrorwith.kindfield naming the failure mode (graph_not_empty,toml_not_found,toml_malformed,enrollment_unverified,claim_unverified,mode_inconsistent,orphan_signer). Adversarial test class proves the round-trip catches tampered text, mutated signature bytes, missing signatures in signed-mode graphs, orphan signers, and validator-row tampering._backup_claims_tomlfailure to stderr at ERROR-level (waswarnings.warn, which production loggers routinely suppress). graph.db remains authoritative.
Removed
@transformdecorator andBuildContext— pipeline layer removed.MareformaObserver,LangChainAdapter— execution tracing removed.- Pipeline CLI commands:
init,add-source,explain,build,log,diff,cross-diff,trace.
v0.2.1 — 2026-05-08
ctx.params— runtime parameter injection from TOMLquery_claims()— read primitive for the epistemic graphdelete_claims_by_generated_by()— delete claims by source agent- Fixed
LangChainAdapterimport path
v0.2.0 — 2026-04-08
mareforma.agent— framework-agnostic agent provenance moduleMareformaObserver— context manager recording agent events tograph.dbLangChainAdapter— LangChain callback handler
v0.1.0 — 2026-03-25
Initial release.@transform decorator, ctx.claim(), mareforma build,
SQLite epistemic graph, claims.toml backup.