Exploring project persistence layer structures

/home/jm/data/code/opencode-trees/ChyTQ · 4 messages · 15 parts · $0.0408 · 50,687 tokens
Created Feb 25, 06:00:38 AM · Updated Feb 25, 06:04:55 AM · 14 syncs
userFeb 25, 06:00:38 AM

@explore

Break down the @packages/opencode/src/project/ concept for me, how it relates to sessions, checkpointingg, worktrees, and sandboxes.

I want to know what exactly the data structures are of the persistence layer, and know them based on how they grow with respect to the primary agents Request context, and with respect to how the knowledge is sectioned in heirarchies for each of the agents that are active

Use the above message and context to generate a prompt and call the task tool with subagent: explore

assistantFeb 25, 06:00:38 AM
qwen/qwen3.5-397b-a17b · openrouter$0.0112
The user wants me to create a prompt for the explore subagent to investigate the project concept in packages/opencode/src/project/. They want to understand: 1. How project relates to sessions, checkpointing, worktrees, and sandboxes 2. The data structures of the persistence layer 3. How data grows with respect to the primary agent's Request context 4. How knowledge is sectioned in hierarchies for active agents I need to craft a detailed prompt that will guide the explore agent to thoroughly investigate this codebase area.
taskcompleted
Explore project/ persistence layer
Output
task_id: ses_36c9d395affeOrikvzr1oHrwL4 (for resuming to continue this task if needed)

<task_result>
read{"filePath":"/home/jm/data/code/opencode-trees/ChyTQ/packages/opencode/src/global"}
read{"filePath":"/home/jm/data/code/opencode-trees/ChyTQ/packages/opencode/src/snapshot"}
glob{"pattern":"**/compaction*.ts","path":"/home/jm/data/code/opencode-trees/ChyTQ/packages/opencode/src"}
</task_result>
13,207in / 788out / 110reasoning$0.0112
assistantFeb 25, 06:01:48 AM
qwen/qwen3.5-397b-a17b · openrouter$0.0113
The task result shows some file reads and a glob search were performed. I should continue exploring to get more comprehensive information about the project persistence layer. Let me make additional parallel tool calls to gather more information about the project structure, session relationships, and persistence mechanisms.
taskcompleted
Explore project/ persistence layer
Output
task_id: ses_36c9d395affeOrikvzr1oHrwL4 (for resuming to continue this task if needed)

<task_result>
# Persistence Layer Architecture Report

## 1. Architecture Overview

The OpenCode persistence layer uses a **hybrid storage architecture** combining:

- **SQLite Database** (via Drizzle ORM) for structured relational data
- **File-based JSON storage** for auxiliary data (session diffs, shares)
- **Git-based snapshots** for checkpointing code state

```
┌─────────────────────────────────────────────────────────────────┐
│                        Project (Top Level)                       │
│  id, worktree, vcs, sandboxes[], commands, timestamps           │
└─────────────────────────────────────────────────────────────────┘
                              │
                              │ 1:N (cascade delete)
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Session (Conversation)                        │
│  id, project_id, parent_id, slug, directory, title, permission  │
│  summary (additions/deletions/files/diffs), revert, timestamps  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              │ 1:N (cascade delete)
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Message (Turn/Exchange)                     │
│  id, session_id, role (user/assistant), parentID, model, agent  │
│  tokens, cost, error, path, timestamps                          │
└─────────────────────────────────────────────────────────────────┘
                              │
                              │ 1:N (cascade delete)
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                         Part (Content Unit)                      │
│  id, message_id, session_id, type, data                         │
│  Types: text, tool, file, rea...
13,996in / 764out / 52reasoning$0.0113
assistantFeb 25, 06:04:31 AM
qwen/qwen3.5-397b-a17b · openrouter$0.0183
The explore agent has returned a comprehensive report on the persistence layer architecture. This is exactly what the user requested - a detailed breakdown of how the project concept relates to sessions, checkpointing, worktrees, and sandboxes, with specific focus on: 1. Data structures of the persistence layer 2. How they grow with respect to the primary agent's Request context 3. How knowledge is sectioned in hierarchies for each active agent The report is thorough and well-structured. I should present this to the user in a clear, organized manner, highlighting the key findings.

Persistence Layer Architecture Breakdown

I've completed a comprehensive exploration of the @packages/opencode/src/project/ persistence layer. Here's what I found:


🏗️ Architecture Summary

The system uses a hybrid storage model:

Storage Type Technology Purpose
Relational SQLite (Drizzle ORM) Projects, Sessions, Messages, Parts, Permissions
File-based JSON files Session diffs, share metadata
Git-based Bare Git repos Checkpoints/snapshots of worktree state

📊 Core Data Hierarchy

Project (1)
├── id: git root commit hash
├── worktree: /path/to/repo
├── sandboxes: string[] (additional directories)
├── permission: Ruleset
└── Session (N)
    ├── id, slug, parentID (for forks)
    ├── directory: working dir
    ├── permission?: Ruleset (override)
    └── Message (N)
        ├── role: "user" | "assistant"
        ├── agent: string (which agent handled this)
        ├── model: { providerID, modelID }
        ├── tokens, cost
        └── Part (N)
            ├── type: "text" | "tool" | "reasoning" | "step-start" | "step-finish" | "snapshot" | "patch" | "compaction" | "subtask"
            └── state: pending | running | completed | error (for tools)

🔗 Key Relationships

Project ↔ Sessions

  • Cardinality: 1:N with cascade delete
  • Storage: project_id FK in session table
  • Location: ~/.local/share/opencode/opencode.db

Sessions ↔ Checkpoints (Snapshots)

  • Mechanism: Git tree hashes stored in step-start/step-finish parts
  • Location: ~/.local/share/opencode/snapshot/{project_id}/ (bare git repo)
  • Trigger: Every reasoning step boundary
  • Retention: 7-day auto-prune via scheduled GC

Project ↔ Worktrees ↔ Sandboxes

Project.Info = {
  worktree: string,      // Primary git worktree
  sandboxes: string[],   // Additional isolated directories
}
  • Worktree: Main git repository root
  • Sandboxes: Additional directories for multi-root workspaces
  • Session.directory: Determines which sandbox/worktree a session operates in

📈 Growth Model: How Storage Expands Per Request

Per User Request (one turn):

Component Size Estimate Growth Trigger
User Message ~200-500 bytes 1 per request
User Parts ~100B + content Text/file attachments
Assistant Message ~500 bytes 1 per response
Assistant Parts Variable Text, tools, reasoning
Snapshots ~1-100 KB Per step boundary
Tool Parts ~200-500B + I/O Per tool execution

Compaction Strategy

Overflow Detection:

const usable = model.limit.input - reserved  // reserved ≈ 20K tokens
if (tokenCount >= usable) → trigger compaction

Pruning Algorithm:

  1. Scan messages backwards
  2. Accumulate tool output token estimates
  3. After 40K tokens, mark old tool outputs for pruning
  4. If >20K found, compact them
  5. Render as "[Old tool result content cleared]"

🧠 Agent Knowledge Hierarchy

Critical Finding: Agents Are Stateless

Aspect Behavior
Per-agent persistence None - no dedicated storage
Agent tracking Message.agent field records which agent handled each turn
Knowledge sharing All messages visible to all agents within a session
Isolation None - any agent reads full session history
Permissions Merged: defaults → config → user overrides → session override

Built-in Agents

Agent Mode Purpose
build primary Default agent (full permissions)
plan primary Planning mode (deny edits except plan files)
general subagent Complex task delegation
explore subagent Codebase exploration
compaction hidden Session summarization
title hidden Title generation
summary hidden Summarization

💾 Storage Locations

~/.local/share/opencode/
├── opencode.db                    # SQLite database (all tables)
├── storage/
│   ├── session_diff/{id}.json     # Session-level diffs
│   └── share/{id}.json            # Share metadata
└── snapshot/{project_id}/         # Bare git repos per project
    └── (git objects for checkpoints)

⚠️ Open Questions & Potential Issues

  1. Snapshot Orphaning: No explicit cleanup when sessions deleted; relies on git gc
  2. Sandbox Usage: Unclear when sessions use sandboxes vs. primary worktree
  3. Fork Divergence: Forked sessions reference parent snapshots; unclear how divergence is tracked
  4. Compaction Races: What happens if user sends message during compaction?
  5. Permission Inheritance: When does session permission override project default?

📁 Key Files for Deep Dives

Concern Primary Files
Project lifecycle src/project/project.ts, src/project/instance.ts
Session management src/session/index.ts, src/session/processor.ts
Message/Part types src/session/message-v2.ts
Compaction logic src/session/compaction.ts
Snapshots src/snapshot/index.ts
Schema src/storage/schema.sql.ts

Would you like me to dive deeper into any specific area? For example:

  • Trace the exact flow of a request through the persistence layer
  • Analyze the compaction algorithm in detail
  • Examine how snapshots are created and restored
  • Review the permission merging logic
20,357in / 1,575out / 118reasoning$0.0183