Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save jmchilton/c1c73e3b8dd7c737c7d8daf95da3306a to your computer and use it in GitHub Desktop.

Select an option

Save jmchilton/c1c73e3b8dd7c737c7d8daf95da3306a to your computer and use it in GitHub Desktop.
History Notebooks Phase 10: Agent Integration

History Notebooks Phase 10: Agent Integration

The Problem

History Notebooks capture narrative alongside computation — but right now, only humans write. The notebook is a passive document: the user opens the editor, types markdown, inserts dataset references via toolbox or drag-and-drop, and saves. The AI agent framework (PR #21434, merged Dec 2025) gives Galaxy a multi-agent system with structured output, tool calling, and multi-turn conversation — but it has no surface for writing or editing documents tied to histories. These two systems sit side-by-side with no connection.

The original vision (from THE_PROBLEM_AND_GOAL.md):

History Notebooks are living documents that grow alongside your analysis. As analysis progresses — whether driven by human clicks, agent actions, or conversation between the two — the narrative builds up iteratively.

This creates a "Claude Code for data analysis" experience. The agent doesn't just run tools — it builds up a polished document with rich visualizations, updates figures when parameters change, and refines the presentation in response to human feedback.

Phase 10 bridges the gap: the agent can read the notebook, propose changes, and write new revisions — all while the human stays in control.

What We're Building

A split-view experience where the user sees the notebook editor on one side and a chat panel on the other. The agent can:

  1. Read the current notebook content + history datasets
  2. Propose markdown changes (with diff preview)
  3. Write new revisions when the user approves (tracked as edit_source="agent")
  4. Discuss the analysis in multi-turn conversation, with full context of what's in the history and notebook

The user can accept, reject, or modify agent proposals before they become revisions. Every agent edit creates a new revision — never destructively overwrites.

Infrastructure Already Built

History Notebooks (Phases 1-8, complete)

Backend:

  • HistoryNotebook + HistoryNotebookRevision SQLAlchemy models
  • Revision model with edit_source field: "user" | "agent" | "restore" (16-char column)
  • HistoryNotebookManager with full CRUD + save_new_revision(trans, notebook, payload, edit_source="user")
  • The edit_source parameter is already accepted by the manager but never called with "agent" yet
  • Content stored with raw HID references (hid=N), resolved at read time via resolve_history_markdown()
  • Two-step read pipeline: resolve_history_markdown() (hid→internal_id) then ready_galaxy_markdown_for_export() (internal_id→encoded_id)
  • 9 API endpoints under /api/histories/{history_id}/notebooks
  • prepare-for-page endpoint for Page export with full HID resolution

Frontend:

  • Pinia store (useHistoryNotebookStore) with dirty tracking, save/discard, revision management
  • HistoryNotebookView.vue orchestrator (list mode, editor mode, displayOnly mode, revision panel)
  • HistoryNotebookEditor.vue wrapping MarkdownEditor in history_notebook mode
  • NotebookRevisionList.vue with edit_source labels: "Manual" / "AI" / "Restored"
  • Drag-and-drop from history panel (dataset + collection, inserts hid=N directives)
  • Window Manager integration (rendered Markdown.vue in WinBox iframe)
  • Routes at /histories/:historyId/notebooks[/:notebookId]

Key notebook API endpoints:

Method Path Purpose
GET /api/histories/{id}/notebooks List notebooks
POST /api/histories/{id}/notebooks Create notebook
GET /api/histories/{id}/notebooks/{nbId} Get notebook (content resolved+encoded)
PUT /api/histories/{id}/notebooks/{nbId} Update (creates revision)
GET /api/histories/{id}/notebooks/{nbId}/revisions List revisions
GET /api/histories/{id}/notebooks/{nbId}/revisions/{revId} Get revision content
POST /api/histories/{id}/notebooks/{nbId}/revisions/{revId}/revert Revert to old revision

Agent Framework (PR #21434, merged)

Architecture:

  • BaseGalaxyAgent (ABC) built on pydantic-ai with GalaxyAgentDependencies dataclass injection
  • Dependencies carry: trans, user, config, optional job_manager, dataset_manager, workflow_manager, tool_cache, toolbox, get_agent (for inter-agent calls)
  • AgentRegistry maps agent_type strings → agent classes; singleton in lib/galaxy/agents/__init__.py
  • AgentService in lib/galaxy/managers/agents.py creates dependencies and executes agents
  • 5 registered agents: router, error_analysis, custom_tool, orchestrator, tool_recommendation

Key base class capabilities:

class BaseGalaxyAgent(ABC):
    agent_type: str                          # Class attribute
    deps: GalaxyAgentDependencies            # Injected
    agent: Agent[GalaxyAgentDependencies, T] # pydantic-ai agent

    async def process(query, context) -> AgentResponse
    def _create_agent() -> Agent            # Abstract: define pydantic-ai agent + tools
    def get_system_prompt() -> str          # Abstract: system prompt (typically from prompts/*.md)
    async def _call_agent_from_tool(...)    # Agent-to-agent delegation
    def _get_model() -> model               # Multi-provider: OpenAI, Anthropic, Google, local
    def _get_agent_config(key, default)     # 4-level config cascade

Tool calling pattern (pydantic-ai):

def _create_agent(self):
    agent = Agent(self._get_model(), deps_type=GalaxyAgentDependencies, ...)

    @agent.tool
    async def search_tools(ctx: RunContext[GalaxyAgentDependencies], query: str) -> str:
        toolbox = ctx.deps.toolbox
        # ... search and return results
        return formatted_results

    return agent

Response schema:

class AgentResponse(BaseModel):
    content: str                         # Main response (markdown)
    confidence: ConfidenceLevel          # low|medium|high
    agent_type: str                      # Which agent answered
    suggestions: list[ActionSuggestion]  # Actionable next steps
    metadata: dict[str, Any]             # Token usage, model, etc.
    reasoning: Optional[str]             # Agent's reasoning chain

class ActionSuggestion(BaseModel):
    action_type: ActionType              # tool_run|save_tool|contact_support|view_external|documentation
    description: str
    parameters: dict[str, Any]
    confidence: ConfidenceLevel
    priority: int                        # 1=high, 2=medium, 3=low

Chat API & conversation persistence:

  • POST /api/chat — main endpoint, supports exchange_id for multi-turn, agent_type selection
  • ChatExchange model stores conversations per user (with optional job_id)
  • ChatExchangeMessage stores individual messages as JSON (query + response + agent metadata)
  • GET /api/chat/history — list user's conversations
  • Frontend ChatGXY.vue — full-page chat with conversation sidebar, agent selector, action cards, feedback

Config system:

inference_services:
  default:
    model: "llama-4-scout"
    api_base_url: "http://localhost:4000/v1/"
  notebook_assistant:          # Per-agent config
    model: "anthropic:claude-sonnet-4-5"
    temperature: 0.3

Integration Seams Already Present

  1. edit_source="agent" — the revision model accepts it, the manager accepts the parameter, the frontend already displays "AI" badge in revision list. Just not called yet.

  2. GalaxyAgentDependencies — extensible dataclass. Can add history_notebook_manager: Optional[HistoryNotebookManager] alongside existing dataset_manager, workflow_manager.

  3. ActionType enum — currently has tool_run, save_tool, contact_support, view_external, documentation. Can add notebook-specific action types.

  4. Context dictAgentService.route_and_execute() passes arbitrary context: dict to agents. Can include history_id, notebook_id, notebook_content.

  5. Chat historyChatManager.get_chat_history() returns conversation for multi-turn. Can include notebook context in system prompt.

  6. Markdown infrastructure — both agents and notebooks produce/consume Galaxy-flavored markdown. Agents already render markdown responses. Notebooks store and resolve Galaxy markdown directives.

Key Design Questions

1. Chat Panel Architecture

Option A: Embedded chat panel alongside notebook editor A split-view HistoryNotebookSplit.vue component — editor on left, chat on right. The chat panel is notebook-specific (scoped to one notebook + its history). This is what the original Phase 10 plan sketched.

Option B: Extend ChatGXY with notebook context Add a "Notebook Assistant" mode to the existing ChatGXY page. When activated from a notebook, ChatGXY receives notebook_id + history_id in context and the agent operates on that notebook.

Option C: Window Manager split User opens notebook in main view and ChatGXY in a WinBox window (or vice versa). Communication via shared state (Pinia store) or postMessage.

User Assessment:

  • I think we want to do A & B together in the the MVP (if question below looks like what I suspect it does).
  • When drafting this into a concrete plan - please be sure to include option C as step after the MVP. We've built this on top of the window manager and we should continue in that direction.

User Questions:

  • Please research ChatGXY works and is structured now? Could we not put a ChatGXY window side-by-side with the HistoryMarkdown? I suspect we wouldn't want to rewrite that - but create a pros and cons? If any of the cons are features lacking in ChatGXY - could we add those or leverage the existing Vue components and APIs in a way (composition/specialization).

2. Agent Scope

User reviewed this section and agrees.

What should the notebook agent be able to do?

  • Read current notebook content
  • Read history datasets (names, types, metadata, peek at content)
  • Read history dataset collection structure
  • Propose full notebook rewrites or targeted section edits
  • Insert Galaxy markdown directives with correct hid=N references
  • Suggest tool runs that would add new datasets to the history
  • Explain existing datasets and their relationships
  • Generate summaries, methods sections, figure legends
  • Insert visualizations into a history markdown.

What should it NOT do (at least initially)?

  • Directly run Galaxy tools (that's the orchestrator agent's job)
  • Modify datasets or history metadata
  • Auto-save without user approval

User Assessment: Agreed!

3. Proposal/Apply Workflow

How should agent-proposed changes flow?

Option A: Full replacement proposals Agent proposes complete new notebook content. User sees diff, clicks "Apply" or "Reject". Apply calls save_new_revision(edit_source="agent").

Option B: Section-level patches Agent proposes changes to specific sections (identified by markdown headers or line ranges). Multiple proposals can be pending. User applies individually.

Option C: Streaming insertion Agent streams markdown directly into the editor at cursor position (like GitHub Copilot inline suggestions). User accepts with Tab/Enter or dismisses.

User Assessment: My hunch is all of these are useful. Maybe we start with Option A and then move to Option B or C as user experience optimizations once we have it working.

User Questions:

  • Are there any Vue libraries that would help us do this? Either full diff or the section-level pathches or streaming with Tab/Enter?

  • Do we think current models would be able to decide which of the first two interactions are more appropriate based on a question. How would be prompt them to decide between the two. What would the API look like for this.

4. Conversation Scoping

Should notebook chat conversations be:

Option A: Stored as ChatExchange (existing model) Reuse the chat infrastructure. Add notebook_id FK to ChatExchange. Conversations persist across sessions.

Option B: Stored as notebook revisions with chat metadata Each agent interaction creates a revision. Chat history reconstructed from revision sequence. Simpler model but loses non-notebook-modifying conversation turns.

Option C: Separate model New NotebookChatMessage table linked to notebook. Keeps notebook-specific conversations separate from general ChatGXY conversations.

User Assessment: Probably A works best - I don't see a lot of value in these other two when the A path will be the most heavily used in practice.

User Question: What are foreign-keys like on ChatExchange? Do they link to other artifacts? Would an association object between ChatExchange and HistoryNotebooks buy us anything useful?

5. Agent Registration

Option A: New dedicated agent Register notebook_assistant in the agent registry. Has its own system prompt, tools, and structured output type. Router can delegate to it when notebook context is present.

Option B: Extend existing router Add notebook tools to the router agent. No new agent type — the router handles notebook queries when notebook context is in the request.

Option C: Orchestrator-based The orchestrator coordinates between a "notebook writer" agent and other specialists (error_analysis for failed datasets, tool_recommendation for next steps). More powerful but more complex.

User Assessment: When we scope out a plan for this - lets do Option A in in the MVP and sketch an outline of Option C in subsequent steps of the implementation plan. After Option C is implemented - also add scope out implementing a visualizing history contents agent. That agent should know about common visualizations and how to build and embed them in History Markdown documents.

User Question: What are there ramifications of Option A vs Option B - what are the practical design, runtime, and implementation implications?

6. History Context Injection

How does the agent learn about the history's contents?

Option A: Tool-based discovery Agent has tools like list_history_datasets(history_id), get_dataset_peek(hid), get_dataset_metadata(hid). Agent calls tools as needed. Scales well — only fetches what's relevant.

Option B: Full context in system prompt Inject a summary of all history items (HID, name, type, state, size) into the system prompt. Agent has immediate context but burns tokens on large histories.

Option C: Hybrid System prompt includes a compact summary (HID + name + type for all items). Tools available for deep-dive (peek at content, metadata, collection structure).

User Assessment:

Yeah - I don't see a world where Option B or C seem viable at this time. We will need something else.

User Question:

I think we're waiting on an MCP branch here to do this right but please review PR 21706 - summarized at "/Users/jxc755/projects/repositories/galaxy-brain/vault/research/PR 21706 - Data Analysis Agent Integration.md" and provide me a answer for what this should look like now.

Relevant Files

Notebook System

File What's There
lib/galaxy/model/__init__.py (lines 11374-11461) HistoryNotebook + HistoryNotebookRevision models, edit_source field
lib/galaxy/managers/history_notebooks.py Manager: CRUD, save_new_revision(edit_source=), resolve, prepare-for-page
lib/galaxy/webapps/galaxy/api/history_notebooks.py 9 API endpoints
lib/galaxy/schema/schema.py (lines 4146-4219) Notebook request/response schemas
lib/galaxy/managers/markdown_util.py (lines 1364-1391) resolve_history_markdown(), HID resolution
lib/galaxy/managers/markdown_parse.py VALID_ARGUMENTS with hid= support
client/src/stores/historyNotebookStore.ts Pinia store (dirty tracking, revisions)
client/src/components/HistoryNotebook/HistoryNotebookView.vue Main orchestrator component
client/src/components/HistoryNotebook/HistoryNotebookEditor.vue Editor wrapper
client/src/components/HistoryNotebook/NotebookRevisionList.vue Revision sidebar (shows edit_source labels)
client/src/api/historyNotebooks.ts TypeScript API client

Agent Framework

File What's There
lib/galaxy/agents/__init__.py Agent registry, 5 registered agents
lib/galaxy/agents/base.py BaseGalaxyAgent ABC, GalaxyAgentDependencies, tool calling, retry logic
lib/galaxy/agents/registry.py AgentRegistry (register, get_agent, list_agents)
lib/galaxy/managers/agents.py AgentService (create_dependencies, execute_agent, route_and_execute)
lib/galaxy/agents/router.py QueryRouterAgent (handoff via output functions)
lib/galaxy/agents/tools.py ToolRecommendationAgent (example of @agent.tool pattern with Galaxy toolbox)
lib/galaxy/agents/error_analysis.py ErrorAnalysisAgent (structured output example)
lib/galaxy/schema/agents.py AgentResponse, ActionSuggestion, ActionType, ConfidenceLevel
lib/galaxy/webapps/galaxy/api/chat.py Chat API (multi-turn, exchange_id, agent routing)
lib/galaxy/managers/chat.py ChatManager (ChatExchange persistence, message storage, history)
client/src/components/ChatGXY.vue Full-page chat UI (conversation, agent selector, actions)
client/src/composables/agentActions.ts Frontend action dispatch (tool_run, save_tool, etc.)
lib/galaxy/config/schemas/config_schema.yml inference_services per-agent config

Shared Infrastructure

File What's There
client/src/components/Markdown/MarkdownEditor.vue Shared editor (modes: report, page, history_notebook)
client/src/components/Markdown/Markdown.vue Rendered markdown view (used by Pages, notebook displayOnly)
client/src/components/Markdown/Editor/TextEditor.vue CodeMirror textarea with drag-drop support

Success Criteria

  1. User can chat with an agent that understands their history contents and current notebook
  2. Agent can propose notebook edits that the user previews before applying
  3. Applied agent edits create revisions with edit_source="agent", visible in revision sidebar
  4. Multi-turn conversation persists across page reloads
  5. Agent uses hid=N references (not encoded IDs) when writing notebook content
  6. Agent can reference specific datasets by HID in its explanations
  7. The experience feels like collaborative editing, not a separate chat that happens to modify a file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment