Skip to content

Instantly share code, notes, and snippets.

@d-oit
Last active October 18, 2025 08:04
Show Gist options
  • Select an option

  • Save d-oit/f9a73baa06c387d4c2f1bff6b01849d2 to your computer and use it in GitHub Desktop.

Select an option

Save d-oit/f9a73baa06c387d4c2f1bff6b01849d2 to your computer and use it in GitHub Desktop.
A2A Agent Architecture — Self-Learning, Decentralized System

A2A Agent Architecture — UPER-S + GOAP + Self-Learning with Hierarchical Reasoning

1. Overview

This document defines a decentralized agent system using:

  • A2A (Agent-to-Agent) Communication
  • UPER-S Methodology (Understand, Plan, Execute, Review, Secure)
  • GOAP (Goal-Oriented Action Planning) for decision making
  • Hierarchical Self-Learning Architecture with feedback loops
  • Candle ML Framework for neural network-based learning
  • Testdata-Builder Pattern for deterministic testing

The architecture ensures autonomous coordination between agents using a shared SQLite knowledge space and structured task synchronization with continuous learning from user interactions.


2. Core Methodologies

UPER-S Execution Cycle

  • Understand: Gather context from project space, agent state, current task queue, and learned patterns.
  • Plan: Determine optimal next action using GOAP with learned priorities and check for task conflicts.
  • Execute: Claim and run the selected task using appropriate ML models.
  • Review: Verify output, update knowledge, and adjust learned patterns.
  • Secure: Apply cleanup, enforce invariants, and ensure data integrity.

GOAP Planning Loop with Learning

GOAP agents select and sequence actions based on goals, current state, effects, and learned preferences.

  • Agents have goals with dynamic priority weights learned from outcomes
  • Each goal has preconditions (what must be true) and effects (what it achieves)
  • Actions are selected using both static rules and learned patterns

3. Enhanced A2A Coordination Model

3.1 A2A Communication Principles

  • No centralized controller
  • All agents write and read from a shared SQLite knowledge base
  • Each agent must check task state and learned priorities before acting
  • Agents wait, reschedule, or preempt based on priority rules
State Action Agent Behavior
IDLE Check conflicts + priorities Claim if optimal
WAITING Monitor heartbeat + learn patterns Resume when optimal
RUNNING Maintain heartbeat, execute task Exclusive execution
STALE Cleanup takes over Other agent can reclaim
COMPLETED Release and log for learning Update context and goals

3.2 Enhanced Task Synchronization

# Priority-based Task Management
class TaskPriority:
    CRITICAL = 100  # Deployment failures, security issues
    HIGH = 75       # Test failures, build breaks
    MEDIUM = 50     # Regular builds, deployments  
    LOW = 25        # Analysis, optimizations
    BACKGROUND = 10 # Logging, metrics, cleanup

class TaskClaim:
    def can_preempt(self, current_task, new_task):
        priority_diff = new_task.priority - current_task.priority
        min_priority_gap = self.learned_preemption_threshold
        return priority_diff >= min_priority_gap

4. Hierarchical Self-Learning Architecture

4.1 Architecture Understanding Layer

graph TB
    subgraph "Architecture Understanding"
        ARCHDET[Architecture Detector<br/>MVC, Microservice, Layered]
        MODULAR[Module Analyzer<br/>group related files]
        LAYER[Layer Detector<br/>UI, Business, Data]
        BOUNDARY[Boundary Detector<br/>module interfaces]
    end
    
    subgraph "Knowledge Graph"
        MODULES[(Module Graph<br/>dependencies, relationships)]
        PATTERNS[(Pattern Library<br/>learned architectures)]
        CONTEXT[(Context Map<br/>entity associations)]
        USAGE[(Usage Patterns<br/>query→result success)]
    end
    
    ARCHDET --> PATTERNS
    MODULAR --> MODULES
    LAYER --> MODULES
    BOUNDARY --> MODULES
Loading

4.2 Self-Learning Engine Implementation

// Using Candle framework for neural learning
use candle_core::{Device, Tensor, DType};
use candle_nn::{Module, Optimizer, VarBuilder};

struct SelfLearningEngine {
    // Neural network for pattern recognition
    pattern_net: candle_nn::Sequential,
    // Context association model
    context_net: candle_nn::Sequential,
    optimizer: candle_nn::AdamW,
    feedback_buffer: Vec<LearningSample>,
}

impl SelfLearningEngine {
    fn process_feedback(&mut self, sample: &LearningSample) -> Result<()> {
        // Convert feedback to tensor
        let features = self.encode_feedback(sample)?;
        let targets = self.compute_targets(sample)?;
        
        // Forward pass
        let predictions = self.pattern_net.forward(&features)?;
        
        // Compute loss and backward pass
        let loss = self.loss_fn(&predictions, &targets)?;
        self.optimizer.backward_step(&loss)?;
        
        // Update knowledge graph
        self.update_knowledge_graph(sample, &predictions)?;
        Ok(())
    }
    
    fn predict_relevance(&self, query: &Query, context: &Context) -> Result<f32> {
        let features = self.encode_query_context(query, context)?;
        let output = self.pattern_net.forward(&features)?;
        Ok(output.get(0).unwrap.to_scalar::<f32>()?)
    }
}

4.3 Enhanced SQLite Knowledge Space

-- Core Tables with Learning Extensions
CREATE TABLE architecture_modules (
    id INTEGER PRIMARY KEY,
    module_name TEXT,
    module_type TEXT, -- 'feature', 'layer', 'utility'
    layer_classification TEXT, -- 'presentation', 'business', 'persistence'
    dependencies JSON, -- Module relationships
    confidence_score REAL,
    last_updated TIMESTAMP
);

CREATE TABLE learned_patterns (
    id INTEGER PRIMARY KEY,
    pattern_type TEXT, -- 'architectural', 'usage', 'query'
    pattern_data JSON,
    confidence REAL,
    usage_count INTEGER,
    success_rate REAL,
    last_verified TIMESTAMP
);

CREATE TABLE feedback_loops (
    id INTEGER PRIMARY KEY,
    query_hash TEXT,
    result_clicked TEXT,
    dwell_time_ms INTEGER,
    success_score REAL, -- -1.0 to 1.0
    context_features JSON,
    learned_insight TEXT,
    created_at TIMESTAMP
);

CREATE TABLE model_performance (
    model_name TEXT,
    task_type TEXT,
    architecture_pattern TEXT,
    success_rate REAL,
    avg_latency_ms INTEGER,
    context_understanding_score REAL,
    last_updated TIMESTAMP
);

5. Enhanced GOAP Agents with Hierarchical Reasoning

5.1 Agent Types with Learning Capabilities

Agent Type Goal Preconditions Effects Learning Component
Builder Agent Build No active build, dependencies ready Build artifacts created Learns build patterns and optimizations
Tester Agent Test Build successful, test env ready Test results recorded Learns test priorities and flakiness patterns
Deployer Agent Deploy Tests passed, env available Deployment executed Learns deployment strategies and rollback triggers
Analyzer Agent AnalyzeArchitecture Code changes detected Architecture graph updated Hierarchical pattern detection
Learning Agent ProcessFeedback New user interactions Knowledge graphs updated Neural network training
Router Agent RouteModel LLM request received Optimal model selected Learns model performance per context
Cleaner Agent Cleanup Stale task detected Resources reclaimed Learns optimal retention policies

5.2 Enhanced GOAP Loop with Hierarchical Reasoning

class LearningGOAPAgent:
    def plan_with_learning(self, goal, world_state):
        # Phase 1: Architecture-aware planning
        architecture_context = self.analyzer.get_architecture_context(goal)
        relevant_modules = self.scope_to_modules(goal, architecture_context)
        
        # Phase 2: Learned priority adjustment
        learned_priorities = self.learning_engine.get_goal_priorities(
            goal, world_state, architecture_context
        )
        
        # Phase 3: Multi-strategy action planning
        candidates = self.generate_action_candidates(relevant_modules)
        ranked_actions = self.rank_with_learned_patterns(candidates)
        
        # Phase 4: Model selection based on architecture
        optimal_model = self.router.select_model_for_context(
            goal, architecture_context
        )
        
        return ranked_actions, optimal_model

    def execute_with_feedback(self, action, model):
        result = super().execute(action, model)
        
        # Record execution for learning
        feedback = ExecutionFeedback(
            action=action,
            model_used=model,
            architecture_context=self.current_architecture_context,
            outcome=result.success,
            performance_metrics=result.metrics
        )
        self.learning_engine.record_feedback(feedback)
        
        return result

6. Multi-Model Orchestration with Architecture Awareness

6.1 Context-Aware Model Routing

# Enhanced Model Routing with Architecture Patterns
model_routing:
  planning:
    complex_architecture: "gpt-4"
    simple_monolith: "claude-3"
    microservices: "local-llm+graph-context"
  code_generation:
    frontend: "claude-3"
    backend: "gpt-4"
    infrastructure: "local-llm"
  analysis:
    architectural: "gpt-4+graph-analysis"
    code_quality: "claude-3"
    performance: "local-llm"
  fallback_strategy: "architecture_aware_fallback"

6.2 Architecture Detection Agent

class ArchitectureDetectionAgent:
    def detect_architecture_patterns(self, codebase):
        # Level 1: Directory and file structure analysis
        module_structure = self.analyze_module_structure(codebase)
        
        # Level 2: Dependency graph analysis  
        dependency_graph = self.build_dependency_graph(codebase)
        
        # Level 3: Architectural pattern matching
        patterns = self.match_known_patterns(module_structure, dependency_graph)
        
        # Level 4: Boundary detection
        boundaries = self.detect_module_boundaries(patterns, dependency_graph)
        
        # Update knowledge graph
        self.update_architecture_knowledge(patterns, boundaries)
        
        return ArchitectureContext(
            patterns=patterns,
            boundaries=boundaries,
            modules=module_structure,
            confidence_scores=self.calculate_confidence(patterns)
        )

7. Implementation Roadmap

Phase 1: Foundation (Weeks 1-4)

  • Enhanced SQLite schema with architecture tables
  • Basic hierarchical reasoning agents
  • Architecture detection pipeline
  • Candle framework integration

Phase 2: Learning Core (Weeks 5-8)

  • Neural network models for pattern recognition
  • Feedback collection and processing system
  • Knowledge graph population and querying
  • Basic self-learning loops

Phase 3: Advanced Reasoning (Weeks 9-12)

  • Multi-level architecture understanding
  • Context-aware model routing
  • Advanced GOAP with learned priorities
  • Performance optimization and scaling

Phase 4: Production Refinement (Weeks 13-16)

  • Advanced monitoring with architecture metrics
  • Security hardening for ML components
  • Load testing and optimization
  • Documentation and deployment automation

8. Enhanced Architecture Diagram

graph TD
    subgraph "Enhanced SQLite Knowledge Space"
        AgentsTable[agents]
        TasksTable[tasks + learned_priority]
        ArchModules[architecture_modules]
        Patterns[learned_patterns]
        Feedback[feedback_loops]
        ModelPerf[model_performance]
    end

    subgraph "Core Agent Layer"
        Builder[Builder Agent]
        Tester[Tester Agent]
        Deployer[Deployer Agent]
        Analyzer[Analyzer Agent<br/>Architecture Detection]
        Learner[Learning Agent<br/>Candle ML Engine]
        Router[Router Agent]
        Cleaner[Cleaner Agent]
    end
    
    subgraph "Hierarchical Reasoning"
        ArchDetect[Architecture Detector]
        ModuleAnalyzer[Module Analyzer]
        BoundaryDetect[Boundary Detector]
        PatternMatcher[Pattern Matcher]
    end
    
    ArchDetect --> ArchModules
    ModuleAnalyzer --> ArchModules
    BoundaryDetect --> ArchModules
    PatternMatcher --> Patterns
    
    Analyzer --> ArchDetect
    Analyzer --> ModuleAnalyzer
    Analyzer --> BoundaryDetect
    Analyzer --> PatternMatcher
    
    Learner --> Feedback
    Learner --> Patterns
    Learner --> ModelPerf
    
    Router --> ModelPerf
    Router --> Patterns
    
    Builder --> TasksTable
    Tester --> TasksTable
    Deployer --> TasksTable
    Analyzer --> ArchModules
    Learner --> Feedback
    Router --> ModelPerf
    
    classDef knowledge fill:#fff3e0,stroke:#e65100
    classDef agent fill:#e3f2fd,stroke:#1565c0
    classDef reasoning fill:#f3e5f5,stroke:#6a1b9a
    
    class AgentsTable,TasksTable,ArchModules,Patterns,Feedback,ModelPerf knowledge
    class Builder,Tester,Deployer,Analyzer,Learner,Router,Cleaner agent
    class ArchDetect,ModuleAnalyzer,BoundaryDetect,PatternMatcher reasoning
Loading

9. Self-Learning Feedback Loop Implementation

9.1 Continuous Learning Pipeline

// Candle-based learning implementation
impl LearningAgent {
    pub fn continuous_learning_loop(&mut self) -> Result<()> {
        loop {
            // Collect new feedback
            let new_samples = self.feedback_collector.collect_recent()?;
            
            if !new_samples.is_empty() {
                // Batch process for efficiency
                let batch = self.create_training_batch(new_samples)?;
                
                // Train pattern recognition network
                self.pattern_trainer.train_batch(&batch)?;
                
                // Update context association model
                self.context_trainer.update_associations(&batch)?;
                
                // Adjust architecture pattern confidence
                self.architecture_analyzer.refine_patterns(&batch)?;
                
                // Update model performance metrics
                self.router.update_model_performance(&batch)?;
            }
            
            // Sleep with exponential backoff based on feedback volume
            self.adaptive_sleep();
        }
    }
    
    fn create_training_batch(&self, samples: Vec<FeedbackSample>) -> Result<TrainingBatch> {
        let features: Vec<Tensor> = samples.iter()
            .map(|s| self.encode_feedback_features(s))
            .collect::<Result<_>>()?;
            
        let targets: Vec<Tensor> = samples.iter()
            .map(|s| self.encode_learning_targets(s))
            .collect::<Result<_>>()?;
            
        Ok(TrainingBatch { features, targets, samples })
    }
}

9.2 Accuracy Improvement Metrics

-- Learning progress monitoring
CREATE TABLE learning_metrics (
    week INTEGER,
    query_type TEXT,
    architecture_pattern TEXT,
    initial_accuracy REAL,
    current_accuracy REAL,
    learning_rate REAL,
    sample_count INTEGER,
    measured_at TIMESTAMP
);

10. Critical Success Factors

Technical Requirements

  1. Candle Framework Mastery - Efficient tensor operations and model management
  2. Architecture Pattern Library - Comprehensive known-pattern database
  3. Feedback Pipeline Reliability - Robust collection and processing
  4. Knowledge Graph Performance - Efficient querying of complex relationships

Performance Metrics

  • Architecture Detection Accuracy: Target >90% pattern recognition
  • Learning Convergence: <4 weeks to 75% accuracy from cold start
  • Query Response Time: <2s for architecture-aware searches
  • Model Routing Accuracy: >85% optimal model selection

Risk Mitigation

  • Cold Start Problem: Seed with common architecture patterns
  • Overfitting: Regular validation against diverse codebases
  • Performance Degradation: Continuous monitoring and rollback capability
  • Knowledge Graph Bloat: Automated pruning and importance scoring

11. Future Extensions

  • Cross-Project Learning - Transfer learned patterns between similar projects
  • Predictive Architecture - Suggest architecture improvements based on patterns
  • Real-time Collaboration - Multiple agents working on large-scale refactoring
  • Advanced Neural Architectures - Transformer-based code understanding
  • Federated Learning - Privacy-preserving learning across organizations

12. Summary

True Hierarchical Reasoning - Multi-level architecture understanding
Continuous Self-Learning - Candle-based neural learning from feedback
Architecture-Aware Coordination - Context-sensitive task execution
Adaptive Model Routing - Optimal LLM selection based on architectural context
Production-Ready Foundation - Testing, monitoring, and security built-in
Scalable Knowledge Graph - Efficient relationship storage and querying

This evolved architecture represents a significant advancement from basic multi-agent systems to an intelligent, self-improving collective that deeply understands software architecture and continuously enhances its performance through learned experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment