Skip to content

Instantly share code, notes, and snippets.

View DerOeko's full-sized avatar

Samuel DerOeko

View GitHub Profile
@DerOeko
DerOeko / slingshot.md
Last active December 11, 2025 21:00
Scalable Online RL Training System (vLLM + RLAIF) - Architecture Overview

Scalable Online RL Training System (vLLM + RLAIF)

Note: This gist outlines architectural patterns and infrastructure used for research. Source code and proprietary datasets are excluded per lab data policy.

Technical Highlights

Online RL Implementation

  • GRPO (Group Relative Policy Optimization): Implemented group-based baselines for multi-turn agents, removing the need for a separate value network (Critic).
  • Loss Variants: Support for DAPO, BNPO, and DR-GRPO with token-level masking for long-context reasoning.
  • Curriculum Learning: Dynamic CurriculumDatasetCallback to shift training distributions from simple to complex tasks to prevent collapse.
@DerOeko
DerOeko / robolearn.md
Last active December 11, 2025 21:00
Distributed Bayesian Optimization Framework for Robotics - System Design

Distributed Bayesian Optimization & Cognitive Modeling Framework

Note: This gist outlines architectural patterns and infrastructure used for research. Source code and proprietary datasets are excluded per lab data policy.

Technical Highlights

Reinforcement Learning Environment

  • Gymnasium Wrappers: Custom observation-based environments with flexible state transitions for cognitive task modeling.
  • Configurable Rewards: Decoupled reward structure from environment logic to allow rapid iteration on task definitions.