Skip to content

Instantly share code, notes, and snippets.

View Nottlespike's full-sized avatar

Jason Lu Nottlespike

View GitHub Profile
@Nottlespike
Nottlespike / setup-guides.sh
Created February 6, 2026 23:39
Set up agent guides. AGENTS.md CLAUDE.md GEMINI.md in one script
#!/usr/bin/env bash
# setup-agent-guides.sh
# ---------------------
# Bootstraps agent guide files for a repository.
#
# This script is intended for "new repo setup" and repeat-safe updates:
# 1) Reads a source guide file (`.md` by default, or a provided path)
# 2) Copies it to `AGENTS.md` (canonical guide file)
# 3) Creates `CLAUDE.md` and `GEMINI.md` aliases to `AGENTS.md`
# 4) Falls back to plain file copies if symlinks are unavailable
@Nottlespike
Nottlespike / update_agents.sh
Last active February 4, 2026 13:57
update_agents.sh: regenerate AGENTS.md with Claude (Opus 4.5 default)
#!/usr/bin/env bash
set -euo pipefail
# update_agents.sh
# Regenerates agents.md by sending (existing agents.md + new info) to Claude using a fixed system prompt.
#
# Requirements:
# - bash, curl, jq
# - ANTHROPIC_API_KEY environment variable set
#
// paged_attention.metal - Paged Attention for Apple Silicon
//
// Implements vLLM-style paged attention adapted for Metal simdgroup architecture.
// Paged attention decouples logical token positions from physical memory layout,
// enabling efficient batch serving with variable-length sequences.
//
// Key differences from flash_attention.metal:
// - KV cache is organized in fixed-size blocks (pages)
// - Block tables map logical block indices to physical block addresses
// - Each sequence can have a different context length
@Nottlespike
Nottlespike / nvfp4_dynamic_range_comparison.py
Last active December 31, 2025 09:49
MLX "NVFP4" vs NVFP4
"""
Accurate comparison of MLX nvfp4 vs NVIDIA NVFP4 implementation.
Key architectural difference:
NVIDIA NVFP4 uses a TWO-LEVEL SCALING strategy:
1. Global FP32 per-tensor scale: s_enc = (6 * 448) / tensor_amax
2. Local E4M3 per-block scale: one scale per 16 elements
MLX appears to use only single-level E4M3 block scaling without the FP32 tensor scale.
# P100 Extended Implementation Tasks - 100+ tasks for full agent utilization
# Tesla P100 (GP100) - 56 SMs, 3584 CUDA cores, 16GB HBM2 @ 732 GB/s
tasks:
# ============================================
# CUDA Kernels (P0 - Critical) - 20 tasks
# ============================================
- name: kernel-vecadd-sm60
prompt: |
Patchy 2.0 License with OpenAI and California State Exclusion
Version 2.0, May 2024
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.