Jason Lu Nottlespike

Eating Unicorns h/eng

Nottlespike / setup-guides.sh

Created February 6, 2026 23:39

Set up agent guides. AGENTS.md CLAUDE.md GEMINI.md in one script

	#!/usr/bin/env bash
	# setup-agent-guides.sh
	# ---------------------
	# Bootstraps agent guide files for a repository.
	#
	# This script is intended for "new repo setup" and repeat-safe updates:
	# 1) Reads a source guide file (`.md` by default, or a provided path)
	# 2) Copies it to `AGENTS.md` (canonical guide file)
	# 3) Creates `CLAUDE.md` and `GEMINI.md` aliases to `AGENTS.md`
	# 4) Falls back to plain file copies if symlinks are unavailable

Nottlespike / update_agents.sh

Last active February 4, 2026 13:57

update_agents.sh: regenerate AGENTS.md with Claude (Opus 4.5 default)

	#!/usr/bin/env bash
	set -euo pipefail

	# update_agents.sh
	# Regenerates agents.md by sending (existing agents.md + new info) to Claude using a fixed system prompt.
	#
	# Requirements:
	# - bash, curl, jq
	# - ANTHROPIC_API_KEY environment variable set
	#

Nottlespike / pa.metal

Created January 26, 2026 00:05

	// paged_attention.metal - Paged Attention for Apple Silicon
	//
	// Implements vLLM-style paged attention adapted for Metal simdgroup architecture.
	// Paged attention decouples logical token positions from physical memory layout,
	// enabling efficient batch serving with variable-length sequences.
	//
	// Key differences from flash_attention.metal:
	// - KV cache is organized in fixed-size blocks (pages)
	// - Block tables map logical block indices to physical block addresses
	// - Each sequence can have a different context length

Nottlespike / nvfp4_dynamic_range_comparison.py

Last active December 31, 2025 09:49

MLX "NVFP4" vs NVFP4

	"""
	Accurate comparison of MLX nvfp4 vs NVIDIA NVFP4 implementation.

	Key architectural difference:
	NVIDIA NVFP4 uses a TWO-LEVEL SCALING strategy:
	1. Global FP32 per-tensor scale: s_enc = (6 * 448) / tensor_amax
	2. Local E4M3 per-block scale: one scale per 16 elements

	MLX appears to use only single-level E4M3 block scaling without the FP32 tensor scale.

Nottlespike / 500-tasks.yaml

Created December 31, 2025 01:24

	# P100 Extended Implementation Tasks - 100+ tasks for full agent utilization
	# Tesla P100 (GP100) - 56 SMs, 3584 CUDA cores, 16GB HBM2 @ 732 GB/s

	tasks:
	# ============================================
	# CUDA Kernels (P0 - Critical) - 20 tasks
	# ============================================

	- name: kernel-vecadd-sm60
	prompt: \|

Nottlespike / Patchy-LICENSE-2.0.txt

Last active May 17, 2024 22:31

	Patchy 2.0 License with OpenAI and California State Exclusion
	Version 2.0, May 2024

	TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

	1. Definitions.

	"License" shall mean the terms and conditions for use, reproduction,
	and distribution as defined by Sections 1 through 9 of this document.