Jank-Dankins

Maharshi-Pandya / quant.py

Last active February 7, 2026 08:39

NVFP4 quantization in torch

	import torch


	FP8_AMAX = 448.0
	FP8_DTYPE = torch.float8_e4m3fn

	FP4_AMAX = 6.0
	FP4_DTYPE = getattr(torch, "float4_e2m1fn_x2", torch.uint8)
	# midpoints and the corresponding bins
	# representable positives = [0.0, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 6.0]