Dhruvil Patel dhruvilp

Here's a complete, battle-tested end-to-end script specifically designed for fine-tuning the MXFP4-quantized MoE GPT-oss-20B model on your 4×A10G (96GB) setup. This leverages QLoRA for memory efficiency while handling MXFP4 quantization properly.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Fine-tune MXFP4-quantized MoE GPT-oss-20B with QLoRA
Hardware: 4× NVIDIA A10G (24GB VRAM each)
Key Tech: bitsandbytes (MXFP4), PEFT (QLoRA), FlashAttention-2, DeepSpeed ZeRO-3
"""

	"""
	The most atomic way to train and inference a GPT in pure, dependency-free Python.
	This file is the complete algorithm.
	Everything else is just efficiency.

	@karpathy
	"""

	import os # os.path.exists
	import math # math.log, math.exp

	# Use an AWS Deep Learning Container (DLC) as a base or a vLLM specific image
	# Ensure the base image has the necessary CUDA drivers and PyTorch
	FROM vllm/vllm-openai:latest # Or a specific version that matches your CUDA

	# Copy the pre-downloaded model weights into the container image
	COPY /mnt/models/granite-docling-258M /app/local_model

	WORKDIR /app

	# The entrypoint command will use the local directory path for the --model argument

	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	import time


	model_path = './gpt-oss-model-local'

	quantization_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_use_double_quant=True,

	import asyncio
	import json
	import os
	from base64 import b64decode
	from typing import List, Dict, Optional, Any
	from pydantic import BaseModel, Field

	from crawl4ai import (
	AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode,
	JsonCssExtractionStrategy, LLMExtractionStrategy, LLMConfig,

	nference Providers
	NEW

	Fireworks
	Text Generation
	Reset
	Examples
	Input a message to start chatting with deepseek-ai/DeepSeek-V3-0324.
	How can I convert an app running on tomcat Catalina 8 server to spring boot app with jdk 17

	from together import Together
	client = Together(api_key = TOGETHER_API_KEY)

	question = "Which is larger 9.9 or 9.11?"

	thought = client.chat.completions.create(
	model="deepseek-ai/DeepSeek-R1",
	messages=[{"role": "user", "content": question}],
	stop = ['</think>']
	)

	# pip install openai
	import os
	from openai import OpenAI

	aiml_api_key ='<YOUR_AIML_API_KEY>'

	client = OpenAI(
	api_key=aiml_api_key,
	base_url="https://api.aimlapi.com",
	)

	"age";"job";"marital";"education";"default";"housing";"loan";"contact";"month";"day_of_week";"duration";"campaign";"pdays";"previous";"poutcome";"emp.var.rate";"cons.price.idx";"cons.conf.idx";"euribor3m";"nr.employed";"y"
	56;"housemaid";"married";"basic.4y";"no";"no";"no";"telephone";"may";"mon";261;1;999;0;"nonexistent";1.1;93.994;-36.4;4.857;5191;"no"
	57;"services";"married";"high.school";"unknown";"no";"no";"telephone";"may";"mon";149;1;999;0;"nonexistent";1.1;93.994;-36.4;4.857;5191;"no"
	37;"services";"married";"high.school";"no";"yes";"no";"telephone";"may";"mon";226;1;999;0;"nonexistent";1.1;93.994;-36.4;4.857;5191;"no"
	40;"admin.";"married";"basic.6y";"no";"no";"no";"telephone";"may";"mon";151;1;999;0;"nonexistent";1.1;93.994;-36.4;4.857;5191;"no"
	56;"services";"married";"high.school";"no";"no";"yes";"telephone";"may";"mon";307;1;999;0;"nonexistent";1.1;93.994;-36.4;4.857;5191;"no"
	45;"services";"married";"basic.9y";"unknown";"no";"no";"telephone";"may";"mon";198;1;999;0;"nonexistent";1.1;93.994;-36.4

	If you're building a SaaS in 2023:

	◆ framework: Next.js
	◆ ui: @shadcn/ui + TailwindCSS
	◆ redis/queues: Upstash
	◆ time-series data & charts: Tinybird + Tremor
	◆ ORM: Prisma
	◆ auth: NextAuth.js
	◆ database: PlanetScale
	◆ emails: Resend