Skip to content

Instantly share code, notes, and snippets.

@shaltielshmid
shaltielshmid / train_math_nemo.py
Last active February 6, 2026 04:11
NeMo-Framework training code from the paper "Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces", for training Mistral-Nemo-Base-2407 or nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base on reasoning data, generated either via gpt-oss-120b or DeepSeek-R1-0528
MODEL_TO_USE = "NanoV2" # or "MistralNemo"
IMPORT_MODEL_AND_DATA = False # set to True to import the model from HF hub, only needs to be run once
REASONING_STYLE = "gpt_oss_120b" # or DeepSeek_R1_0528
NUM_NODES = 2
GPUS_PER_NODE = 8
import nemo_run as run
from nemo.collections import llm
from nemo.collections.llm.gpt.model.mistral import MistralModel, MistralNeMoConfig12B
from nemo.collections.llm.gpt.model.ssm import MambaModel, NemotronNano12Bv2
@shaltielshmid
shaltielshmid / llm-app.cs
Created August 6, 2025 15:17
Reproduction of LLM-over-DNS in C# 10
#:package ARSoft.Tools.Net@3.6.1
#:package LLMTornado@3.7.25
using ARSoft.Tools.Net;
using ARSoft.Tools.Net.Dns;
using LlmTornado.Code;
using LlmTornado;
using LlmTornado.Chat;
(string EnvKey, LLmProviders Provider)[] Providers = [