shaltielshmid

Recently created

Least recently created

Recently updated

Least recently updated

shaltielshmid / train_math_nemo.py

Last active February 6, 2026 04:11

NeMo-Framework training code from the paper "Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces", for training Mistral-Nemo-Base-2407 or nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base on reasoning data, generated either via gpt-oss-120b or DeepSeek-R1-0528

	MODEL_TO_USE = "NanoV2" # or "MistralNemo"
	IMPORT_MODEL_AND_DATA = False # set to True to import the model from HF hub, only needs to be run once
	REASONING_STYLE = "gpt_oss_120b" # or DeepSeek_R1_0528
	NUM_NODES = 2
	GPUS_PER_NODE = 8

	import nemo_run as run
	from nemo.collections import llm
	from nemo.collections.llm.gpt.model.mistral import MistralModel, MistralNeMoConfig12B
	from nemo.collections.llm.gpt.model.ssm import MambaModel, NemotronNano12Bv2

shaltielshmid / llm-app.cs

Created August 6, 2025 15:17

Reproduction of LLM-over-DNS in C# 10

	#:package ARSoft.Tools.Net@3.6.1
	#:package LLMTornado@3.7.25

	using ARSoft.Tools.Net;
	using ARSoft.Tools.Net.Dns;
	using LlmTornado.Code;
	using LlmTornado;
	using LlmTornado.Chat;

	(string EnvKey, LLmProviders Provider)[] Providers = [