Skip to content

Instantly share code, notes, and snippets.

[2026-02-02 23:00:50] WARNING server_args.py:806: The tool_call_parser 'glm45' is deprecated. Please use 'glm' instead.
[2026-02-02 23:00:52] WARNING server_args.py:1562: Disabling overlap schedule since MambaRadixCache no_buffer is not compatible with overlap schedule currently, try to use --mamba-scheduler-strategy extra_buffer to enable overlap schedule
[2026-02-02 23:00:52] INFO server_args.py:1697: Attention backend not specified. Use fa3 backend by default.
[2026-02-02 23:00:52] WARNING server_args.py:2136: Max running requests is reset to 48 for speculative decoding. You can override this by explicitly setting --max-running-requests.
[2026-02-02 23:00:52] WARNING server_args.py:2145: Spec v2 is enabled for eagle/eagle3 speculative decoding and overlap schedule is turned on.
[2026-02-02 23:00:52] Fail to set RLIMIT_STACK: current limit exceeds maximum limit
[2026-02-02 23:00:52] server_args=ServerArgs(model_path='/mnt/models', tokenizer_path='/mnt/models', tokenizer_mode='auto', tokenizer_worker_num=1,
import torch
from torch import tensor, device
import torch.fx as fx
from torch._dynamo.testing import rand_strided
from math import inf
import torch._inductor.inductor_prims
import torch._dynamo.config
import torch._inductor.config
import torch
from torch import tensor, device
import torch.fx as fx
from torch._dynamo.testing import rand_strided
from math import inf
import torch._inductor.inductor_prims
import torch._dynamo.config
import torch._inductor.config
@naveenkumarmarri
naveenkumarmarri / docker.md
Created September 16, 2021 01:05
Nuke docker images

Removes all docker images, networks and volumes

docker rm $(docker ps -a -q) --force || true \
	&& docker container prune --force \
	&& docker volume prune --force \
	&& docker network prune --force \
	&& docker image prune -a --force