Skip to content

Instantly share code, notes, and snippets.

@rtldg
Last active December 2, 2025 19:22
Show Gist options
  • Select an option

  • Save rtldg/91dd76b65748540717ed6f88d95a41b1 to your computer and use it in GitHub Desktop.

Select an option

Save rtldg/91dd76b65748540717ed6f88d95a41b1 to your computer and use it in GitHub Desktop.
Notes so I remember how calling conventions work

Resources

Random Notes

  • Right before a call, the stack must be aligned to 16-bytes.

    • (Right after a call, (rsp&8) == 8)
  • In the called function: A frame pointer (RBP) is often used to realign the stack to 16-bytes.

    • push rbp + mov rsp, rbp
  • ABI can return objects.

    • If it spills to the stack then inRDI = out-object-address and outRAX = inRDI.
      • For thiscall functions: inRDI = out-object-address and inRSI = this and outRAX = inRDI.
    • Object-unpacking into registers is a MESS. You'll have to read the System V document.
    • A struct { a: f32, b: f32, c: f32, d: f32 } will be returned like this:
      XMM0.lo64.lo32 = a;
      XMM0.lo64.hi32 = b;
      XMM1.lo64.lo32 = c;
      XMM1.lo64.hi32 = d;
      
    • A struct { a: f64, b: f64 } will be returned like this:
      XMM0.lo64 = a;
      XMM1.lo64 = b;
      
  • What is the "red zone"?

    • Signals arrive asynchronously in Linux (& Unix in general), which means your function could be mid-execution when a signal arrives, which causes it to jump to a signal handler*. Doesn't sound great, does it? Well, the signal handler reuses your stack, but it doesn't want to clobber your stack-variables. A "leaf function" is a minimized type of function that even avoids allocating stack space (which is slight overhead). A leaf function is free to use anything from rsp-0 to rsp-128 without moving the stack pointer. So if a signal arrives in the middle of a leaf function, it has to avoid using anything above rsp-128. So now your leaf function can be minimal and your signal handler won't clobber its variables.

Notes so I remember how the x86_64 Windows calling convention "shadow space" is used

From Wikipedia's x86 calling conventions:

In the Microsoft x64 calling convention, it is the caller's responsibility to allocate 32 bytes of "shadow space" on the stack right before calling the function (regardless of the actual number of parameters used), and to pop the stack after the call.

...

Stack aligned on 16 bytes. 32 bytes shadow space on stack.

From Microsoft's x64 Software Conventions / Stack Allocation

Note that space is always allocated for the register parameters, even if the parameters themselves are never homed to the stack; a callee is guaranteed that space has been allocated for all its parameters.

...

The stack will always be maintained 16-byte aligned, except within the prolog (for example, after the return address is pushed), and except where indicated in Function Types for a certain class of frame functions.

Okay, so we need at least sub rsp, 32 for "shadow space" but since a call FunctionHere will also put the return address on the stack, we'll need to sub/push another 8 bytes to the stack to align it to 16 bytes.

TL;DR for every 'caller':

  • allocate 32 bytes for "shadow space"
  • optional: allocate X bytes (round up to a 16-byte multiple) for function parameters that overflow onto the stack
    • ((num_stack_params * 8) + 15) / 16
    • This could mean just pushing an extra/random register, or allocating the stack space at the beginning of the function when allocating "shadow space" and such.
  • because call FunctionHere will push the return address (8 bytes) to the stack, allocate an extra 8 bytes in the new function so the stack is aligned to 16 bytes.

The options usually seen are:

function:
    ; push a callee-saved register which this function will clobber & the push will also realign stack to 16 bytes
    push rbx
    ; allocate "shadow space" since we're going to be calling something (that needs it / non-leaf func / etc)
    sub rsp, 32

    ;
    ; do stuff here like clobber the saved register & also calling something
    ;

    ; free the "shadow space"
    add rsp, 32
    ; welcome back callee-saved register
    pop rbx
    ; and return
    ret

or

function:
    ; allocate "shadow space" since we're going to be calling something
    ; & also allocate another 8 bytes so we realign the stack to 16 bytes
    sub rsp, 32+8 ; sometimes you see things like `sub rsp, 5*8`

    ;
    ; do stuff here like calling something
    ;

    ; free the "shadow space" and the extra 8 bytes we used to realign the stack
    add rsp, 32+8 ; sometimes you see things like `add rsp, 5*8`
    ; and return
    ret

Let's check out the beginning of kernel32!WriteFile:

WriteFile:
    ; cache some registers in the "shadow stack" (and it's interesting that they skip `[stack+8]`)
    mov qword ptr ss:[rsp+0x10],rbx
    mov qword ptr ss:[rsp+0x18],rsi
    mov qword ptr ss:[rsp+0x20],r9
    ; cache rdi since it will be clobbered -- THIS REALIGNS THE STACK TO 16 bytes
    push rdi
    ; allocate "shadow space" (0x20) for subsequent calls & allocate an extra 0x40 bytes for stack variables
    sub rsp,0x60

Here's what kernel32!WriteConsoleA looks like:

WriteConsoleA:
    ; + 32 bytes to allocate "shadow space"
    ; + 16 bytes since we need to put an argument on the stack for
    ;   the inner-WriteConsoleA function & then round up to a 16-byte multiple
    ; + 8 bytes to realign the stack once the return address is on the stack after the `call`
    ; = 56 (0x38)
    sub rsp,0x38
    ; 5th parameter to the inner-WriteConsoleA function
    mov byte ptr ss:[rsp+0x20],0x0
    call kernelbase.7FFB6CDC6D78
    test eax,eax
    js kernelbase.7FFB6CDC6D59
    mov eax,0x1
    jmp kernelbase.7FFB6CDC6D69
    mov ecx,eax
    call qword ptr ds:[<RtlSetLastWin32ErrorAndNtStatusFromNtStatus>]
    nop dword ptr ds:[rax+rax],eax
    xor eax,eax
    ; free "shadow space" plus the 8 bytes used to realign the stack to 16 bytes
    add rsp,0x38
    ret

Assemblers (FASM, at least) often have their function invocation macros automatically handle the "shadow stack", and even allocating enough for multiple functions you call.

Short example for FASM (from https://flatassembler.net/docs.php?article=win32#1.4):

    invoke glVertex3d,float 0.6,float -0.6,float 0.0
    invoke glVertex2f,float dword 0.1,float dword 0.2

; The stack space for parameters are allocated before each call and freed immediately after it.
; However it is possible to allocate this space just once for all the calls inside some
; given block of code, for this purpose there are frame and endf macros provided. They
; should be used to enclose a block, inside which the RSP register is not altered between
; the procedure calls and they prevent each call from allocating stack space for parameters,
; as it is reserved just once by the frame macro and then freed at the end by the endf macro.

    frame ; allocate stack space just once
        invoke TranslateMessage,msg
        invoke DispatchMessage,msg
    endf

FASM will also automatically prepend sub rsp, 8 if you're using the .code macro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment