The previous version (like the x86 one) used a combination of C and asm code, called from C code to switch the stack. This is problematic, since there is no guarantee what assumptions C code makes about the stack (i.e. it can place any kind of stack pointers into registers or on the stack itself.) The new algorithm returns back to the systemcall entry point in asm, which then calls KiConvertToGuiThread, which is also asm and calls KeSwitchKernelStack ...
To be 100% correct and not rely on assumptions, stack switching can only be done when all previous code - starting with the syscall entry point - is pure asm code, since we can't rely on the C compiler to not use stack addresses in a way that is not transparent. Therefore the new code uses the same mechanism as for normal system calls, returning the address of the asm function KiConvertToGuiThread, which is then called like an Nt* function would be called normally. KiConvertToGuiThread then allocated a new stack, switches to it (which is now fine, since all the code is asm), frees the old stack, calls PsConvertToGuiThread (which now will not try to allocate another stack, since we already have one) and then jumps into the middle of KiSystemCallEntry64, where the system call is handled again.
Also simplify KiSystemCallEntry64 a bit by copying the first parameters into the trap frame, avoiding to allocate additional stack space for the call to KiSystemCallHandler, which now overlaps with the space that is allocated for the Nt* function.
Finally fix the locations where r10 and r11 are stored, which is TrapFrame->Rcx and TrapFrame->EFlags, based on the situation in user mode.