the QLp structure used to occupy 24 bytes on amd64.
with some rearranging the fields we can get it to 16 bytes,
saving 8K in the data section for the 1024 preallocated
structs in the ql arena.
the rest of the changes are of cosmetic nature:
- getqlp() zeros the next pointer, so there is no need to set
it when queueing the entry.
- always explicitely compare pointers to nil.
- delete unused code from ape's qlock.c