flushing tlb once the index wraps arround is not enougth
as in use pte's can be speculatively loaded. so instead
use invlpg() and explicitely invalidate the tlb of the
page mapped.
this fixes wired mount cache corruption for reads approaching
2MB which is the size of the KMAP window.
invlpg() was broken, using wrong operand type.
remove myaddr() function and replace with myip() function
that receives binary ip address. and don't use string
comparsion for ip addresses... parse and then ipcmp().
for sanity reasons, normalize ip address strings and
reject unparsable ones. done by calling ipalookup()
with a binary ip address.
this implements the server part of mschapv2 with the new
authserver changes.
we also provide AuthInfo for the client now with the
MPPE secret and the authenticator.
this adds new rpc for mschapv2 authentication (21)
deliver the MPPE secret not after the ticket/authenticator
response as cheartext, but include it in the first 128 bit
of the ticket key. and the authenticator in the first 160 bit
of the authenticator random field.
the ::/0 route has the bad side effect of breaking v4 connections
when theres no default route due to v6 mapped v4 addresses. this
might be temporary measure.
windows 7 just drops the default router when it tries to
probe for router reachability but gets a neighbor avertisement
from the router with the router bit clear.
so set the R-flag when sendra is active, which implies that
we are a router.
use OCHAPREPLYLEN instead of sizeof(reply) (no padding).
exit after sending ticket response to force eof as factotum
unconditionally reads tailing secret hash (as of mschap).
the driver doesnt implement multicast filter, but just turns
on promiscuous mode when a multicast address is added. but this
breaks when one actually enables and then disables promiscuous
mode with say, running snoopy.
we have to keep promisc mode active as long as multicast table
is not empty.
broadcast traffic was received back on the wire causing
duplicate address detection to break with dmat proy as
the rewritten broadcasts where observable.
the fix is to just ignore packets from ourselfs received
from the air. devether already handles loopback.
when kernel memory is exhausted, rtl8169replenish() can fail
to plant more receive descriptors and rtl8169receive() would
run over the receive tail and crash on the nil ctlr->rb[x].
rtl8169receive() is called on "Receive Descriptor Unavailable"
and "Packet Underrun" so we will try to replenish descriptors
in the beginning first in case memory was exhausted and memory
is available again and make sure not to run over the tail.
the string encoding functions touch secret key material
in a bunch of places (devtls, devcap), so make sure we do
not leak information by cache timing side channels, making
the encoding and decoding routines constant time.
we also expose the alphabets through encXchr()/decXchr()
functions so caller can find the end of a encoded string
before calling decode function (for libmp).
the base32 encoding was broken in several ways. inputs
lengths of len%5 == [2,3,4] had output truncated and
it was using non-standard alphabet. documenting the alphabet
change in the manpage.
Instead of only using a hash over the whole certificate for
white/black-listing, now we can also use a hash over the
Subject Public Key Info (SPKI) field of the certificate which
contians the public key algorithm and the public key itself.
This allows certificates to be renewed independendtly of the
public key.
X509dump() now prints the public key thumbprint in addition
to the certificate thumbprint.
tlsclient will print the certificate when run with -D flag.
okCertificate() will print the public key thumbprint in its
error string when no match has been found.
getting rid of some functions that take Byte* and instead
pass uchar* and length.
keeping the signature and public key fields in CertX509
as Bits* allows ownership transfer by swapping pointers.
use common code to copy CN from subject field.
in case we continue to send traffic (like ping) with the ap gone,
the sending would keep updating bss->lastseen which prevents the
timeout to happen to switch bss.
yes, it peeks into IP packets to handle fragmentation when sending
onto tunnel ports and does mss clamping. but it can carry arbitrary
ethernet packets just fine (between ethernets).
busybox gunzip fails on empty (offset) huffman tables,
so force one entry.
gzip states in a comment:
The pkzip format requires that at least one distance code exists,
and that at least one bit should be sent even if there is only one
possible code.
dumping hybrid MBR/GPT disks is fine, which can sometimes be found
on USB sticks. but prohibit editing.
however, always barf on disks with dos partitions and missing
protecive MBR partition entry.
- remove arbitrary limits on screen size, just check with badrect()
- post resize when physgscreenr is changed (actualsize ctl command)
- preserve physgscreenr across softscreen flag toggle
- honor panning flag on resize
- fix nil dereference in panning ctl command when scr->gscreen == nil
- use clipr when drawing vga plan 9 console (vgascreenwin())
check for "needkey " error string from auth_userpasswd() in case no
key is pesent in factotum. this used to be a common trap with stand
alone machines that do not have an authentication server setup.
indicate authentication in progress by drawing a white border.
delete unneccesary cruft and simplify the code.
the role=login protocol is ment to replace proto=p9cr in
auth_userpasswd() from libauth to authenticate a user
given a username and a password. in contrast to p9cr, it
does not require an authentication server when user is the
hostowner and its key is present in factotum.
- unroll the loops
- rotate the taps on each step, avoiding copies
- simplify boolean formulas for Ch() and Maj()
this yields arround 40% throughput increase on 32/64bit
archs for sha2_256 and sha2_512 on amd64.
- get rid of the temporary copies and memmoves()
- when the data pointer is aligned, do xor and copying inline
speedup for auth/aescbc encryption depends on arch:
- zynq 7% (arm)
- t23 13% (386)
- x230 20% (amd64, aes-ni)
- apu2 25% (amd64, aes-ni)
to capture bios and bootloader messages, convert the contents
on the screen to kmesg.
on machines without legacy cga, the cga registers read out as
0xFF, resuting in out of bounds cgapos. so set cgapos to 0 in
that case.
from Ori_B:
There were a small number of changes needed from the tarball
on spinroot.org:
- The mkfile needed to be updated
- Memory.h needed to not be included
- It needed to invoke /bin/cpp instead of gcc -E
- It depended on `yychar`, which our yacc doesn't
provide.
I'm still figuring out how to use spin, but it seems to do
the right thing when testing a few of the examples:
% cd $home/src/Spin/Examples/
% spin -a peterson.pml
% pcc pan.c -D_POSIX_SOURCE
% ./6.out
(Spin Version 6.4.7 -- 19 August 2017)
+ Partial Order Reduction
Full statespace search for:
never claim - (none specified)
assertion violations +
acceptance cycles - (not selected)
invalid end states +
State-vector 32 byte, depth reached 24, errors: 0
40 states, stored
27 states, matched
67 transitions (= stored+matched)
0 atomic steps
hash conflicts: 0 (resolved)
Stats on memory usage (in Megabytes):
0.002 equivalent memory usage for states (stored*(State-vector + overhead))
0.292 actual memory usage for states
128.000 memory used for hash table (-w24)
0.534 memory used for DFS stack (-m10000)
128.730 total actual memory usage
unreached in proctype user
/tmp/Spin/Examples/peterson.pml:20, state 10, "-end-"
(1 of 10 states)
pan: elapsed time 1.25 seconds
pan: rate 32 states/second
doing 4 quarterround's in parallel using 128-bit
vector registers. for second round shuffle the columns and
then shuffle back.
code is rather obvious. only trick here is for the first
quaterround PSHUFLW/PSHUFHW is used to swap the halfwords
for the <<<16 rotation.
instead of hardcoding the tunnel interface MTU to 1280,
we calculate the tunnel MTU from the outside MTU, which
can now be specified with the -m mtu option. The deault
outside MTU is 1500 - 8 (PPPoE).
when a process does an exec, it calls procsetup() which
unconditionally sets the sets the TS flag and fpstate=FPinit
and fpurestore() should not revert the fpstate.
cannot just reenable the fpu in FPactive case as we might have
been procsaved() an rescheduled on another cpu. what was i thinking...
thanks qu7uux for reproducing the problem.
new approach to graphics memory management:
the kernel driver never really cared about the size of stolen memory
directly. that was only to figure out the maximum allocation
to place the hardware cursor image somewhere at the end of the
allocation done by bios.
qu7uux's gm965 bios however wont steal enougth memory for his
native resolution so we have todo it manually.
the userspace igfx driver will figure out how much the bios
allocated by looking at the gtt only. then extend the memory by
creating a "fixed" physical segment.
the kernel driver allocates the memory for the cursor image
from normal kernel memory, and just maps it into the gtt at the
end of the virtual kernel framebuffer aperture.
thanks to qu7uux for the patch.
Add assembler versions for aes_encrypt/aes_decrypt and the key
setup using AES-NI instruction set. This makes aes_encrypt and
aes_decrypt into function pointers which get initialized by
the first call to setupAESstate().
Note that the expanded round key words are *NOT* stored in big
endian order as with the portable implementation. For that reason
the AESstate.ekey and AESstate.dkey fields have been changed to
void* forcing an error when someone is accessing the roundkey
words. One offender was aesXCBmac, which doesnt appear to be
used and the code looks horrible so it has been deleted.
The AES-NI implementation is for amd64 only as it requires the
kernel to save/restore the FPU state across syscalls and
pagefaults.
The aim is to take advantage of SSE instructions such as AES-NI
in the kernel by lazily saving and restoring FPU state across
system calls and pagefaults. (everything can can do I/O)
This is accomplished by the functions fpusave() and fpurestore().
fpusave() remembers the current state and disables the FPU if it
was active by setting the TS flag. In case the FPU gets used,
the current state gets saved and a new PFPU.fpslot is allocated
by mathemu().
fpurestore() restores the previous FPU state, reenabling the FPU
if fpusave() disabled it.
In the most common case, when userspace is not using the FPU,
then fpusave()/fpurestore() just toggle the FPpush bit in
up->fpstate.
When the FPU was active, but we do not use the FPU, then nothing
needs to be saved or restored. We just switched the TS flag on
and off agaian.
Note, this is done for the amd64 kernel only.
introducing the PFPU structue which allows the machine specific
code some flexibility on how to handle the FPU process state.
for example, in the pc and pc64 kernel, the FPsave structure is
arround 512 bytes. with avx512, it could grow up to 2K. instead
of embedding that into the Proc strucutre, it is more effective
to allocate it on first use of the fpu, as most processes do not
use simd or floating point in the first place. also, the FPsave
structure has special 16 byte alignment constraint, which further
favours dynamic allocation.
this gets rid of the memmoves in pc/pc64 kernels for the aligment.
there is also devproc, which is now checking if the fpsave area
is actually valid before reading it, avoiding debuggers to see
garbage data.
the Notsave structure is gone now, as it was not used on any
machine.
adjust to new aes_xts routines.
allow optional offset in the 4th argument where the encrypted
sectors start instead of hardcoding the 64K header area for
cryptsetup.
avoid allocating temporary buffer for cryptio() reads, we can
just decrypt in place there.
use sdmalloc() to allocate the temporary buffer for cryptio()
writes so that devsd wont need to allocate and copy in case
it didnt like our alignment.
do not duplicate the error reporting code, just use io()
that is what it is for.
allow 2*256 bit keys in addition to 2*128 bit keys.
the previous implementation was not portable at all, assuming
little endian in gf_mulx() and that one can cast unaligned
pointers to ulong in xor128(). also the error code is likely
to be ignored, so better abort() when the length is not a
multiple of the AES block size.
we also pass in full AESstate structures now instead of
the expanded key longs, so that we do not need to hardcode
the number of rounds. this allows each indiviaul keys to
be bigger than 128 bit.
the QLp structure used to occupy 24 bytes on amd64.
with some rearranging the fields we can get it to 16 bytes,
saving 8K in the data section for the 1024 preallocated
structs in the ql arena.
the rest of the changes are of cosmetic nature:
- getqlp() zeros the next pointer, so there is no need to set
it when queueing the entry.
- always explicitely compare pointers to nil.
- delete unused code from ape's qlock.c
the initial issue was that wunlock() would wakeup readers while
holding the spinlock causing deadlock in libthread programs where
rendezvous() would do a thread switch within the same process
which then can acquire the RWLock again.
the first fix tried to prevent holding the spinlock, waking up
one reader at a time with releasing an re-acquiering the spinlock.
this violates the invariant that readers can only wakup writers
in runlock() when multiple readers where queued at the time of
wunlock(). at the first wakeup, q->head != nil so runlock() would
find a reader queued on runlock() when it expected a writer.
this (hopefully last) fix unlinks *all* the reader QLp's atomically
and in order while holding the spinlock and then traverses the
dequeued chain of QLp structures again to call rendezvous() so
the invariant described above holds.
when the alarm hits while the process is currently in syslog(), holding
the sl lock, then calling syslog again will deadlock:
/proc/1729193/text:386 plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/386
acid: lstk()
sleep()+0x7 /sys/src/libc/9syscall/sleep.s:5
lock(lk=0x394d8)+0xb7 /sys/src/libc/port/lock.c:25
i=0x3e8
syslog(logname=0x41c64,cons=0x0,fmt=0x41c6a)+0x2d /sys/src/libc/9sys/syslog.c:60
err=0x79732f27
d=0x0
ctim=0x0
buf=0x0
p=0x0
arg=0x0
n=0x0
catchalarm(msg=0xdfffc854)+0x7a /sys/src/cmd/upas/smtp/smtpd.c:71
notifier+0x30 /sys/src/libc/port/atnotify.c:15
this implements the mitigation suggested in section "6.5 Countermeasures" of
"Key Reinstallation Attacks: Forcing Nonce Resuse in WPA2" [1].
[1] https://papers.mathyvanhoef.com/ccs2017.pdf
copying files from the uefi shell works, reading plan9.ini works,
loading the kernel by calling Read to read in the DATA section of
the kernel *FAILS*. my guess is that uefi filesystem driver or
nvme driver tries to allocate a temporary buffer and hasnt got
the space. limiting the read size fixes it.
the cachetab just keeps track of recent messages that have not
been called cachefree() on. under some conditions, the fixed
table could overflow (all messages having refs > 0). with a
linked list, overflow becomes non fatal and the algorithm is
simpler to implement.
there isnt much of a point in keep maintaining separate
kernel configurations for terminal and cpu kernels as
the role can be switched with service=cpu boot parameter.
to make stuff cosistent, we will just have one "pc" kernel
and one "pc64" kernel configuration now.
we used to tweak arround in the ICH registers for all intel controllers,
which is wrong, as the 200 series has different magic registes. see
the datasheet:
https://www.intel.com/content/www/us/en/chipsets/200-series-chipset-pch-datasheet-vol-2.html
this caused the clocks to be disabled for the 6th port causing a full
machine lockup touching the 6th port registers.
the next problem was that aiju's bios disabled the unused ports somehow
but didnt clear ther PI bits, so that they would stay in Sbist status even
after a port reset. so the port would get stuck in the Dportreset state
forever. the fix for this was to use a one second timeout for the
port reset procedure.