Commit graph

693 commits

Author SHA1 Message Date
cinap_lenrek 32dfbc7c50 devcons: simplify putstrn0() 2016-11-08 00:34:59 +01:00
cinap_lenrek 48b49361d8 devbridge: various bugfixes and improvements from charles forsyth 2016-11-07 22:43:37 +01:00
cinap_lenrek a54d1cd95e kernel/qio: big cleanup of qio functions
remove bl2mem(), it is broken. a fault while copying to memory
yields a partially freed block list. it can be simply replaced
by readblist() and freeblist(), which we also use for qcopy()
now.

remove mem2bl(), and handle putting back remainer from a short
read internally (splitblock()) avoiding the releasing and re-
acquiering of the ilock.

always attempt to free blocks outside of the ilock.

have qaddlist() return the number of bytes enqueued, which
avoids walking the block list twice.
2016-11-07 22:20:10 +01:00
cinap_lenrek 23d217afb4 devloopback: simplify loopoput()
remove unneeded waserror() block, loopoput is alled from
loopbackbwrite only so we will always get called with a
*single* block, so the concatblock() is not needed.
2016-11-07 22:08:21 +01:00
cinap_lenrek c1fd7c210b kernel: fix missing ; in panic() call 2016-11-05 20:08:20 +01:00
cinap_lenrek 963497f06b kernel: avoid padblock copying for devtls/devssl/esp, cleanup debugging
to avoid copying in padblock() when adding cryptographics macs to a block
in devtls/devssl/esp we reserve 16 extra bytes to the allocation.

remove qio ixsummary() function and add acid function qiostats() to
/sys/lib/acid/kernel

simplify iallocb(), remove iallocsummary() statitics.
2016-11-05 20:05:40 +01:00
cinap_lenrek fa5bd71218 devmnt: avoid memory copies of I/O rpc buffer by using bwrite()
given that devmnt will almost always write into a pipe
or a network connection, which supports te bwrite routine,
we can avoid the memory copy that would have been done by
devbwrite(). this also means the i/o buffer for writes
will get freed sooner without having to wait for the 9p
rpc to get a response, saving memory.

theres one case where we have to keep the rpc arround and
that is when we write to a cached file, as we want to update
the cache with the data that was written, but the user buffer
cannot be trusted to stay the same during the rpc.
2016-11-05 18:26:12 +01:00
cinap_lenrek 5c1feb0ef0 libc: move calloc() into its own compilation unit
move calloc() in its own compilation unit to avoid
code duplication. also, calloc() is used rarely in
plan9 programs.
2016-11-05 18:00:10 +01:00
cinap_lenrek 234137bce3 fix bugs and cleanup cryptsetup code
devfs:

- fix memory leak in devfs leaking the aes key
- allocate aes-xts cipher state in secure memory
- actually check if the hexkey got fully parsed

cryptsetup:

- get rid of stupid "type YES" prompt
- use genrandom() to generate salts and keys
- rewrite cryptsetup to use common pbkdf2 and readcons routines
- fix alot of error handling and simplify the code
- move cryptsetup command to disk/cryptsetup
- update cryptsetup(8) manual page
2016-10-24 20:56:11 +02:00
cinap_lenrek c0a9c3b551 kernel: rekey chacha state on each randomread() invocation
we can encrypt the 256 bit chacha key on each invocation
making it hard to reconstruct previous outputs of the
generator given the current state (backtracking resiatance).
2016-09-11 19:07:17 +02:00
cinap_lenrek 36c9a2489d devcons: remove /dev/reboot "halt" command...
the "halt" command written to /dev/reboot just causes the
machine to crash... its also undocumented... removing it.

--
cinap
2016-09-11 14:12:39 +02:00
cinap_lenrek 95c9f5bf37 kernel: better nonce partitioning for chacha random number generator
leave the block counter to chacha_encrypt() and increment the 96 bit
iv instead.
2016-09-11 03:18:48 +02:00
cinap_lenrek 10275ad6dd kernel: xoroshiro128+ generator for rand()/nrand()
the kernels custom rand() and nrand() functions where not working
as specified in rand(2). now we just use libc's rand() and nrand()
functions but provide a custom lrand() impelmenting the xoroshiro128+
algorithm as proposed by aiju.
2016-09-11 02:10:25 +02:00
cinap_lenrek 7713145638 kernel: make randomread() fault reentrant
we now access the user buffer in randomread() outside of the lock,
only copying and advancing the chacha state under the lock. this
means we can use randomread() within the fault handling path now
without fearing deadlock. this also allows multiple readers to
generate random numbers in parallel.
2016-09-11 02:09:07 +02:00
cinap_lenrek a121806126 kernel: replace various custom random iv buffer filling functions with calls to prng() 2016-09-11 01:54:06 +02:00
cinap_lenrek ed38b5e9cb kernel: fix type for utime/stime in pexit(), fix debug format strings 2016-09-08 01:49:25 +02:00
cinap_lenrek 5d9deb77e9 kernel: make sure procalarm() remaining time doesnt become negative 2016-09-08 01:28:34 +02:00
cinap_lenrek 01b4c2a63d kernel: always do unsigned subtractions for m->ticks delta for updatecpu() and rebalance(), handle ticks wrap arround in hzsched() 2016-09-08 00:44:38 +02:00
cinap_lenrek bd3429304c kernel: use tk2ms() instead of TK2MS macro for process time conversion
this code isnt time critical and process TReal delta can become
very long, so use tk2ms() which is less prone to overflow.
2016-09-07 23:39:10 +02:00
cinap_lenrek 1848f4e946 kernel: tsemacquire() use MACHP(0)->ticks for time delta
we might wake up on a different cpu after the sleep so
delta from machX->ticks - machY->ticks can become negative
giving spurious timeouts. to avoid this always use the
same mach 0 tick counter for the delta.
2016-09-07 23:36:04 +02:00
cinap_lenrek bfd8098b8d devcap: timeout capabilities after a minute, fix memory leak, paranoia
the manpage states that capabilities time out after a minute,
so we add ticks field into the Caphash struct and record the
time when the capability was inserted. freeing old capabilities
is handled in trimcaps(), which makes room for one extra cap
and frees timed out ones.

we also limit the capuse write size to less than 1024 bytes to
prevent denial of service as we have to copy the user buffer.
(memory exhaustion).

we have to check the from user *before* attempting to remove
the capability! the wrong user shouldnt be able to change any
state. this fixes the memory leak of the caphash.

do the hash comparsion with tsmemcmp(), avoiding timing
side channels.

allocate the capabilities in secret memory pool to prevent
debugger access.
2016-09-07 21:14:23 +02:00
cinap_lenrek cf78fd37cb devproc: do unsigned subtraction to get MACHP(0)->ticks - up->times[TReal] delta 2016-09-06 22:27:26 +02:00
cinap_lenrek 0a5f81a442 kernel: switch to fast portable chacha based seed-once random number generator 2016-08-27 20:42:31 +02:00
cinap_lenrek 71ac88392f devsdp: keep cipher states in secret memory 2016-08-27 20:39:36 +02:00
cinap_lenrek 2967f942ea devtls: allocate cipher states in secret memory 2016-08-27 20:37:31 +02:00
cinap_lenrek 7250c438bb devssl: allocate cipher states in secret memory 2016-08-27 20:37:14 +02:00
cinap_lenrek 0f97eb3a60 kernel: add secalloc() and secfree() functions for secret memory allocation
The kernel needs to keep cryptographic keys and cipher states
confidential. secalloc() allocates memory from the secret pool
which is protected from debuggers reading the memory thru devproc.
secfree() releases the memory, overriding the data with garbage.
2016-08-27 20:33:03 +02:00
cinap_lenrek 713beb6d42 devmnt: fix mistake in mntrahread()
mntrahread() had the prefetch window condition wrong so
it would very agressively prefetch ignoring the prefetch
window.
2016-08-16 18:06:22 +02:00
cinap_lenrek 409babb990 devtls, devssl: make sure channel has ORDWR mode and is not a mount chan on fdtochan() 2016-07-24 03:24:42 +02:00
cinap_lenrek 8173223f43 swap: make sure swap chan has ORDWR mode on fdtochan() 2016-07-24 03:23:01 +02:00
cinap_lenrek 093eaec219 kernel: dont pprint() into 9p channels
when fd 2 (stderr) points to a mount channel, dont
cause protocol confusion by dumping error strings
into it.
2016-07-19 22:10:52 +02:00
cinap_lenrek a99cf56c7d kernel: more (arm) compiler friendly mul64fract()
the arm compiler can lift long->vlong casts on multiplcation
and convert 64x64->64 multiplication into a 32x32->64 one
with optional 64 bit accumulate.
2016-06-26 15:13:10 +02:00
cinap_lenrek b6005f3a45 avoid updating offset in pread; avoid diagnostic about vlong mask (charles forsyth) 2016-05-16 21:11:54 +02:00
cinap_lenrek 29c7ca80c9 correct check for segment overlap (rmiller) 2016-05-16 21:10:08 +02:00
cinap_lenrek cb4b187f10 devssl, devtls: fix permission checks 2016-05-11 02:10:05 +02:00
cinap_lenrek 66719fb3ea kernel: fix cb->f[0] nil dereferences due to short control request 2016-05-05 18:54:58 +02:00
cinap_lenrek 0237b58390 kernel: always clunk closed fids asynchronously, regardless of caching 2016-04-01 14:12:50 +02:00
cinap_lenrek df53b2d69b kernel: remove unused NSMAX, NSLOG, NSCACHE constants from portdat.h 2016-03-31 04:23:27 +02:00
cinap_lenrek 1057a859b8 devsegment: cleanups
- return distinct error message when attempting to create Globalseg with physseg name
- copy directory name to up->genbuf so it stays valid after we unlock(&glogalseglock)
- cleanup wstat() handling, allow changing uid
- make sure global segment size is below SEGMAXSIZE
- move isoverlap() check from globalsegattach() into segattach()
- remove Proc* argument from globalsegattach(), segattach() and isoverlap()
- make Physseg.attr and segattach attr parameter an int for consistency
2016-03-30 22:49:13 +02:00
cinap_lenrek e6b30b287c kernel: fix procflushmmu()
fix bug introduced in previous change for zynq, broke
procflushseg() function only flushing the first proc
matching the segment.
2016-03-29 02:09:49 +02:00
cinap_lenrek ce00c68059 kernel: print pid as %lud instead %lux (in tsleep() debug print) 2016-03-28 23:01:54 +02:00
cinap_lenrek 89f9966aed devtls: print the path of the underlying chan in status file
to figure out what network connection a particular tls
conversation refers to, we add the path of the underlying
we send the encrypted tls traffic over in the status file,
example:

term% grep -n '^Chan:' '#a'/tls/*/status
#a/tls/0/status:7: Chan: /net/tcp/6/data
#a/tls/1/status:7: Chan: /net/tcp/0/data
2016-03-28 20:12:54 +02:00
cinap_lenrek 04c3a6f66e zynq: introduce SG_FAULT to prevent access to AXI segment while PL is not ready
access to the axi segment hangs the machine when the fpga
is not programmed yet. to prevent access, we introduce a
new SG_FAULT flag, that when set on the Segment.type or
Physseg.attr, causes the fault handler to immidiately
return with an error (as if the segment would not be mapped).

during programming, we temporarily set the SG_FAULT flag
on the axi physseg, flush all processes tlb's that have
the segment mapped and when programming is done, we clear
the flag again.
2016-03-27 20:57:01 +02:00
cinap_lenrek 9aa6573359 kernel: fix tsleep()/twakeup()/tsemacquire() race
tsleep() used to cancel the timer with:

if(up->tt != nil)
	timerdel(up);

which still can result in twakeup() to fire after tsleep()
returns (because we set Timer.tt to nil *before* we call the tfn).
in most cases, this is not an issue as the Rendez*
usually is just &up->sleep, but when it is dynamically allocated
or on the stack like in tsemacquire(), twakeup() will call
wakeup() on a potentially garbage Rendez structure!

to fix the race, we execute the wakup() with the Timer lock
held, and set p->trend to nil only after we called wakeup().

that way, the timerdel(); which unconditionally locks the Timer;
can act as a proper barrier and use up->trend == nil as the
condition if the timer has already fired.
2016-03-26 02:37:42 +01:00
cinap_lenrek e7bc98b057 devtls: zero secret information before freeing, cleanup 2016-03-23 13:50:58 +01:00
cinap_lenrek aa6673fcfb add portable AES-GCM (Galois/Counter Mode) implementation to libsec and devtls 2016-03-23 02:45:35 +01:00
cinap_lenrek a2be120ea9 abandon streaming experiment
for queue like non-seekable files, it is impossible to implement an
exportfs because one has to run the kernels devtab read() and write()
in separate processes, and that makes it impossible to maintain 9p message
order as the scheduler can come in and randomly schedule one process before
another.

so as soon as we have a transition from 9p -> syscalls, we'r screwed.

i currently see just two possibilities:

- introduce special file type like QTSEQ with strictly ordered i/o semantics
- fix all fileservers and exportfs to only do one outstanding i/o to QTSEQ files
which means maintaining a queue per fid

this doesnt propagate. so exporting slow 9p mount again will be limited
again by latency of the inner mount.

other option:

- return offset in Rread, so client can bring responses back into order. this
requires changing all fileservers and drivers to maintain such an per fid offset
and change the protocol to include it in the response, and also pass it to userspace
(new syscalls or pass it in TOS)

this only works for read pipelining, write is still screwed.

both options suck.

--
cinap
2016-03-17 17:48:19 +01:00
cinap_lenrek 0276031c01 make kernel UTFmax and Runemax consistent with libc (21-bit runes) (thanks maurice) 2016-03-10 20:02:36 +01:00
cinap_lenrek 28bd8adce7 devcons: nil vs 0 2016-03-10 03:28:36 +01:00
cinap_lenrek 595501b005 kernel: make fversion()/mntversion() types consistent 2016-03-10 03:02:28 +01:00
cinap_lenrek 0aa5b01fab devtls: fix wrong iounit
devtls writes are only atomic up to MaxRecLen as this is the
maximum payload size we put in a record application message.
2016-03-09 19:54:33 +01:00
cinap_lenrek 5ebb1a29d8 devdraw: remove unused Edepth[] 2016-02-28 03:06:42 +01:00
cinap_lenrek b450cb7e32 devmnt: deal with partial response for Tversion request in mntversion() 2016-02-15 01:03:44 +01:00
cinap_lenrek ecebba779f provide /n and /mnt early in bootrc to allow consistent use in /lib/namespace
theres a bootstrap problem:

when /bin/init is run, it processes /lib/namespace where we might want to
mount or bind resources to /n or /mnt. but mntgen was run later in
cpurc/termrc so these mounts would be ignored.

we already have mntgen in bootfs, so we can provide these mountpoints early.

i keep the termrc/cpurc mntgens where they are, but ignore the error
prints. this way old kernels will continue to work.
2016-02-14 01:42:32 +01:00
cinap_lenrek 21b70c782a devssl: use tsmemcmp() to compare mac to close timing side channel 2016-01-13 21:48:09 +01:00
cinap_lenrek 5afa5f5c0b kernel: remove todfix overflow iprint() spam 2016-01-07 19:37:05 +01:00
cinap_lenrek 772afbe98c format pointer subtraction results with %zd instead of %ld (for long -> intptr on amd64) 2016-01-07 04:44:13 +01:00
cinap_lenrek 3e38194d72 introduce signed intptr and %z format modifier for formating uintptr and intptr 2016-01-07 04:39:09 +01:00
cinap_lenrek 41383ad012 kernel: change active.machs from bitmap to char array to support up to 64 cpus on pc64 2016-01-05 05:32:40 +01:00
cinap_lenrek 9b0de7f9d6 tls: implement chacha20/poly1305 aead cipher suits 2015-12-21 04:55:54 +01:00
cinap_lenrek d19144155e kernel: missing changes for ibrk() prototype 2015-12-21 04:49:29 +01:00
cinap_lenrek b6f04b77e3 devprov: remove unused extern int unfair 2015-12-16 21:07:24 +01:00
cinap_lenrek 7be7d0681f kernel: use uintptr for ibrk() return value (for base >2GB) and clarify segbrk(2) 2015-12-16 21:06:51 +01:00
cinap_lenrek 7f3659e78f kernel: cleanup exit()/shutdown()/reboot() code
introduce cpushutdown() function that does the common
operation of initiating shutdown, returning once all
cpu's got the message and are about to shutdown. this
avoids duplicated code which isnt really machine specific.

automatic reboot on panic only when *debug= is not set
and the machine is a cpu server or has no display,
otherwise just hang.
2015-11-30 14:56:00 +01:00
cinap_lenrek 98363cb272 devenv: fix ORCLOSE handling
when opening a /env file ORCLOSE, and the process exits, envgrp() would
return nil can crash in envremove() because procexit will have set up->egrp
to nil before calling closefgrp().

the solution is to capture the environment on open, keeping a reference in
Chan.aux, so it doesnt matter on what process the close happens and a
env chan will always refer to its original environment group.
2015-11-22 02:39:57 +01:00
cinap_lenrek 00572496ce kernel: use nicer check in okaddr(), wet floor signs in fixfault()
instead of checking addr+len >= addr, check len >= -addr so
that addr == 0 is never valid for len > 0 even if we decide
to have memory at the zero page so theres never any chance
user can pass in "nil" pointers.

put up some signs where we fall thru the switch cases in
fixfault()
2015-11-06 17:27:15 +01:00
cinap_lenrek b32300deb0 kernel: fix okaddr() check 2015-11-06 02:53:30 +01:00
cinap_lenrek cd3053a3cc devtls: reject SHA2_256 mac for SSL, but TLS is fine
sha256 is only defined for TLS1.2, however, technically, theres
no reason not to use it in TLS1.0/TLS1.1. the choice is up to
tlshand and pushtls, not the kernel.
2015-10-28 17:09:22 +01:00
mischief 08e2333cc1 port: fix typo in devmnt mntproc name 2015-10-07 21:45:03 -07:00
cinap_lenrek 12f7fc7a08 devsd: handle SYNCHRONIZE CACHE scsi commands as nops in sdfakescsi() 2015-09-20 14:54:49 +02:00
cinap_lenrek fa769a8f9d sdmmc: handle fakescsi emulation 2015-09-20 14:53:44 +02:00
cinap_lenrek c7c58ef8bb devsd: remove unused timeout field from SDreq 2015-09-20 14:27:41 +02:00
cinap_lenrek 6fb9ae8f43 usbehci: clean cache unconditionally before handing a buffer to the hardware
even in the read case, we need to clean the cache
so the cpu will not flush out old changes while
the hardware updates the buffer.
2015-09-05 10:14:19 +02:00
mischief 163a772124 devtls: add sha256 mac 2015-08-27 01:46:28 -07:00
glenda c4fdc6bfdb fix fuckup 2015-08-25 09:35:10 +00:00
mischief 6b402b83cf import E script from bell labs 2015-08-25 02:07:46 -07:00
cinap_lenrek 74d1f67b05 devtls: TLS1.1 explicit iv support
using nrand() to fill the explicit iv, which isnt great but better
than no iv.
2015-08-15 17:50:44 +02:00
cinap_lenrek 76f21ca715 kernel: try freebroken() *before* killbig() (thanks aiju) 2015-08-14 14:45:19 +02:00
cinap_lenrek 7ba3be82a7 kernel: move "setargs" field in Proc structure after "nargs" and "args" 2015-08-09 21:48:58 +02:00
cinap_lenrek b4f56f1f4e kernel: mount flag is int not ulong, reduce size of Mount struct by putting mflag field in what would be wasted as padding 2015-08-09 21:35:50 +02:00
cinap_lenrek 9f4eac5292 kernel: pgrpcpy(), simplify Mount structure
instead of ordering the source mount list, order the new destination
list which has the advantage that we do not need to wlock the source
namespace, so copying can be done in parallel and we do not need the
copy forward pointer in the Mount structure.

the Mhead back pointer in the Mount strcture was unused, removed.
2015-08-09 21:16:10 +02:00
cinap_lenrek 3af236b5e3 kernel: fix Mheadache
there was a race between cunmount() and walk() on Mhead.from as Mhead.from was
unconditionally freed when we cunmount(), but findmount might have already
returned the Mhead in walk(). we have to ensure that Mhead.from is not freed
before the Mhead itself (now done in putmhead() once the reference count of the
Mhead drops to zero).

the Mhead struct contained two unused locks, removing.

no need to hold Pgrp.ns lock in closegrp() as nobody can get to it (refcount
droped to zero).

avoid cclose() and freemount() while holding Mhead.lock or Pgrp.ns locks as
it might block on a hung up fileserver.

remove the debug prints...

cleanup: use nil for pointers, remove redundant nil checks before putmhead().
2015-08-09 18:19:47 +02:00
cinap_lenrek 8ce456bd19 kernel: remove unused MAXCRYPT constant from portdat.h 2015-08-06 13:35:03 +02:00
cinap_lenrek 87d7a3c875 kernel: have to validate argv[] again when copying to the new stack
we have to validaddr() and vmemchr() all argv[] elements a second
time when we copy to the new stack to deal with the fact that another
process can come in and modify the memory of the process doing the
exec. so the argv[] strings could have changed and increased in
length. we just make sure the data being copied will fit into the
new stack and error when we would overflow.

also make sure to free the ESEG in case the copy pass errors.
2015-08-06 13:20:41 +02:00
cinap_lenrek 281729551f kernel: limit argv[] strings to the USTKSIZE to avoid overflow
argv[] strings get copied to the new processes stack segment, which
has a maximum size of USTKSIZE, so limit the size of the strings to
that and check early for overflow.
2015-08-06 11:51:23 +02:00
cinap_lenrek b09cd67860 kernel: validnamedup() the name argument for segattach()
this moves the name validation out of segattach() to syssegattach()
to make sure the segment name cannot be changed by the user while
segattach looks at it.
2015-08-06 11:48:51 +02:00
cinap_lenrek d275add1a8 kernel: fix indention in validname0() 2015-08-06 11:43:22 +02:00
cinap_lenrek 9585e9b7f8 kernel: limit syscallfmt user strings to 64K (as in validname) 2015-08-06 11:42:05 +02:00
cinap_lenrek 86eb8ea6bb kernel: change vmemchr() length argument to ulong and simplify 2015-08-06 10:15:07 +02:00
cinap_lenrek 8d196aeec7 kernel: use Etoolong[] constant instead of string literal in validname0() 2015-08-06 10:01:45 +02:00
cinap_lenrek 9110ae6eae kernel: make shargs() function static in sysproc.c 2015-08-06 09:09:57 +02:00
cinap_lenrek 2acb02f29b kernel: reject empty argv (argv[0] == nil) in sysexec()
when executing a script, we did advance argp0 unconditionally
to replace argv[0] with the script name. this fails when
argv[] is empty, then we'd advance argp0 past the nil terminator.

the alternative would be to *not* advance if *argp0 == nil, but that
would require another validaddr() check for a case that is unlikely
to have been anticipated in most programs being invoked as
libc's ARGBEGIN macro assumes argv[0] being non-nil as it also
unconditionally advances the argv pointer.

to keep us sane, we now reject an empty argv[]. on entry, we
verify that argv[] is valid for at least two elements:
- the program name argv[0], has to be non-nil
- the first potential nil terminator in argv[1]

when argv[0] == nil, we throw Ebadarg "bad arg in system call"
2015-08-06 08:47:38 +02:00
cinap_lenrek 145624eec2 kernel: remove unused qstate() function 2015-08-04 13:52:29 +02:00
cinap_lenrek 1b7e120c09 kernel: dont rely on atoi() parsing hex for netif/devbridge 2015-08-03 16:24:14 +02:00
cinap_lenrek d5d6724805 devenv: simplify envremove(), cleanup 2015-08-03 22:08:10 +02:00
cinap_lenrek 37e4ce0ea7 devenv: avoid indirection, keep Evalue's allocated in an array
avoid the indirection for envlookup() by allocating Evalue structs
together in an array. remove unused link field in Evalue.
2015-08-02 21:39:33 +02:00
cinap_lenrek 27445c5768 kernel: cleanup qlock.c to use nil instead of 0 for pointers 2015-08-02 05:36:35 +02:00
cinap_lenrek ee86d3cb52 devmnt: fix mntcache()
make sure mntcache() wont cache data beyond what was read from
the block list.
2015-07-30 21:00:13 +02:00
cinap_lenrek 20da5094d9 kernel: remove obsolete comment from namec() 2015-07-28 10:01:05 +02:00
cinap_lenrek 4bd9ed80c3 kernel: export mntattach() from devmnt.c avoiding bogus struct passing and special case in namec()
we already export mntauth() and mntversion(), so why not stop
being sneaky and just export mntattach() so bindmount() and
devshr can just call it directly with proper arguments being
checked.

we can also avoid handling #M attach specially in namec()
by having the devmnt's attach function do error(Enoattach).
2015-07-28 09:52:21 +02:00
cinap_lenrek 652a641704 kernel: clunk the cache when removing cache flag on a channel, only call cread() chen CCACHE flag is set
to avoid double caching, attachimage() and setswapchan() clear
the CCACHE flag on the channel but this keeps the read ahread
state of the cache arround (until the chan gets closed), so also
call cclunk() to detach the mcp and free the read ahead state.

avoid the call to cread() when CCACHE flag is clear.
2015-07-27 06:42:41 +02:00
cinap_lenrek ff494b954f devmnt: use c->iounit instead of msize-IOHDRSZ to chunk reads and writes, reduce memory overhead for Mntrpc, mntalloc lock
use the actual iounit returned from Ropen/Rcreate to chunk reads and writes
instead of c->mux->msize-IOHDRSZ.

dont preallocate the rpc buffers to msize, most 9p requests are rather small
(except Twrite of course). so we allocate the buffer on demand in mountio()
with some rounding to avoid frequent reallocations.

avoid malloc()/free() while holding mntalloc lock.
2015-07-27 04:33:46 +02:00
cinap_lenrek 23f7840056 devmnt: dont reset readahead window when requested offset still has pending rpc 2015-07-26 13:55:51 +02:00
cinap_lenrek 6617c63a37 kernel: pipelined read ahead for the mount cache
this changes devmnt adding mntrahread() function and some helpers
for it to do pipelined sequential read ahead for the mount cache.

basically, cread() calls mntrahread() with Mntrah structure and it
figures out if we where reading sequentially and if thats the case
issues reads of c->iounit size in advance.

the read ahead state (Mntrah) is kept in the mount cache so we can
handle (read ahead) cache invalidation in the presence of writes.
2015-07-26 05:43:26 +02:00
cinap_lenrek 497daed116 kernel: make sure fd is in range in fdclose()
as the Fgrp can be shared with other processes, we have to
recheck the fd index after locking the Fgrp in fdclose()
to make sure not to read beyond the bounds of the fd array.
2015-07-23 22:56:49 +02:00
cinap_lenrek 323184d775 kernel: simplify syspipe() 2015-07-23 22:34:58 +02:00
cinap_lenrek ff03b72ed5 devaoe: more nil vs. 0 2015-07-23 22:05:46 +02:00
cinap_lenrek 0b3fd7c052 devaoe: fix off by one in aoeerror(), consistent use of nil for pointers, error handling 2015-07-22 21:56:11 +02:00
cinap_lenrek 769b3f1c2f kernel: consistent use of nil for pointer in sysfile.c 2015-07-22 21:54:07 +02:00
cinap_lenrek 1fcc84d072 kernel: cleanup chan.c to consistenly use nil instead of 0 for pointers 2015-07-22 19:17:10 +02:00
cinap_lenrek 8db5af02d8 kernel: make sure the swap device has a reasonable capacity in setswapchan() 2015-07-22 19:15:51 +02:00
cinap_lenrek 47bb311d39 devmnt: do not use user buffer to update the mount cache
using the user buffer has a race where the user can modify
the buffer from another process before it is copied into the cache.
this allows poisoning the cache for every file where the user
has read access.

instead, we update the cache from kernel memory.
2015-07-19 20:25:42 +02:00
cinap_lenrek 157b7751e7 devstream: fix mistake 2015-07-19 03:36:53 +02:00
cinap_lenrek 71cda09d1e devstream: fast sequential file access with 9p pipelining experiment 2015-07-19 03:31:17 +02:00
cinap_lenrek bae3ac29fc devproc: make sure statbufread offset wont turn negative 2015-07-15 17:09:05 +02:00
cinap_lenrek 2aa2f9f359 kernel: remove debugalloc.c 2015-07-14 06:51:02 +02:00
cinap_lenrek b5655b7247 wifi: adjust transmit rate on error (for etheriwl), small mkfile changes
Wnode gets two new counters: txcount and txerror
and actrate pointer that will be between minrate
and maxrate.

driver should use actrate instead of maxrate for
transmission when it can provide error feedback.

when a driver detects a transmission failed, it calls
wifitxfail() with the original packet. wifitxfail() then
reduces wn->actrate.

every 256th packet, we optimistically increase wn->actrate
before transmitting.
2015-07-10 09:04:05 +02:00
cinap_lenrek 4ec93f94c9 kernel: use HDR_MAGIC constant to handle Exec header extension, make rebootcmd() handle AOUT_MAGIC macro 2015-07-10 23:56:39 +02:00
cinap_lenrek 3ca9ac70c4 sysexec(): need () arround AOUT_MAGIC comparsion to handle #define hack on mips 2015-07-09 08:51:38 +02:00
cinap_lenrek e3217c6f6a sysexec(): make the mips compiler happy 2015-07-09 08:34:20 +02:00
cinap_lenrek 9ab096a707 kernel: reject bogus two byte "#!" shell scripts in sysexec()
- reject files smaller or equal to two bytes, they are bogus
- fix out of bounds access in shargs() when n <= 2
- only copy the bytes read into line buffer
- use nil for pointers instead of 0
2015-07-09 08:03:18 +02:00
cinap_lenrek 8ed25f24b7 kernel: various cleanups of imagereclaim(), pagereclaim(), freepages(), putimage()
imagereclaim(), pagereclaim():
- move imagereclaim() and pagereclaim() declarations to portfns.h
- consistently use ulong type for page counts
- name number of pages to free "pages" instead of "min"
- check for pages == 0 on entry

freepages():
- move pagechaindone() call to wakeup newpage() consumers inside
  palloc critical section.

putimage():
- use long type for refcount
2015-07-09 00:01:50 +02:00
cinap_lenrek 1bd4c243ad kernel: ignore last page at the top of virtual kernel address space for xalloc()
avoding kernel address -BY2PG because of end pointer wrapping to zero.
2015-06-19 02:45:58 +02:00
cinap_lenrek 0dab8869ad kernel: ignore memory pages with singular kernel addresses
addresses va's of 0 and -BY2PG cause trouble with some memmove()/memset()
implementations and possibly other code because of the nil pointer
and end pointers wrapping to zero.
2015-06-18 12:15:33 +02:00
cinap_lenrek fd8597ac31 zynq: fix barriers
unlock()/iunlock():

we need to place the coherence() *before* "l->key = 0", so that any
stores that where done while holding the lock become observable
*before* other processors see the lock released.

cas()/tas():

place memory barrier before successfull return to prevent reordering.
2015-06-18 04:35:46 +02:00
cinap_lenrek 58dc03cec0 kernel: do not inherit Proc.dot (current working directory) in kproc()
making sure to close the dot in every kproc appears repetitive,
so instead stop inheriting the dot in kproc() as this is usually
never what you wanted in the first place.
2015-06-18 03:13:50 +02:00
cinap_lenrek b48078c12c kernel: do not inherit current directory channel (dot) to pager
kproc() inherits dot and slash, pager needs to drop these
channels, otherwise it will keep the files open preventing
say, ramfs to exit.
2015-06-18 22:58:56 +02:00
cinap_lenrek 45b79036be devcons: add current pool allocations to #c/swap 2015-06-16 08:05:33 +02:00
cinap_lenrek 6c99d2f028 kernel: remove waserror() arround newpage() in mntcache
newpage() does not raise error().
2015-06-16 06:05:12 +02:00
cinap_lenrek 64ed3658d2 kernel: add pagechaindone() to wakeup processes waiting for memory
we keep the details about palloc in page.c, providing pagechaindone()
for mmu code to be called after a series of pagechainhead() calls.
2015-06-15 17:40:47 +02:00
cinap_lenrek 8a3b388ffe kernel: implement separate wait queues for page allocation
give kernel processes and local disk file servers (procs
having noswap flag set) a clear advantage for page allocation
under starved condition by giving them ther own wait queue so
they get readied as soon as pages become available.
2015-06-15 16:05:00 +02:00
cinap_lenrek d6eb7cc71c kernel: dont use smalloc() to allocate pte array in ibrk()
when we'r out of kernel memory, it is probably better to
let that alloc fail instead of hanging while holding the
segment qlock.
2015-06-13 17:50:26 +02:00
cinap_lenrek 34ae4649cc kernel: fix accounttime() for HZ >= 1000
"milli-CPU's" is too low resolution for the decaying load average
calculation when HZ >= 1000.
2015-06-12 14:28:31 +02:00
cinap_lenrek cda46731d8 devsegment: fix parsecmd() memory leak 2015-06-09 03:33:37 +02:00
cinap_lenrek c5b0edecc9 devfs: remove useless ~OTRUNC mask for openmode 2015-06-07 17:41:43 +02:00
cinap_lenrek 5c6357de8b devtls: ignore UnrecogniedName (112) alert message (for SNI) 2015-06-01 01:32:57 +02:00
cinap_lenrek 646062da1c kernel: state errstr.h dependency for proc.acid target (fixes acid kinit() on cleaned kernel source tree) 2015-05-11 05:09:31 +02:00
cinap_lenrek 82a797da70 kernel: leave shared, physical and fixed segments alone in killbig() 2015-04-16 16:30:14 +02:00
cinap_lenrek ef647a54c0 kernel: cannot interrupt segmentio commands
once we submit a command to segmentio process, we have to wait
for it to complete even if we got interrupted.
2015-04-16 16:07:36 +02:00
cinap_lenrek 39cf6b34e3 kernel: avoid posting note to kernel process in faulterror()
the intend of posting a note to the faulting process is to
interrupt the syscall to give the note handler a chance
to handle it. kernel processes however, have no note handlers
and all the postnote() does is set up->notepending which will
make the next attempt to sleep raise an Eintr[] error. this
is harmless, but usually not what we want.
2015-04-16 15:31:51 +02:00
cinap_lenrek bcf54c0bfb kernel: pass segio error string by pointer
there's no need to waste space for a error buffer in the Segio
structure, as the segmentio kproc will be waiting for the next
command after an error and will not overwite it until we issue
another command.
2015-04-16 01:20:30 +02:00
cinap_lenrek 46070c3122 kernel: add segio() function for reading/writing segments
devproc's procctlmemio() did not handle physical segment
types correctly, as it assumed it can just kmap() the page
in question and write to it. physical segments however
need to be mapped uncached but kmap() will always map
cached as it assumes normal memory. on some machines with
aliasing memory with different cache attributes
leads to undefined behaviour!

we borrow the code from devsegment and provide a generic
segio() function to read and write user segments which
handles all the cases without using kmap by just spawning
a kproc that attaches the segment that needs to be read
from or written to. fault() will setup the right mmu
attributes for us. it will also properly flush pages for
segments that maintain instruction cache when written.
however, tlb's have to be flushed separately.

segio() is used for devsegment and devproc now, which
also allows for simplification of fixfault() as there is no
special error handling case anymore as fixfault() is now
called from faulting process *only*.

reads from /proc/$pid/mem can now span multiple pages.
2015-04-16 00:45:25 +02:00
cinap_lenrek 35e1aa1bfa segment: don't store pointers in a long 2015-04-13 23:35:36 +02:00
cinap_lenrek 656dd953a8 segment: fix read/write g->dlen race, avoid copying kernel memory, qlock
code like "return g->dlen;" is wrong as we do not hold the
qlock of the global segment. another process could come in
and override g->dlen making us return the wrong byte count.

avoid copying when we already got a kernel address (kernel memory
is the same on processes) which is the case with bread()/bwrite().
this is the same optimization that devsd does.

also avoid allocating/freeing and copying while holding the qlock.
when we copy to/from user memory, we might fault preventing
others from accessing the segment while fault handling is in
progress.
2015-04-13 23:18:56 +02:00
cinap_lenrek a43321946e segment: speed up fixedseg() doing single pass over freelist
walking the freelist for every page is too slow. as we
are freeing a range, we can do a single pass unlinking all
pages in our range and at the end, check if all pages
where freed, if not put the pages that we did free back
and retry, otherwise we'r done.
2015-04-12 18:08:06 +02:00
cinap_lenrek 647a1da108 segment: fix print buffer overflow, map fixed segments uncached, add to zynq kernel 2015-04-12 16:05:05 +02:00
cinap_lenrek 461c2b68a1 kernel: fixed segment support (for fpga experiments)
fixed segments are continuous in physical memory but
allocated in user pages. unlike shared segments, they
are not allocated on demand but the pages are allocated
on creation time (devsegment). fixed segments are
never swapped out, segfreed or resized and can only be
destroyed as a whole.

the physical base address can be discovered by userspace
reading the ctl file in devsegment.
2015-04-12 22:30:30 +02:00
cinap_lenrek 49fe7b0dd0 kernel: move arrow cursor definition to port/devmouse.c 2015-04-07 22:05:48 +02:00
cinap_lenrek 8caec8564d vl, libmach, kernel: mips has 16K alignment for segments (for bigpages) 2015-03-22 17:49:28 +01:00
cinap_lenrek 972cd5e3fc kernel: get rid of auxpage() and preserve cache index bits in Page.va in mount cache
the mount cache uses Page.va to store cached range offset and
limit, but mips kernel uses cache index bits from Page.va to
maintain page coloring. Page.va was not initialized by auxpage().

this change removes auxpage() which was primarily used only
by the mount cache and use newpage() with cache file offset
page as va so we will get a page of the right color.

mount cache keeps the index bits intact by only using the top
and buttom PGSHIFT bits of Page.va for the range offset/limit.
2015-03-16 05:46:08 +01:00
cinap_lenrek d0b1db98bc kernel: avoid repeated calls to reclaim(), dont miss last page in Pte
when we are skipping a process because we could not acquire
its segment lock, dont call reclaim() again (which is pointless
as we didnt pageout any pages), instead try the next process.

the Pte.last pointer is inclusive, so don't miss the last page
in pageout().
2015-03-16 05:23:38 +01:00
cinap_lenrek 4d211fdd48 kernel: fix integer overflow in syssegflush(), segment code cleanup
mcountseg(), mfreeseg():
use Pte.first/last pointers when possible and avoid constructs
like s->map[i]->pages[j].

freepte():
do not zero entries in freepte(), the segment is going away and
here is no point in zeroing page pointers. hoist common code at
the top avoiding duplication.

segpage(), fixfault():
avoid load after store for Pte** pointer.

fixfault():
return -1 in default case to avoid the "used but not set" warning
for mmuphys and get rid of the useless initialization.

syssegflush():
due to len being unsigned, the pe = PGROUND(pe) can make "chunk"
bigger than len causing a overflow. rewrite the function and deal
with page alignment and errors at the beginning.

syssegflush(), segpage(), fixfault(), putseg(), relocateseg(),
mcountseg(), mfreeseg():
keep naming consistent.
2015-03-10 18:16:08 +01:00
cinap_lenrek fcc336b902 kernel: catch address overflow in syssegfree()
the "to" address can overflow in syssegfree() causing wrong
number of pages to be passed to mfreeseg(). with the current
implementation of mfreeseg() however, this doesnt cause any
data corruption but was just freeing an unexpected number of
pages.

this change checks for this condition in syssegfree() and
errors out instead. also mfreeseg() was changed to take
ulong argument for number of pages instead of int to keep
it consistent with other routines that work with page counts.
2015-03-07 18:59:06 +01:00
cinap_lenrek 374d4ec2c1 devsd: always page align sd buffers
sdbio() tests if it can pass the buffer pointer directly to
the driver when it is already in kernel memory. we also need
to check if the buffer is properly aligned but alignment
requirement is handled in system specific sdmalloc() and
was not known to devsd.

to solve this, we *always* page align sd buffers and get rid
of the system specific sdmalloc() macro (was only used in bcm
kernel).
2015-03-06 16:16:45 +01:00
cinap_lenrek eaf91d0f8e kernel: fix physical segment handling
ignore physical segments in mcountseg() and mfreeseg(). physical
segments are not backed by user pages, and doing putpage() on
physical segment pages in mfreeseg() is an error.

do now allow physical segemnts to be resized. the segment size
is only checked in segattach() to be within the physical segment!

ignore physical segments in portcountpagerefs() as pagenumber()
does not work on the malloced page structures of a physical segment.

get rid of Physseg.pgalloc() and Physseg.pgfree() indirection as
this was never used and if theres a need to do more efficient
allocation, it should be done in a portable way.
2015-03-03 13:08:29 +01:00
cinap_lenrek fc1ff7705b devmnt: remove unused mntstats fields from Mntrpc 2015-03-01 18:56:45 +01:00
cinap_lenrek 6f1787adcb devusb: check for nil hp->dump and hp->seprintep 2015-02-20 18:56:22 +01:00
cinap_lenrek 173bafd800 devusb: fix debug ctl nil crash 2015-02-20 18:42:24 +01:00
cinap_lenrek 995379e388 usbehci: initial support for usb on zynq, remove uncached.h
the following hooks have been added to the ehci Ctlr
structore to handle cache coherency (on arm):

	void*	(*tdalloc)(ulong,int,ulong);
	void*	(*dmaalloc)(ulong);
	void	(*dmafree)(void*);
	void	(*dmaflush)(int,void*,ulong);

tdalloc() is used to allocate descriptors and the periodic
frame schedule array. on arm, this needs to return uncached
memory. tdalloc()ed memory is never freed.

dmaalloc()/dmafree() is used for io buffers. this can return
cached memory when when hardware maintains cache coherency (pc)
or dmaflush() is provided to flush/invalidate the cache (zynq),
otherwise needs to return uncached memory.

dmaflush() is used to flush/invalidate the cache. the first
argument tells us if we need to flush (non zero) or
invalidate (zero).

uncached.h is gone now. this change makes the handling explicit.
2015-02-14 03:00:31 +01:00
cinap_lenrek e8760ba636 kernel: make pagereclaim() a bit less stupid
put recently used pages at the head of ther image hash
chains, and reclaim pages from the tail first.
2015-02-07 03:01:59 +01:00
cinap_lenrek b8cf3cb879 kernel: reduce Page structure size by changing Page.cachectl[]
there are no kernels currently that do page coloring,
so the only use of cachectl[] is flushing the icache
(on arm and ppc).

on pc64, cachectl consumes 32 bytes in each page resulting
in over 200 megabytes of overhead for 32gb of ram with 4K
pages.

this change removes cachectl[] and adds txtflush ulong
that is set to ~0 by pio() to instruct putmmu() to flush
the icache.
2015-02-07 02:52:23 +01:00
cinap_lenrek b76b5901ff kernel: increase size of palloc.mem[] user page bank array
we'r hitting the limit of user page banks on some asrock mainboard,
so doubling the size of the array twice to make running out unlikely.
2015-01-30 14:50:28 +01:00
cinap_lenrek e823ddb3b0 devmnt: handle rpc buffer exhaustion on mntflushalloc()
this bug happens when the kernel runs out of mount rpc
buffers when allocating a flush rpc. in this case, mntflushalloc()
will errorjump out of mountio() leaving the currently in
flight rpc in the mount. the caller of mountrpc()/mountio()
frees the rpc thats still queued in the mount leaving
to interesting results.

for the fix, we add a waserror() arround mntflushalloc() and
handle the error case like a mount rpc failure which will
properly dequeue the rpc's in flight.
2015-01-27 22:14:26 +01:00
cinap_lenrek 68b8351f8c devdraw: remove broken color palette blanking
the code did not work as drawactive() was called with
the drawlock held. instead of fixing, the code for
palette blanking has been removed.
2015-01-02 18:48:22 +01:00
cinap_lenrek cb35d1a132 kernel: avoid inconsistent reads in /proc/#/fd and /proc/#/ns
to allow bytewise access to /proc/#/fd, the contents of the file where
recreated on each call. if fd's had been closed or reassigned between
the reads, the offset would be inconsistent and a read could start off
in the middle of a line. this happens when you cat /proc/#/fd file of
a busy process that mutates its filedescriptor table.

to fix this, we now return one line record at a time. if the line
fits in the read size, then this means the next read will always start
at the beginning of the next line record. we remember the consumed
byte count in Chan.mrock and the current record in Chan.nrock. (these
fields are free to usefor non-directory files)

if a read comes in and the offset is the same as c->mrock, we do not
need to regenerate the file and just render the next c->nrock's record.

for reads smaller than the line count, we have to regenerate the content
up to the offset and the race is still possible, but this should not
be the common case.

the same algorithm is now used for /proc/#/ns file, allowing a simpler
reimplementation and getting rid of Mntwalk state strcture.
2014-12-21 04:46:22 +01:00
cinap_lenrek e3a77e594f sdloop: hardcode Enotup[] string to avoid devaoe dependency 2014-12-19 02:38:36 +01:00
cinap_lenrek 9df9a3625c sdaoe: allow aoedev= shorthand for id!lun -> id!#æ/aoe/lun
we cannot type æ character in the bootloader console, so allow
the shorthand syntax id!lun which gets translated to id!#æ/aoe/lun.
2014-12-19 02:37:40 +01:00
cinap_lenrek d9c4637a5f kernel: remove "checked xxx page table entries" print from checkpages()
the purpose of checkpages() is to verify consitency of the hardware mmu state,
not to notify on the console that a program faulted. a program could also
continue after handling the note. (this seems to be the case in go programs)
2014-12-18 23:53:32 +01:00
cinap_lenrek f52e85826f kernel: print addresses in hex and sizes in decimal in xallocsummary 2014-12-18 23:06:39 +01:00
cinap_lenrek 0e03a5f9fd kernel: replace ulong with uintptr in ucallocb() and fix unneeded parentheses 2014-12-16 09:41:05 +01:00
cinap_lenrek 5c29603f50 kernel: remove obsolete comment regarding Mntcache size in */main.c 2014-12-16 08:11:21 +01:00
cinap_lenrek 8309f15c36 kernel: new mount cache
this is a new more simple version of the mount cache
that does not require dynamic allocations for extends.

the Mntcache structure now contains a page bitmap
that is used for quick page invalidation. the size
of the bitmap is proportional to MAXCACHE.

instead of keeping track of cached range in the
Extend data structure, we keep all the information
in the Page itself. the offset from the page where
the cache range starts is in the low PGSHIT bits and
the end in the top bits of Page.va.

we choose Page.daddr to map 1:1 the Mountcache number
and page number (pn) in the Mountcache. to find a page,
we first check the bitmap if the page is there and then
do a pagelookup() with the daddr key.
2014-12-16 05:41:20 +01:00
cinap_lenrek 523c33bb6f kernel: minor changes to mount cache
change page cache ids (bid) to uintptr so we use the full
address space of Page.daddr.

make maxcache offset check consistent in cread().

use consistent types in cupdate() and simplify with goto.

make internal functions static.

use nil instead of 0 for pointers.
2014-12-15 06:28:27 +01:00
cinap_lenrek 8d6171f1ae kernel: remove *.acid files in nuke target instead of $CONF.clean target 2014-12-14 22:25:15 +01:00
cinap_lenrek 67bed722f2 kernel: get rid of /boot/boot parametrization
there is no use for "bootdisk" variable parametrization
of /boot/boot and no point for the boot section with its
boot methods in the kernel configuration anymore. so
mkboot and boot$CONF.out are gone.

move the rules for bootfs.paq creation in 9/boot/bootmkfile.
location of bootfs.proto is now in 9/boot/bootfs.proto.
our /boot/boot target is now just "boot".
2014-12-14 22:10:34 +01:00
cinap_lenrek 4afb56f570 kernel: evaluate dependencies of bootfs.proto files for bootfs.paq
expand the list of files specified in bootfs.proto and use them
as dependencies to bootfs.paq rule. this way, bootfs.paq is
regenerated when the to be included files have been modified.
2014-12-14 00:00:59 +01:00
cinap_lenrek feb7702c9e kernel: correct dependency for printstub.$O instead of print.$O 2014-12-13 21:44:51 +01:00
cinap_lenrek 6a3b9012d5 kernel: generate dummy bootscreeninit() function when building without vga device 2014-12-13 05:29:51 +01:00
cinap_lenrek ba6cd37412 bootfs: remove disk/kfs fileserver, nobody uses it 2014-12-10 03:22:59 +01:00
cinap_lenrek 23b3407663 bootrc: add ndb/dnsgetip resolver to bootfs so domain names can be used for fs=, auth= and secstore= (thanks mischief) 2014-12-10 03:22:14 +01:00
mischief 98645db9ab devsegment: fix segmentcreate function signature 2014-12-08 23:16:22 -08:00
cinap_lenrek 9840ce91cf kernel: make use of nil vs 0 consistent in qio.c (sorry) 2014-11-13 16:46:41 +01:00
cinap_lenrek b18a641397 kernel: remove implicit Proc* argument from procctl()
procctl() is always called with up and it would not
work correctly if passed a different process, so
remove the Proc* argument and use up directly.
2014-11-09 08:19:28 +01:00
cinap_lenrek 1ffcdbab88 dont flush screen when hiding software cursor
we can avoid some flickering when removing the software cursor
from the shadow framebuffer by avoiding the flushscreenimage()
call.

once the cursor is redrawn, we flush the combined rect of its
old and new position in one go.
2014-11-08 11:48:38 +01:00
cinap_lenrek a0e001a234 devproc: reset p->pdbg under p->debug qlock in procstopwait()
theres a race where procstopwait() is interrupted by a note,
setting p->pdbg to nil *before* acquiering the lock and
and pexit() and procctl() accessing it assuming it doesnt
change under them while they are holding the lock.
2014-11-07 05:21:42 +01:00
cinap_lenrek eb6a4fc1a4 devcons: avoid division by zero reading Qsysstat
alexchandel got the kernel to crash with divide error
on qemu 2.1.2/macosx at this location. probably
caused by perfticks()/tsc being wrong or accounttime()
not having been called yet from timer interrupt yet for
some reason.
2014-09-28 02:42:33 +02:00
cinap_lenrek 19a8f66eec pc64: syscallfmt for nsec syscall 2014-09-20 01:37:11 +02:00
cinap_lenrek acd15f13c4 pc64: put return value of nsec syscall in register on amd64
WHAT WHERE THEY *THINKING*??!?!

unlike seek, the (new) nsec syscall (not used in 9front libc)
returns the time value in register (from nix), so do the same
for compatibility.
2014-09-20 01:07:46 +02:00
cinap_lenrek 694597de3b devtls: fix typo in debug print 2014-09-15 08:19:51 +02:00
cinap_lenrek e9fddbaad8 kernel: fix segattach() rounding of va+len (thanks kenji arisawa)
from segattach(2):

          Va and len specify the position of the segment in the
          process's address space.  Va is rounded down to the nearest
          page boundary and va+len is rounded up.  The system does not
          permit segments to overlap.  If va is zero, the system will
          choose a suitable address.

just rounding up len isnt enougth. we have to round up va+len
instead of just len so that the span [va, va+len) is covered
even if va is not page aligned.

kenjis example:

	print("%p\n",ap);	// 206cb0
	ap = segattach(0, "shared", ap, 1024);
	print("%p\n",ap);	// 206000

term% cat /proc/612768/segment
Stack     defff000 dffff000    1
Text   R      1000     6000    1
Data          6000     7000    1
Bss           7000     7000    1
Shared      206000   207000    1
term%

note that 0x206cb0 + 0x400 > 0x20700.
2014-09-14 16:04:22 +02:00
cinap_lenrek 3b661a96ef kernel: make noswap flag exclude processes from killbig() if not eve, reset noswap flag on exec 2014-08-17 00:50:20 +02:00
cinap_lenrek 773b57b676 kernel: fix todfix() race
we have to recheck the condition under tod lock, otherwise
another process can come in and updated tod.last and
tod.off and once we have the lock, we would make time
jump backwards.
2014-08-16 21:04:41 +02:00
cinap_lenrek ce0b77e2b9 kernel: xinit() use ulong for page counts, cleanup 2014-08-16 17:26:12 +02:00
cinap_lenrek bedffdd8c3 devenv: prevent non-hostowner from creating or removing variables in '#ec', cleanup 2014-08-13 23:09:47 +02:00
cinap_lenrek daa15d1edb kernel: more nil vs 0 cleanup in chan.c 2014-08-08 17:02:10 +02:00
cinap_lenrek ee6409366e kernel: use nil for pointers instead of 0, zero channel umc and dirrock in newchan() 2014-08-08 16:44:41 +02:00
cinap_lenrek 45333cdc92 devmnt: fix potential race with mntflushfree(), remove mntstats, 0 vs nil cleanup
when mountmux() completes a request for another process, enforce odering
of the loads and stores to the request prior to writing q->done = 1
so mntflushfree() sees q->done != 0 only when the request has actually
completed. otherwise, the q->done = 1 store could have been reordered
before the load from q->z, reading from already freed request and causing
spurious wakeups.

removing unused mntstats callback.

use nil for pointers instead of 0.
2014-08-08 23:28:47 +02:00
cinap_lenrek 0a101736b8 pc, pc64: make pc kaddr() check reject -KZERO address (thanks aiju) 2014-08-07 21:11:11 +02:00
cinap_lenrek 4f3724e6e1 devproc: nil 2014-07-15 18:51:58 +02:00
cinap_lenrek 3d3a29cd84 devproc: fix syscalltrace error handling, conistent use of nil for pointers 2014-07-15 07:54:22 +02:00
cinap_lenrek e4db040bcf devproc: fix mistake 2014-07-14 06:45:23 +02:00
cinap_lenrek 655ec332a7 devproc: fix proccrlmemio bugs
dont kill the calling process when demand load fails if fixfault()
is called from devproc. this happens when you delete the binary
of a running process and try to debug the process accessing uncached
pages thru /proc/$pid/mem file.

fixes to procctlmemio():

- fix missed unlock as txt2data() can error
- make sure the segment isnt freed by taking a reference (under p->seglock)
- access the page with segment locked (see comment)
- get rid of the segment stealer lock

other stuff:

- move txt2data() and data2txt() to segment.c
- add procpagecount() function
- make return type mcounseg() to ulong
2014-07-14 06:02:21 +02:00
cinap_lenrek 03f68c49f6 kernel: only complain about no images when theres nothing more to reclaim
uncaching a thousand pages (arround 4MB) might not be
enougth. so keep on reclaiming pages and only complain
once theres nothing more to reclaim.
2014-07-11 03:57:21 +02:00
cinap_lenrek fa03455b50 kernel: more proc.c cleanup 2014-06-23 21:51:34 +02:00
cinap_lenrek 6a05751132 kernel: make use of nil and 0 consistent in proc.c
always explicitely compare with nil if pointer.
sorry for the noise. :(
2014-06-23 21:24:12 +02:00
cinap_lenrek 7cf6a35486 kernel: fix cooperative scheduling for wired processes 2014-06-23 20:29:10 +02:00
cinap_lenrek d4d86df2ab kernel: new pagecache, remove Lock from page, use cmpswap for Ref instead of Lock
make the Page stucture less than half its original size by getting rid of
the Lock and the lru.

The Lock was required to coordinate the unchaining of pages that where
both cached and on the lru freelist.

now pages have a single next pointer that is used for palloc.head
freelist xor for page cache hash chains in Image.pghash[].

cached pages are not on the freelist anymore, but will be reclaimed
from images by the pager when the freelist runs out of pages.

each Image has its own 512 hash chains for cached page lookup. That is
2MB worth of pages and there should be no collisions for most text images.

page reclaiming can be done without holding palloc.lock as the Image is
the owner of the page hash chains protected by the Image's lock.

reclaiming Image structures can be done quickly by only reclaiming pages from
inactive images, that is images which are not currently in use by segments.

the Ref structure has no Lock anymore. Only a single long that is atomically
incremented or decremnted using cmpswap().

there are various other changes as a consequence code. and lots of pikeshedding,
sorry.
2014-06-22 15:12:45 +02:00
cinap_lenrek 1b8fb4fec3 swap: make sure swap address sticks arround until page is written to swap
we have to make sure the *swap address* doesnt go away,
after putting the swap address in the segment pte.

after we unlock the segment, the process could be
killed or fault which would cause the swap address to
be freed *before* we write the page to disk when it
pulls the page from the cache and putswap() swap pte.

keeping a reference to the page is no good. we have
to hold on the swap address. this also has the advantage
that we can now test if the swap address is still
referenced and can avoid writing to disk.
2014-06-08 17:39:40 +02:00
cinap_lenrek 72ba3571a3 kernel: remove _xinc()/_xdec()
as with the Block refcount changes, _xinc() and _xdec() arent
used anymore, so remove them.

architecure can still define ainc()/adec() when it needs them.
2014-06-08 01:35:22 +02:00
cinap_lenrek be3a5a6dc3 kernel: remove Block refcounting (thanks erik) 2014-06-08 00:19:33 +02:00
cinap_lenrek 91614f582f kernel: dont use atomic increment for Proc.nlocks, maintain Lock.m for lock(), use uintptr intstead of long for pc values
change Proc.nlocks from Ref to int and just use normal increment and decrelemt
as done in erik quanstros 9atom.

It is not clear why we used atomic increment in the fist place as even if we
get preempted by interrupt and scheduled before we write back the incremented
value, it shouldnt be a problem and we'll just continue where we left off as
our process is the only one that can write to it.

Yoann Padioleau found that the Mach pointer Lock.m wasnt maintained
consistently for lock() vs canlock() and ilock(). Fixed.

Use uintptr instead of ulong for maxlockpc, maxilockpc and ilockpc debug variables.
2014-06-05 21:54:32 +02:00
cinap_lenrek 0aa3af0934 kernel: remove wrong and needles mapsize check in newseg() (thanks Yoann Padioleau) 2014-06-03 07:47:09 +02:00
cinap_lenrek c9f91d5015 pc64: allocate palloc.pages from upages
the palloc.pages array takes arround 5% of the upages which
gives us:

16GB = ~0.8GB
32GB = ~1.6GB
64GB = ~3.2GB

we only have 2GB of address space above KZERO so this will not
work for long.

instead, pageinit() was altered to accept a preallocated memory
in palloc.pages. and preallocpages() in pc64/main.c allocates the
in upages memory, mapping it in the VMAP area (which has 512GB).

the drawback is that we cannot poke at Page structures now from
/proc/n/mem as the VMAP area is not accessible from it.
2014-06-01 03:13:58 +02:00
cinap_lenrek 15fc6c1cc0 devproc: handle 64bit address writes to /proc/n/mem files
procwrite() did truncate the offset to 32bit ulong.
introduce off2addr() function that does the sign
extension hack and use it conststently for Qmem
reads and writes.
2014-05-26 00:27:06 +02:00
cinap_lenrek 9ebbfae28b kernel: simplify fdclose() 2014-05-26 22:47:34 +02:00
cinap_lenrek 89acedb9b8 devproc: fix close and closefiles procctl
for the CMclose procctl, the fd number was not
bounds checked before indexing in the Fgrp.fd
array.

for the CMclosefiles, we looped fd from 0..maxfd-1,
but need to loop from 0..maxfd as maxfd is inclusive.
2014-05-26 22:43:21 +02:00
cinap_lenrek 2185188f83 kernel: fix read size calculation in pio() demand load
on amd64, the text segment is aligned and padded to
2MB, but segment granularity is 4K which can give
us page faults that are beyond the highest file
offset. this is perfectly valid, but was not handled
correctly in pio().
2014-05-24 01:27:57 +02:00
cinap_lenrek 3207e8b6a4 add _nsec() syscall 53 for binary compatibility with labs distribution
the new syscall is added under the symbol _nsec() for
binary compatibility.

nsec() is still a library function reading /dev/bintime.
2014-05-20 05:06:31 +02:00
cinap_lenrek a2d96d47c9 kernel: always reset notepending in eqlock, handle forceclosefgrp in eqlocks 2014-04-29 21:17:07 +02:00
cinap_lenrek b7d8431036 kernel: stop queue bloat before allocating blocks 2014-04-29 21:15:09 +02:00
cinap_lenrek 40b6959788 devmnt: make abandoning fid on botched clunk handle flushes
make mntflushfree() return the original rpc and do the
botched clunk check on the original instead of the
current rpc.

so if we get a botched flush of a clunk, we abandon the
fid of the channel as well.
2014-04-28 06:55:06 +02:00
cinap_lenrek 2c2a71cd51 devmnt: abandon fid on botched Tclunk or Tremove
if theres an error transmitting a Tclunk or Tremove request,
we cannot assume the fid to be clunked. in case this was
a transient error, reusing the fid on further requests
will fail.

as a work arround, we zero the channels fid and allocate
a new fid before the chan is reused.

this is not correct as we essentially leak the fid
on the fileserver, but we will still be able to use
the mount.
2014-04-28 05:59:10 +02:00
cinap_lenrek 41908149de nusb: resolve endpoint id conflict with different input and output types
ftrvxmtrx repots devices that use the endpoint number for
input and output of different types like:

 nusb/ether:             parsedesc endpoint 5[7]  07 05 81 03 08 00 09	# ep1 in intr
 nusb/ether:             parsedesc endpoint 5[7]  07 05 82 02 00 02 00
 nusb/ether:             parsedesc endpoint 5[7]  07 05 01 02 00 02 00	# ep1 out bulk

the previous change tried to work arround this but had the
concequence that only the lastly defined endpoint was
usable.

this change addresses the issue by allowing up to 32 endpoints
per device (16 output + 16 input endpoints) in devusb. the
hci driver will ignore the 4th bit and will only use the
lower 4 bits as endpoint address when talking to the usb
device.

when we encounter a conflict, we map the input endpoint
to the upper id range 16..31 and the output endpoint
to id 0..15 so two distinct endpoints are created.
2014-04-23 20:03:01 +02:00
cinap_lenrek fc15a01d1d kernel: add secstore and wpa to bootfs 2014-04-18 20:44:40 +02:00
cinap_lenrek 66aa949039 kernel: fix printing wrong memory sizes in pageinit(), overflowed on amd64 (thanks aram) 2014-04-15 21:34:41 +02:00
cinap_lenrek 5d3d085492 devproc: change address format in segment file to %8p (thanks eekee)
the original format for addresses was %8lux which was changed
to %p for amd64. this broke linuxemu which assumes fixed format
in the segment file. as a compromize we change it to %8p and
amd64 port of linuxemu will hopefully use a more robust parser :)
2014-04-01 19:28:10 +02:00
cinap_lenrek 4a6939c2ce devfs: fix cclose() crash in devfs error handling 2014-03-21 18:12:06 +01:00
cinap_lenrek f2f46f4a33 pc64: amd64 kernel reboot support 2014-03-16 20:22:59 +01:00
cinap_lenrek 316d8ad76b pc64: fix segattach
the comment about Physseg.size being in pages is wrong,
change type to uintptr and correct the comment.

change the length parameter of segattach() and isoverlap()
to uintptr as well. segments can grow over 4GB in pc64 now
and globalsegattach() in devsegment calculates len argument
of isoverlap() by s->top - s->bot. note that the syscall
still takes 32bit ulong argument for the length!

check for integer overflow in segattach(), make sure segment
goes not beyond USTKTOP.

change PTEMAPMEM constant to uvlong as it is used to calculate
SEGMAXSIZE.
2014-03-04 22:37:15 +01:00
cinap_lenrek 9405f4c95f kernel: getting rid of duppage() (thanks charles)
simplifying paging code by getting rid of duppage(). instead,
fixfault() now always makes a copy of the shared/cached page
and leaves the cache alone. newpage() uncaches pages as
neccesary.

thanks charles forsyth for the suggestion.

from http://9fans.net/archive/2014/03/26:

> It isn't needed at all. When a cached page is written, it's trying hard to
> replace the page in the cache by a new copy,
> to return the previously cached page. Instead, I copy the cached page and
> return the copy, which is what it already
> does in another instance. ...
2014-03-02 20:55:26 +01:00
mischief 774ccb19e4 devtls: spelling 2014-02-25 16:57:22 -08:00
cinap_lenrek 521a34d33b kernel: keep cached pages continuous at the end of the page list on imagereclaim()
imagereclaim() sabotaged itself by breaking the invariant
that cached pages are kept at the end of the page list.

once we made a hole of uncached pages, we would stop
reclaiming cached pages before it as the loop breaks
once it hits a uncached page. (we iterate backwards from
the tail to the head of the pagelist until pages have been
reclaimed or we hit a uncached page).

the solution is to move pages to the head of the pagelist
after removing them from the image cache.
2014-02-24 22:42:22 +01:00
cinap_lenrek 6b146c70c2 pc64: handle negative file offsets when accessing kernel memory with devproc
file offset is 64 bit signed integer, negative offsets
are invalid and rejected by the kernel. to still access
kernel memory on amd64, we unconditionally clear the sign
bit of the 64 bit offset in libmach and devproc sign
extends the offset back to a 64 bit address.
2014-02-08 03:50:41 +01:00
cinap_lenrek 0fdb1578ef pc64: fix devcons format strings for memory sizes 2014-02-07 23:35:27 +01:00
cinap_lenrek c3917ec566 pc64: fix poolsummary() string format 2014-02-07 23:02:56 +01:00
cinap_lenrek 868a262bb8 pc64: dont 4 byte align stack pointer for amd64 in sysexec() 2014-02-05 19:48:36 +01:00
cinap_lenrek ccfb6168c8 kernel: dont double ptemap size in newseg()
this doubling affects all segment types, not just bss.
(tho text/data are usually small...)

and theres no telling if the segment will actually
grow in the future justifying the reduction of memmove
overhead in ibrk().

some ape programs are approaching the 16mb ssegmap size
so that code might trigger.

removing the smarts...
2014-02-03 20:04:43 +01:00
cinap_lenrek f556fd2437 devdraw: screenid is BGLONG, not BGSHORT 2014-02-03 03:52:27 +01:00
cinap_lenrek b7b3406657 malloctag: only store lower 32bit of malloc tag, fix getrealloctag
as erik quanstro suggests, theres not much of a point in
storing the full 64bit pc as one cannot get a code segment
bigger than 4G and amd64 makes it hard to use a pc that
isnt 64bit sign extension of 32bit.

instead, we only store ulong (as originally), but sign
extend back when returning in getmalloctag() and
getrealloctag().

getrealloctag() used to be broken. its now fixed.
2014-02-02 16:03:59 +01:00
cinap_lenrek 0cdb32cc18 kernel: fix bogus free in sysexec.
we free the wrong pointer in the waserror() block.
2014-02-02 15:11:19 +01:00
cinap_lenrek 29eea45931 kernel: do not pass user address of fd[2] array to newfd2()
access to user memory can pagefault and newfd2() holds
fgrp spinlock while writing to it. make temporary copy
on the stack in syspipe().
2014-02-02 10:41:51 +01:00
cinap_lenrek 0b95485db7 kernel: use uintptr when appropriate in syssegflush() 2014-02-02 09:59:54 +01:00
cinap_lenrek 56343cafcf add experimental pc64 kernel 2014-02-01 10:25:10 +01:00
cinap_lenrek 06bc19c28f kernel: usb fixes for amd64 2014-02-01 10:20:43 +01:00
cinap_lenrek dcea714680 kernel: fix pointer truncation in xspanalloc(), fix format prints 2014-02-01 10:17:53 +01:00
cinap_lenrek 7613608b23 kernel: handle amd64 40 byte headers in exec() 2014-02-01 10:16:55 +01:00
cinap_lenrek 520957e254 kernel: fix ulong abuse in xalloc 2014-01-21 22:12:25 +01:00
cinap_lenrek ebfb4fdf29 kernel: convert putmmu() to uintptr for va and pa 2014-01-20 03:17:55 +01:00
cinap_lenrek ad1eefb355 kernel: various cleanups 2014-01-20 02:16:42 +01:00
cinap_lenrek 6c2e983d32 kernel: apply uintptr for ulong when a pointer is stored
this change is in preparation for amd64. the systab calling
convention was also changed to return uintptr (as segattach
returns a pointer) and the arguments are now passed as
va_list which handles amd64 arguments properly (all arguments
are passed in 64bit quantities on the stack, tho the upper
part will not be initialized when the element is smaller
than 8 bytes).

this is partial. xalloc needs to be converted in the future.
2014-01-20 00:47:55 +01:00