plan9fox

Author	SHA1	Message	Date
cinap_lenrek	29f60cace1	kernel: avoid palloc lock during mmurelease() Previously, mmurelease() was always called with palloc spinlock held. This is unneccesary for some mmurelease() implementations as they wont release pages to the palloc pool. This change removes pagechainhead() and pagechaindone() and replaces them with just freepages() call, which aquires the palloc lock internally as needed. freepages() avoids holding the palloc lock while walking the linked list of pages, avoding some lock contention.	2020-12-22 16:29:55 +01:00
cinap_lenrek	e4ce6aadac	kernel: handle tos and per process pcycle counters in port/ we might as well handle the per process cycle counter in the portable part instead of duplicating the code in every arch and have inconsistent implementations. we now have a portable kenter() and kexit() function, that is ment to be used in trap/syscall from user, which updates the counters. some kernels missed initializing Mach.cyclefreq.	2020-12-20 22:34:41 +01:00
cinap_lenrek	58e6750401	kernel: remove Proc* argument from procsetuser() function	2020-12-19 18:07:12 +01:00
cinap_lenrek	0b33b3b8ad	kernel: implement per file descriptor OCEXEC flag, reject ORCLOSE when opening /fd, /srv and /shr The OCEXEC flag used to be maintained per channel, making it shared between all the file desciptors. This has a unexpected side effects with regard to channel passing drivers such as devdup (/fd), devsrv (/srv) and devshr (/shr). For example, opening a /srv file with OCEXEC makes it impossible to be remounted by exportfs as it internally does a exec() to mount and re-export it. There is no way to reset the flag. This change makes the OCEXEC flag per file descriptor, so a open with the OCEXEC flag only affects the fd group of the calling process, and not the channel itself. On rfork(RFFDG), the per file descriptor flags get copied. On dup(), the per file descriptor flags are reset. The second modification is that /fd, /srv and /shr should reject the ORCLOSE flag, as the files that are returned have already been opend.	2020-12-13 16:04:09 +01:00
cinap_lenrek	0ba91ae22a	pc, pc64: allocate i/o port space for unassigned pci bars, move ioalloc() to port/iomap.c With some newer UEFI firmware, not all pci bars get programmed and we have to assign them ourselfs. This was already done for memory bars. This change adds the same for i/o port space, by providing a ioreservewin() function which can be used to allocate port space within the parent pci-pci bridge window. Also, the pci code now allocates the pci config space i/o ports 0xCF8/0xCFC so userspace needs to use devpnp to access pci config space now. (see latest realemu change). Also, this moves the ioalloc()/iofree() code out of devarch into port/iomap.c as it can be shared with the ppc mtx kernel.	2020-11-03 20:46:09 +01:00
cinap_lenrek	61a062ee9f	kernel: improve page reclaimation strategy and locking when reclaiming pages from an image, always reclaim all the hash chains equally. that way, we avoid being biased towards the chains at the start of the Image.pghash[] array. images can be in two states: active or inactive. inactive images are the ones which are not used by program while active ones aare. when reclaiming pages, we should try to reclaim pages from inactive images first and only if that set becomes exhausted attempt to release text pages and attempt to reclaim pages from active images. when we run out of Image structures, it makes only sense to reclaim pages from inactive images, as reclaiming pages from active ones will never free any Image structures. change putimage() to require a image already locked and make it unlock the image. this avoids many pointless unlock()/lock() sequences as all callers of putimage() already had the image locked.	2020-04-26 19:54:46 +02:00
cinap_lenrek	1aa80c1d10	kernel: remove unused mem2bl() prototype	2020-04-12 16:11:41 +02:00
cinap_lenrek	8debb0736e	kernel: add portable memory map code (port/memmap.c) This is a generic memory map for physical addresses. Entries can be added with memmapadd() giving a range and a type. Ranges can be allocated and freed from the map. The code automatically resolves overlapping ranges by type priority.	2020-04-04 16:04:27 +02:00
cinap_lenrek	4a80d9d029	kernel: fix multiple devproc bugs and pid reuse issues devproc assumes that when we hold the Proc.debug qlock, the process will be prevented from exiting. but there is another race where the process has already exited and the Proc* slot gets reused. to solve this, on process creation we also have to acquire the debug qlock while initializing the fields of the process. this also means newproc() should only initialize fields not protected by the debug qlock. always acquire the Proc.debug qlock when changing strings in the proc structure to avoid doublefree on concurrent update. for changing the user string, we add a procsetuser() function that does this for auth.c and devcap. remove pgrpnote() from pgrp.c and replace by static postnotepg() in devproc. avoid the assumption that the Proc* entries returned by proctab() are continuous. fixed devproc permission issues: - make sure only eve can access /proc/trace - none should only be allowed to read its own /proc/n/text - move Proc.kp checks into procopen() pid reuse was not handled correctly, as we where only checking if a pid had a living process, but there still could be processes expecting a particular parentpid or noteid. this is now addressed with reference counted Pid structures which are organized in a hash table. read access to the hash table does not require locks which will be usefull for dtracy later.	2020-02-23 18:00:21 +01:00
cinap_lenrek	8d51e7fa1a	kernel: implement portable userinit() and simplify process creation replace machine specific userinit() by a portable implemntation that uses kproc() to create the first process. the initcode text is mapped using kmap(), so there is no need for machine specific tmpmap() functions. initcode stack preparation should be done in init0() where the stack is mapped and can be accessed directly. replacing the machine specific userinit() allows some big simplifications as sysrfork() and kproc() are now the only callers of newproc() and we can avoid initializing fields that we know are being initialized by these callers. rename autogenerated init.h and reboot.h headers. the initcode[] and rebootcode[] blobs are now in *.i files and hex generation was moved to portmkfile. the machine specific mkfile only needs to specify how to build rebootcode.out and initcode.out.	2020-01-26 19:01:36 +01:00
cinap_lenrek	13785bbbef	pc: replace duplicated and broken mmu flush code in vunmap() comparing m with MACHP() is wrong as m is a constant on 386. add procflushothers(), which flushes all processes except up using common procflushmmu() routine.	2019-12-07 02:19:14 +01:00
cinap_lenrek	24d1fbde27	kernel: simplify pgrpnote(); moving the note string copying to procwrite() keeps handling of devproc's note and notepg files similar and in the same place and reduces stack usage.	2019-09-19 02:07:46 +02:00
cinap_lenrek	2149600d12	kernel: catch execution read fault on SG_NOEXEC segment fault() now has an additional pc argument that is used to detect fault on a non-executable segment. that is, we check on read fault if the segment has the SG_NOEXEC attribute and the program counter is within faulting page.	2019-08-27 03:47:18 +02:00
cinap_lenrek	fe594760eb	kernel: get rid of checkpagerefs() debugging was only implemented by the pc kernel. does not account pages used by the mount cache.	2019-05-01 12:40:27 +02:00
cinap_lenrek	b452f8857f	kernel: export freepages() function so it can be used in mmurelease()	2019-05-01 10:07:39 +02:00
cinap_lenrek	26d36c3ae2	devswap: simplify, don't panic when writing swapfile fails always start the pager kproc in swapinit(), simplifying kickpager(). allow zero conf.nswap and conf.nswppo. avoid allocating the reference map and iolist arrays in that case. use ulong for ioptr and iolist indices. don't panic when writing pages out to the swapfile fails. just requeue the page in the io transaction list so we will try again next time executeio() is run or just free the page when the swap reference was dropped. remove unused pagersummary() function.	2019-01-22 22:06:42 +01:00
cinap_lenrek	5da4f0fc0f	sdram: experimental ramdisk driver this driver makes regions of physical memory accessible as a disk. to use it, ramdiskinit() has to be called before confinit(), so that conf.mem[] banks can be reserved. currently, only pc and pc64 kernel use it, but otherwise the implementation is portable. ramdisks are not zeroed when allocated, so that the contents are preserved across warm reboots. to not waste memory, physical segments do not allocate Page structures or populate the segment pte's anymore. theres also a new SG_CHACHED attribute.	2018-05-27 22:59:19 +02:00
cinap_lenrek	b437065950	stats: show amount of reclaimable pages (add -r flag) reclaimable pages are user pages that are used for caches like the image cache, mount cache and swap cache.	2018-01-05 00:52:14 +01:00
cinap_lenrek	f3f9392517	kernel: introduce devswap #¶ to serve /dev/swap and handle swapfile encryption	2017-10-29 23:09:54 +01:00
aiju	773be02aa1	kernel: add support for hardware watchpoints	2017-06-12 19:03:07 +00:00
cinap_lenrek	0c1110ace2	kernel: fix twakeup()/timerdel() race condition timerdel() did not make sure that the timer function is not active (on another cpu). just acquiering the Timer lock in the timer function only blocks the caller of timerdel()/timeradd() but not the other way arround (on a multiprocessor). this changes the timer code to track activity of the timer function, having timerdel() wait until the timer has finished executing.	2017-03-29 00:30:53 +02:00
cinap_lenrek	47f07b2669	kernel: make the mntcache robust against fileserver like fossil that do not change the qid.vers on wstat introducing new ctrunc() function that invalidates any caches for the passed in chan, invoked when handling wstat with a specified file length or on file creation/truncation. test program to reproduce the problem: #include <u.h> #include <libc.h> #include <libsec.h> void main(int argc, char argv[]) { int fd; Dir d, nd; fd = create("xxx", ORDWR, 0666); write(fd, "1234", 4); d = dirstat("xxx"); assert(d->length == 4); nulldir(&nd); nd.length = 0; dirwstat("xxx", &nd); d = dirstat("xxx"); assert(d->length == 0); fd = open("xxx", OREAD); assert(read(fd, (void*)&d, 4) == 0); }	2017-01-12 20:13:20 +01:00
cinap_lenrek	c86b5ddaa6	kernel/qio: make readblist() offset of type ulong as the rest	2016-11-12 17:41:58 +01:00
cinap_lenrek	a54d1cd95e	kernel/qio: big cleanup of qio functions remove bl2mem(), it is broken. a fault while copying to memory yields a partially freed block list. it can be simply replaced by readblist() and freeblist(), which we also use for qcopy() now. remove mem2bl(), and handle putting back remainer from a short read internally (splitblock()) avoiding the releasing and re- acquiering of the ilock. always attempt to free blocks outside of the ilock. have qaddlist() return the number of bytes enqueued, which avoids walking the block list twice.	2016-11-07 22:20:10 +01:00
cinap_lenrek	963497f06b	kernel: avoid padblock copying for devtls/devssl/esp, cleanup debugging to avoid copying in padblock() when adding cryptographics macs to a block in devtls/devssl/esp we reserve 16 extra bytes to the allocation. remove qio ixsummary() function and add acid function qiostats() to /sys/lib/acid/kernel simplify iallocb(), remove iallocsummary() statitics.	2016-11-05 20:05:40 +01:00
cinap_lenrek	10275ad6dd	kernel: xoroshiro128+ generator for rand()/nrand() the kernels custom rand() and nrand() functions where not working as specified in rand(2). now we just use libc's rand() and nrand() functions but provide a custom lrand() impelmenting the xoroshiro128+ algorithm as proposed by aiju.	2016-09-11 02:10:25 +02:00
cinap_lenrek	0a5f81a442	kernel: switch to fast portable chacha based seed-once random number generator	2016-08-27 20:42:31 +02:00
cinap_lenrek	0f97eb3a60	kernel: add secalloc() and secfree() functions for secret memory allocation The kernel needs to keep cryptographic keys and cipher states confidential. secalloc() allocates memory from the secret pool which is protected from debuggers reading the memory thru devproc. secfree() releases the memory, overriding the data with garbage.	2016-08-27 20:33:03 +02:00
cinap_lenrek	1057a859b8	devsegment: cleanups - return distinct error message when attempting to create Globalseg with physseg name - copy directory name to up->genbuf so it stays valid after we unlock(&glogalseglock) - cleanup wstat() handling, allow changing uid - make sure global segment size is below SEGMAXSIZE - move isoverlap() check from globalsegattach() into segattach() - remove Proc* argument from globalsegattach(), segattach() and isoverlap() - make Physseg.attr and segattach attr parameter an int for consistency	2016-03-30 22:49:13 +02:00
cinap_lenrek	04c3a6f66e	zynq: introduce SG_FAULT to prevent access to AXI segment while PL is not ready access to the axi segment hangs the machine when the fpga is not programmed yet. to prevent access, we introduce a new SG_FAULT flag, that when set on the Segment.type or Physseg.attr, causes the fault handler to immidiately return with an error (as if the segment would not be mapped). during programming, we temporarily set the SG_FAULT flag on the axi physseg, flush all processes tlb's that have the segment mapped and when programming is done, we clear the flag again.	2016-03-27 20:57:01 +02:00
cinap_lenrek	595501b005	kernel: make fversion()/mntversion() types consistent	2016-03-10 03:02:28 +01:00
cinap_lenrek	d19144155e	kernel: missing changes for ibrk() prototype	2015-12-21 04:49:29 +01:00
cinap_lenrek	7f3659e78f	kernel: cleanup exit()/shutdown()/reboot() code introduce cpushutdown() function that does the common operation of initiating shutdown, returning once all cpu's got the message and are about to shutdown. this avoids duplicated code which isnt really machine specific. automatic reboot on panic only when *debug= is not set and the machine is a cpu server or has no display, otherwise just hang.	2015-11-30 14:56:00 +01:00
cinap_lenrek	9f4eac5292	kernel: pgrpcpy(), simplify Mount structure instead of ordering the source mount list, order the new destination list which has the advantage that we do not need to wlock the source namespace, so copying can be done in parallel and we do not need the copy forward pointer in the Mount structure. the Mhead back pointer in the Mount strcture was unused, removed.	2015-08-09 21:16:10 +02:00
cinap_lenrek	86eb8ea6bb	kernel: change vmemchr() length argument to ulong and simplify	2015-08-06 10:15:07 +02:00
cinap_lenrek	4bd9ed80c3	kernel: export mntattach() from devmnt.c avoiding bogus struct passing and special case in namec() we already export mntauth() and mntversion(), so why not stop being sneaky and just export mntattach() so bindmount() and devshr can just call it directly with proper arguments being checked. we can also avoid handling #M attach specially in namec() by having the devmnt's attach function do error(Enoattach).	2015-07-28 09:52:21 +02:00
cinap_lenrek	6617c63a37	kernel: pipelined read ahead for the mount cache this changes devmnt adding mntrahread() function and some helpers for it to do pipelined sequential read ahead for the mount cache. basically, cread() calls mntrahread() with Mntrah structure and it figures out if we where reading sequentially and if thats the case issues reads of c->iounit size in advance. the read ahead state (Mntrah) is kept in the mount cache so we can handle (read ahead) cache invalidation in the presence of writes.	2015-07-26 05:43:26 +02:00
cinap_lenrek	8ed25f24b7	kernel: various cleanups of imagereclaim(), pagereclaim(), freepages(), putimage() imagereclaim(), pagereclaim(): - move imagereclaim() and pagereclaim() declarations to portfns.h - consistently use ulong type for page counts - name number of pages to free "pages" instead of "min" - check for pages == 0 on entry freepages(): - move pagechaindone() call to wakeup newpage() consumers inside palloc critical section. putimage(): - use long type for refcount	2015-07-09 00:01:50 +02:00
cinap_lenrek	64ed3658d2	kernel: add pagechaindone() to wakeup processes waiting for memory we keep the details about palloc in page.c, providing pagechaindone() for mmu code to be called after a series of pagechainhead() calls.	2015-06-15 17:40:47 +02:00
cinap_lenrek	8a3b388ffe	kernel: implement separate wait queues for page allocation give kernel processes and local disk file servers (procs having noswap flag set) a clear advantage for page allocation under starved condition by giving them ther own wait queue so they get readied as soon as pages become available.	2015-06-15 16:05:00 +02:00
cinap_lenrek	46070c3122	kernel: add segio() function for reading/writing segments devproc's procctlmemio() did not handle physical segment types correctly, as it assumed it can just kmap() the page in question and write to it. physical segments however need to be mapped uncached but kmap() will always map cached as it assumes normal memory. on some machines with aliasing memory with different cache attributes leads to undefined behaviour! we borrow the code from devsegment and provide a generic segio() function to read and write user segments which handles all the cases without using kmap by just spawning a kproc that attaches the segment that needs to be read from or written to. fault() will setup the right mmu attributes for us. it will also properly flush pages for segments that maintain instruction cache when written. however, tlb's have to be flushed separately. segio() is used for devsegment and devproc now, which also allows for simplification of fixfault() as there is no special error handling case anymore as fixfault() is now called from faulting process only. reads from /proc/$pid/mem can now span multiple pages.	2015-04-16 00:45:25 +02:00
cinap_lenrek	972cd5e3fc	kernel: get rid of auxpage() and preserve cache index bits in Page.va in mount cache the mount cache uses Page.va to store cached range offset and limit, but mips kernel uses cache index bits from Page.va to maintain page coloring. Page.va was not initialized by auxpage(). this change removes auxpage() which was primarily used only by the mount cache and use newpage() with cache file offset page as va so we will get a page of the right color. mount cache keeps the index bits intact by only using the top and buttom PGSHIFT bits of Page.va for the range offset/limit.	2015-03-16 05:46:08 +01:00
cinap_lenrek	fcc336b902	kernel: catch address overflow in syssegfree() the "to" address can overflow in syssegfree() causing wrong number of pages to be passed to mfreeseg(). with the current implementation of mfreeseg() however, this doesnt cause any data corruption but was just freeing an unexpected number of pages. this change checks for this condition in syssegfree() and errors out instead. also mfreeseg() was changed to take ulong argument for number of pages instead of int to keep it consistent with other routines that work with page counts.	2015-03-07 18:59:06 +01:00
cinap_lenrek	cb35d1a132	kernel: avoid inconsistent reads in /proc/#/fd and /proc/#/ns to allow bytewise access to /proc/#/fd, the contents of the file where recreated on each call. if fd's had been closed or reassigned between the reads, the offset would be inconsistent and a read could start off in the middle of a line. this happens when you cat /proc/#/fd file of a busy process that mutates its filedescriptor table. to fix this, we now return one line record at a time. if the line fits in the read size, then this means the next read will always start at the beginning of the next line record. we remember the consumed byte count in Chan.mrock and the current record in Chan.nrock. (these fields are free to usefor non-directory files) if a read comes in and the offset is the same as c->mrock, we do not need to regenerate the file and just render the next c->nrock's record. for reads smaller than the line count, we have to regenerate the content up to the offset and the race is still possible, but this should not be the common case. the same algorithm is now used for /proc/#/ns file, allowing a simpler reimplementation and getting rid of Mntwalk state strcture.	2014-12-21 04:46:22 +01:00
cinap_lenrek	b18a641397	kernel: remove implicit Proc* argument from procctl() procctl() is always called with up and it would not work correctly if passed a different process, so remove the Proc* argument and use up directly.	2014-11-09 08:19:28 +01:00
cinap_lenrek	3b661a96ef	kernel: make noswap flag exclude processes from killbig() if not eve, reset noswap flag on exec	2014-08-17 00:50:20 +02:00
cinap_lenrek	655ec332a7	devproc: fix proccrlmemio bugs dont kill the calling process when demand load fails if fixfault() is called from devproc. this happens when you delete the binary of a running process and try to debug the process accessing uncached pages thru /proc/$pid/mem file. fixes to procctlmemio(): - fix missed unlock as txt2data() can error - make sure the segment isnt freed by taking a reference (under p->seglock) - access the page with segment locked (see comment) - get rid of the segment stealer lock other stuff: - move txt2data() and data2txt() to segment.c - add procpagecount() function - make return type mcounseg() to ulong	2014-07-14 06:02:21 +02:00
cinap_lenrek	d4d86df2ab	kernel: new pagecache, remove Lock from page, use cmpswap for Ref instead of Lock make the Page stucture less than half its original size by getting rid of the Lock and the lru. The Lock was required to coordinate the unchaining of pages that where both cached and on the lru freelist. now pages have a single next pointer that is used for palloc.head freelist xor for page cache hash chains in Image.pghash[]. cached pages are not on the freelist anymore, but will be reclaimed from images by the pager when the freelist runs out of pages. each Image has its own 512 hash chains for cached page lookup. That is 2MB worth of pages and there should be no collisions for most text images. page reclaiming can be done without holding palloc.lock as the Image is the owner of the page hash chains protected by the Image's lock. reclaiming Image structures can be done quickly by only reclaiming pages from inactive images, that is images which are not currently in use by segments. the Ref structure has no Lock anymore. Only a single long that is atomically incremented or decremnted using cmpswap(). there are various other changes as a consequence code. and lots of pikeshedding, sorry.	2014-06-22 15:12:45 +02:00
cinap_lenrek	72ba3571a3	kernel: remove _xinc()/_xdec() as with the Block refcount changes, _xinc() and _xdec() arent used anymore, so remove them. architecure can still define ainc()/adec() when it needs them.	2014-06-08 01:35:22 +02:00
cinap_lenrek	a2d96d47c9	kernel: always reset notepending in eqlock, handle forceclosefgrp in eqlocks	2014-04-29 21:17:07 +02:00

1 2

71 commits