The PR #6649 which fixed an issue with orphaned KCBs leaking in memory which also pointed to unloaded registry hives, it also brought a problem.
In CmpEnumerateOpenSubKeys there is a risk of getting hit by a deadlock as we enumerate the cache table to remove empty cache entries.
Fundamentally CmpEnumerateOpenSubKeys locks down a KCB from cache for exclusive use in order to tear down its contents from memory but it doesn't address the fact a KCB might have already been locked in the same calling thread, leading to a recursion.
This leads to random hangs when unloading a hive during system startup (tipically on a clean install).
The solution here is to simply lock the whole registry when we unload a hive so that we don't have to worry the KCBs are getting tampered by anybody else. This also simplifies the code.
Although locking the entire registry while other apps are doing registry related operations to other hives can cause overhead. If this turns out to be bad then we have to rethink the locking mechanism here.
CORE-19539
A hive whose KCBs have a reference count of 0, meaning nobody is using these keys anymore, will not get removed from the cache table.
As a result during a normal hive unloading operation you will get orphaned KCBs which results in an unload failure.
This is wrong, because this is what a normal hive unloading is supposed to do. What it cannot do of course is that it cannot
scramble the references of opened keys by the users who use the Registry, as it is the job of force unloading mechanism to do that.
Also remove a misleading debug print. Force unloading works as intended by scrambling the references of keys and marking the KCB for deletion,
which is what how a force unload works. Namely Windows does exactly that.
CORE-10705
- Annotate the CmpEnumerateOpenSubKeys function with SAL2
- When removing an orphaned cached KCB, ensure that it is locked before clearing it from cache table entries
In particular remove some extra-parentheses around single code tokens,
and replace few "DPRINT1 + while (TRUE);" by UNIMPLEMENTED_DBGBREAK.
+ Improve some comments.
Sometimes repairing a broken hive with a hive log does not always guarantee the hive
in question has fully recovered. In worst cases it could happen the LOG itself is even
corrupt too and that would certainly lead to a total unbootable system. This is most likely
if the victim hive is the SYSTEM hive.
This can be anyhow solved by the help of a mirror hive, or also called an "alternate hive".
Alternate hives serve the purpose as backup hives for primary hives of which there is still
a risk that is not worth taking. For now only the SYSTEM hive is granted the right to have
a backup alternate hive.
=== NOTE ===
Currently the SYSTEM hive can only base upon the alternate SYSTEM.ALT hive, which means the
corresponding LOG file never gets updated. When time comes the existing code must be adapted
to allow the possibility to use .ALT and .LOG hives simultaneously.
In addition to that, in some functions like CmFlushKey, CmSaveKey and CmSaveMergedKeys we must validate the underlying hives as a matter of precaution that everything is alright and we don't fuck all the shit up.
Currently the failure code path doesn't do any kind of cleanup against
the hive that was being linked to master. The cleanup is pretty
straightforward as you just simply close the hive file handles and free
the registry kernel structures.
CORE-5772
CORE-17263
CORE-13559