It was prefiously made by Vioulus guy. But he changed his site recently so it is no longer there. The app I mention he did is named PillowFort. It is still availalbe.
Anyway, this guy never destoryed original procedure like you. He just added jmp at the top of the original func, but I think he executes the overwriten code at the end of his rediretion function. When he returns, the original function just continues... I don't remember this very well, maybe Iam wrong about something.
Google for "violus.com Redirections" and you will see this guy even made a class for this. To bad it isn't cashed.
Your thread problem is indeed nasty. Thread2 has not crit section so it will call garbaged function. This can be fixed with dissasembler like I explained you so Thread2 will call your function as you don't have to do stuff you do now in CS.
Why do you care. Your dll will be loaded at any adress. Then it finds out address of RegisterClass function and patch it in address space of that process. If you put replacement func in the dll itself, its adress will be valid for host process only.But there's no way to ensure that my dll will get the same address in all processes.
Acctually for particular function you don't need desasm at all, as you can detect its start manuely. U can suppose that WinProc has the same start for every TC version. Violus did so for OpenSocket function. He detects OS version and patch the function with 2 or 3 different codes depending on the version. The desasm is needed for creating universal solution.
Maybe you can try this open-mind solutin:
When you enter critical section, make priority of your tread Real Time so nothing can enter the scene until you finish your cole. Then restore thread priority to normal. Perhaps there is some other way to make trhead un-interuptable by other process threads. Furthermore, perhaps there is a way to make threads non interuptible once they enter crit section so other threads, even those not knowing about crit section can not interupt it.