First of all, thanks for constructive response! Mikhail, you are absolutely right that on multi-CPU systems multiple reads/writes in parallel threads significantly improve performance. The detailed description of emulated Proactor was of the scope of this article. It the latest version of J5Proactor (
http://www.terabit.com.au/J5Proactor-1.0.zip) code completion dispatcher works in handler thread. Actually, Proactor thread can be in one of 3 states: LEADER (checks/waits for readiness events), HANDLER (really performs I/O) and dispatches completions, FOLLOWER (waits for be leadership). When leader thread goes to HANDLER state, it passes leadership to the of the FOLLOWER threads and after that performs I/O and calls user handler. J5Proactor works much faster that on old JavaProactor-1.4 on multi-CPU boxes.
The diagrams show result of old Proactor (J5Proactor was under development at the time of preparation article).
I see nothing bad in design where event demultiplexor and thread pool are embedded in single entity. What is the better alternative? Thread-per-connection is easy, but not scalable. We have the old goal to process M connections in N threads, where
M >> N. I think optimal N is equal the number of CPUs; more threads dont improve performance. So we still have to deal with thread pool. Thanks, again.