Summary
JGroups is a reliable group communication tool that forms the basis of several high-availability projects, including clustering in the JBoss application server. The new JGroups release adds partial state transfer, streaming state transfer, support for the virtual synchrony group communication model, and two new failure detection protocols.
Advertisement
By introducing a communication group as an abstraction for a set of processes, group communication allows a process to send messages to a group of processes as effectively as it would to a single other process. Group communication underlies many load balancing, replication, and general event broadcast tools.
Among the more popular Java open-source group communication tools is JBoss' JGroups. JGroups has been used as the basis for JBoss' clustering solution, and has been available as the basis of a distributed cache plugin, JBossCache, in other popular projects, such as Hibernate.
JBoss recently released version 2.4 of JGroups, with features that address both performance and scalability. Efficient state transfer among group members is a key performance consideration, and a new feature now enables partial state transfer as well. According JGroups project lead Bela Ban, partial state transfer,
Allows a programmer to transfer a subset of the entire state, identified by an ID. We use this ... in JBoss HTTP session replication/buddy replication, where only the state represented by a buddy is transferred.
Another welcome performance enhancement is the ability to transfer state via streaming, as opposed to using the older model of a byte array buffer. According to Ban, using a byte array buffer to transfer state,
Forced a user to serialize the entire state into a byte[] buffer at the state provider and unserialize the byte[] buffer into the application state at the state requester...
Streaming state transfer uses input and output streams, so users can stream their state to the output stream in chunks and don't need to have the entire state in memory as required by a byte[] buffer.
On the receiving side, an input stream is provided from which the user can read the entire state and set the application state to it.
A key group communication concept is the channel, the abstraction an application uses to interface the group communication framework. JGroups 2.4 adds the ability for multiple services to share the same communication channel. According to Ban,
JGroups channels are quite expensive in their use of resources—mainly threads—and sharing a channel amortizes a channel over multiple services...
The Multiplexer was mainly developed to accommodate multiple services running on top of the same channel. This is beneficial in JBoss where we had 5 clustered services in [JBoss] 4.0.x, each using its own channel. In JBoss 5, we switched to the Multiplexer, and all 5 services use the same shared channel... If multiple services sharing a channel require state transfer, we run into the problems [that] FLUSH [prevents].
Ban has this to say about Flush, another new JGroups 2.4 feature:
Whenever the group membership is to be changed, or a state to be transferred ... Flush tell[s] every node in the cluster to stop sending messages and then join/leave a node, or transfer the state.
The 2.4 release also adds a new group communication model to JGroups, Virtual Synchrony:
Virtual Synchrony is a model of group communication, developed by Ken Birman at Cornell, which has the following properties:
All non-faulty members see the same set of messages between views
A message M sent by P in view V must be received by P in V, unless P crashes
All non-faulty members receive the same sequence of views
Note that we have not yet fully implemented virtual synchrony as flushing only flushes out messages *sent* by member P, but not those received by P. Therefore, if member A sends a message M to {A,B,C}, and crashes immediately afterwards, and only B received M, then C will *not* receive M (violating rule #1 above). We will fully implement this in JGroups 2.5
Finally, JGroups 2.4 sports two new failure detection protocols:
FD_PING and FD_ICMP ... allow for scripts to be run in order to check the health of a node (FD_PING) and ICMP to ping a machine (FD_ICMP). These can of course be combined with other failure detection protocols, such as FD or FD_SOCK.
If you've used JGroups—or some of its derivatives, such as JBossCache—before, what was your experience with its scalability and performance?
many of your coverage articles like this are good for offline reading. Would you mind making it easy to print your articles -- infoq entered the scene recently -- I asked them for PRINT FRIENDLY pages and they quickly made all their pages print friendly.
> hi there, > > many of your coverage articles like this are good for > or offline reading. Would you mind making it easy to print > your articles. > We're planning on redesigning this whole page, and printer-friendliness is part of that. We have never actually created a page for news items, but instead just used an old page that was for forum posts. In the meantime if you click on "Threaded View" at the top of the page, and print that, you'll get a more printer-friendly output.
Could you tell me if the following problems are addressed/fixed in this version?
1. One thing that's a pain in the previous version ist that it the group coordinator node goes down, it cannot rejoin the group (at least with TCP), instead it writes numerous warning messages and finally opens it's new subnet. There's no way to interfear in that process. As a result, I have to stop all nodes and restart the controller again. That isn't exactly what one expects from a HA solution, is it?
2. It doesn't do a clean shutdown even if all the channels are closed correctly, so there's no redeployment possible in a web container. Instead, I have to shut down the web server in order to get all ports correctly closed.
3. Certificate beased (public key) encryption would be nice. Currently, the symmetric key has to be distributed manually among the nodes.