Conversational state in IBM WebSphere eXtreme Scale: PER_CONTAINER grids

Posted by: Billy Newport on

We added this type of grid starting with V6.1.0.3, prior to this we had 'normal' grids which are a fixed number of partitions and the key of a Map hashed to one of those partitions. We place the partitions on the set of online container JVMs and automatically scale them out or in if JVMs are added and removed. This grid works very well for key based grids, where the applications uses a key object to locate data in the grid.


PER_CONTAINER grids are different. You specify the grid uses PER_CONTAINER using the placementPolicy attribute in the deployment xml file. Here, you specify how many partitions you want per container JVM that you start. Remember, normally, you specify how many partitions total in the grid, not per container. If the number of partitions is 5 then when you start a container JVM, WXS will create 5 new anonymous partition primaries on that JVM and will create any replicas on the other container JVMs already running. Here is a sequence:

  • Start a JVM, it gets 5 primaries. (P0-P4)
    JVM#1 -> P0, P1, P2, P3, P4 
  • Start another JVM, it gets 5 primaries (P5-P9). Replicas for P0-4 are placed on it and replicas for P5-9 go on JVM#1.
    JVM#1 -> P0,P1,P2,P3,P4,R5,R6,R7,R8,R9
    JVM#2 -> P5,P6,P7,P8,P9,R0,R1,R2,R3,R4
     
     
  •  Start another JVM, it gets 5 primaries (p10-p14). The replicas for p10-p14 are placed on JVM#1 and JVM#2 and some of the replicas for P0-P9 are moved to JVM#3 to balance the load.

    JVM#1 -> P0,P1,P2,P3,P4,R7,R8,R9,R10,R11,R12
    JVM#2 -> P5,P6,P7,P8,P9,R2,R3,R4,R13,R14
    JVM#3 -> P10,P11,P12,P13,P14,R5,R6,R0,R1
     
     
This continues as more JVMs are started. The grid will always create 5 new partition primaries on each new JVM and will then place replicas for those on the existing JVMs and will then balance the replicas out so the new JVM gets it's share. But, WXS will NEVER move a primary in PER_CONTAINER mode. It only moves replicas for balancing.

The partition numbers are arbitrary and have nothing to do with keys. You cannot do key based routing with this type of grid. If a JVM was to be stopped then the partition ids which were created for that JVM are not longer in use. This means there is a gap in the partition ids. There would no longer be partitions 5 to 9 if JVM#2 was to die in the above example. There would only be 0-4 and 10-14. This is why key based hashing doesn't work here. 

So, what's it good for? It's good for things like HTTP Session replication or application session state. Here, a HTTP router assigns a HTTP Session to a servlet container. The servlet container needs to create a HTTP Session for it and will choose one of the five local partition primaries for the session. The 'id' of the partition chosen is then stored in the cookie. The servlet container now has local access to the session state which means zero latency access to the data for this request as long as session affinity is maintained. Any changes to the partition are replicated by WXS.

Why not just have one partition per JVM? 5 or more likely 10 is good because think about what happens if a JVM fails. We want the burden of the JVM failing to be spread across the cluster 'evenly'. If there was one partition then when the JVM fails then one JVM has to pick up the entire load, the one where the replica was. This is bad because the load there may double. If there is 5 partitions per JVM then 5 JVMs pick up the load of the JVM which lowers the impact on each JVM by 80%. By using multiple partitions per JVM, we are lowering the impact on the replica substantially. Another way to look at it is if a JVM suddenly spiked then the replication load of that JVM is spread over 5 JVMs, which is better than one.

The next issue I hope some have noticed is that this means every time a JVM starts, we may five more partition primaries and 5 more replicas. Over time, we just keep making partitions and they never go away. This isn't how it works. When a JVM starts, it gets five partition primaries. These are 'home' primaries. They exist on the container which created them. If the JVM fails then the replicas become primaries and WXS will create 5 more replicas to maintain high availability (unless auto repair is turned off!). The new primaries are in a different JVM than the one that creates the partitions. They are foreign primaries. The application should never place new state or sessions in a foreign primary. They should be allowed to 'drain', sessions get evicted etc. Eventually, the foreign primary has no entries and WXS automatically deletes it and its associated replicas. The foreign primaries purpose is to allow existing sessions to still be available but it's not for new sessions.

How does a client interact with such a 'key' less grid then? The client just begins a transaction and then stores data in the grid. The keys are meaning less. The client can ask the Session for a SessionHandle object. This is a serializable handle which allows the client to get back to the same partition later. WXS picks a partition for the client from the list of home partition primaries. It will never return a foreign primary partition. This SessionHandle could be serialized in a HTTP cookie or similar device and then later on receiving it again, it can be converted back in to a SessionHandle and provided to the WXS API to obtain a Session bound to the same partition again. You cannot use Agents to interact with a PER_CONTAINER grid for now.

This is different than a normal FIXED_PARTITION or hash grid because the client stores data in a place in the grid, gets a handle to it and uses the handle to access it again next time. There is no application supplied key here.

Obviously, WXS is not making a new partition for each 'session'. Therefore, the keys used to store data in the partition should be unique within that partition. Maybe the client generates a unique SessionID and then uses that as the key to find information in Maps in that partition. Multiple client sessions will be assigned to the same partition so the application needs to make sure different keys are used to store session data in that partition for each partition.

I used 5 partitions as an example here but the numberOfPartitions parameter in the objectgrid.xml can be used to specify this, it just means the number of partitions per container rather than per grid. The number of replicas is specified in the normal way.

This also works with multiple data centers. If possible WXS will return a SessionHandle to a partition whose primary is located within the same zoned/data center as that client. The client can specify the zone as a parameter to the JVM or using an API. The client zone id can be set using serverproperties or clientproperties

The PER_CONTAINER style of grid suits applications which store conversational type state rather than database oriented data. The key to access it is some kind of conversation id and is not related to a specific database record or something like that. It provides higher performance (because the partition primaries can be collocated with the servlets for example), easier configuration (no need to worry about how many partitions are needed for how many JVMs), works very well in multiple data center scenarios (zone based routing). It provides a very useful extra tool in the bag for effectively using WebSphere eXtreme Scale.

be the first to rate this blog

About Billy Newport

Billy Newport

Billy is a Distinguished Engineer at IBM. He's been at IBM since 2001. Billy was the lead on the WorkManager/ Scheduler APIs which were later standardized by IBM and BEA and are now the subject of JSR 236 and JSR 237. Billy lead the design of the WebSphere 6.0 non blocking IO framework (channel framework) and the WebSphere 6.0 high availability/clustering (HAManager). Billy currently works on WebSphere XD and ObjectGrid. He's also the lead persistence architect and runtime availability/scaling architect for the base application server.

Before IBM, Billy worked as an independant consultant at investment banks, telcos, publishing companies and travel reservation companies. He wrote video games in C and assembler on the ZX Spectrum, Atari ST and Commodore Amiga as a teenager. He started programming on an Apple IIe when he was eleven, his first programming language was 6502 assembler.

Billys current interests are lightweight non invasive middleware, complex event processing systems and grid based OLTP frameworks.

More About Billy »

NFJS, the Magazine

August Issue Now Available
  • Google Your Persistent Domain Model
    by John Griffin
  • Get Cooking in the Cloud with Chef, Part 2
    by Michael Nygard
  • Making Java Bearable with Guava
    by Daniel Hinojosa
  • HTML 5 Update
    by Brian Sletten
Learn More »