Many enterprise WebSphere systems use topologies that share Java Virtual Machine (JVM) clusters with both mission-critical core applications and client applications. Learn about three alternate topologies based on WebSphere Application Server Network Deployment (ND) that you can implement to prevent costly system instability and downtime.
A large WebSphere system can have hundreds of Java Virtual Machine (JVM) instances running on more than a dozen physical servers at multiple data centers. To simplify administration and management, many companies employ an infrastructure where numerous client applications share the same JVM cluster with a large, mission-critical enterprise application that they contact for a service or data. However, this architectural model can raise a pronounced risk for a company in terms of the system’s overall stability. Because both the client applications and the enterprise application run on a single clustered JVM, a glitch in one system can affect all of the systems. The system downtime that follows any such failure — and the time it takes to isolate the problem — can adversely affect a company’s bottom line and customer satisfaction.
To help WebSphere system and application professionals avoid such potential stability pitfalls, we discuss three architectural solutions that use topologies based on WebSphere Application Server Network Deployment (ND) and show how you can achieve high resiliency and high stability for your enterprise WebSphere systems. For each solution, we highlight the challenges you may encounter and the factors you should consider when planning the design and implementation of the infrastructure.
Before discussing the challenges you face in choosing the topology to best support and enhance the stability of your WebSphere systems, we take a moment to define our terms in this discussion. By “core application,” we mean a large mission-critical enterprise Java EE application or application suite that executes within the Java EE containers provided by a dedicated WebSphere Application Server system. A “client application” is a Java EE application that typically requests services from the core application. “WebSphere infrastructure” refers to a physical server, such as an IBM midrange server, and other related hardware and system software, such as the operating system software and WebSphere Application Server system software.
The challenge
In order to support composite applications in a service-oriented architecture (SOA), an IT infrastructure centered on WebSphere must expand the interconnectivity among WebSphere systems that include a Java EE application and WebSphere infrastructure, and which are interconnected with non-WebSphere systems.
In terms of a system’s computing roles, this interconnectivity can be mutual, transitive, and/or reversible. For example, WebSphere system A may be the client of WebSphere system B (i.e., system A depends on mutual interconnection with system B for the fulfillment of a service). The service provider role is both mutual and transitive when system A depends on system B for a service, and system B depends on system C to provide the service that system A needs. This client and server relationship can be reversed when system A is providing a service to system B.
Figure 1 shows some of the dependencies of a typical SOA, where customer-facing Application A — executing on WebSphere server A — serves as a client to request account information from Application B (which is running on WebSphere server B). Likewise, Application A serves as client when requesting customer information from Application C residing on WebSphere server C. Conversely, the customer-facing Application A could provide update services to Application B on account information and to Application C on customer data.

|
Figure 1. A typical service-oriented architecture (click image to enlarge)
|
When there are hundreds of such relationships and interconnections existing among your critical WebSphere applications – and the client applications provide services to and obtain services from your core application — the overall WebSphere infrastructure can become fragile if its design does not address these complex relationships. The stability risk inherent in this type of WebSphere system manifests itself in four key ways:
-
Production failure. A change to one system’s application code, or one system’s infrastructure configuration, could destabilize the JVM or the JVM cluster of the entire core application (which contains a large number of clients). This type of production failure arguably is the most damaging to a company because of its widespread consequences.
-
Problem isolation. Because of the heavy traffic within a multiclient WebSphere infrastructure and the high number of potential problem sources, it is often difficult or impossible to detect a problem, isolate it, and resolve it. This challenge can drastically delay the resolution of the problem and the restoration of service.
-
Associated complexity in the design, implementation, maintenance, and on-going support for both client applications and the core application. For example, an application fix that changes the session management characteristics of one client application may become a bug in another client application that shares the same JVM and JVM cluster within the core application. The fix could significantly increase the number of active sessions and thus prohibit the other application from creating an adequate number of sessions at peak time. This type of situation is often very hard to diagnose because different development teams and business groups may not be aware of the other’s changes.
-
An inability to adequately stress test. The more systems there are, the more difficult it is to effectively simulate the richness of the traffic patterns involved in the application environment. The variety of the load mix is also difficult to reproduce, so a stress test may not expose an application defect or system anomaly.
It is critically important to design an infrastructure that can satisfy business-driven computing needs while being sufficiently mature and robust to gracefully recover from an instability that originates within any of the interconnected systems. The key is to prevent the instability of one system from causing a “chain-reaction of failures” within the other interconnected systems and thus affecting the availability of your multiclient WebSphere system.
So where does all of this leave you? The following sections delineate three topology options based on ND. ND provides all of the features of WebSphere Application Server, plus advanced deployment services to help eliminate or minimize application downtime and its related costs. These services include clustering, failure bypass, and a high-availability manager. Edge components provide load balancing, caching, and centralized security for enhanced performance at the edge of the network. ND’s extended Web services management and advanced remote administration help with managing complex environments.
Each of the options that we outline is designed to handle the inherent risks of a large and complex integrated system.
Option 1 resembles many WebSphere systems in production today, which simply lump everything into the same JVM cluster.
Option 2 improves on the first option by separating clients and the core system such that they are on separate infrastructures, but the core system still does not have dedicated JVM instances for each client, which can hinder problem detection and isolation.
Option 3 is a state-of-the-art approach that separates the infrastructure for the core application from that of the client applications, and it gives each client application its own dedicated core application JVM or JVM cluster. Thus, this option enables the WebSphere team to prevent or isolate problems very quickly. We believe that Option 3 presents the best solution for promoting large-system stability while balancing costs and benefits.
To illustrate the topology options, we use as an example a topology that contains two data centers: data center A in New York, and data center B in Los Angeles.
Option 1: Integrating client applications with the core application system
Option 1 hosts the core application and client applications on the same WebSphere infrastructure, as shown in Figure 2. With this option, one core application and multiple client applications are deployed into each cluster. In this scenario, each data center contains two different WebSphere clusters in the same WebSphere cell and each cluster has its own applications supporting different business initiatives. The load-balancer appliance at each data center uses a round-robin algorithm to spray requests alternately to one cluster and then the next. The geographical load-balancer appliance ships the inbound traffic to either site based on predefined rules (e.g., clients east of the Mississippi go to data center A in New York, while clients west of the Mississippi go to data center B in Los Angeles).

|
Figure 2. Client applications sharing the same WebSphere infrastructure with the core application (click image to enlarge)
|
The advantage of this design is that it uses fewer physical servers than the other two options that we discuss; all of the major applications are deployed to the same set of hardware, which is a cost-effective way to increase system utilization. However, a disadvantage of this solution is that it does not provide problem insulation and isolation. For example, if a runaway process in one application causes high CPU utilization, it could degrade the performance of all applications hosted on this infrastructure or cause their complete failure.
Option 2: Separating client applications from the core application system
In Option 2, the client applications are separated from the core application, with each application (core or client) deployed to a dedicated WebSphere application server. Any of the applications can then be deployed to clusters in different data centers. In the scenario shown in Figure 3, the core application is on two clusters, each with multiple JVMs (represented by gray circles in the figure), and the client applications are not clustered; they are merely clients of the core application. The key to this architecture is that the client applications are on different physical machines than the core application. Thus, the architecture confines a problem in a client application to its respective server. Any problems associated with the client application that affect system settings, such as CPU, native memory, or disk space, do not affect the core application.

|
Figure 3. Client applications and the core application on separate servers (click image to enlarge) |
The core application clusters can be configured to belong to the same cell or multiple cells, depending on your preference of administration ease or level of problem insulation. If your priority is with ease of administration, then you should configure the core application clusters to belong to the same cell. If you want to emphasize stability, you can choose multiple cells.
The same pros and cons are true for client applications. However, the WebSphere servers for the client applications should normally not be part of the core application cells for problem insulation and stability purposes. The reason: this architecture uses more physical hardware than the other options, and it thus increases the potential for server proliferation. For instance, a system of 40 client applications will have 40 different WebSphere Application Server environments, each of which requires more hardware and additional software license costs to support.
Another disadvantage of this architecture is that, although many of the problems that occur in client applications will not affect the core application or other client applications, there could still be issues in the core application system that cause instability and possibly lead to a system-wide failure.
For example, a client application may suddenly generate an immense amount of traffic for an unprepared core application, causing the Web container thread pool to saturate. With the core application unavailable to accept connections for other client applications, the result is costly system-wide instability.
Option 3: Dedicating separate JVM clusters for each client application
Not every complex problem requires a complex solution — which is good, because complex solutions tend to be difficult to implement, costly, and error prone. The WebSphere infrastructure design of Option 3 meets the IT challenge of providing a technical solution that is elegantly simple, but effective. In this topology, each client application accesses a dedicated JVM cluster containing the core application, and the dedicated JVM clusters share the same physical hosts, as shown in Figure 4. (Note that the figure only shows the infrastructure for the New York data center in the scenario we’ve been using; the design is the same for the Los Angeles data center.)

|
| Figure 4. Client applications on separate servers and using dedicated JVM clusters (click image to enlarge) |
The Web servers balance the distribution of server requests across the JVMs in each cluster. To segment the traffic inbound from each client application, a Web server sits in front of the core application with its own plug-in configuration file (plugin-cfg.xml) that points only at certain JVMs. Both the top and bottom tier of client Web servers are configured to spray requests to the application servers based on a virtual IP (VIP) address with the appropriate context root in the plugin-cfg.xml file, and each client application requires it own VIP address. Thus, inbound traffic to the core application only hits a subset of the total JVM count. Having a Web server in front of the core enables requests from a client application to go the appropriate core application cluster without the need for complex property files.
This approach simplifies many tasks, particularly identifying, isolating, and resolving technical issues, because the problems of one client application are less likely to affect client applications within a separate JVM cluster. For example, if a problem with one client application forces you to apply a patch or recycle the server, these operations will not have a cascading affect on the entire system.
The downside of this approach is that running additional JVM instances (additional in comparison with the previous options) increases system requirements, which could cause low utilization of the WebSphere system. Although it’s unlikely that these JVM instances and their physical host servers will experience significant workloads at the same time, you should allow for that possibility when planning your system capacity. Also, be aware that the operational costs incurred by the WebSphere engineering support team to maintain this number of JVM instances can be substantial if these processes are not managed correctly.
Although there are more JVM instances in topologies that use Option 3, the additional JVMs do not necessarily require more system administration work if you employ innovative standardization and automation techniques and use the appropriate system administration tools, which we discuss next.
Factors affecting your topology choices and decisions
Before you settle on a WebSphere Application Server ND topology, you have several factors to consider that may impact the cost and effectiveness of the topology you choose.
Standardization and automation
Standardization and automation are key tools for any WebSphere engineer assigned to manage a large number of JVM instances (e.g., a few hundred). It is a prerequisite for the system administration automation that follows that you standardize on the principles and guidelines that fit the business needs and focus of your company.
For example, if stability enhancement and problem insulation have a lower priority than the ease of system administration, you should have one file mount point for multiple applications. However, if the business driver demands maximum results in system stability and problem insulation, then you should consider a separate file mount for each application. Keep in mind, though, that if your WebSphere systems have different file system settings, it is more difficult to automate system administration jobs. Regardless of which WebSphere system standards you employ, the key to successful system administration automation is to rigorously follow the standards.
To enable system engineers to identify and resolve a problematic JVM without having to log into the administration console of each WebSphere system, you need a system administration automation tool (customized or off-the-shelf) with the following capabilities:
-
A visual presentation of the JVM; a system console that allows easy visualization of the status of a JVM instance; fast system administration operations such as start, stop, and synchronization
-
Several automation scripts that can be launched from the system console to perform system administration operations
-
A display that indicates messages from system administration operations
A system automation tool with these capabilities enables WebSphere engineers to perform system administration operations at very high speed, regardless the number of JVM instances they work with.
The need for additional system resources
As the number of JVM instances increase, so does the need for more system resources, such as memory, which naturally increases the cost of the overall WebSphere infrastructure.
However, adding a limited amount of main memory to your WebSphere servers is a small price to pay when any unscheduled shutdown of the core enterprise WebSphere application, however short-lived, can cost millions of dollars.
Still, you should expect questions and even resistance from the project management team for adding even well-justified costs to an IT project, as this team’s concerns may be focused primarily on short-term costs.
For example, if the project’s success is defined as the project being delivered on time and within budget, then the stability of the system in production is probably not a key concern for the project management team. In order to secure adequate system resources and implement the appropriate JVM topology design, system engineers must overcome the project management team’s perception that long-term stability is a lesser priority. Fortunately, there are two ways to do so: architectural guidance and risk mitigation analysis.
Architectural guidance and risk mitigation analysis
A company’s business needs drive the design of enterprise applications. The applications’ architectural features must then be supported by, and implemented in, an IT infrastructure that conforms to corporate strategies.
For example, let’s say you’ve used your company’s documented architecture requirements for a given type of application to rank topology options for a multiclient core WebSphere application.
The design deliverables resulting from this exercise indicate that you can best meet requirements with vertical and horizontal clustering within one data center, hot-hot data center failover, and a dedicated JVM for each client application. Such architectural guidance is key to the consistent performance of IT systems.
Rigorously applied, it can powerfully assist you in presenting the fundamental architectural need for dedicated JVM instances as well as justifying the extra system resources required (for example, the increase of system memory for WebSphere servers).
If the project management team decides not to follow the company’s established architectural framework — or if such architectural guidance has not been developed — we suggest that the WebSphere management team create and distribute to its appropriate business partners in the company a formal risk analysis document that highlights a suitable WebSphere topology.
Note that a risk analysis document is not a design document; rather, it is a very high-level document that provides a concise overview of design choices and notes any possible risks (e.g., the failure to insulate one client system from the problem of another system, and the probable cascading system failure as a result).
The risk analysis document should also contain a risk mitigation proposal (with an accountability clause) that outlines the proposed actions to take to manage any risk exposure (e.g., acquiring the system resources required to have a dedicated JVM for each client application). The accountability clause should clearly state which group is accountable for any potential loss to the company stemming from design recommendations not followed due to immediate project cost concerns.
Licensing
Whether more JVM instances would increase your overall WAS license expense would depend on your license agreement with IBM. For example, if your WAS license is calculated per CPU, the number of JVM instances on each physical server has no bearing on license charge. However, you may incur additional expenses if the costs are calculated by the number of JVM instances.
What about dynamic application management?
WebSphere Application Server ND provides a viable option for managing WebSphere server instances. However, there is another architectural solution offered by WebSphere Virtual Enterprise (VE), which enables server consolidation, increased server resource utilization, and a simplified operations model through resource virtualization and the dynamic sharing and allocation of computing resources.
In a future article, we will highlight how WebSphere VE addresses the stability and problem isolation challenges of a large central business system. For a comprehensive overview of the product, we recommend reading the series of articles on WebSphere Extended Deployment (XD), which is the product’s former name.