Script:
Owner:
Subdir: atlantists
Blog ID: 115738919
Group ID: User ID: 114842689

Leader Spotlight

Featured Video

    Troubleshooting WebSphere Part 2: Principles of Effective Troubleshooting

    Friday, January 6, 2012, 2:57 PM

    In our previous post we discussed how imporant the fudamentals are to troubleshooting within the IBM WebSphere brand.  We briefly touched on a few of the basic tools and techniques.  This week we will look a bit closer at a few of the tools and an example of troubleshooting.

    Logs, Trace and Dumps
    Effective troubleshooting usually begins with a thorough examiniation of log files and any applicable product tracing.  Sometimes the standard error logging will indicate what the problem is, however it may or may not be obvious what the root cause and therefore resolution is.

    Take for example an out of memory error in WebSphere Application Server.  This could be due to an application memory leak, inefficient algorithm(s), a JVM with too little heap allocated, a bottleneck at a connectivity point such as to a database, a product defect or inefficient garbage collection or any combination of these.

    In order to make the right decision as to what action will resolve the issue you must take a closer examination of the information available.  For WebSphere application server this may include any of the following depending on the problem being evaluated.

    • Standard Out and Standard Error
              These logs contain a combination of WebSphere Application Server JEE Container information and errors and application informaiton and errors.  In addition to the symptoms these logs will give further indications as to the problem area.  
    • Monitoring reports - (PMI data)
              Any application be it custom or commercial that collects PMI statistics from the run-time JVM can prove useful in diagnosing problems especially when the problems may be performance related.  This includes the native PMI interface in the WebSphere Application Server for version 5.x and above.

    When going beyond the levels of information provided above is necessary, then you must turn to tracing and dumps for analysis.

    • Tracing at a component level
              Each JEE container service area and associated JEE component area has a trace string assocaited with it.  Most of the time in debugging an issue you will need to gather this information to receive support from IBM.  However, you can conduct your own level of troubleshooting to determine if tuning and/or application re-factoring will resolve the issue.  Refer to this article for more information: www-01.ibm.com/support/docview.wss?uid=s...
    • Thread Dumps
              A thread dump shows all live threads at the moment the thread dump is taken.  Therefore, it is imperative that you take the thread dump while reproducing the problem or while the problem is ocurring if possible.  Check the infocenter associated with your product for instructions.  An example is provided in the url below.

     
    • Core Dumps
             Core dumps are thread dumps and more.  They provide information on locks, loaded classes, some memory information, etc.
    Core dumps are taken either through the wsadmin scripting interface or through process signaling (kill -3 ).

    • Heap Dumps
             Heap dumps take a snapshot of the state of memory consumption.  This is essential to understand what objects are consuming the most memory.  Several snapshots taken over time can pinpoint a memory leak.   

    Below are some useful links with additional information:

    WAS 7 InfoCenter - Heap Dump
    tinyurl.com/8967nq5


    Summary

    Referring back to our original post in Part#1 we discussed the importance of having the fundamental knowledge required to troubleshoot effectively.  We also mentioned the ISA tool.  These are both essential when combined with the information in this post to troubleshoot WebSphere Application Server.  Look for more posts on troubleshooting other WebSphere products from this Author in the near future.

    0 (0 Ratings)
    [ 159 views ] Leave a Comment

    Troubleshooting WebSphere Part 1: Principles of Effective Troubleshooting

    Wednesday, December 7, 2011, 7:56 AM
    Categories: Articles

    Ever heard of Pistol Pete Maravich?  "The Pistol" as he became affectionately known was a master of the fundamentals of the game of basketball.  The irony of a player who still holds all time scoring records is that he got his name from his unconventional, from the hip shot.  I thought this would be an appropriate legendary person to teach us some lessons about the importance of fundamentals since RTP in North Carolina is the home of the IBM WebSphere Level 2 Support organization and Pistol and his family moved to North Carolina where he attended High School and his father joined the coaching staff at N.C. State University.  A winning combination of legendary greats: WebSphere and Pistol Pete.

    So what can Pistol Pete's legendary and unconvential shot and career teach us about troubleshooting?   

    Principle 1 - Fundamentals
    “Fundamentals, fundamentals, fundamentals. You’ve got to get the fundamentals down because otherwise the fancy stuff isn’t going to work.” 
    ― Randy Pausch

    The basics of sound troubleshooting are patience, persistence and understanding.  What does that have to do with troubleshooting WebSphere and fundamentals?  Well, there are simply no shortcuts to learning the fundamentals.  "The Pistol" was known for his early adoption of fundamentals taught by his Father.  He was also known to have spent countless hours mastering drills.  A great teacher combined with great discipline, work ethic and persistence led to a legendary super star.

    Today's integrated systems demand skills in Java/JEE, web services, networking, database connectivity, design and administration, replication, http, middleware, etc.  Sending a graduate to a training class to receive a certification in a product is not mastery of the fundamentals of distributed computing.  Proficiency with an IDE and the ability to programatically satisfy business requirements is not mastery of the fundamentals of design and programming.  If this sounds harsh it is because there are too many inefficiencies, errors and costs associated with taking short cuts in integration.  The best approach to troubleshooting and problem solving is to know and understand the underlying technologies such as:

    • Best practices for Message Flow Design (Design Patterns)
    • Best practices for JEE development
    • JVM tuning
    • Garbage Collection
    • JMS, MQ - pub/sub, clustering, etc.
    • SOAP/XML
    • Principles of parallel processing and distributed computing 

    Within each of test areas of technology there are patterns of problems that surface and patterns of root causes that surface.  The best troubleshooters will have experience with these patterns and knowlege of the fundamentals.

    It requires patience and persistence to both acquire these skills and apply them across an integration technology.  Problems are now a part of a much larger system of integrated technologies and when one of those components and systems fails it has a systemic affect on the rest.

    It takes great patience to work across organizations within the enterprise to have an applicaiton level and system level understanding that gives the best troubleshooters the edge needed to ensure these systems are running healthy.

    Principle 2 - Tools

    Having the proper tools in place to capture the level of information needed to identify the source of problems is essential.  Knowing where to look for answers to problems is essential.

    The IBM Support Assistant is a great tool that searches across several resources as is the IBM Support site for your product.  The key is being familiar with and having these tools setup customized to your specific needs so that when a problem ocurrs you can search the knowledgebase.

    IBM Support Assistant

    IBM WebSphere Message Broker Support Site

    The keys to success here are the patience to learn the nuances of the tool, how to dig and where to find the most relevant information.  

    Other tools include IBM Tivoli for your product.  Again, the key here is patience and persistence to properly size capacity to run the collection agents, to instrument and fine tune the information you extrapolate from the tools.  

    The problem I see often with respect to tooling is the lack of patience to learn the tool as well as the product to which the tool is applied.  A wrench is a wrench, but have you ever seen medical tools used in lacroscopic surgery?  Have you ever seen the instrumentation dashboard of an F-14?

    Flying IBM WebSphere technologies is not for a Sesna Engineer.  You will need to call in the specialists.  They have the background, skills, patience and persistence required to solve complex problems on complex, integrated platform systems and they are normally bored in a 9 to 5 flying sorties of cargo or surveillance duty.  e.g. watching the monitors and filling out paperwork duties.  Which leads us to third principle.

    Principle 3 - Have the right People on the Problem

    Operations and development have to work more closely together for successful integration.  A great example is within the WebSphere Message Broker as an ESB solution product space.  The applicaiton developed message flows are so tightly couple with the product and OS that anything from performance to a complete abend of the Broker is possible.

    From design to implementation and troubleshooting these two historically separated parties must come together.

    Principle 4 - The Hip Shot

    Shooting from the Hip is appropriate and can certainly be mastered.  But all too often I see customer's paying the price of short cutting and breaking the laws of the fundamental principles of design, architecture and paying big penalties.  There are times to deviate from the norm, to find a work around and in some cases the hip shot works a whole lot better than the standard, run of the mill, textbook form jump shot in rare circumstances.  The key here is rare.

    And, there was only one Pistol Pete Maravich....now Tim Tebow might just be the next legend of the sort....but these guys appear how often in a century?  

    So, if you want to take the hip shot approach for your shot...be sure you've got a Pistol or a Tebow on the team.

    That's all for now.  Three simple and seemingly obvious principles. But three principles I find myself reminding myself and those whom I humbly serve and lead.



    3.2 (1 Ratings)
    [ 383 views ] Leave a Comment

    Getting the tiny bit of bytes to Service Bliss: Designing High Availability Messaging for WebSphere Message Broker

    Monday, September 26, 2011, 9:48 AM

    Recently I have been working with a few different clients architecting and designing High Availability WMB solutions.  A few things have stood out to me that are worth sharing and considering.

     

    We need to define and agree upon what HA means to the stakeholders

    What does this thing do?

    What is more pertinent and relevant for messaging solution specialists is to know, understand and think in business terms and design the appropriate solution.  An engineer would not blueprint a bridge, a building or an airplane without a through, detailed and documented understanding of the purpose which is being served by the solution.  Tinkering with technology is the crazed favorite pass time amongst many a techology enthusiasts.  It is more than worth the tradeoff to acquire the input of a business driven technologist over a techology enthusiast.  This applies strongly in the case of desiging HA messaging solutions as most of these systems make up the core of business services delivery.

     

    What do you mean by that?

    Some of the best communication I recall is from a mentor who would often just simply say "What do you mean by that?".  Wow, how simple and yet how effective is that principle in communication.  Communication must be a strong suit for the person designing a messaging technology solution.  

    HA means something different to every one of my customers.  In some instances it simply means there is a way to recover from a failure and to apply maintenance with little to no down time.  Then when we talk further they agree that the system  can be down for eight hours and recover.  On the other end of the spectrum, with large, real-time messaging at the core of their business; which represents most of my clients we start talking about 9's of availability.

    Most often this turns quickly into Five 9's MUST be acheived.  Then I begin to explain that this really equivocates to Continuous Availability and further break down the definition in terms of downtime per year, per day and per week and begin to discuss the architecture.  

    5:15.36 of downtime per year and 25:92 per year.

    After this discussion some times these requirements are reduced.  This is very important, no it is critical to any discussion about availability.  This will drive the rest of the design.

     

    A chain is only as strong as its weakest...

    You know the trite but true.  Therefore a high availability messaging soltuion is only as highly available as its least available component!  The driving principle here is to think about every place across the wire, across the LAN across the C-Sharp and the Worldwide Area Network that a message has to traverse in its short and simple life to stay alive long enough to be tested, translated and transformed, secured, packaged and delivered to its final destination to that place in messaging bliss when a message becomes a trasaction that once lived, but is now part of the bigger universe of business services so eloquently transaction that mean real money to real owners.  So and therefore if that little bit of bytes finds its way on to one component, one little nic that when nac and no longer transmits a data pulse...guess what?  That little message just did not make it to heaven.  No matter what kind of quad engine turbo jet processor it was rocketed on.  The moral of the story?  Each component which the message crosses must be available for the designated number of nines specificed.  MQ clustering, WMB multi-instance, Red Hat Clustering, HACMP, Tivoli, Veritas Clustering etc. provides the supporting infrastructure availability to meet the success criteria and satisfy the SLA; but not if there is one component which is not highly available and redundant.

    Stakeholders need to know and understand that messaging is one of many boxes and wires involved in delivering messages on to heavenly blissful passage where a message transforms into revenue.

     

    Clear and Accurate Measurements

    Meters or Yards?

    Now when the stakes are high we simply do not calibrate on an ambiguous system of measurement!  No sir, this is why football is played with an agreed upon definition of success.  This is why the guys in stripes bring out the chains to mesure the fraction of inches that make the difference between success and failure on third and long.  So why would we not agree upon standards for measuring the availaiblity of critical business systems services for the underlying technologies which ensure the delivery of said services?  Does loosing a transaction here and there REALLY matter?  If so, then how much?  How much does it cost to design a CA messaging solution versus mitigating those losses in some other manner?  

    There is actually such a thing as availability math.  I am not sure if they are teaching this in C.S. classes, but I am sure they need to be.

     

    The rest of the Design Story

    So now that what level of availability is required has been defined we can begin to actually design a system that meets the criteria.  Once designed and costed some customers will further relax their 9's and thereby reduce their costs.  They will also make intelligent and informed decisions about what transactions really need this level of availability versus what percentage can live on a more simplified architecture.  

     

    Service is the name of the game

    At the end of the day we all need to remember it is about service.  Quality of service, assurance of service delivery and reliablity of service.  These issues are the lifeblood of the companies we who wear the messaging hat hold at our disposal.  It's what getting that tiny bit of bytes across the wire to service bliss is all about.

    How to?

    Next we can look at the how to with respect to the actual design and implementation of a WMB HA including options. 

     

    3.7 (1 Ratings)
    [ 1102 views ] Leave a Comment

    I am currently ISA installation and configuration. Helping with a WMB best practices guide.

  • Celia Hamilton
    Celia Hamilton

  • GWC Support
    GWC Support

  • Debbie Lynd
    Debbie Lynd

  • Annette Rossi
    Annette Rossi

    Loading...