Webserver Plugin configuration
How should I configure my web server plug-in for the best results ?
Some users are confused about how the ServerIOTimeout setting affects how the WebSphere web server plug-in handles failed requests. This document is intended to clarify this confusion.
There are competing factors that should be considered when determining the appropriate WebSphere web server plug-in settings for your environment. There is no single configuration that is correct for all environments. Please consider the following facts when selecting the best values for your environment.
– Connections consume resources on both the machine where the plug-in is installed and the machine where the application server is installed.
– Connections that do not terminate do not free these resources and can exhaust the resource pool.
– Time-out values set too low cause the client to experience unnecessary failures and increase the traffic load if the request is retried.
– Heavy traffic and intermediate issues can cause unexpected delays in responses.
– When an application response is not received within the ServerIOTimeout specification, you can decide whether to continue to send requests to that server. If you continue to send requests to that server, you risk additional time-outs and failures if there is a problem on the machine. If you decide to halt requests to that server, you decrease your sites capacity.
– Requests that contain a message body and typically can change the application state, such as a POST request, should not be retried unless the application is designed to accept multiple instances of the same request.
– Requests, typically GET and HEAD requests, that do not contain a message body are automatically retried by the plug-in when failures occur and this functionality cannot be disabled. If you configure the plug-in to stop using a server if time out issues occur, you must adjust other settings to ensure a single request never disables your entire site.
The WebSphere web server plug-in has several settings that should be examined when deciding how to best meet the needs of your clients.
The PostBufferSize property controls whether a request containing data, such as a POST request, will be retried to another server when failures and time-outs occur. By default, this property is set to 64K, which is the maximum content that can be buffered. If a request larger than this setting arrives, it will not be retried. Increasing the PostBufferSize property will consume more memory. As an alternative, you can disable the retry functionality of requests with content by setting the PostBufferSize property equal to 0, or you can set an unlimited buffer size by setting the PostBufferSize property to -1.
Requests that do not contain a body, such as GET requests, are automatically retried to another server whenever a failure or time-out occurs. This behavior cannot be disabled.
If a non-affinity request is retried, the plug-in will attempt to handle the request on each available server until a good response is received or each server has been tried. If an affinity request is retried and there is a positive ServerIOTimeout, affinity requests will be retried to the affinity appserver only. If there is a negative ServerIOTimeout, then the affinity server will be marked down and retries will be made to the remaining available servers.
The ServerIOTimeout property is designed to allow requests to time-out instead of waiting indefinitely for a response from the server. In Versions 6.1 and earlier, the ServerIOTimeout property defaults to 0, which indicates that there is no time limit for requests. This value is not recommended because if the a server never responds, the resources involved in this request might never be freed and eventually the resource pool is exhausted. In some cases, the operating system or adapter resources might break the connection because of inactivity but this is not behavior that should be relied on. You will need to determine the appropriate behavior for your environment if you plan to set the ServerIOTimeout property to 0.
If the ServerIOTimeout property is set to 0, do not expect requests to be retried. A request is not retried until the previous connection is broken and, as explained previously, the connection might not get broken. If you need to use the retry capabilities of the plug-in, either set the ServerIOTimeout property to a non-zero value, use the operating system settings, or use some other non-plug-in mechanism to ensure that you can guarantee that a connection is broken at some predetermined time.
With Version 7.0, the default value of the ServerIOTimeout property is 60 seconds. This might not be the ideal setting for applications running intensive queries or other nontrivial functions. The ServerIOTimeout value will need to be adjusted if you expect requests to take longer than 60 seconds to be served.
The ServerIOTimeout setting is based on how long the server takes to handle a request; not on a particular URI or application. Therefore, when specifying a value for this property, you should allow for the slowest, longest request time, and then add a little more time to handle peak operation situations.
When a request fails because the length of time it took to get a response exceeded the value specified for the ServerIOTimeout property, the server is NOT marked as down if the ServerIOTimeout property is set to a value >= 0. This is an important fact to keep in mind because this fact means that other requests will continue to be sent to this server. If you have affinity defined and this server is selected based on the request’s previous session data, the same server will be selected if the request is retried because the server will still be available and matches the incoming affinity requirements. The actual number of times that the plug-in will attempt the request to the same server depends on the number of servers defined in the cluster. If the server is not healthy, sending requests back to the same server is not likely to result in a good response and might exasperate a potential performance problem.
Starting with Version 184.108.40.206, 220.127.116.11 and 18.104.22.168, if you do NOT want the same server selected for affinity requests that fail because of response time-outs or you would like to eliminate traffic from this server because it is not responding within your predefined response criteria, you should specify a negative value for the ServerIOTimeout property. When the value for the ServerIOTimeout property is negative, the plug-in marks the server down when a response time-out occurs. When the plug-in marks the server down, requests will not be sent to that server until the interval specified for the plug-in RetryInterval property expires. If there is only one server in the cluster, it is never marked down regardless of the plug-in properties.
Setting the ServerIOTimeout property to a negative value might have bad consequences if you set the time-out value too low. If you have a long running request that takes longer than the length of the time specified for the ServerIOTimeout property and that request is retried to other servers, the plug-in might mark every server in the cluster down as the request is retried to each server. Therefore, if you specify a negative value for the ServerIOTimeout property, you should ensure that the value specified for the RetryInterval property is within the range such that:
The lowest value for the range is 1 but this means that the server will only be guaranteed a second to try to recover so the minimum value should be a value that is reasonable for the server to recover.
The highest value for the range is 1 less than the result of multiplying the absolute value of the setting for the ServerIOTimeout property by one less than the number of servers in the cluster. ((absolute value of the ServerIOTimeout * (number of servers in cluster -1)) – 1
For example, if you set the ServerIOTimeout property to -5, and you have 3 servers in the cluster, the value specified for the RetryInterval property should be between 1 and 9. Specifying such a value guarantees that all of the servers are never marked down because of an unexpected intensive request.
If your applications are designed such that a single request can have significant impact on the application server, such as an extensive query which might cause data locking until the query is complete, you should use the ServerIOTimeoutRetry property to a prevent retries. By default, this value is set to -1 which means to retry all the members of the cluster; if this value is set to 0; no retries will occur once data has been sent to the server.
It is recommended to set the ServerIOTimeoutRetry value such that when combined with the ServerIOTimeout value, the product is the maximum time a client is expected to wait for a response. For example, assume that users are willing to wait to up to 5 minutes for a response and your serveritoimeout value is 60seconds, you would want to set your ServerIOTimeoutRetry value to 5 or less (assuming there are at least 5 members in the cluster).
The ServerIOTimeoutRetry property was introduced with apar PM70559.
The following example illustrates a non-recommended configuration and the potential problem that could arise :
There are 3 servers in the cluster.
The RetryInterval property is set to the default value of 60 seconds.
The ServerIOTimeout property is set to -5 seconds.
An request is made that does not get a response until after 10 seconds elapse.
Assume t is the time the original request is received.
The plug-in sends the request to server 1 and waits 5 seconds for a response. No response is received so server1 is marked down at the time the request was received plus 5 seconds (t + 5).
The request is now sent to server 2, it also fails to receive a response within 5 seconds so the plug-in marks server 2 down at the time the original request was received plus 10 seconds (t + 10).
The request is now sent to server 3 and it fails to get a response so it is marked down at the time the original request was received plus 15 seconds (t + 15).
Because all of the servers are now marked down, all requests will fail until server 1 is retried. Server 1 will be retried at the time the original request was received plus 5 seconds (ServerIOTimeout) plus 60 seconds which is the RetryInterval value (t + 5 +60). Server 2 will be marked as up 5 seconds (t + 10 + 60) later and server 3 will be marked as up 10 seconds (t + 15 + 60) after server 1 is marked as up. There will be 50 seconds where all the servers are marked as down and all requests would fail((t + 5+ 60) – (t + 15)) — server 1 time marked up minus server 3’s time marked down.
The following example illustrates the recommended configuration for the preceding scenario :
There are 3 servers in the cluster.
The RetryInterval property should be set within the range of 1 to 9.
( (number of servers -1 ) x (absolute value of ServerIOTimeout) ) – 1 = (( 3 – 1) * 5) – 1 = 10 – 1 = 9
The ServerIOTimeout property is set to -5 seconds. NOTE: This value is only being used as part of this example. It is not meant to imply that you should specify a timeout value of -5 seconds in all situations. For example this value is not appropriate for the ServerIOTimeout property if you know that some responses will take 10 seconds.
A request is made that does not get a response before ServerIOTimeout pops on server 1.
Assume t is the time the original request is received.
The plug-in sends the request to server 1 and waits 5 seconds for a response. No response is received so server 1 is marked down at the time the request was received plus 5 seconds (t + 5).
The request is now sent to server 2. If the RetryInterval property is set to the minimum recommended value of 1, server 1 will be marked up 1 second after server 2 has received the request (t + ServerIOTimeout + RetryInterval = t + 5 + 1 = t + 6). Server 2 does not get a response and is marked down at the time the original request was received plus 10 seconds (t + ServerIOTimeout to server 1 + ServerIOTimeout to server 2 = t + 10).
The request is sent to server 3. If the RetryInterval property is set to the maximum recommended value of 9, server 1 will be marked up (t+ ServerIOTimeout + retryInterval = t + 5 + 9 = t+14), 1 second before server 3 is marked down (t + ServerIOTimeout to server 1 + ServerIOTimeout to server2 + ServerIOTimeout to server 3 = t +5 + 5+ 5 = t +15) because it fails to receive a response within the ServerIOTimeout property value.
There will never be a time where all servers are marked down because of an unexpected long running request.