As much as we’d love to believe that websites can manage themselves through high traffic periods, a website’s stability can be compromised.
Here is a list of three tips for keeping websites running smoothly throughout testing periods, so that any glitches or potential issues can be avoided.
#1: Test the impact of changes to your website from the end-user’s perspective before the big event
• Make sure you have alerts enabled on all related devices first, then tune your alerts down as you determine what are typical precursor events to major problems. Get a baseline of normal performance, so you know when a change to your website has a performance impact.
• Monitor your app from multiple locations. If it’s not up everywhere, it’s not really “up.”
• Monitor the individual steps of key transactions to identify where problems occur (DNS v page load time, etc.).
Your webserver may be sitting there at 10 percent CPU during 30 page loads because it’s waiting on the database to complete a login or shopping cart step. Without details you might assume it was a web server problem.
#2: Ensure adequate capacity for influx of user requests
• Virtualize everything you can. Adding servers to a CoLo rack is never an option during an unexpected spike. Cloning VMs however is relatively easy.
• Use your monitoring systems’ historical data capture and visualization features to do 95th percentile load planning, taking into account daily, weekly and seasonal trends and plan accordingly. As mentioned in tip #1, you need to do this for each tier of the application (web server, application server, database server, etc.).
• Don’t believe everything your traffic driver team forecasts, but do go talk to them. Does anyone really like talking to the marketing team? No. Can they tip you off to an extraordinary promotion that’s likely to overwhelm your servers? Yes.
#3: Monitor availability and performance of the supporting infrastructure
• Monitor everything you can, even adjacent components on the periphery of your application. Often your app shares resources with other applications. Don’t assume all other apps play well with others. Monitor shared storage, visualization infrastructure, database, rack, core and firewall networking components and the WAN links to the outside world.
• Be prepared to push your IT or hosting provider for details to force troubleshooting (“{component x} is maxed out” is not an acceptable answer). For bandwidth, what’s the traffic mix? Is my app really filling the pipe or is there non-essential traffic in the way? For hardware resources, what are the details of service restarts? Where can you examine detailed event logging? If it’s not tracked, why not?
• Ensure you have a maintenance window communication plan in place, and understand the interdependence of the components of your application. How many times has what appeared to be a minor and unrelated system update unexpectedly affected your production application? Group logically connected components in your maps and reporting in your monitoring solution.
This article was contributed by Jennifer Kuvlesky, Systems Management Product Marketing Manager for SolarWinds