Getting a Handle on IT Complexity: Succeeding in a Performance-Obsessed Culture

Getting a Handle on IT Complexity: Succeeding in a Performance-Obsessed Culture

Google, Facebook and Apple have collectively created an insatiable consumer desire for exceptionally high-performing (fast, reliable) digital experiences on websites, mobile sites, and mobile apps. This past February, Google launched their Accelerated Mobile Pages (AMP) project, designed to make news publishers’ mobile sites load up to 85 percent faster than standard mobile web pages (Facebook Instant News and Apple News were similar initiatives). Google AMP has quickly gained steam, with non-publishing companies following suit and adopting the standard. WordPress reports they have AMP’d up tens of millions of websites, while eBay notes that 15 million product category pages are currently AMP’d.


These tech giants have made web downloads virtually instantaneous and consumer expectations have adjusted accordingly. Negative attitudes towards poor online performance were reflected in a recent Harris poll of frequent online shoppers; more than 75 percent of respondents noted that if they encounter a poorly performing website or mobile site, they will not attempt to shop there again. For the first time, it also became clear that the impact of poor online performance can be felt far beyond the digital realm, with more than half of respondents saying a poor online experience would dissuade them from going to that same retailer’s brick-and-mortar store.


Companies outside of the Facebook/Apple/Google echelon are enlisting help and building elaborate ecosystems to achieve the seemingly impossible – offering feature-rich websites combined with exemplary speed and reliability. But the challenge is that each and every external third party they enlist in this effort (e.g. feature tags, cloud service providers, and content delivery networks) introduces more complexity and another potential point of failure, while making it extremely difficult to identify and fix the source of performance problems when they occur. This can lead to costly delays that negatively impact customer satisfaction and revenues. For example:


External Third-Party Feature Tags

Third-party tags are a great way to add needed functionality and nice-to-have features to a site, quickly and cost-effectively. Companies use them for various purposes, from marketing analytics to social media plug-ins to video streaming. However, these services are often the source of performance problems that drag down an entire page – especially during peak traffic periods, when the sites they serve are under heavy load.

During the recent Black Friday/Cyber Monday holiday ecommerce period, we saw several examples of misbehaving third-party services causing problems for sites incorporating them. On Black Friday, a Pinterest problem caused issues for several leading retail ecommerce sites, while Williams-Sonoma had ongoing performance issues as a result of problems with Photorank, a customer photo display service that integrates within a retailer’s website and is designed to help drive conversions. Both the retailer’s desktop and mobile sites had load time spikes around 25 seconds – slow by any standards.

Walmart experienced several issues throughout the long weekend, including spotty desktop web performance starting in Phoenix then extending to New York City and Denver, all resulting directly from an issue with a Rubicon Project tag. The desktop sites of Lenovo, QVC and Newegg also experienced issues directly attributable to third-party tags on Cyber Monday, with Lenovo’s webpage load time spiking to more than 18 seconds in the middle of the day.

When it comes to third-party tags, companies must remember that less is often more. We’ve been trained to think that more functionality is better, but this is not the case, especially when a third-party service is not essential to your site. What good is it to have the latest social media plug-in if it is going to slow your site way down and ultimately drive users away? This is especially true in mobile. Companies erroneously believe that mobile users want the same feature-richness that characterizes desktop web experiences. They don’t – more often than not they just want to get in and out of a mobile site as quickly and conveniently as possible.


Cloud Service Providers

The cloud can be an invaluable resource, providing instant scalability to businesses. But no matter how reputable a cloud service provider may be, they are not immune to completely unexpected events, or capricious acts of nature. Consider the Dyn DDoS attack from October – while this brought down many websites, it also heavily impacted businesses relying on SaaS-based applications.


Even in the absence of a major outage or event, one negative drawback of the cloud is that it tends to offer limited visibility to the businesses relying on it, in terms of the overall health of its infrastructure operations. Contrary to popular belief, the cloud is not an infinite resource, and if one or more of your neighbors in the cloud experiences a traffic spike during a peak period, there’s no guarantee that won’t impact the speed and availability of your services to end users.


Any organization that uses the cloud – either for infrastructure support, or SaaS-based applications – must monitor the performance of their cloud service providers themselves, from the closest possible vantage points to the cloud datacenters delivering the services. Even if these measurements show that a cloud service provider is delivering fast, reliable performance, this is not a guarantee that customers are having strong experiences, because numerous other performance-impacting factors (e.g. CDNs, ISPs, and browsers) can influence performance. But at a bare minimum, measurements taken at the point of delivery (the cloud infrastructure) can provide a clean, pure view of cloud service provider performance, and help determine whether or not the cloud is the source of end user performance problems.


If the Dyn outage showed us anything, it is the importance of having contingency plans in place whenever you are using a multi-tenant service provider for any mission-critical service. Organizations that rely on only one cloud service provider are putting themselves at significant risk.


Content Delivery Networks (CDNs)

CDNs also represent an excellent option for companies looking to deliver strong end-user performance in more remote regions of the world, but who may lack datacenters. End-user performance levels tend to degrade the further customers are away from the datacenter. By getting content closer to geographically distributed end users, CDNs can overcome this expanse and dramatically speed content delivery.


However, like the cloud, even CDNs with a history of excellent performance can present a performance risk. Major CDN outages are infrequent, but micro-outages – where sites become available or slow way down within certain geographies for a shorter period of time – are more common. Peering issues between CDNs and ISPs are a frequent cause of micro-outages. In addition, within a specific region or country like the United States, one CDN can be much better on a specific ISP than another.


Because end-user experiences through CDNs can vary so much based on intermediary ISPs, CDN performance measurements need to be a two-pronged strategy. Companies using these services should measure their performance from a vantage point close to their datacenters (as is recommended for cloud datacenters), to gain a true view of their performance levels minus any ISP noise. But companies also need to keep close tabs on performance levels for end users in various geographies being served by a CDN. This gives organizations a chance to switch CDNs quickly, if it is found that a particular CDN is not working well in a high-priority end-user geography.



External third-party tags, the cloud and CDNs can be very helpful to organizations looking to add features and enhance performance. But no matter how reputable a third-party resource may be, using them does not absolve you from the responsibility of providing a stellar digital experience for your customers.


This means organizations must thoroughly vet these service options before implementation, and carefully monitor them on an ongoing basis, especially during peak traffic periods. At any point in time you must be able to pinpoint how these external services are performing and understand how it impacts the overall customer experience. Contingency plans need to be in place, and enacted when necessary.


This “layered” approach to performance management is critical for companies who want to leverage every opportunity to keep up with the “big guys,” while minimizing unnecessary risk exposure. Sure, the unexpected can always happen. But more often than not, performance-impacting factors are within our control, and customers’ willingness to hear and tolerate excuses is quickly evaporating.

Getting a Handle on IT Complexity: Succeeding in a Performance-Obsessed Culture


By Mehdi Daoudi, CEO, Catchpoint Systems

This post is part of our contributor series. It is written and published independently of TNW.

Read next: Online Gaming Is More Popular Now in Day By Day