The Overlooked Key Skill For DevOps and Site Reliability Engineers

Team Teridion

I read an excellent article in CIO magazine about the role of a DevOps engineer versus that of a site reliability engineer.  The opinion piece, SRE vs DevOps: What’s the difference?, compared and contrasted the functions of  the two roles and how they each benefit an organization. What struck me more than anything is that these roles are very application and infrastructure centric.  It’s all about code, processes, and operational efficiency and agility. Stated another way, they are often portrayed as software centric roles.

And why shouldn’t they be?  In the world of cloud and SaaS, software is king, right?  Maybe. Maybe not. A better metaphor may be that software is a kingdom.  In order for the Software Kingdom to survive and prosper, it must maintain good relations with its neighbor, the Networking Kingdom.

The advent of Cloud and SaaS has had a massive impact on businesses of all sizes.  Just five years ago you would have expected to see all business critical application running inside of the typical enterprise’s data center.  Today, though, most business critical applications are running in the cloud. You won’t find a single server within the walls of my company’s corporate office, and we aren’t uncommon.

 

The cloud is the new data center, so the Internet is the new LAN

The Internet is the new LAN. Think about that for a moment.  For decades, enterprises have heavily invested in their networks to ensure reliable and fast access to applications and data.  Millions of dollars have been poured by the enterprise into building robust and highly manageable LAN, WAN, and data center infrastructure, all to make sure their users could efficiently get their jobs done.  By moving applications to the cloud, they are throwing the hot potato of responsibility for performance, visibility, and control at a variety of SaaS providers, Internet providers, and the Internet as a whole.

But are they prepared to catch it? SaaS providers, of course, have a vested interest in assuring the good performance of their applications for their customers.  They employ CDNs to accelerate static content from their websites. They employ minification and code optimization to make their web apps as efficient as possible.   They shard their applications across multiple points of presence to try and improve performance by moving applications closer to their users worldwide. They compete for the best DevOps and site reliability engineer talent.

This isn’t enough, though. None of it addresses the core performance issues presented by the Internet.  For the same reason enterprises have always updated and enhanced their LANs and WANs, they should be just as concerned today about Internet performance.  If the enterprise is concerned about Internet performance, that means the SaaS vendor has to be. At that point, as they are responsible for delivering a turnkey service, the SaaS provider is the one that winds up with the performance hot potato scalding their hands We all take for granted that the Internet will just work.  The reality is that it’s a miracle it works at all.

The Internet is designed for resiliency, not for performance.  The routing protocol of the Internet, BGP, does not factor link nor autonomous system performance.  Subtle yet impactful design flaws in BGP allow for routing loops that lead to packet loss. Least cost routing and low cost interconnects provide no QoS guarantees.  Worse yet, these issues all occur in the middle mile, the heart of the Internet, inside the networks of carriers and ISPs, as well as cloud providers, away from last mile connections and links.  This means neither SaaS vendors nor enterprises have a toll free number to call and complain to.

And so the hot potato  lands squarely in the lap of the SaaS provider, and more specifically a DevOps team or site reliability engineer at that SaaS provider.  These teams must be networking literate because there are networking oriented solutions to address the challenge of Internet performance.

A great start is diving into the excellent book High Performance Browser Networking by Ilya Grigorik.  It’s a great primer on TCP, UDP, and HTTPS, and how Internet performance is adversely affected by these protocols.  This book provides insights into how networking impacts web applications, and while it doesn’t directly address how to deal with Internet performance issues, it builds the foundational knowledge required to dig deeper into Internet and SaaS performance optimization.

The Perils of Ignorance

In parallel to widespread deployment of SaaS applications, the enterprise is revamping its WAN.  Both transitions are hallmarks of most enterprise digital transformation initiatives. SD-WAN is the hot new trend in the enterprise WAN, and some vendors are making bold claims about how SD-WAN can improve Internet performance for SaaS applications.  Cloud platform vendors offer direct connection solutions, and transit providers will claim bold performance numbers.

A site reliability engineer or DevOps exec should have a solid networking skill set to dig into the vendor claims and discern what is real from what is marketecture.  Different SaaS applications have different networking footprints and varying degrees of Internet performance requirements.

Hire Well

Networking is a skill set that needs to be included in any well rounded DevOps or site reliability engineer team.  As SaaS applications become business critical for enterprise customers, enterprises will look for a throat to choke when performance issues start impacting their user productivity.  Internal IT teams will guarantee performance within the four walls of the enterprise, but the unfortunate reality is that SaaS vendors will bear the responsibility for Internet performance as it related to their applications.

What we do at Teridion is bring performance, reliability, visibility and control to the Internet backbone connection between the SaaS provider and the enterprise. We cool off the hot potato that’s landed in the lap of the site reliability engineer by deploying thousands of sensors across the public cloud. Then we use machine learning to take the real-time information about the Internet that we receive from those sensors to predictively, dynamically route traffic from provider to user through the highest-performing path, no matter where in the world the user is. We’re a turnkey service, so we’re DevOps friendly, and in fact can often reduce or eliminate the need to shard the application altogether, which is even friendlier. If you’re a SaaS provider, or an enterprise that needs better worldwide performance from your SaaS apps, you should take a closer look at us. A good place to start is our in-depth whitepaper on the performance problems of the Internet Backbone, or dive into how SaaS performance can impact your ability to retain customers. If you’re ready to try us out, it’s really simple to sign up for a free trial.

Share:

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp

Send Us A Message

    Interested in (please select all that apply):

    Do you require:



    More Posts

    Book a Demo

      Interested in (please select all that apply):

      Do you require:



      Interested in (please select all that apply):
      Do you require: