Building the SD-WAN Business Case (Nemertes Research)
SD-WAN’s transformational potential is not enough. IT teams have to build a compelling business case for making the transition. Savings from MPLS is not the only avenue by which SD-WAN can drive ROI. By providing cheaper and more transparent and automatic failover when WAN links fail, SD-WAN can reduce branch WAN outages and troubleshooting costs by 90%.
By working with multiple vendors and enterprises, Nemertes has developed and validated an SD-WAN cost model that enables enterprise users to build that business case base on multiple key factors.
Join Principal Research Analyst and CIO John Burke of Nemertes Research as he presents his latest research findings. These include:
- The cost-model applied to three real SD-WAN use cases
- The process for identifying locations that could benefit from higher bandwidth, lower rates, increased reliability, or all three
- The cost components of connectivity, capex, troubleshooting and more.
- The comparison between Overlay SD-WAN and In-Net WAN
As research analyst and consultant, John draws on his past experience as a practitioner and director of IT to better understand the needs of IT executives and the challenges facing vendors trying to sell to them. His experience has given him broad and deep knowledge in computing, communications, and IT management.
Analyst Keynote- Building the SD-WAN Business Case (Nemertes Research)
Lloyd: Hello, everybody. Thank you for joining today’s FutureWAN session on Building the Better SD-WAN Business Case. So today we have John Burke, the CIO and Principal Research Analyst from Nemertes Research, who’s going to essentially walk through exponential analysis, ROI analysis, and build the business case of SD-WAN, and also essentially get into two or three use cases of SD-WAN. Now, before we start off, before I turn it over to John, there are two or three things I’d like to mention. One is the all the attachments from this session – the presentation, research paper – are available for download, and we will point you to those resources as we go through. But they’re also available as attachment and link in this session. Next, if you have any questions, feel free to keep asking them in the Q&A chat box that you have there, and we’ll prioritize and ask those questions towards the end.
And lastly we are live tweeting this session, so anybody that has a comment or wishes to share a comment or a screen grab from the session, please do so with #futurewan. And at the end of the day, we are giving an Amazon Echo to either the most active tweeter or the most interesting tweets. And with that I want to turn it over to John. John?
John: Thanks very much, Lloyd. Pleasure to be here and to get a chance to talk about the SD-WAN Business Case. It’s a great one. It’s one of those nice opportunities that comes up in technology where you get to talk faster and better and cheaper, all at the same time. I’ll start by introducing who Nemertes is, who I am, and where the data that we’re going to talk about comes from that’s threaded through the presentation. We’ll talk about the current dominant paradigm for Wide Area Networking and how it falls short for too many people. We’ll talk about how SD-WAN steps in to offer ways to change the dominant paradigm and do WAN in a different way and how SD-WAN is being adopted in the enterprise currently.
We’ll follow that up with both aspects of the business case for SD-WAN, talking about things that go beyond hard dollars savings, and also talking about the potential hard dollar savings and their order of magnitude, if you will. We’ll wrap that up with some recommendations. So Nemertes Research is a research and strategic consulting firm. And we are focused on analyzing the business impact that businesses experience when they deploy emerging technologies. So we’re focused on things that are relatively new in the marketplace, and we’re focused on seeing how they make it possible for the business to do things it hasn’t done become or become more efficient or more effective in doing the things it is already doing.
Our primary mode of research is to speak to IT practitioners who are actually deploying these technologies in their data centers on their networks, in their environments, and ask them a lot of really nosy questions about how they have deployed a technology, what road bumps they’ve hit along the way, and how they’ve gotten around those obstacles to realize value and to try and quantify that value. Myself, I started out as one of the guys on the other end of the phone line that Nemertes would ask the nosy questions of, and jumped ship and joined Nemertes around 2005. And since then I’ve been really focused on data center and Wide Area Networking issues.
And one of the things that we began to cover actually even before I came on board in 2005 was the enterprise WAN and its evolution, which in 2005 was well on the way to what it has become today which is a three-tiered architecture that uses MPLS links as the primary, the backbone connectivity that links data centers and branches to each other and branches directly to each other, as well. And what we have seen over the course of the ensuing 10 years or so is that in most cases a branch will have a single MPLS link going into it and some backup connectivity in the form of an Internet VPN, an IP VPN that is only activated when the MPLS link goes down. For smaller branches, for lower risk branches, or for branches that are in locations where it’s ineffective from a cost perspective to get MPLS connectivity in, they’ll only have that IP VPN in place, and typically without any kind of redundancy behind it.
So if the Internet link goes down, they have no link back to the enterprise WAN. So with most branches in this situation, when there’s a connectivity issue and the MPLS link goes bad – either completely off or goes into some kind of brown out condition where performance is significantly degraded without ever actually stopping passing packets, you’re in the position where you need to fail connectivity over to that redundant link, that dark Internet link. And in most cases, people tell us when we talk to them on the phone that failover is not fast enough to maintain any traffic flows that are in progress. All the sessions that are going on across the WAN at that point die. VDI sessions, enterprise application sessions – any kind of really communications that are going on, including especially recently voice calls and video conferencing or web conferencing traffic – it all dies and has to be reinitiated.
So even when that failover connectivity is there, there’s a significant interruption of work whenever there’s a problem with connectivity. In some cases, that can last for several minutes if the failover is not completely automated. At the same time, this whole architecture is predicated on the idea that all of the network traffic in the branch wants to go back to the data center. It’s harkening back to a much older vision of how we provide services. And in the meantime, we’ve embraced fast to the tune of 99 percent of companies now using it for at least one application. We’ve embraced platform as a service offering, things like Amazon Web Services and Force.com. We’ve embraced Infrastructure as a Service, Azure, Amazon, Google, you name it, we’re running servers out there and directing our users out to things that are happening there.
We’ve introduced IoT devices to the environment. We’ve continued to ramp up our support for end-users’ mobile devices, increasing the amount of traffic that’s going outward to the Internet, but we’re still pushing all of that traffic, even the stuff that’s bound out to the larger Internet back through our data centers. And so we’re ramping up the demands on our WAN links and we’re consuming that precious and typically very expensive MPLS bandwidth for pretty much everything that goes on. So that costliest form of connectivity forming the backbone of our systems, it’s also the connectivity that has the longest lead times. If you want to deploy services to a new location, you might have to ask 30 days in advance, 60 days in advance, 90, 120 days in advance in order to get services, MPLS services, to that new location and to begin taking advantage of them.
We’re still in that situation too where any kind of problem on the primary link equals an outage for our end-users, an interruption to their work, a decrease in the value that they’re able to provide to the clients or the company. And actually when service comes back on that primary link, you have to fail back. And that’s another interruption. That’s another event that kills all the sessions in progress and people have to restart what they’re doing. Or you as the IT staff have to do this after hours, which you know kind of sucks for you. And on the category of things that aren’t good about this whole scenario is the idea that you’ve got that failover bandwidth in place, but you’re only using it when there’s an emergency. You’re paying for it and not using it. That’s not ideal for anybody.
Endemic to these environments is the idea that every router in the network is being managed individually. And so we say, you know, routers are being treated like roses. They’re being pampered as individuals when the enterprise really needs them to be more like corn – something you have a crop of and you treat all the same. Because when every router is unique and has to be treated individually, it’s challenging to make changes quickly to the service profile, to the set of services that you want to provide, to the performance characteristics that you want to aim at specific services and to the security that you’ve got wrapped around any specific services. If you need to do things like optimization and prioritization of traffic, you’re adding more appliances in the office, the remote office typically.
And as noted, all of your traffic, even if it’s ultimately bound out to the larger Internet, your own infrastructure in Amazon or your trusted strategic partners like Office 365, it’s going back through your data center, chewing up bandwidth, coming and going and adding latency coming and going to the transactions that your users are relying on to get things done. So SD-WAN arose from this set of problems that people were experiencing with their WANs. It started with the idea that most of these problems could be solved as a group by taking a different approach to Wide Area Networking, starting with if the link is available in the branch, it should be used – used continuously, not just in the case of emergencies. So make use of all the links that you have available to carry all the traffic that you’ve got and provide for transparent and mutual failover. So if either link starts to misbehave, traffic will gracefully flow onto the well-behaved link.
Prioritization and stuff will make sure that things that need the performance priority get the performance priority. End-users won’t notice that there’s been perhaps a complete failure of one link or the other. They may notice a slowdown, but services will continue, sessions will stay up. And in many cases what we’re told is they don’t even notice that something has happened. A nice side effect here and one that becomes the lynch pin of any business cases is that you can bolster MPLS as the core of your network with Internet links, or even replace MPLS in some locations with Internet links. And instead of having an MPLS link and an Internet link in a given branch have two Internet links from different providers to get the same kind of resilience.
So that’s the beginning. Use all the links that you have available and use them intelligently. Use them flexibly. That’s the next stage is to not just use those links, but to pay attention to how performance is across each of those links and route traffic so that it gets the best performance or at least the level of performance that it needs. Take advantage of the fact that you’ve got multiple links to load balance across them, to shift low demand kinds of traffic like file transfers off of the MPLS link, reserving that for live phone calls or things that have a need for the higher degree of consistency and packet delivery, for example, that you get over the MPLS link. With that in mind, now break the link between architecture and connectivity. That is take that whole pool of connectivity that’s available and overlay on that the WANs that you want and need. As many as you need, dedicated to whatever purposes you need.
Overlay virtual LANs on the pool of bandwidth that’s available. And then mix and match the typologies of each of virtual WANs to meet the needs of the services that you’re providing. If it’s a service that is going directly – or you want, anyway, to have go directly from the branch office out to a trusted SaaS provider – Office 365, Salesforce, whoever – you can set up a policy that says that traffic gets routed directly out to the Internet. If it’s traffic that is going back and forth amongst your employees and requires for compliance purposes that it goes through some kind of archiving appliances which you have in your data centers, you can define the virtual WAN for that traffic class to carry all the traffic back to the data center for archiving before conveying it out to the other of the conversation.
You can mix and match the typologies based on performance requirements, security or compliance requirements, or any other factor that you need to take account as you’re designing your ideal WAN. The point is though that you’re doing it all in one place. You’re defining policies at the center of the network that are then applied to the WAN as a whole. You get out of that prize rose model of managing routers in each location and instead manage the WAN. And those centralized policies, as I say, can be pegged to any of the characteristics that you need to control, whether it’s performance – which things gets prioritized over which other things, or which kinds of traffic get additional performance enhancements or protections applied to them.
Do they get passed through some kind of WAN optimization appliance, if you still have one in place? Do they get forward error correction applied to them so that there’s never a need for a packet to be retransmitted? Do they get some kind of compression applied to them, as well as prioritization? All of these become possible in a centralized and policy-driven environment much more flexibly and easily. If you want to add a new application which has its own specific performance profile requirements, it’s easy to do that without disturbing what’s going on for the rest of the existing services already and without having to worry that individual devices out there will be misconfigured as you roll out this new service. Likewise on the security fronts – you get past the worry, continual worry for a lot of organizations, that one or more of the routers is out and their environments are misconfigured, and this may leave them open to attacks that they, you know, really need to be protected against.
It also gets you out of the position that I would say the majority of the WANs that we have heard about in our research are in, which is that routers out in remote locations are not up to date on their operating systems, they’re not fully patched for security or function, and tend to be anywhere from a year to six years out of date on the operating systems that they’re running because, as prize roses, people don’t want to mess with them continually. They want to get them working and then nurture them in place and protect them from disruption. When you get into a centralized environment like this, in an SD-WAN environment, those kinds of updates become trivial, they become non-interrupting events – you don’t even have to turn the darn things off, in most cases; you can update in function and continue to offer services while taking advantage of the latest versions of operating systems to get the highest level of security patching and decrease your risk envelope as a result.
With all of these potential benefits in mind, it’s no surprise then that we’re seeing fairly rapid uptake on SD-WAN in the enterprise. Although the technology as a class is only a few years old, we’re already seeing almost 20 percent of the organizations that we benchmarked deploying it in their environments, in production. If they all followed through on their plans – we collected this data in the spring. If they all followed through on their plans, then another nine percent were planning on deploying yet during 2016, so the actual number should be north of 25 percent now. If you look just at the folks in the research who are among the most successful companies, they had the most successful level of maturity with their cloud and data center operations, adoption there was about twice as high as among the overall population of companies participating. And they saw some very significant reductions in trouble tickets, very significant reductions in outages in the locations in which they had deployed the solution, and we’ll talk about those a little more in a moment.
Because they go into building the business case just as much as cost savings do. And certainly any business case you’re going to build around deploying this technology is going to have to lead with some kind of cost-based argument. It might be very short-term, it might be, you know, over three or five years, but the lower costs for connectivity and really the lower costs for management and troubleshooting have to be, you know, the front foot in the argument. They’re not the only thing, though. You’re going to lean on other values, as well. You’re going to lean especially on improved agility – the ability of the business to change what it’s doing with its Wide Area Network more quickly and more easily. If the network staff is reluctant to add new services to the WAN because, you know, your jostling the buzz saw while it’s cutting wood, this gets you past that.
It makes it easier to introduce new services non-disruptively, and that has value. Moreover, because you can take essentially any kind of connectivity that’s available in a location and fold it into the SD-WAN, you can bring new sites up onto your company’s WAN with much shorter lead times. You may only need a day or two to plug in a 4G modem and get it up and running, or to plug in a cable modem or DSL modem and get a site up and online before the MPLS link has ever been pulled to it, if you’re intent on staying with MPLS as your network core. You can also make arguments – and in some companies hard dollars for this are fairly easy to accrue – on reducing the number of outages that affect end-users in the organization. If you’re a company that does any kind of online retail sales, if your call center experiences performances outages or performance degradations even, you can usually put a hard dollar figure on sales lost as a result of that.
If you are in a company that processes things for other companies – medical transcriptions, whatever – again, if there’s a reduction in performance for the users in your branches, you can put a hard dollar number on the effect of eliminating those outages or reducing the links of any kind of performance degradation. And actually that idea of putting dollar figures on things plays back into the agility argument, as well. Speed has dollar value in organizations whose locations generate revenue directly. If it’s a retail outlet, if it is a professional services location for something like the Tax Advisory Services, you can put a dollar number on how much more revenue the company gets if that location is up and running on a day-by-day, sometimes even an hour-by-hour basis.
So the business case is going to be multifaceted, but it’s going to start with cost. And so one of the things that we’ve done at Nemertes based on the research that we did last spring on cloud and data center usages and based on some other projects that we’ve done researching actual WAN costs and some projects that we’ve done helping people deploy SD-WANs – we built a cost-modeling tool that allows people who are using it to help decide how they want to change the profile of connectivity that they’re going to use in their branch offices in the future, and based on the number of nodes in their network and how many of them they’re going to convert to SD-WAN, and whether they’re going to have, you know, direct Internet access at those branches, and a variety of other factors to model the cost benefits of switching from a traditional MPLS backbone environment to something that’s built around SD-WAN and simultaneous use of other connectivity, as well.
So in the test case that we have here, on the slide, you can see that we’re modeling the situation in which they’re maintaining their current connectivity profile for the most part, with the exception of the 30 percent of sites that have the lowest bandwidth links currently. They’ve got 5 MB links on MPLS currently, and a 10 MB backup link on business Internet. We’re modeling the situation in which they shift their backup to commodity, consumer grade Internet, and their primary link becomes the business Internet. But of course, because it’s an SD-WAN situation, primary and backup become kind of moot because they’re both in use at once. But even in this situation, just looking at that minor shift in how they’re using connectivity for their lowest-demand branches and just 30 percent of their sites, that shift alone resulted in significant savings on an annual basis while at the same time doubling the available capacity of the network in their locations because they’re making active use of what were previously just backup links in all their locations.
They’re also getting resilience in a lot of locations that they didn’t have before in terms of transparent failover from one link to another whenever there’s a problem, and fail back whenever the link comes back up and starts performing properly again. So this is a fairly typical scenario in that it is very conservative. They’re not dropping MPLS in their network as a whole. It’s still the backbone for most of their locations, and they’ve only augmented it with Internet capacity. Looking at a client example – so we actually worked through a selection process with one of our clients, and their network was not much larger than the one we were just looking at. It’s 200 sites they, like most of the people we talk to, really 78 percent, said they were not going to drop MPLS; they want to keep it as the core of their network, but they want to save money. They’re looking at prohibitively expensive growth curves as they pause it, sticking with MPLS, doing things the way they’re doing them now, and also allocating enormously more bandwidth to all of their branch offices to cope with a plethora of new, real-time communications and collaboration tools and web-based applications.
So shifting their growth, their bandwidth growth off to the Internet allowed them to save by the year 2019, $4.5, almost $5 million a year just by bending the cost curve, by applying Internet in conjunction with MPLS. They didn’t even drop it in any of their locations. They truly did just shift all of their growth off of it and onto the Internet. So for them, you know, doing this math justified the entire project, and every other benefit that they expected to get, including eliminating WAN outages at a substantial number of their branches, became icing on the cake. This was the core of their case.
So when we talk with folks in any context, basically, we say it’s time to explore the options that are available to them in SD-WAN, both in looking for hard dollar savings that they can bring, if not in the first year – although many people can realize savings in the first year – then certainly over a fairly short timeframe – two, three, five years. We’ve never seen anybody who’s break even exceeded three years, actually, for any plan of shifting connectivity that they were willing to even consider. So everybody should be exploring this. It may not be the right answer for everybody, but everybody now needs to be exploring this because the potential benefits are so great that it is irresponsible not to. There are many options available, and it is important to look at more than one and to plan beyond evaluation to actually do proof of concept deployments with more than one.
And then, as you look at what you’re going to do – assuming that you’re going to deploy, shape your strategy based on what the business intends to do with its Wide Area Network, what it intends to bring up in terms of business services or drive in terms of business practices. Keeping in mind things like unified communications and a cloud both from a fast and an IaaS perspective – how much traffic are you sending out to the Internet directly and how much could you? Building the value, the benefit that you can derive by being able to add branches and lose branches and move branches and modify the service envelope quickly and easily and safely. Getting out of the way of, again, business strategies that revolve around maybe getting closer to customers, maybe staying closer to customers as your customer base shifts. Keep in mind always the options that are available to you for connectivity in your actual business locations and don’t ignore the cost that comes with adding connectivity providers.
Certainly, adding one or two or three is typically something any business can manage, but if you’re starting to talk about adding two or three dozen, you need to pay close attention to how much that relationship adds to your institutional load and whether or not some kind of aggregator would serve you better. Do benchmarking to see how much you’re currently expanding in terms of troubleshooting costs and WAN management costs. How much of your staff time is actually going into these activities and how much benefit are you deriving from that in terms of business flexibility and agility? And lastly, quantify whenever possible both the topline benefits you might generate by being faster out the door with a new branch or service and the actual cost to the business of downtime or poor application performance. And again, wherever possible, quantify it in terms of real, hard dollar numbers – talk to your business lines; they can help you with that kind of thing.
Before we proceed to the question and answer session, I just wanted to point out that you can download a copy of the research paper that this presentation is based on by going to the links that you see here. And also, if you go and look at the attachments and links portion of the BrightTALK interface that you’re currently working in here. There is an attachment that you can download there for a copy of the paper, as well as a copy of the presentation itself. And there are links to other things, as well, including a survey – a peer WAN data survey that you might want to take a few minutes to complete. And with that, I’ll throw it open to questions.
Lloyd: Wonderful. John, the first question coming in is around DIY versus carrier. Question is – are you seeing enterprises trying to adopt this themselves, or are they typically using a carrier, especially since many service providers have made announcements recently?
John: Many of those announcements post-dated when we did the research, so the data we got in the research was nearly all about do-it-yourself solutions. But that said, a couple of the folks that were in that group were using a managed solution not from carriers though – from other managed service providers.
Lloyd: Okay. The next question is around maturity. The question is – what is your opinion on the maturity of the solution? Have you either directly spoken to customers implementing SD-WAN or have heard of customers implementing SD-WAN?
John: We have spoken – all the people that you saw percentage-wise as currently deploying, those are folks that we spoke with directly. So yeah, it’s the case that we’re talking with folks who are actually deploying with things and their level of satisfaction with the solutions is pretty high. They’re saying it delivers the benefits it says it delivers. A few people said it wasn’t quite as easy as they expected, but it was still a lot easier than what they had been doing previously. And yeah, their satisfaction with the reduction in troubleshooting and the reduction in downtime was – well, I’ve used the word “glee” many times. They were gleeful at the improvements they were experiencing.
Lloyd: Wonderful. So the next question, John, is – 78 percent of clients are keeping MPLS as part of their SD-WAN network. What segment of clients are you referencing when you make that statement? Are these Fortune 1000, or are they representative of all companies?
John: Sure. That is a across the board for people participating in our research. And the research sample when we interview folks – it leans large slightly. So more than half of the companies that we’re talking to are Fortune – well, probably Fortune 2000 size, and the rest are below that. So it is certainly leaning large just as the research pool does, but it was irrespective of size. We didn’t see a specific correlation.
Lloyd: Okay. Next question is around security. The question is – how secure is SD-WAN? What additional security measures are recommended in the deployment?
John: You need to reference very specific solutions before you ask how secure it is because some solutions were essentially designed from the drawing board with the idea of being secure enough for banks or hospitals to use them as their core WAN technologies. So security was one of the initial design criteria and an ongoing target of all evaluation of updates and management tools. Others started life in different places. You know, they started life primarily as a link aggregation tool, or they started life primarily as a WAN optimization tool. They may also have extremely tight and wonderful security. You just have to be doing an evaluation on a product by product basis. Is there any further questions?
Lloyd: Oh yes, sorry. I was on mute there. The question is – how difficult is it to interoperate SD-WAN solutions with legacy networks?
John: Let’s see. We have a little data on that. The folks who were deploying SD-WAN primarily to their, you know, most troubled locations first. And basically it was segmenting their WAN into two large pieces – the SD-WAN part and the rest of it. Communications between the two went through the data center, essentially, and it did not seem to present them with any insuperable challenges for engineering and didn’t hurt performance in any way they remarked upon when we were asking about how well that was working for them. So a little bit of traffic challenge, but all of the providers in the space that we’ve discussed and whose clients we’ve spoken with expect this. They don’t expect any kind of flash migration from the old solution to the new solution, so they are expecting deployment in a situation where they need to coexist with the system as it used to be and interoperate with it cleanly.
So they’re all pretty much engineered to make that straightforward. It may not be simple, but it won’t be something you’ll have to revisit constantly, either. You’ll get it set up, and then it will be in place.
Lloyd: Wonderful. The next question is around deployment state. From your slide on SD-WAN state of deployment, it seems like most of the people are either planning or evaluating SD-WAN. Are these the respondents who are in an active WAN refresh cycle or are people considering SD-WAN outside their refresh cycle?
John: That is an interesting question. It does not seem to be tied completely to refresh cycles. We’re hearing people making evaluations out of cycle because they have new locations to bring up and they are thinking how they want to do that, or because they have enough trouble in enough locations that they’re looking for a solution there. So they may not be planning on a complete WAN refresh in the short term. That may wait for the regular refresh cycle to run its course, but you know the 20 percent of worst performing sites may be scheduled for a transition to this new solution well in advance of that. And they might be committing to a greenfield as SD-WAN plan, as well. We spoke with at least one company that was committed to doing all of its growth on the SD-WAN side and waiting for its actually multiple refresh cycles for other locations to catch up to it.
Lloyd: Wonderful. Next question is – in your analysis, are the enterprises using multiple providers for their solutions or are they going with a single provider?
John: On the SD-WAN side, one provider. On the connectivity side, multiple providers, typically trying to keep the pool size limited. You know, they want at least a regional provider for those redundant Internet links, for example, instead of having a different provider in every municipality, for example. You know, they want to have one for United States. They want to have for EU, if they can manage it, one from Brazil – whatever places they locate in, they’d like to have it. They’d like to have the pool size as small as they can manage.
Lloyd: Okay. And that was the last question, John.
John: All right, Lloyd.
Lloyd: Does anyone –
John: Thank you. Thank you very much. I just wanted to take a moment and suggest that folks avail themselves of other FutureWAN sessions. You can see some listed on the slide, here, ranging from enabling cloud migrations to dealing with layered security using a service like Zscaler – all very interesting stuff and well worth some attention.
Lloyd: Thank you.