Agilent’s Detailed Results on SD-WAN Production Pilot (ONUG 2016)
This session will present Agilent’s phased deployment of it’s Global SD-WAN transformation. He will cover the vendor evaluation criteria, PoC strategy, and share findings on cost, reliability, performance and security on the Viptela SD-WAN solution testing.
For more information, visit the ONUG website at opennetworkingusergroup.com
Moderator: Hi, everyone, welcome back. We normally do a few of these kinds of sessions and they’re called the Luncheon Partnership Sessions and what they’re designed to do is kind of show how IT supplier and buyer work together to solve a problem. Okay, hey, I’ve been struggling to kind of do this thing I need to do and I need some help as we all do from time to time. And so does that relationship work for each other? And that’s what these sessions are all about. So we’re only doing one today. We’ll do a few more tomorrow and I am really just so honored and pleased to be able to introduce James Winebrenner and James is with Viptela. We’re really lucky at ONUG. We’ve been able to like help see companies grow that have become part of the community and get plugged into the community. Viptela is one of those companies.
So we’re really honored and pleased to have them be part of the ONUG community and also to kind of show you what I mean by that by this quick kind of presentation. So let me give you just a really quick background on James and then I’ll hand it on over to him. So James is the Vice President of Business Development with Viptela. He plays a major role in the solutions architecture of wide area networking for both enterprise and service providers. I’ve worked with Jim for at least four years or so, four or five years. And he is so plugged into like this transformation that’s happening in the wide area network and we’re just happy to have him here. And also Pascal as well. So, Jim.
Jim: Thank you, [Nick]. We’re glad to be back. This is actually our two-year anniversary. We came out of stealth at ONUG, May, New York, 2014. So two years ago. Nick was gracious enough, even though we’re only doing one of these sessions, we’re going to go the full sixty minutes this time. Right? No. So thanks for having us. I want to introduce Pascal Heger. Pascal is the chief architect for Agilent. Pascal will explain a little bit more about what they do but Agilent’s been a great partner of ours over about the last six months as they’ve embarked on their SD-WAN transformation and our goal today is really to actually kind of talk through the transformation that they have gone through as well as really what kind of drove that, the amount of rigor and testing that went into their analysis and really kind of where they came out on the other end. So Pascal, I’ll let you kind of talk maybe a little bit more about Agilent, what your footprint looks like, what you guys do for those in the room that may not know.
Pascal: Sure, absolutely. So I’m a global network architect at Agilent Technologies. A little bit before I talk to you about our journey into the SD-WAN space, Agilent is a leading test and measurement manufacturer that serves industries in the life sciences, the diagnostics, and the chemical analysis markets. Last year we generated about four billion dollars in revenue and we have about 120 sites in over 30 countries. From an IT perspective all of our sites are currently interconnected through a single global MPLS provider. If you look at the MPLS footprint we tend to break out our internet traffic today in the MPLS cloud within region and we run all of our traffic through a full suite of security tools there. If we look at our LAN site we’re highly standardized there. We’re fully reliant on voice over IP for our internal as well as our external communication.
We run WAN acceleration at approximately 50 percent of our sites and we have 35 sites where we are fully telepresence enabled, ranging from sort of the large free screen units down to some of the smaller footprints. Historically, I think Agilent’s always been a company that’s been sort of risk at first and when we look at new technology we tend to adopt what’s already out there in the industry, what’s considered to be industry proven technology. As such we try to avoid downtime at all costs and I have full quality of service enabled in our network to prioritize our voice, our video, and our mission critical traffic.
Jim: Great. And I think that sets a little bit of the tone for the depth and the scope of the evaluation that you guys went through. I know that when we first started working together there were really kind of three areas that you brought as main concerns. One was simply cost. The WAN cost as a proportion of your overall IT budget and how they were growing kind of disproportionately to everything else.
Some of the performance concerns you had on some specific applications, I know you mentioned telepresence as one of those, and then I guess more existentially this transition that was already starting to happen as Agilent looked to move applications out into the cloud both for internal consumption by Agilent employees but also as you guys looked at developing some new applications that would be used by your customers and sort of the transformation that was going to happen there. As you guys looked across those three areas, what really kind of came together and gelled behind Agilent deciding to embark on this SD-WAN journey?
Pascal: Yeah. It was really a combination of different drivers, so cost obviously being one of them. Today we spend approximately 720,000 a month just on our MPLS footprint. Fifty thousand of that goes to the provider just to manage our CPE infrastructure. We provide dual links, fully diverse circuits in 12 of our larger sites today where only one those links is active at any point in time. So we really wanted to make better use of these idle connections and augment or supplement our current MPLS footprint with DIA connections and potentially reduce our MPLS footprint in about 50 percent of our sites. From a reliability perspective, as I said, we can only afford today to provide fully diverse circuits in 12 of our sites and through SD-WAN we’re wanting to increase that to about 75 percent of our sites and we’re actually talking about doing all of them.
We’re currently running at a total cost of ownership exercise that’ll likely drive that decision. From a performance perspective, we’re seeing more and more application shift towards the cloud. R&D is doing tons of development work there. We’re in the middle of an Office 365 migration. We’re migrating about 17,000 mailboxes and it comes with an unlimited one drive storage facility. We really needed to sort of break away from the old isolated MPLS paradigm and look at better means of connectivity for these use cases. Also for our business partners connections and site moves and acquisitions we often had to wait four to five months for a new MPLS circuit to be provisioned. How do we leverage these business partner integrations better? Once these relationships get established it’s really difficult to have to wait that long. The business wants to start reaping the benefits immediately.
Jim: So, Pascal, you guys as a company have testing really in your DNA and I know you embarked on a very, very rigorous process as you started to look across not only the use case internally and some of those challenges but then when you started to go out to the market and evaluate the space. I guess if you could kind of sum up, how did you guys narrow it down to the key kind of table stakes criteria to go from looking at everything to really what you were willing to go and invest the time and the resources to take into the lab?
Pascal: Yeah, this was pretty interesting. As I said, we’re a risk at first company and generally we tend to do our vendor selection based on who’s in the top right hand corner of Gartner’s Magic Quadrant. With SD-WAN only being two years old according to Gartner and no such thing existed. So we had the task of sifting through this vendor landscape, a couple of well established names in the network industry but a lot of these companies that we really never heard of before. The problem with it is that they essentially show you all of the same material. This SD-WAN controller that pushes down some [magic sauce] down to the edge nodes and secure connections get established in a full mesh. Right? How do you deal with that? And once you’ve got through all of the marketing [flub] what you’ll actually find is that a lot of these companies have different interpretations of what a SD-WAN really means and how it’s supposed to be architected and implemented. We ended up defining some very basic key criteria and I’m sure you’ve read through them as you’re looking at the slide here that we sort set of to go into a deeper discussion with some of these vendors. I’m kind of wondering if you’ve noticed that these are all criteria that any traditional modern day router would easily support. Well, to our surprise a lot of these companies were excluded for not meeting them. Looking further into their detailed architectures we were able to sift down and exclude some other vendors because we felt they wouldn’t scale, weren’t a good fit in our environment. There were a couple that I thought were just fundamentally flawed in the end. In the end, going through the detailed discussions with some of these companies, we ended up picking three vendors and took them into pretty extensive lab exercise.
Jim: Great. And I know you shared at the beginning a little bit about kind of the global scope of Agilent’s network. The fact that you guys have obviously a number of different data center locations but also different site profiles in different activities that are going on, so how did you go about distilling that global environment into something then you could build out in a lab topology and feel that you were effectively able to test and replicate kind of what you needed to see in the lab before taking it out to production?
Pascal: Yeah. So we really ended up with a back of napkin design. If you look on the left hand side this is what we intended to build out. It’s essentially comprised of a single large site represented at the top, two medium sites and a small site connected to just the internet. From this we ended up sort of getting a detailed network diagram and detailed architecture that referenced our own standards and our own polyservice models. Putting that all together, we essentially gave that to the three vendors and we told them here’s a scaled down version of what our infrastructure and architecture looks like today. Go ahead and change what you need to and implement your SD-WAN solution in our lab. It’s interesting that through this exercise we were really able to get to a better determination and better understanding of the differences in the SD-WAN landscape and how the different vendors operate.
So each vendor was given pretty much a full week of lab time and in the nine week total engagement our teams developed a 220 page document detailing our architecture, our infrastructure, routing models, polyservice models, and all of the different test cases that we subjected different architectures to. In the week of lab time, we sort of subjected the vendors to a wide array of tests, all the way from starting out with standard basic connectivity testing up to complex service chaining and application routing policies. We assessed the vendors for their ability to be able to interoperate and integrate with our network monitoring tools, our network management tools. We looked at their ability to perform Tech-X functions, [SIS] log functions. I should say one of the other criteria really was for us to look at the ease of operation and ease of management. It’s really something I think cannot be underestimated when you try to operationalize the solution.
Based on the outcome of the lab exercise, Viptela ended up passing with flying colors. While they were given a full week in the lab, they ended up completing all of our test scenarios in just under three days. So we were pretty stunned by the level of ease to build out an environment like this. Based on that we ended up picking them to build out a small production pilot in our production environment.
Jim: Great. And I know when we started about going into the production environment the kind of first thing you said was do no harm. We had to be able to go in and make sure we didn’t impact any of the existing users, the existing applications. What was kind of the goal? Maybe you can share with the audience, what were the things you guys were looking to be able to test in the production pilot you didn’t feel like you could effectively simulate in the lab and then what were some of the things that were kind of most important as you rolled out that production pilot to make sure that it was not impactful to the user base?
Pascal: You can only do much testing in a lab. Right? A lab environment, it can probably hang an elephant from a daisy but in a real world scenario that doesn’t really happen. Right? We ended up testing the Viptela solution six ways to Sunday but ended up trying to define what our production environment would look like, our limited production pilot. So we ended up picking a hub site which is our corporate data center in Colorado Springs, started integrating the Viptela solution there where the function of the hub site would be to act as a hub between sort of the overlay and the underlay network. Once a site is migrated into the overlay its only way to talk back to the underlay would be for this hub location.
After we went live in our hub site I actually took one of these routers and took it home and plugged it into my 2d [unintelligible 00:16:36] connections and started developing the different templates that I thought would be appropriate for us. Once all of that was done we quickly shortly after went live at our Englewood and Boulder facilities. One of the risk mitigation strategies when we initially went live because it’s pretty scary to put all of that traffic on an internet connection when you’re used to writing sort of an end to end QS enabled high quality MPLS connection, what we ended up doing was we used app route policies to steer our mission critical traffic of voice and video onto the MPLS network initially.
Jim: And I know as part of the production pilot you also did a very comprehensive evaluation of what types of other underlay networks were available. I know often times in this sort of SD-WAN conversation we talk about MPLS and internet but there’s different obviously kind of grades of each. So what did you guys find? What did you end up kind of picking to move forward with and I know you were a little surprised by some of what you found in terms of both the price/performance metrics?
Pascal: Absolutely. We were totally expecting to get or needing to get MPLS like performance out of the solution. We felt like we couldn’t just throw some random consumer grade or business grade broadband at these sites. So we ended up picking a tier one internet service provider. When we implemented the tier one ISP connection we actually ended up running some numbers. A big part of the driver was cost savings. I think the numbers here, they speak for themselves. But if you compare our MPLS with our direct internet circuit pricing you can clearly see that we’re able to provision so much more bandwidth and still would use our overall monthly charges. In fact, if you compare the cost on the per megabit basis, we’re paying only ten percent of what we used to pay in the traditional MPLS network. To me that really shows what the value proposition can be of a SD-WAN solution.
Jim: Kind of these new SD-WAN economics, does that really change what you guys are looking at in terms of the broader rollout for things like redundancy and BGP and other types of more economic calculations around what you can afford to put at a given site?
Pascal: Absolutely. So those were additional drivers for us. Actually what happened after we went live at the Boulder site two days later our MPLS CPE rebooted and usually that would have generated a bunch of support tickets and management escalations coming from the site, we were now able to leverage the resilient nature of the SD-WAN architecture to keep the site up and running. Not a single voice call was interrupted in the process. We don’t even get that level of resiliency at our traditional MPLS sites due to sort of the slow nature of BGP reconverging. The [fill] of conditions that the Viptela solution gives us are so fast that even if a router fails, users can still continue to make phone calls. The other example that I’ve listed here on this slide which is a big one for us is we had to perform some planned maintenance on the ISP circuit. So what we ended up doing was using traffic steering capabilities to shift all of our traffic onto the MPLS network. We performed the planned maintenance and put both circuits back into production which is incredible since for us this is a 24 x 7 manufacturing facility. We have 400 to 500 people in it. Traditionally it’s been really, really hard to get downtime negotiated here.
Jim: I know through this production pilot process you guys gained a lot of visibility really into what was going on on some of these links, both from an application kind of breakdown as well as an overall utilization perspective. What was most kind of interesting or surprising that you found when you had the visibility that Viptela leant in the overlay?
Pascal: One of our measures of success was obviously getting our users closer to the cloud facing applications. And you know what we ended up doing was we actually took this chart from the Viptela’s V managed platform. It does some monitoring and through sort of the DPI inspection capabilities in a router we were able to derive what are our top 25 applications and how much traffic is really flowing from these sites. So each one of these bars really represents an application in our environment. For some more detailed analysis, I was able to determine how much of this traffic was really Agilent traffic versus internet destined. So I was pretty amazed to see…sure, we knew some of it was internet but I was pretty amazed to see that really 60 percent of this traffic is internet destined already. Why would we end up having that internet destined traffic ride a super expensive MPLS circuit only to have it break out somewhere else to the internet?
Jim: The other thing you guys looked at through the production pilot and you already called out the fact that integration with some of the network monitoring tools and things of that nature were kind of critical success criteria. What do you see as some of the changes that are going to be enabled from an operation standpoint even including things like traditional [MACD] from a site move [out change] perspective?
Pascal: Maintaining standards is always a big struggle in a distributed environment. But I think that through the SD-WAN solution and the fact that you actually use templates to configure your network, we are now able to enforce strict standards on how we configure our routers. So instead of having all the 150 individual devices we can now just audit a handful of templates. One of the other capabilities that this brings to us is traffic segmentation that we’re excited to start leveraging when we bring in new business partners. We are now able to drop them into an isolated network segment and have those sites up and running so much faster than having to provision MPLS circuits.
Jim: I know in the intro you talked a little bit about the legacy security architecture where Agilent made kind of strong use of regional internet breakouts, had a traditional sort of full stack security solution. What has changed in terms of the security architecture? Do you see SD-WAN as actually benefiting Agilent’s overall security posture as part of this evolution as well?
Pascal: Yeah. I think so. We all kind of make assumptions that MPLS is really secure. Right? At the end of the day it’s only as secure as the level of accuracy with which your vendor does VLAN switching. Right? So moving forward we’re going to be encrypting traffic overall transports. It’s so easy to be able to enable in a full mesh IPsec connectivity that we really didn’t see too much downside to it. From a business partner perspective other than just dropping them into an isolated segment we can now use service chaining to steer that traffic to an IPS, IDS, or firewall to do additional inspection and really be more granular about what traffic comes into our network.
Jim: Great. So I think for the audience one of the benefits of Viptela’s SD-WAN architecture is that it’s part of just the automatic bring up. We actually do performance monitoring across all of the underlays in real time. So thinks like packet loss, latency, jitter, path MTU, we just start collecting all these statistics, analyzing these things, and we can make real time decisions around application routing based upon this data. But I know, Pascal, you were pretty surprised with some of what you saw and I know you wanted to share that back with the ONUG audience.
Pascal: Yeah. We were absolutely surprised by what we saw. The Viptela routers, they essentially measure latency, loss, and jitter over every IPSec tunnel by sending BFD packets. This table here really represents sort of a seven day average of the performance that we saw with the different transports between the sites in our production pilot. Obviously, we were really surprised to see that we were seeing better performance on a 90 percent lower cost circuit, our internet circuit, than our MPLS circuit when MPLS was supposed to be a guarantee transport here. It was also interesting that there was a substantial difference from my home consumer [growth] and connection in terms of latency, loss, and jitter. I’m sure there are some really valid use cases to provision lower cost broadband type IPSec connections into sites such as internet offloading or just bandwidth augmentation, it was also pretty clear that based on these results that we cannot use that as a primary means of connectivity and expect to get the same performance as MPLS.
Jim: Great. The numbers are one thing. I think user experience is really sort of the final judge and this is a question we get all the time from prospects that are looking at us to understand the real impact of leveraging a mix of underlay transports on especially real time applications, things like voice, telepresence, other types of video. So I know you had some specific kind of quality metrics that you wanted to share on voice as an example.
Pascal: Yeah, absolutely. Voice quality for us is a really big thing. We rely on it for all of our internal and external communication. So we worked really extensively with our voice services team throughout the pilot and they ended up capturing sort of voice quality metrics from NetScout probes that we’ve placed in strategically important locations in our network. This really shows a summary of what it is that we measured. So really the blue bar is baseline traffic meaning precut on the MPLS network and what we were seeing after we cut over and actually pinned our voice traffic onto the ISP circuit is a slight increase in the overall loss core which is fantastic and substantial reduce in jitter.
Jim: Great. I’m sure that there’s some questions from the audience but before we open it up if you want to just kind of talk about what the evolution is going to be now that the production pilot’s done and you guys have done quite a bit of planning around those results. What does that mean for your broader deployment and I think you’ve leveraged some best practices in terms of kind of categorizing sites and things like that.
Pascal: Yeah, absolutely. At the moment we are running through sort a total cost of ownership exercise, running a [unintelligible 00:28:56] looking at the cost to operationalize the solution, looking at Smart Hands. How do you deploy this at larger sites throughout the world where you don’t have local IT hands. All of that comes into the equation. Really our phase two intent is to sort of test these outs on a global scale, add more latency to the mix, go to South America, Asia-PAC, Europe and again perform the similar latency measures and voice quality measures. Obviously once that second phase is successful for us our intent is to global with this.
Jim: All right. Well, thank you very much, Pascal, for the partnership and for sharing this back. Do we have time for any questions? Or we can have people come out into the lobby if there’s questions.
Moderator: I would do one question and then we have to go.
Jim: One question.
Male Voice: [Question]
Pascal: In my opinion in a complex corporate world there’s not such thing as Zero Touch Provisioning. How many DIA circuits really have the DATP enable on them? We use a model where we call it Minor Touch Provisioning where we do some very basic configuration to allow the routers to talk to our controllers and push templates from there.
Jim: Well, thank you all. We’ll be available in the lobby afterward if there are any other questions. Thanks very much, Nick, appreciate it.