Tech Field Day: MPLS to Hybrid Migration

David Klebanov, Director of Technical Marketing, demonstrates the security capabilities of the vManage solution, via MPLS to hybrid migration.

Recorded at Tech Field Day in Silicon Valley.

View Slides Here

Presenter

David Klebanov Director of Technical Marketing, Viptela

Senior infrastructure technologies professional with over 15 years of extensive experience in designing and deploying complex multidisciplinary networking environments.

Transcript

Tom: Hello. I’m Tom Hollingsworth and you are watching Networking Field Day 13. We are here in San Jose, California with Viptela. We have invited a group of networking bloggers, speakers, podcasters and luminaries of the community to take part in this discussion, offer their opinions, ask questions and ad their voice to the conversation about software defined wide area networking.

If you would like to learn more about tech field day including how to become a presenter or a delegate, please join us at our website, Techfieldday.com. If you would like to see more videos about this and other exciting technologies, please check out our YouTube channel at YouTube.com/techfieldday.

David: All right. So let’s look further. Right. So I’m trying to bring up a new site. So the device had been provisioned. Right. So one thing to understand is that I do want to bring just one point in here is that what we do at Zero Touch Provisioning Service, every device … Archish mentioned about trust or zero trust, every device comes with pre-built or pre-loaded certificate, signed certificate. It gives us a unique identifier for that device. Every device that ships out of the manufacturing floor. Right.

The certificate is pre-built inside. When a device comes up, it goes through a bi-directional certificate authentication between itself and a management system, between itself and controllers. Everything is predicated on a bi-directional trust between the systems. Right. So we assume that this process is done. Right. The device has been brought up because it’s really zero touch. There is no point for you to see that because what will happen is that you will just stare at this and all of a sudden the device will come up, right. So there’s zero value to show you that.

Now, the question is what happens after this device comes up, right. What if it was shipped to the wrong location? What if it ended up with your competition or your neighbor? What do you do? So as part of the process of setting up the device, you can decide on the trust level for the device. You can say it’s an invalid device which means from the remote site perspective, it was zero touch. There is nothing I need to do. I don’t need to click anything, I don’t need to browse anywhere. It’s completely zero touch. Yet I have no connectivity.

It’s an action that an administrator would have to take to go into the management system and enable to device. Right. There is also staging where I can work on the device stages which means it will get some connectivity from controlling but no data plan. What our customers are using this for is the stage to devices before moving this into production. So I can have full connectivity into the device. I can do whatever I want configuration and privacy wise on the device. When I’m ready to flip the switch, then I promote this into the valid mode. That will open up the data connections and the device becomes fully operational and now not only the control connections are up but also data plan connections which are the IP sec tunnels are also up. Right.

Tom: So if V Manager is running in your server environment, is it like a Cloud brokered connection for the vEdge defined …

David: Yes.

Tom: [Unintelligible 0:03:11].

David: There is ZTP service that we offer as a service. If you want it, you can also hose this internally for some of the larger customers that want to have ZTP internal, ZTP service, they can do that. So it is sort of like a broker kind of that basically makes your vEdge into your organization. It is very light weight. It’s sort of stateless in a sense that it has no customer data. It just goes home, hello, I’m customer A. Go talk to your customer, to your infrastructure, disconnect.

Tom: Right. Okay.

David: All right. So the reason that I didn’t find it is I was looking in the wrong screen. So here’s the device. You can see that it’s been sort of ZTPed and it shows invalid. Right. Which means it’s not operational, it’s standby device so to speak. So what I’m going to do now if … also if I look at the geography map in here, this is a map that shows all of the devices. That device has actually been pre-configured with GPS coordinates so it will appear somewhere in the New York area. All right.

So let me go back into this. So where we are now is the device had been shipped. It’s at the customer and I’ve decided that I don’t want to activate them immediately. I want to go through the process of validation first. So I’m going to put this into a staging mode first. So once I flipped it into a staging mode, I basically have to send it to the controllers. What it does is that V manage, as I mentioned, it’s a separate system.

There’s a V Manage system which is a management system and there’s the controllers. So for the device to be able to establish the control play and connections, the management system, I made the change so I made the change in a V Manage database, right, but the controllers need to know about this. That’s what I did. I pushed it and our controllers are aware that the device is allowed to connect. So you see it’s been pushed. So now if I go to network, I can see this device is here. It shows in staging mode.

So if I go inside, I can basically … I can see the information about this device. I can see what are the interfaces on it, I can see things about it but if I go here and I say show me what IDC connections you are building. Nothing. I’m not building any IDC connections because there is no data plane yet. It is completely a device that has been staged now and ready to be activated.

Now, the next step that I can do and I can promote it into the fully valid device. So this of course requires the maintenance window for you to do that because that activates the tunnels so to speak. So all the devices now need to start building tunnels through the device. It does go through a process where it loses some of the connectivity so it’s a maintenance window of activating the device after it’s been staged. Right. So I’m basically repeating the same thing. I’m pushing this through the controllers. Now the device has received sort of instruction to be activated.

So we need to wait a little bit, for a few seconds for the device to sort of … for the network to come up. Now, in the meantime, I can go to the map and I can see this device is in here already. Actually it was here even when I was doing staging. It didn’t show in the map but it was there as well but it had no data connections. So it says unreachable. It’s going to clear in about maybe 10 seconds or so. And what I expect to see is I expect IDC connections to come up. And when the IDC connections come up, then it’s actually going to communicate to the rest of the devices and I’m also going to show you how all of the reach ability information that is now received through the controllers about all the other devices that it can communicate with. So let’s see. Okay. I think it’s up. All right.

So now if I go into here and I can say what are the IDC connections that you are building? Okay. All right. It built the IDC connections out. Right. Now, what are you learning through those IDC connections? What is it that you know about the network now? So I can go in here and say show me your routing table. It knows about stuff. You can see here some of them are marked as OMP. OMP is our control plane protocol. It’s called an overlay management protocol. Think about that as an extensible control plane protocol that communicates between all of the devices, all of the vEdges and the V Smart controllers and it passes information around. It is extremely extensible.

It is extremely scalable. What it allows you to do, it allows you to move massive amount of information from the sites to the other sites, through the controllers. So imagine site wants to advertise its own prefixes. This is what I know. It’s going to send it to the controllers. Controllers are going to send it to the other sites. Kinda like a reflection mechanism. Now, is it just reflecting? Absolutely not. There’s a whole lot of policies that I can do on that reflection.

I can send the information yet I can say the controllers can influence the information that gets reflected. I can change next hops, I can say don’t advertise that route anywhere else so it’s not just reflection. It is reflection with a whole lot of control. So when we say policies, there’s lots of dimensions to policies that we have. Some of our policies are acting on the devices themselves.

Some of our policies are acting on the controllers. That gives us lots of flexibility on what happens with that information when it gets distributed. Just because a device learned about something and sent it to the controller to be reflected, it doesn’t mean it’s just gonna go through the controller untouched. Controller may actually introduce quite a lot of things to that object.

Tom: Because I can have a full meshed topology here. I think that’s what was on the screen but it could be hub and spoke but in the policies that I’ve built.

David: Yes.

Tom: But in the policies that I’ve built.

David: One use case that you can think about that’s a really good point. One thing you can think about is that I have two sides. I’m going to advertise reach ability to them, right? So if they’re going to learn about directly connected subunits, they’re gonna try to communicate directly between them. Let’s say I have a tunnel, I just learned about that other subunit I can talk to. However, as it passes through the controller, controller can actually set a different next hop for that prefix that gets advertised.

That’s going to force the traffic, even though they have a direct connection, it’s going to force the traffic to go … maybe I can set a next hop to the data center and I can do this on a purvey PM basis. So you’re right. I can use the manipulations done on the V Smart controllers, on those updates. I can use that to construct a hub and spoke topology. For anybody who builds networks, that should make a lot of sense.

Tom: Now, you’re talking about a lot of detail about setting next hops on these kinds of things and I’m thinking about BGP policies and that sort of stuff. Now, if I’m writing a policy, is it a high-level extraction of all that or am I actually getting down into the guts and keying in IP addresses and that sort of thing?

David: So the way we approach a policy, it’s a policy language. The reason that we have a policy language is that the flexibility that you get in a system to determine how the traffic is going to follow requires you to have a level of expertise to build this policy because there’s so much flexibility into it. So what we have is we have a policy language that describes your intent and I’m going to show you something.

Tom: So a simple hub and spoked topology can be these are my hubs, these are my spokes and many workers figured out how to manipulate around it.

Tom: The policy language sounds suspiciously like command line interface but I don’t think it is that exactly.

David: It’s a policy language. It depends what you call command line interface. It’s a policy that is enacted … it’s defined centrally on the V Manage and it’s acted upon by either the vEdges or the V Smarts, it depends on how the policy is built and where it says in the policy. Right now, the policy is a policy language.

Tom: That’s a configuration on the controls which is then interpreted and pushed out to the [unintelligible 0:11:40].

David: In fact, it’s a configuration on the management system that pushes it to either the controllers or the vEdge devices themselves based on the intelligence the system has about where that policy should go. Some policies go to the controllers. For example, to build those hub and spoked topologies instead of doing full mesh topologies. Some of the policy goes straight into the vEdge. If I wanted to do some manipulation on the vEdge itself, it does not concern the controller. For example, maybe set the next hop straight on the vEdge to determine. Take MPLS or take broadband.

So there’s some decisions that are taken on the controller. There are some decisions that are taken on the vEdge. There is lots of flexibility that is built into the policy. I will go as far as making a bold statement. There is pretty much nothing that you cannot do with the policy. Any crazy traffic patterns that you can envision, you can. Is it sophisticated, there is a small learning curve. Yes, we guide the customers through the journey of what it means to construct the policies.

Once you build the policy, of course you can reuse it. As customers deploy our solution, they get more comfortable with the policy, they get to implement it. The policies, not everything is complicated. Some policies are very simple to build. Other policies are sophisticated and require really understanding of what you are building but the flexibility is ultimate.

David: And also these policies around network-based policies rather than [unintelligible 0:13:06] policies so when you define these policies, I’ll give you an example. Say you have a policy where you’re saying that in my network, I want my real-time, a voice traffic or something to deploy MPLS. So now when I’m coming back to your idea of the next hops of things, so now you don’t have to go and define it per device and saying that from this device, here’s voice traffic, this is my next hop because every vEdge is gonna be a different next hop.

So you define a network level policy where you just say that oh, I want this traffic to take MPLS. So now the network, all these devices in the network, they’re aware of the transports they are connected to, they’re aware of all the services they are connected to and also going back to what David was saying. If you have a fire wall and you want certain type of traffic based on some policy to go through a fire wall, all you need to do is say I need this traffic matching this criteria, go to a fire wall. You don’t need to know the IP address of the firewall. You don’t need to know the IP address of the next hop you’re gonna take to get to the firewall. So the network is aware of all the services as well as the transports it’s connected to.

David: Yes.

David: So we have extracted all this lower level information from the policies and the policies now become network level policies rather than device-level policies.

David: Yes, so just to complete the thought. Just because it’s a policy language, don’t think about that as oh my God, this is now a device by device configuration now. It is completely abstracted, it’s completely transport independent. It allows you to express your intent but however because enterprise customers have so much intent and they’re trying to build so many things that they’re doing the policy in such a way that it gives you that utmost flexibility. Yes, sorry, you had a question.

Tom: Sure. So that sounds great.

David: Yes.

Tom: From an operations standpoint. So ideally, we’re pushing this out to many remote sites. That’s one of the compelling reasons to use this type of a technology. I have experience with one particular vendor and every time we want a new policy or a new feature or new function, even if it’s just security vulnerability remediation, that gets updated centrally somewhere and then we can’t push that out until everything is on the same version. So again, that sounds great but in an operations standpoint, is that required to do here where I have a literal global outage while we’re always updating something? Is it the maturity of the product …

David: Yeah, so the policy has a policy scope. If you define a policy, you can define a policy scope as being enacted on the entire network, on a subset of the network or on just individual sites. Right. So when you make policy changes because it’s really not vulnerability the way it can be described. This is if I stay on the policy side of things. Right. It’s really the scope of the policy that is going to decide of how wide that policy is attached, right. So you should know what you’re doing when you’re operating the product.

David: And to add, the management plane, the control plane and the data plane can all be on different versions. We can go really far, really forward, we can be on completely different ones and use two [unintelligible 0:16:04] to make it work.

Tom: Yeah, and that’s sort of what I was hoping.

David: Then I apologize. Yeah, from that standpoint, absolutely. Yes, there is no need for you to be on the same version. There’s no requirement of anything like that. Yes. All right. I’ll move on.

So we see the device came up. It learned a bunch of routes. Now what I’m going to do now is … one more thing that I wanted to show is in here if I do … so if I look at the interface, there is one interface that was brought down in here. You see that it actually shows in a down state. So this is how I take my … basically I take a site that is now an MPLS site that was brought up because this is actually an MPLS connection, I want to add another connection to it. Right.

So if broadband was delivered, I just want to unshut the interface and let it go. So if I look at what exists right now, if I go again in here and I would say for example, show me a B of D information, it’s going to tell me that all of the connections are going through something which is called an MPLS. All right. Now if I want to enable a different … an additional transport and turn this site into a hybrid site, all I need to do is I basically need to unshut the interface and see that it establishes the connections to the other underlying network. So I can do that through everything is templatized.

So if I go into the template, of course it’s a very small system so there’s not many templates in here. If I go into here and I can see this template, it’s a device, it’s attached to site two. So I can go inside and I can actually look inside the template. There’s different interfaces that exist. I know that on the transport side in here, transport side for us is basically the WAN site. Right. I can see one interface that is provisioned which is this guy. So what I can do is I can basically say I want to add another interface.

I can select an interface that I want to add and I can say update and what it’s doing is it’s basically just generating an update to the vEdge router and adding another interface that that VH router now knows about. Right. So the moment it’s going to finish which should be momentarily. It’s pushing that update. So we’re going to see that additional connections would come up on that device and those connections would be over broadband.

So if adding a transport into the VH router is as easy as just plugging a cable into it and making sure that the interface goes up, the device goes through like an immediate discovery process that says oh, what can I do with this new interface? Who can I reach over that interface and it’s going to try to establish a communication over the interface to all the other VH routers that are connected over that interface sound ware.

So if I enabled broadband, anybody who was on the broadband on the other side, if I don’t restrict anything, it’s just going to try and connect to them. So if I go back to the network and I choose this device and I go into the real-time NSA, oops, I now learn whole other type of connectivity that I can get to. So the point that I was tryin’ to make that is as you are migrating from sort of one type of connectivity and adding an additional transport, it’s as easy as just enabling an interface on the device and letting the device because the device is all the intelligence through the control and protocols, it has the intelligence about all the other devices that exist in that domain.

So when it comes up, when the interface is added, it’s going to start basically sort of crawl so to speak through that new transport to try to find the destinations that it can get to through that because it tries to give you the utmost connectivity if it’s unrestricted. Right. Of course you can do a lot of manipulation to say that’s do or don’t but if you’re not restricting that, then it’s just gonna try like a web search engine would crawl the sides and try to get you the content so we crawl the connectivity types and try to get you to all the remote destinations that exist so that you’re able to establish as many connectivity options from side to side to start leveraging one transport and then another transport.

All right. Let me go into enabling connectivity into the [Legusus] site. Right. So when you’re thinking about how you connect to the traditional environment, so you have your hybrid network, right. So you built your hybrid network. It doesn’t matter how many sites it has but the principal is the same. You have some sites, you have some remote sites, you have some data centers, we may have some cloud locations in it. It doesn’t matter. At the end of the day, it’s a hybrid network. Right.

So now you’re trying to connect into a traditional world. So what is in a traditional world? It’s either the side that has not yet migrated or it’s some services that are provided by the service provider in the underlay space. Many of the service providers provide a multitude of services on the MPLS network. Be that security services, voice services, whatnot. How do I connect to them because they are … for me, it’s sort of an underlay service. I live in an overlay. I send traffic from one site to another over the tunnel.

Now I need to go into the service that is not reachable over the tunnel. It goes over the underlay. So what do I do? I have two options. I can either send this traffic to the data center and let it go out of the data center and that’s where I can have a more meaningful peer relationship with my service provider, BGP or whatnot to learn the routes that are available on the MPLS network from the data center.

Now, there’s another option and that’s what I’m going to show is to be able to establish that BGP connection or that BGP petering relationship directly from the remote site over the same interface that carries my overlay traffic over the interface that establishes the connection to my MPLS service provider. So I’m going to learn routes from the overlay and I’m going to also learn routes from the underlay and all of that over the single interface because I’m able to establish a BGP connection with the underlay world as well.

So what we have is very simple. We have one site, one site which is non-SD WAN site, one site which is an SD WAN site, enable BGP process on here. Actually the process is enabled, the enable is just unshut. And what it’s going to do is it’s going to now … once we enable BGP, it’s going to learn routes through BGP and it’s going to allow me to interoperate between the overlay sites and the underlay sites because now that vEdge device has a visibility into both.

David: So during the migration, all you have to do is enable BGP on the new site that you have brought up, your existing sites that are on your traditional network do not need to be touched.

Tom: And do you select sort of a specific gateway that will be that bridge between the overlay and the underlay or does that happen at multiple points?

David: Yes, so that’s the option number one that he was talking about where you have a gateway. Option number two is there’s no gateway. It’s actually a site that connects to all the other … there’s different sites over the overlay. It talks to other non-SD WAN sites statically through the underlay.

Tom: Yes. You can do that on a site by site basis?

David: Yes, site by site basis. You can do this on a site by site basis and every site that wants to get the shortest path reach ability into the underlay services can do that. So it depends on the flavor. Some customers prefer to actually send it to the central location like data center and let it access the MPLS in the data center because they feel it’s more controlled. But some others that want to kind of minimize the back hole that goes into the data center, they want to send it directly onto the underlay at every site. And again, it’s … I’ll stress it out again, it’s over the same interface so it’s not like a back end LAN connection that goes back into your MPLS network. It’s the same interface.

Tom: That’s important because very often if you’ve engineered your WAN a certain bandwidth and then you go to make a migration like that, you may not have been prepared for the amount of aggregation bandwidth you’re now gonna take if you’re bridging everything at the data center because that’s not how the MPLS work in the first place.

David: Another interesting point is that when you deploy MPLS service, we do not require you to keep an MPLS CE router in place because we are a fully featured router in this sense and we support routing protocol in both LAN and WAN site interfaces, when you go and you say I want to transition this site from being sort of being an MPLS only to a site which is a hybrid site, I can take the connection from the MPLS service provider and I can plug it directly into the vEdge. There’s no need to do it. The CE is gone.

Tom: You don’t have serial interfaces though do you?

David: We don’t have serial interfaces.

Tom: Okay. Just checking.

Female: You do have PV6 support? All this is PV6 support?

David: Yes. Yes, IPV6 and IPV4.

Tom: And you support partial mesh topologies so if you had a group of point to point links between certain sites, can that be integrated?

David: Absolutely.

Tom: Be more clear on the IPV6 support, is that for management only, transport or both?

David: Both.

Tom: Overlay and underlay?

David: Yes. So the point that you were making and I didn’t finish my thought it when the MPLS CE is gone, all I have at that side because we are trying to make the deployment a very slim deployment, right, we don’t want to … because what’s the point if I have to keep legacy routers in place? Right?

Tom: Mm-hmm. Right.

David: So this device gives you an option to connect all the transports that you want, MPLS, broadband or built-in 4GLT straight into the device without relying on any other devices being present on that site. So an option for you to go with LAN site interface, it’s not even an option. The CE is gone. I don’t have a back end connection anymore. So this is a very sort of powerful tool that our customers have adopted is that it really creates that cost savings that people are really after. It’s like what the point for me because I’m in a hybrid mode for many years. So what’s the point for me to keep the MPLS CE router in place?

So what I’ll do here is I will go …

Tom: I think we’re winding down on the …

David: Yes. Go a little bit faster. So I’ll go to the legacy site, this is the Cisco CSR so it has a BGP connection. Let me just make it a little bit. I can do show IP route, BGP and see that I’m learning a bunch of routes. This is my relationship with the service provider. This is a traditional site. Have not migrated to MPLS, sorry to SD WAN, it’s a traditional MPLS site. Right. So what I’m looking for is that the two sites that want to communicate is if I look at what’s on the legacy site and I look at what’s defined under the BGP, I’m seeing that my local network is wanting to do 16830.

This is the network that exists on the MPLS site, that legacy site. You see it’s been amortized into MPLS, into the service provider but of course I’m also picking it up on the other site because I haven’t established that relationship yet. So if I go here and I go back into for example, the device that I’m trying to communicate which is the site one device and I say show me your routing table, and I say what I’m looking for is 1 into 21683. That’s the legacy site.

No, here because I haven’t been able to BGP yet so I’m not learning about it yet. So what I’m going to do is I’m going to go into the template again. I’ll speed it up a little bit. I’ll just search for it. I’m going to into the template that is applied to an SD WAN site. I’m going to edit this template. I’m going to go into the neighbor statement and there is a neighbor that’s been created in here. I will expand it and I will say I want to unshut that sort of neighbor.

So now an SD WAN site is going to establish a BGP connection with an underlay. So we’re going to be learning a bunch of stuff over the overlays so it knows how to effectively communicate with the rest of the overlay in active over any transport. What it’s trying to get to is its trying to get to a site which is purely an MPLS site. It has no SD WAN on it. So now that I’ve enabled the BGP process, it’s going to kick in and we need to push the template. When the template is pushed, that’s going to establish he BGP connection from an SD WAN site through the underlay into the service provider.

That is going to allow me to learn about the remote legacy site that have not yet migrated into … allowed it to be learned by the SD WAN environment. Okay. It’s done. So I’ll go back to where I was before. I’ll choose site one and I’ll just repeat the information about … actually I can just BGP route. There’s the guy. 15630. This is the guy that I’ve now learned from my service provider and if I go back to here and I’ll just repeat this command, there’s the guy. I learned about site one.

What I’ve done here is basically I’ve taken a site which was an MPLS only site. My traditional site. I haven’t done anything to it but I need to get connectivity to it. So what do I do? I already embarked on an SD WAN. I’m deploying sites but I cannot live in SD WAN world only. Right. I need backward compatibility, I need backward connectivity. This is one of the ways of doing that. Establish BGP with your service provider, learn about the underlay routes. Now I know how to get from the overlay into the underlay and reach those legacy sites.

Tom: But can you filter which prefixes you do that for?

David: Absolutely.

Tom: Okay. Because otherwise you’d be in a case where you might be advertising overlay prefixes as equal cost into the underlay BGP and now you end up with really suboptimal routing behavior.

David: Yes. There’s lots of controls that you can have over this BGP relationship because it’s a full mature BGP implementation.

Tom: Shocking.

David: Yeah, shocking. It is shocking.

David: And even if that happens, that is enough juice that we have built to arbitrate across … to run something through MPLS BGP overlay/overlay, we go through a comprehensive best part to figure out this is the most optimal route and use that. So built on top of very sound learning principles.

Tom: Okay.

David: David.

Tom: And because of your policy language, once the traffic enters the edge, you no longer actually have to make a routing decision, right, you can make a police decision of how to deploy the traffic.

David: It depends on where it’s going. If it’s going to the legacy site, it is a traditional routing decision made by that vEdge device to go underlay, no tunnels involved, no IP sec, no SD WAN control plane, just go straight into the service provider underlay and then reaches the destination, return traffic will go from the underlay and reach that site. If I’m going overlay, I have my OMP protocol that advertises all the reach ability to other sites. I’ll go on the overlay. So I effectively keep on bringing my sites sort of as I’m moving them into the SD WAN, they never lose the connectivity to my traditional MPLS world.

Watch Now