ESXi, vCenter, vCloud Director, Zerto, AD….. all nested inside vCloud.

So this is basically another piece of craziness born out of necessity.

I needed to do some testing with the latest release of Zerto virtual replication suite. I didn’t want to do it in Prod (obviously!) and our existing physical lab environment is a bit-too-secure to be useful and has no ability to demo what I build in it to clients…. So, what’s a cloud architect to do….. run in the cloud of course! (OK, that’s a bit wank, but sarcasm doesn’t translate well in text…).

Background

For those of you that don’t know, Zerto is a particularly nice replication suite that’s heavily integrated with VMware vSphere, ESXi and vCloud Director.

In a nutshell: It works by intercepting the SCSI writes at the hypervisor level, duplicating those writes to a  per-host virtual appliance (VRA) then off to the remote site where it is replayed to disk. It’s completely hardware agnostic, very simple to use, integrated with vCloud, has good client/provider separation and is the only replication product I know of that’s really targeted at cloud providers to deliver a Recovery/Replication-as-a-Service platform.

So it’s a great product, that I’m putting through its paces, ready for a RaaS product offering. A high Level of what we’re trying to achieve is to the right.

I’ve been working with Zerto for over a year now, they’ve been very helpful and responsive implementing suggested improvements there’s been a few “see that button, that’s a Doug feature” discussions with Zerto product management/support and I’m massively proud to be a part of it. With the 3.0 release, I’m convinced its ready for production RaaS deployment…. and we have a few customers eager to Beta the service… so time to test it in the lab.

What do you want (to test)?

Well…. I wanted to test Zerto replication and recovery from a Dummy client site (Cnidus.net pty ltd) to a dev hosting provider vCloud instance (ZettaDemo.com.au).

Specifically:

  • How well the Client / Provider resource masking works.
  • Deploying the cloud connector (demarcation point between provider and client). See how that works, any pitfalls etc.
  • Start mapping out what policies/procedures needs to be built / automated for a per-customer RaaS deployment.
  • Test out L2 tunnelling options w/ vyatta’s acting as bridges. (GRE, L2TPv3)
  • …..and simply if I could do it all in vCloud without too many compromises…. if we can test something like this inside vCloud, why not use it for other projects. Hyper-V lab in vCloud…. why the hell not!?

When do you want it?

NOW!…. actually, yesterday would be nice. 😛

Show me already!

Fine…. picture, meet 1000 words :)

RaaS_Testlab_Simplified

RaaS Testlab design

Believe it or not…. this is the slightly simplified version…. No L2TP/GRE bridges etc yet, just a single Network spanned between client/provider. This is to simulate the Virtual L2 circuit, that is itself meant to mimic a VLAN segment anyway, which is meant to mimic a ‘real’ physical network….. but I digress.

So, running through it from bottom to top, this is what we have:…

Base layer (Recursion L1)

  • An Organization + Org VDC running on a vCloud-Powered cloud provider (in this case, Zettagrid).
  • Three Org Networks (in vCloud 1.5.x terminology), VDC Networks in the new money… (Networks enabled for Promiscuous Mode)
  • Two groups of VMs, representing a customer (Cnidus.net pty ltd) and the Provider (ZettaDemo.com.au).
  • 2x separate Domains (cnidus.net domain is actually distributed across two VDCs, roughly 4000km apart),
  • 2x vCenter instances, 2 nested vESXi hosts,
  • 2x Zerto Virtual Manager,
  • 1x vShield Manager,
  • 1x vCloudDirector Cell
  • 3x Test VMs to verify connectivity etc

Also, to make my life a bit easier, I setup a small jumpbox/TerminalServer with interfaces on both provider and client. Obviously that’s kind of cheating…

Just to keep things relatively neat, I deployed Client and Provider into a vApp each (Cloud-in-a-box, inside a cloud) :)

RaaS_vApps_19062013

vApps for RaaS TestLab

RaaS_Cnidus_vApp_19062013

Customer environment vApp.

RaaS_ZettaDemo_vApp_19062013

Provider environment vApp.

All of that is pretty cut-and-dried hosted infrastructure. It’s all just-a-bunch-of-VMs. (JBOV? :P), running ‘in the cloud’… simple.

ESXi Hosts / vCenter (Recusion L2)

RaaS_Cnidus_Node201_19062013

Client Nested ESXi VM config

Now the slightly tricky part, the nested ESXi hosts and vCenter config. Lets start with the Client side…

Client Side

The ESXi Node is deployed from a standard template I built for a previous project. Pretty standard nested ESXi host.

I haven’t yet done any customization to the underlying hosts, so this environment can only currently run nested 32bit guests. That’s a relatively simple enhancement I’ll be submitting through change control though. If successful, it will be global on the ZettaGrid platform.

In terms of networking, I’ve got all 3 vNICs on the customer’s vlan segment, I’ve pulled an IP out of the pool for the vmk nic. The underlying portgroups have been modified to allow promiscuous mode, allowing the nested hosts to work properly.

 

 

RaaS_Testlab_node201_vswitch_04062013

Client Nested ESXi vSwitch Config

Currently, I’m only making use of one NIC on the customer side, so networking is relatively simple. I’ve simply passed through the network to a portgroup, without any vlan tagging at the nested layer. No Q-in-Q or anything too complex required.

 

 

 

You will also notice a couple of VMs running at this level. One is the Test VM that’s being replicated by Zerto. The other is the Zerto Virtual Replication Appliance (VRA) that’s deployed on the vESXi host.

Provider Side

Now here’s where it gets a bit more interesting….

Provider Nested ESXi VM config

Provider Nested ESXi VM config

As with the client side, to avoid any double-tagging and generally complicating it more than it is, I’ve got nested portgroups directly mapped to specific vNICs on the vESXi host.

Since I wanted to use vCloud Director on the provider side, I’ve opted for a dvSwitch. I used the specific failover orders to direct traffic out the appropriate interface exclusively.

This time, I’ve got 2 separate networks zones terminated to the vESXi, two provider networks and one Customer VLAN. As I mentioned earlier, I’ve used the customer underlying Org Network / VLAN segment on both provider and customer sides…. kind of cheating, but I wanted to get it all up and running first, then add network complexity in stages.

RaaS_Cnidus_Node202_vSwitch

Provider dvSwitch Failover order 1/3

RaaS_Cnidus_Node202_vSwitch2

Provider dvSwitch Failover order 2/3

 

 

 

 

 

 

 

RaaS_Cnidus_Node202_uplinks

Provider dvSwitch Failover order 3/3

 

So, so far so good. You’ll notice from the diagram at the top, I also put a test VM at this level to test basic connectivity.

That’s it for Recursion L2… onto vCloud.

 

 

 

 

 

 

Provider vCloud environment (Recursion L3)

ZettaDemo Provider network config (inside vCloud)

ZettaDemo Provider network config (inside vCloud)

The vCloud environment is made up of a VCD cell, vShield Manager appliance and an SQL box. All three boxes are deployed in the  L1 vCloud environment (i.e NOT nested), so they run at at normal speed, same as the vCenter server instance. They are however managing the nested ESXi / vSphere environment and add another abstraction layer on top.

Networking-wise, I’m basically using pass-through at this level too, so again, no Mac-in-Mac, VXLAN etc… although that might be fun to play with. That’s accomplished by using a Direct-type Org Network.

The OrgNet (CniNet_Hosted_L3_11403) is directly passed through to the Provider Network, which is also directly mapped to the PortGroup (VLAN0_CniNet_Hosted_L2_11403).

Doing it this way means that the network path is kept fairly simple, vCenter, ESXi and vCloud are effectively just abstracting

the underlying VLAN segment.

Hosted Organization Network config 1/2

Hosted Organization Network config

The VDC is an out-of-the-box Organization VDC, deployed from a completely standard Provider VDC.
So that’s really it for vCloud. End result is a Org VDC + Org Network, running on a nested ESXi host…. ready for Zerto to use as a recovery destination…… so onto the Zerto (aka fun) part!!

 

Hosted Organization Network config 1/2

Hosted Organization vDC confi

RaaS_ZettaDemo_PvDC

ZettaDemo Provider vDC config

 

 

 

 

 


 

So… Zerto? Where’s that fit in.

RaaS_High_Level

Zerto RaaS Service overview

In short…. on top of the nested environment :) …..

The slightly longer explanation: Zerto has 3 main components in a multi-tenant deployment

  1. ZVM: Zerto Virtual-replication Manager (Customer and Provider side, one each.)
  2. VRA: Virtual Replication Appliances (Customer and Provider Side, one per source and destination ESXi node)
  3. ZCC: Zerto Cloud Connector (Provider side, one per tenant)

Diagram to the right shows the High-level of how the service works.So for this part, the best way I can think of to show it is to get our hands dirty and run through the install/config procedure, again starting with the client site:

Client Site (Cnidus.net)

The RaaS customer (Cnidus.net pty ltd in this case) first installs a ZVM server (hosted on the base vCloud), this will also add a plugin to their vCenter install, which is the primary way of managing the Zerto environment. Once that’s done, the remainder of the config is from within vSphere client.

License key is then entered, along with some basic site info (Name, location, support contact etc), then its time to install the VRA’s. Again, some basic details are all that’s required… Which host, host password (to install a small vStorage-API plugin) and IP details for the VRA itself.

Cnidus.net (Client side) VRA config

Cnidus.net (Client side) VRA config

Then, its time to pair to the recovery site. At this point, the provider would need to be setup…. so lets jump across to that, then come back to joining it all together.

Provider Site (ZettaDemo.com.au)

This side is bit more complex, as we need to interface with vCloud-Director and also do some masking of resources etc (you don’t want your customer’s seeing your ESX config details do you?). Honestly, this part is really what gets me excited with Zerto, they’ve done a really good job of listening to service providers during beta and given us the tools to do what we need to…. but let’s get cracking.

As with the Client, the first step is to deploy a ZVM, then the license, site info and VRA(s)….. we won’t bother with that again.

That’s where the similarity ends though. Next step is to pair to vCloud, which needs a Message Queue setup. Most providers will already have that setup, but in the lab… we’ll need to set that up. Fortunately, Zerto provide a handy installer for that which downloads RabbitMQ, sets it up and joins to vCloud Director. I won’t bother covering that… its dead simple (Good thinking by Zerto automating that part though)…

So, at this point, we’re ready to add the vCloud details into Zerto, image  to the right shows that config…. pretty self explanatory (Cell address/IP, username, password and AMQP details)

RaaS_ZettaDemo_zerto_provider

Provider Zerto vCloud Setup

Then its time to add which Provider VDCs to make available for replication / recovery and what each DataStore can be used for. Pretty clever I think, good work Zerto. (Right).

One other key setting for the provider…”masking for new paired sites”, you can see it in the back ground (Right). By default, when pairing, you can see all provider networks and destination VDCs etc… obviously not what a provider would want. This setting overrides that behaviour.

Next up, the ZCC needs to be deployed. This will be the demarcation point between customer and provider.

The process is simple, enter in some IP details for the customer side (provided by the customer), IP details for the provider side, OrgName etc. One cool feature is the static route groups. This allows for a non-flat network on the provider side. This is actually a feature I requested during beta…. You’re welcome 😛

In this case, the ZCC bridges between the 192.168.60.0 (cnidus.net) and 192.168.80.0 (ZettaDemo) networks.

Zerto Cloud Connector Details

ZCC Configuration

That’s it on the provider side… now time to pair the sites.

Pairing the sites

Again, this is nice and simple too. From the client side, we need to initiate the pairing request, then resources will be unmasked from the provider side. From an automation perspective, it would be nice if the provider could pre-provision all of that, but that’s not the case currently.

So, from the main Zerto console on the client site, the big “Pair” button is pressed, then its just a matter of entering the Customer-facing IP of the Cloud-Connector, In this case 192.168.60.253.

Pairing to the recovery site

Pairing to the recovery site

All going well, both provider and client should now see each other.

Then its the provider’s turn to deliver some resources to the customer. Since we enabled masking for new sites previously, by default the customer is presented with nothing until we specifically configure a recovery site.

Masking of resources for the RaaS customer.

Masking of resources for the RaaS customer.

In the example above, you can see I’ve configured the Zerto Organization “Cnidus.net” to be linked to the vCloud vDC “DC_Cnidus.net” and to have access to the CniNet_Hosted_L3_11403 Org/vDC network, which is a pseudowire back to the client site. I’ve also specified a pre-seed folder, which means we can add some VM’s to that folder, which Zerto will check first during the setup of a new replication task.

You can also see some limits of # protected VM’s and protected storage. These will be the resources you will be paying for from the provider.

So that’s it for the Zerto setup. Sites are installed, configured, paired and resources are assigned to the customer…. all that’s left is setting up a replication job, in Zerto terminology a Virtual Protection Group (VPG). So lets do that, then call it a day.

My First VPG! (not really…. but yours, maybe)

So this really is what the whole lab is built to support. It took far longer to write about it than to actually do it, by the way…

This is a client-managed procedure, it’s completely self-service from this point on…. cool, huh?!

In short the process is below in bullet points…. (it’s simple enough to not need a wordy explanation).

Easy, huh!?

VPG Sample Configuration

  1. We start by clicking the “New VPG” button, visible on most config panels.
  2. Select the target Recovery site (Zerto can replicate to multiple sites….)
  3. Enter shared details for this VPG
    • VPG Name
    • Priority (in relation to other VPGs)
    • Target RPO
    • Journal History (enabling roll-back to point-in-time)
    • Test Frequency (non-disruptive, self-service testing is available…. how often should it be enforced?
    • Target vDC (maybe you have a couple, for resource separation / different SLAs etc)
    • Live Failover network (where VM’s should go when you want then to actually service clients
    • Test Failover network (where VM’s should go when you just want to test replication is working
  4. Add the protected VMs
  5. Save.
  6. Watch all the creation tasks complete.
  7. DONE!

So that’s it. Replication starts immediately after configuration. If there was a seed VM configured, Zerto will use that… otherwise, the whole VM will replicate over the WAN. Once sync is complete, only changes will be sent.

Site Overview during initial VPG Sync

Site Overview during initial VPG Sync

 

VPG Details during initial Sync

VPG Details during initial Sync

Fruits of all the labour.... 6sec RPO! :)

Fruits of all the labour…. 6sec RPO! :)

Summary

So there you have it; One nested vCloud environment, with Zerto, replicating, ready to test out all sorts of scenarios. For the most part, it was a fairly simple and painless process to deploy. The biggest thing was getting the network right through the different layers…. mapping it out in a living visio document helped keep that all clear in my head at least.

From here, its a matter of testing the various mechanisms of L2 WAN bridging, testing the fail-over and fail-back functionality and the upgrade paths from 2.x to 3.x and developing the deployment workflows, most of which we’ve already got done.

RaaS here we come!

…. Now time for a shameless plug; I really should thank my employer, ZettaGrid for allowing me the time to build this environment, and post about it in detail. If anyone is interested in being part of the Beta of this RaaS service (based on real hardware of course, not in this lab), or any other cloud projects you need help with…. Give us a call, or get in touch with me/one of my colleagues. @Cnidus

Cheers!

-Doug

2 Responses to “ESXi, vCenter, vCloud Director, Zerto, AD….. all nested inside vCloud.”

  • Cnidus:

    I actually scrapped the L2TP portion of my lab tests, but did have a customer using MPLS segments for DRaaS and got replication traffic to flow without any incident.

  • David:

    Did you get the L2TP tunnel solution to work w/ Zerto? I’m trying it over MPLS and have had issues with packet size on the replication traffic. Control traffic seems okay.

Leave a Reply for Cnidus

The opinions expressed on this site are my own and not necessarily those of my employer.

All code, documentation etc is my own work and is licensed under Creative Commons and you are free to use it, at your own risk.

I assume no liability for code posted here, use it at your own risk and always sanity-check it in your environment.