Friday, 13 May 2016

A Lightweight multi-host cloud using LXD


In this post I'm going to explain how to create a multi-host cloud using LXD, if you don't know about LXD then you can get more information from the ubuntu link -> LXD.
It's lightweight in a few ways:
  • It uses LXD containers for the hypervisor so removes the Virtualisation overhead
  • It doesn't provide all the features of a normal cloud like shared storage, messaging, Telemetry, Identity etc. although it is more than capable of running most workloads.
  • It can run on bare-metal or Virtual Machines or Cloud instances

Motivation

I've used multi-hosts lxd setups for a while but it was always a pain to get connectivity between hosts, because each host had it's own dnsmasq instance. We had to have a route defined to each lxd subnet via the host configured for the office (and the vpn) which was a pain.
Also because each host had it's own dnsmasq for the hostnames we had to have a forwarders in our main office dns to a unique sub-domain.
If for example we wanted to load balance between 2 servers on 2 hosts then they would have to have a different domain name, e.g. service1.leg1.lxd and service2.leg2.lxd. Migrating between hosts would change the IP address and the hostname meaning we'd have to re-configure a load-balancer.
Multi-cast was a real pain, especially in AWS on EC2 instances.
So to address these issues and using this Flockport article I tried to work out how to use a layer 2 network on top of the layer 3.
As it turned out it wasn't that difficult.

Advantages

  • A shared address range across all hosts (an Alternative is the Ubuntu Fan Network)
  • Single configuration for dns resolution
  • Simplified routing for external connections
  • The ability to have a much bigger IP address range
  • A whole lot easier to set up than openstack
  • Live migration of instances between hosts (Although according to Stéphane Graber's website, this is not ready for production
    • When I tested this it seemed to work and the IP address and host name in dns was the same

Use Cases

  • Replicating a real cluster of servers for development
  • Increasing the density of usage on AWS/EC2, therefore reducing costs.
    • Most of our software is idle outside of peak usage reducing the number of EC2 instances can be a significant cost saving
  • Getting multicast to work on EC2

What does this example give us?

The final cloud will have an Layer 2 network overlay between hosts, a single DNSMASQ providing DHCP and DNS resolution,

Lets do It

For this example I've used Vagrant and virtual box on my mac, I create 3 machines with Ubuntu Xenial on them with no extra network BUT I've port forwarded port 7000 to a local port on the mac.
  • The host has an IP address of 192.168.99.1
  • lxd1 has port 7000 to port 192.168.99.1:7001
  • lxd2 has port 7000 to port 192.168.99.1:7002
  • lxd3 has port 7000 to port 192.168.99.1:7003
This was done to ensure that there was no direct connectivity between the VMs. In AWS for example each machine may have an IP address but there's no MULTICAST between them (most cloud environments don't support multicast).
( Our software is much easier and scalable if multicast is available )
The project for this file is lxd-cloud hosted on github

Step 1 Install the software.

On each box we're going to install:
SoftwareReason
peervpnProvides the layer 3 network between the machines
bridge-utilsThis is used to add the vpn tap device to the lxd bridge
libssl-devUsed to compile peervpn
build-essentialUsed to compile peervpn
zfsutils-linuxFor use of zfs for lxd
lxdWell it wouldn't work without it
sudo apt-get install bridge-utils libssl-dev build-essential zfsutils-linux lxd
wget http://www.peervpn.net/files/peervpn-$VERSION.tar.gz
tar -xvf peervpn-$VERSION.tar.gz
cd peervpn-$VERSION
make
sudo make install

Step 2 initialise lxd

On each host we will initialise lxd, below we use an zfs loopback device
lxd init --storage-backend=zfs --storage-create-loop=100 --storage-pool=lxd --auto  

Step 3 configure the lxd bridge

Modify the /etc/default/lxd-bridge file to set up the layer 2 network
I used:
MACHINESTATICIPCIDR MASK
lxd1172.16.0.116
lxd2172.16.0.216
lxd3172.16.0.316
SETTINGVALUE
USE_LXD_BRIDGEtrue
UPDATE_PROFILEtrue
LXD_DOMAINYour choice of domain
LXD_IPV4_ADDR{STATICIP}
LXD_IPV4_NETMASKcalculated netmask from CIDR MASK e.g. 255.255.0.0
LXD_IPV4_NETWORK{STATICIP}/{CIDR_MASK}
LXD_IPV4_DHCP_RANGEThe range for dnsmasq ip addresses e.g. "172.16.0.10,172.16.255.254", in this case we can have 9 hosts
LXD_IPV4_DHCP_MAXThe number of hosts in the range e.g. 65534 - 10 = 65524
LXD_IPV4_NATtrue
The IPV6 setting are not used in this example.

Step 4 we need to fix the DNSMASQ settings

By default the lxd-bridge program creates a DNSMASQ service if the LXD_IPV4_ADDR or LXD_IPV6_ADDR (I think they should have used the DHCP entry myself), BUT we only want 1 running on the layer 2 network.
There are a number of ways to fix this
  1. Change the lxd-bridge program to check for the LXD_IPV4_DHCP_RANGE setting and only define that on 1 host
  2. override the dnsmasq program in the /etc/default/lxd-bridge which is what I did
So on 2 of the 3 hosts add this line to the /etc/default/lxd-bridge file alias dnsmasq="/bin/echo 'Not starting dnsmasq'"

Step 4 Restart lxd-brigde

On all hosts
service lxd-brigde restart
Now if you check the lxdbr0 device using ifconfig they should have the {STATICIP} assigned

Step 5 Configure peervpn

We're using peervpn to create the layer 2 network (you can use other methods, see the Flockport article.
Each host must have an IP address for the VPN and the peers for the other hosts. Peers are of the format <ip> <port>, in my case we use the port_forwarded address,
MACHINEIPPEER1PEER2
lxd110.99.0.1192.168.99.1 7002192.168.99.1 7003
lxd210.99.0.2192.168.99.1 7001192.168.99.1 7003
lxd310.99.0.3192.168.99.1 7001192.168.99.1 7002
On each machine as root
cd /etc
mkdir peervpn
cd peervpn

IP=<IP>
PEER1=<FORWARDED PORT FOR ANOTHER NODE>
PEER2=<FORWARDED PORT FOR ANOTHER NODE>

cat > peervpn.conf.l2 <<!
networkname PEERVPN
psk password
enabletunneling yes
interface peervpn0
ifconfig4 $IP/24
port 7000
initpeers $PEER1 $PEER2
upcmd brctl addif lxdbr0 peervpn0
!
Notice The upcmd adds the tap device to the lxd bridge device
You can now test the peervpn by running
/usr/local/sbin/peervpn /etc/peervpn/peervpn.conf.l2
You should see the number of peers increasing after a few seconds

Step 6 Make peervpn a service

To make peervpn a service as root run:
cd /lib/systemd/system
cat > peervpn.service <<!
[Unit]
Description=Start the VPN
Wants=lxd.service
After=lxd.service

[Service]
Type=simple
ExecStart=/usr/local/sbin/peervpn /etc/peervpn/peervpn.conf.l2

[Install]
WantedBy=multi-user.target
!
service peervpn start
systemctl enable peervpn

Step 6 Check stuff

You should be able to ping the 172.16.0.[1,2,3] addresses from all machines if you can't then
  • use brctl show and make sure the peervpn0 is in the interface
  • sudo service peervpn status and make sure that there are peers
  • ifconfig and make sure that the peervpn0 and lxdbr0 devices have ip addresses
  • ip route show and make sure that there is a route for the peervpn network associated with the vpn ip
  • ip route show and make sure that there is a route for the lxd network associated with the lxd ip
Example routes
ip route show
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0  proto kernel  scope link  src 10.0.2.15
10.99.0.0/24 dev peervpn0  proto kernel  scope link  src 10.99.0.1
172.16.0.0/16 dev lxdbr0  proto kernel  scope link  src 172.16.0.1
Sometimes it will fail because the services were not started in the correct order, try rebooting the host if this was the case.

Lets do some lxd tests

Get and image from the image store on each host e.g.lxc image copy ubuntu:xenial local: --copy-aliases --auto-update
You can set the core.https_address and core.trust_password and add remotes for the other hosts and copy the image between hosts if your internet is too slow
Got to each host and create a container make sure each container has a unique name.
e.g lxc launch xenial c1
using lxc list the host should get an ip in the lxd network range (this may take a couple of seconds).
You should be able to ping between containers using the ip address or the dns address (This is the e.g. c1.lxd if your LXD_DOMAIN was set to lxd).
This gist has an example of how to test multicast, which should work between containers.

Issues and improvements

  • There is a single point of failure in dnsmasq
  • It is possible to create a container on 2 hosts with the same name, they get a unique IP address but only the last one is resolvable through DNS
  • A GUI for administering the cluster would be nice.
  • Peervpn is not one of the standard ubuntu packages and forces encryption and overhead. Alternatives are mentioned in the Flockport article but I haven't experimented with them