This morning when I woke up, it was to the sound of my monitoring system notifications.

We have a pretty simple setup:

  • web servers connect to the vpn server
  • vpn server connects to the internal servers
  • web and vpn are hosted for our customer by a cloud provider
  • internal servers are hosted at our customer's datacenter

This morning then, around 7:30AM, a group of technicians decided to make a major change in the network configuration of all machines linked to our customer's subscription at the cloud provider.

This change was supposed to be unrelated to our innocent boxes, but it turns out that after this re-configuration, we were in a funny situation:

  • web could no longer connect to vpn
  • vpn could no longer connect to web either
  • but web and vpn could connect / be connected from anywhere else on the Internet.

tenor-1

If you can't take the direct road ...

As expected, our customer was not happy about the situation, and we were given until the end of the day to make it work again, no matter what.

Recap of the situation:

  • we have about 20 web servers that are on a 10.10.0.0/16 network
  • the vpn machine is physically wired to both the Internet and the internal customer network, so we have to use it.
  • the openvpn network running on vpn has the 10.30.0.0/16 range
  • the internal network has a lot of subnets of various sizes (most are /16)

We decided to spin another server, to go from the web network to the newvpn network, then from newvpn to vpn and from there we're back in business.

It's quite a stretch, but we had an almost-working situation that just needed a little nudge.

For connecting newvpn to vpn instead of using openvpn again, and since there was only going to be one peer in the network, we instead chose to use wireguard, even though it claims on the website it is not yet production ready, it is being currently reviewed for being integrated directly in the linux kernel, and past experiences proved it to be resilient enough for our usage.

WireGuard under Centos7 caveats

WireGuard comes with an abundance of packages to install from, and this was a treat since vpn runs Ubuntu while the rest of our infrastructure runs Centos7.

[[email protected]]# curl -Lo /etc/yum.repos.d/wireguard.repo https://copr.fedorainfracloud.org/coprs/jdoss/wireguard/repo/epel-7/jdoss-wireguard-epel-7.repo
[[email protected]]# yum install epel-release
[[email protected]]# yum install kernel kernel-headers dkms
[[email protected]]# yum install wireguard-dkms wireguard-tools

Speaking of which, I encountered an annoying issue right after installing it:

[[email protected]]# ip link add dev wg0 type wireguard
RTNETLINK answers: Operation not supported

tenor--1--1

What did I miss ? Was there some Centos-specific incantation that I overlooked ? No, I did all the steps as described, I had installed the kernel, the kernel headers, dkms, etc.

But still, the wireguard kernel module was not being found, as modprobe would confirm:

[[email protected]]# modprobe wireguard
modprobe: FATAL: Module wireguard not found.

Then some old proverb struck my mind !
tenor--2--1

[[email protected]]# reboot now
...
[[email protected]]$ modprobe wireguard
modprobe: ERROR: could not insert 'wireguard': Operation not permitted

# oops! but encouraging!

[[email protected]]# modprobe wireguard
[[email protected]]#

Et voilà!

Setting up WireGuard

The rest of the setup went pretty much like what is described in the quickstart, so I'll just post edited versions of my configurations here:

On vpn server

First step was to disable openvpn that was running from vpn to avoid further confusion, and install WireGuard.

[[email protected]]# service openvpn stop

In /etc/wireguard/wg0.conf

[Interface]
Address = 192.168.1.1/32
ListenPort = 3000
PrivateKey = ABCDEFG


[Peer]
PublicKey = KLMNOP
Endpoint = x.x.x.x:7000
AllowedIPs = 192.168.1.2/32

This is pretty clear: we declare a new interface called wg0 (thanks to the name of the file), that will be serving WireGuard service over port 3000.
This end of the connection will be associated with the IP 192.168.1.1/32.

iptables rules

iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -j MASQUERADE

This means "whatever comes out of WireGuard and its weird-looking IP range, just forward it to where it wants to go".

in /etc/sysctl.conf

net.ipv4.conf.all.proxy_arp=1
net.ipv4.ip_forward=1

On newvpn server

I am not going to describe how to setup openvpn here since there is already an abundance of available documentation and blog posts on the subject. I will just share the bits of configuration that are relevant for this example.

in /etc/wireguard/wg0.conf

[Interface]
Address = 192.168.1.2/24
ListenPort = 7000
PrivateKey = WXYZ

[Peer]
PublicKey = QRSTUV
Endpoint = y.y.y.y:3000
AllowedIPs = 192.168.1.1/32, 10.11.0.0/16, 10.45.0.0/16, 171.13.0.0/16, 10.32.0.0/16, 20.115.18.214/32, 20.70.0.0/16, 10.66.68.0/24

This is the important part: WireGuard can take care of setting up all your routes for you, as long as you declare what routes are accessible through what peer in the interface configuration file.
You can mention any route that can be routed from your exit node (here vpn) and it will happily setup your clients to send the traffic in the right place, even if the IPs you mention are completely alien to the IP range you are using for setting up your private WireGuard network (I used 192.168.1.x here and on the internal network I'm only using 10.x.x.x, 20.x.x.x or 171.x.x.x addresses).

The utility that is managing all of this for you is called wg-quick and you can just invoke it this way once you have written your configuration file: wg-quick up wg0.

in /etc/openvpn/servers.conf

...
server  10.30.0.0 255.255.0.0
...
# Service 1
push "route 10.11.0.0  255.255.0.0"
# Service 2
push "route 10.45.0.0  255.255.0.0"
# Service 3
push "route 171.13.0.0 255.255.0.0"
push "route 10.32.0.0 255.255.0.0"
# Service 4
push "route 20.115.18.214 255.255.255.255"
# Service 5
push "route 20.70.0.0 255.255.0.0"
# Service 6
push "route 10.66.68.0 255.255.255.0"
...

This network topology is going to be pushed to each VPN client saying "if you are looking for this range of IPs, then ask me". "me" is in this case newvpn.
Since WireGuard is routing the exact same ranges through its own interface, newvpn is just transparently passing packets from web to vpn.

in /etc/openvpn/jail/ccd/web-5.conf

ifconfig-push  10.30.0.5  255.255.0.0

This pushes the static IP of each web server in the openvpn network.

in /etc/sysctl.conf

net.ipv4.conf.all.proxy_arp=1
net.ipv4.ip_forward=1

iptables rules

iptables -A FORWARD -s 10.10.0.0/16 -j ACCEPT
iptables -A FORWARD -d 10.10.0.0/16 -j ACCEPT
iptables -t nat -A POSTROUTING -o tun0 -j MASQUERADE

This will accept all traffic incoming from tun0 (the openvpn exit point), and forward it where it wants to go (we could probably have used -s 10.10.0.0/24 instead of -o tun0).

Don't ask me why here we need to specify those FORWARD rules either and not on the WireGuard server side, I don't know. But I know openvpn routing does not work without it.

One last challenge awaits

After running wg-quick on each server and starting openvpn on newvpn, I could ping internal services from my web boxes !

[email protected]:~$ ping 10.45.13.62
PING 10.45.13.62 (10.45.13.62) 56(84) bytes of data.
64 bytes from 10.45.13.62: icmp_seq=1 ttl=104 time=57.9 ms
64 bytes from 10.45.13.62: icmp_seq=2 ttl=104 time=56.0 ms
^C
--- 10.45.13.62 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 56.081/56.996/57.911/0.915 ms

This goes through tun0, exits in newvpn, gets forwarded by WireGuard to vpn who is physically connected to the 10.45.0.0/16 network, and back !

But as I am ready to call it a day, a co-worker tells me that he cannot reach the service in question, but ping is indeed doing its job.

[email protected]:~$ curl -v http://10.45.13.62
* Rebuilt URL to: http://10.45.13.62/
*   Trying 10.45.13.62...
* connect to 10.45.13.62 port 80 failed: No route to host
* Failed to connect to 10.45.13.62 port 80: No route to host
* Closing connection 0
curl: (7) Failed to connect to 10.45.13.62 port 80: No route to host

tenor--3-

So you're telling me that ICMP traffic can reach the host, but TCP traffic cannot ?
This is definitely not a networking issue, but sounds an awful lot like some firewall issue.
Let's check the iptables rules just one more time ...

[[email protected]]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere  anywhere  ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere
INPUT_direct  all  --  anywhere             anywhere
INPUT_ZONES_SOURCE  all  --  anywhere         anywhere
INPUT_ZONES  all  --  anywhere     anywhere
DROP       all  --  anywhere    anywhere             ctstate INVALID
REJECT     all  --  anywhere     anywhere    reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere      anywhere   ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere
FORWARD_direct  all  --  anywhere             anywhere
FORWARD_IN_ZONES_SOURCE  all  --  anywhere             anywhere
FORWARD_IN_ZONES  all  --  anywhere             anywhere
FORWARD_OUT_ZONES_SOURCE  all  --  anywhere             anywhere
FORWARD_OUT_ZONES  all  --  anywhere             anywhere
DROP       all  --  anywhere             anywhere             ctstate INVALID
REJECT     all  --  anywhere     anywhere   reject-with icmp-host-prohibited
ACCEPT     all  --  anywhere             10.10.0.0/16
ACCEPT     all  --  10.10.0.0/16         anywhere
....

I can see the rules that I added for forwarding traffic, and I don't know what I'm doing wrong here.

tenor--4-
Except maybe ...

See those lines:

REJECT     all  --  anywhere     anywhere   reject-with icmp-host-prohibited
ACCEPT     all  --  anywhere             10.10.0.0/16
ACCEPT     all  --  10.10.0.0/16         anywhere

What it means is: "reject all packets, but ICMP, then for the packets that haven't been rejected, forward them".

This is basically killing all traffic except ping before doing the relay, while it would make more sense to accept the traffic to be relayed, then drop the remainder while keeping the ICMP for debug purpose.

Turns out it's pretty annoying to edit iptables by hand using the command line, so I just ran a quick iptables-save > /etc/sysconfig/iptables.
Then I swapped the order of the rules so it reads:

ACCEPT     all  --  anywhere             10.10.0.0/16
ACCEPT     all  --  10.10.0.0/16         anywhere
REJECT     all  --  anywhere     anywhere   reject-with icmp-host-prohibited

Then finally ran iptables-restore < /etc/sysconfig/iptables.

tenor--5-

All services re-connected properly, traffic was reaching the internal network again.

Not too bad for an afternoon of work, I ran into an impressive amount of quirks (set aside the initial cataclysm that triggered this whole operation), and was surprised to see it was not more thoroughly documented.

I hope this may help you if you are also dealing with routing issues on Centos7 and using WireGuard and OpenVPN in conjunction. You can do it !