This morning when I woke up, it was to the sound of my monitoring system notifications.
We have a pretty simple setup:
webservers connect to the
vpnserver connects to the
vpnare hosted for our customer by a cloud provider
internalservers are hosted at our customer's datacenter
This morning then, around 7:30AM, a group of technicians decided to make a major change in the network configuration of all machines linked to our customer's subscription at the cloud provider.
This change was supposed to be unrelated to our innocent boxes, but it turns out that after this re-configuration, we were in a funny situation:
webcould no longer connect to
vpncould no longer connect to
vpncould connect / be connected from anywhere else on the Internet.
If you can't take the direct road ...
As expected, our customer was not happy about the situation, and we were given until the end of the day to make it work again, no matter what.
Recap of the situation:
- we have about 20
webservers that are on a
vpnmachine is physically wired to both the Internet and the
internalcustomer network, so we have to use it.
- the openvpn network running on
internalnetwork has a lot of subnets of various sizes (most are /16)
We decided to spin another server, to go from the
web network to the
newvpn network, then from
vpn and from there we're back in business.
It's quite a stretch, but we had an almost-working situation that just needed a little nudge.
vpn instead of using openvpn again, and since there was only going to be one peer in the network, we instead chose to use wireguard, even though it claims on the website it is not yet production ready, it is being currently reviewed for being integrated directly in the linux kernel, and past experiences proved it to be resilient enough for our usage.
WireGuard under Centos7 caveats
WireGuard comes with an abundance of packages to install from, and this was a treat since
vpn runs Ubuntu while the rest of our infrastructure runs Centos7.
[[email protected]]# curl -Lo /etc/yum.repos.d/wireguard.repo https://copr.fedorainfracloud.org/coprs/jdoss/wireguard/repo/epel-7/jdoss-wireguard-epel-7.repo [[email protected]]# yum install epel-release [[email protected]]# yum install kernel kernel-headers dkms [[email protected]]# yum install wireguard-dkms wireguard-tools
Speaking of which, I encountered an annoying issue right after installing it:
[[email protected]]# ip link add dev wg0 type wireguard RTNETLINK answers: Operation not supported
What did I miss ? Was there some Centos-specific incantation that I overlooked ? No, I did all the steps as described, I had installed the kernel, the kernel headers,
But still, the
wireguard kernel module was not being found, as modprobe would confirm:
[[email protected]]# modprobe wireguard modprobe: FATAL: Module wireguard not found.
Then some old proverb struck my mind !
[[email protected]]# reboot now ... [[email protected]]$ modprobe wireguard modprobe: ERROR: could not insert 'wireguard': Operation not permitted # oops! but encouraging! [[email protected]]# modprobe wireguard [[email protected]]#
Setting up WireGuard
The rest of the setup went pretty much like what is described in the quickstart, so I'll just post edited versions of my configurations here:
First step was to disable openvpn that was running from
vpn to avoid further confusion, and install WireGuard.
[[email protected]]# service openvpn stop
[Interface] Address = 192.168.1.1/32 ListenPort = 3000 PrivateKey = ABCDEFG [Peer] PublicKey = KLMNOP Endpoint = x.x.x.x:7000 AllowedIPs = 192.168.1.2/32
This is pretty clear: we declare a new interface called
wg0 (thanks to the name of the file), that will be serving WireGuard service over port
This end of the connection will be associated with the IP
iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -j MASQUERADE
This means "whatever comes out of WireGuard and its weird-looking IP range, just forward it to where it wants to go".
I am not going to describe how to setup openvpn here since there is already an abundance of available documentation and blog posts on the subject. I will just share the bits of configuration that are relevant for this example.
[Interface] Address = 192.168.1.2/24 ListenPort = 7000 PrivateKey = WXYZ [Peer] PublicKey = QRSTUV Endpoint = y.y.y.y:3000 AllowedIPs = 192.168.1.1/32, 10.11.0.0/16, 10.45.0.0/16, 184.108.40.206/16, 10.32.0.0/16, 220.127.116.11/32, 18.104.22.168/16, 10.66.68.0/24
This is the important part: WireGuard can take care of setting up all your routes for you, as long as you declare what routes are accessible through what peer in the interface configuration file.
You can mention any route that can be routed from your exit node (here
vpn) and it will happily setup your clients to send the traffic in the right place, even if the IPs you mention are completely alien to the IP range you are using for setting up your private WireGuard network (I used
192.168.1.x here and on the
internal network I'm only using
The utility that is managing all of this for you is called
wg-quick and you can just invoke it this way once you have written your configuration file:
wg-quick up wg0.
... server 10.30.0.0 255.255.0.0 ... # Service 1 push "route 10.11.0.0 255.255.0.0" # Service 2 push "route 10.45.0.0 255.255.0.0" # Service 3 push "route 22.214.171.124 255.255.0.0" push "route 10.32.0.0 255.255.0.0" # Service 4 push "route 126.96.36.199 255.255.255.255" # Service 5 push "route 188.8.131.52 255.255.0.0" # Service 6 push "route 10.66.68.0 255.255.255.0" ...
This network topology is going to be pushed to each VPN client saying "if you are looking for this range of IPs, then ask me". "me" is in this case
Since WireGuard is routing the exact same ranges through its own interface,
newvpn is just transparently passing packets from
ifconfig-push 10.30.0.5 255.255.0.0
This pushes the static IP of each
web server in the openvpn network.
iptables -A FORWARD -s 10.10.0.0/16 -j ACCEPT iptables -A FORWARD -d 10.10.0.0/16 -j ACCEPT iptables -t nat -A POSTROUTING -o tun0 -j MASQUERADE
This will accept all traffic incoming from tun0 (the openvpn exit point), and forward it where it wants to go (we could probably have used
-s 10.10.0.0/24 instead of
Don't ask me why here we need to specify those
FORWARD rules either and not on the WireGuard server side, I don't know. But I know
openvpn routing does not work without it.
One last challenge awaits
wg-quick on each server and starting openvpn on
newvpn, I could ping
internal services from my
web boxes !
[email protected]:~$ ping 10.45.13.62 PING 10.45.13.62 (10.45.13.62) 56(84) bytes of data. 64 bytes from 10.45.13.62: icmp_seq=1 ttl=104 time=57.9 ms 64 bytes from 10.45.13.62: icmp_seq=2 ttl=104 time=56.0 ms ^C --- 10.45.13.62 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 56.081/56.996/57.911/0.915 ms
This goes through
tun0, exits in
newvpn, gets forwarded by WireGuard to
vpn who is physically connected to the
10.45.0.0/16 network, and back !
But as I am ready to call it a day, a co-worker tells me that he cannot reach the service in question, but
ping is indeed doing its job.
[email protected]:~$ curl -v http://10.45.13.62 * Rebuilt URL to: http://10.45.13.62/ * Trying 10.45.13.62... * connect to 10.45.13.62 port 80 failed: No route to host * Failed to connect to 10.45.13.62 port 80: No route to host * Closing connection 0 curl: (7) Failed to connect to 10.45.13.62 port 80: No route to host
So you're telling me that ICMP traffic can reach the host, but TCP traffic cannot ?
This is definitely not a networking issue, but sounds an awful lot like some firewall issue.
Let's check the
iptables rules just one more time ...
[[email protected]]# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED ACCEPT all -- anywhere anywhere INPUT_direct all -- anywhere anywhere INPUT_ZONES_SOURCE all -- anywhere anywhere INPUT_ZONES all -- anywhere anywhere DROP all -- anywhere anywhere ctstate INVALID REJECT all -- anywhere anywhere reject-with icmp-host-prohibited Chain FORWARD (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED ACCEPT all -- anywhere anywhere FORWARD_direct all -- anywhere anywhere FORWARD_IN_ZONES_SOURCE all -- anywhere anywhere FORWARD_IN_ZONES all -- anywhere anywhere FORWARD_OUT_ZONES_SOURCE all -- anywhere anywhere FORWARD_OUT_ZONES all -- anywhere anywhere DROP all -- anywhere anywhere ctstate INVALID REJECT all -- anywhere anywhere reject-with icmp-host-prohibited ACCEPT all -- anywhere 10.10.0.0/16 ACCEPT all -- 10.10.0.0/16 anywhere ....
I can see the rules that I added for forwarding traffic, and I don't know what I'm doing wrong here.
Except maybe ...
See those lines:
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited ACCEPT all -- anywhere 10.10.0.0/16 ACCEPT all -- 10.10.0.0/16 anywhere
What it means is: "reject all packets, but ICMP, then for the packets that haven't been rejected, forward them".
This is basically killing all traffic except
ping before doing the relay, while it would make more sense to accept the traffic to be relayed, then drop the remainder while keeping the ICMP for debug purpose.
Turns out it's pretty annoying to edit
iptables by hand using the command line, so I just ran a quick
iptables-save > /etc/sysconfig/iptables.
Then I swapped the order of the rules so it reads:
ACCEPT all -- anywhere 10.10.0.0/16 ACCEPT all -- 10.10.0.0/16 anywhere REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
Then finally ran
iptables-restore < /etc/sysconfig/iptables.
All services re-connected properly, traffic was reaching the
internal network again.
Not too bad for an afternoon of work, I ran into an impressive amount of quirks (set aside the initial cataclysm that triggered this whole operation), and was surprised to see it was not more thoroughly documented.
I hope this may help you if you are also dealing with routing issues on Centos7 and using WireGuard and OpenVPN in conjunction. You can do it !