Recently we wanted to deploy a bunch of Celery workers and connect them using a Redis backend for tasks scheduling and result storage.

Since the workers and Redis would be internet-facing, we wanted to make sure it was not leaking any data, so we made a few tests.

Securing the communication with Redis

SSL

Redis does not support SSL, and has no plan to support SSL. Period.

Despite a clear and informative list of ways to make your Redis instance more secure, their only suggestion is to use spiped to setup secure channels between my instances and Redis.

Since we are also building a high-availabilty system, and the number of workers can be high, this was quickly going to turn into the three utilities problem.

So, no chance for SSL, but we still noted some good tips on securing a Redis instance, so not all was lost.

VPNs

For some time, we thought about deploying VPNs, and we considered OpenVPN and wireguard.

OpenVPN High-availability setup is rather complicated, and we didn't have seamless failover (so if the route X from A to B goes down, your client needs to know that Y is also a route from A to B, but then B is called "D").

Not something our small team would feel comfortable doing and maintaining for ages.

Wireguard is basically spiped on steroids, and is not yet "production ready" to their own confession. So not something we would recommend to our customers.

Giving up on networking

Since we couldn't agree on a cost-effective way to encrypt the network traffic, we decided to move on encrypting data before it went on the wire.

Basically if anyone could read our traffic, we would just need to encrypt what was sent so it would be unreadable without the proper key, which is second best to having SSL on Redis.

Kombu Fernet Serializers

During our search for encrypting the content of a Celery request, we landed on this blog post from Heroku blog where they explain how to use a library with a funny name to do exactly what we wanted to do.

What a chance ! So we gave it a spin, and after some adaptations to our code, we started a dummy task ... and this is the traffic we saw on the network:

23:19:03.799507 IP localhost.58354 > localhost.6379: Flags [P.], seq
194:1234, ack 66, win 12757, options [nop,nop,TS val 1301471707 ecr
1301471706], length 1040: RESP "LPUSH" "test" "{"body": "Z0FBQUFBQmJUb
DFIOXBfN295eUd4RnFpUVRGc1FBbDhUcHhtcm16OGQ1dXV6X2hPa3dPN3I5NEFyVkkzRHZ
QUVpHeDJRR1dCNy1NM1dUVkJyNFZRR09ZMkV2bm5qbjFreVhnU1ZKZ1VOLU9CS1V6Yy1Va
m9SNkZuUm1rUGZOU1RwQnF3eExWWFVPWjRsYjB3MXMtbVR0U1RBTllHVHVpZXM5cUpfNkF
0V2p5S2dfWGp6SDB1UXlHc0RRUExwVkY4VE5ReDVkYVRLQ0ZR", "content-
encoding": "utf-8", "content-type": "application/x-fernet-json",
"headers": {"lang": "py", "task": "bluesnake.tasks.simple.add", "id":
"37c8d7ce-0e62-4306-b89b-866f787c498a", "eta": null, "expires": null,
"group": null, "retries": 0, "timelimit": [null, null], "root_id":
"37c8d7ce-0e62-4306-b89b-866f787c498a", "parent_id": null, "argsrepr":
"(2, 3)", "kwargsrepr": "{}", "origin": "gen66855@guybrush.local"},
"properties": {"correlation_id": "37c8d7ce-0e62-4306-b89b-
866f787c498a", "reply_to": "2254248b-f276-3c6e-825d-b7c83d0e2e89",
"delivery_mode": 2, "delivery_info": {"exchange": "", "routing_key":
"test"}, "priority": 0, "body_encoding": "base64", "delivery_tag":
"5b382193-a938-4639-90e3-1ed5854fa1e1"}}"

If you compare to what goes normally over the network, you might wonder what is exaclty being encrypted ...

23:17:06.175108 IP localhost.58345 > localhost.6379: Flags [P.], seq
194:1060, ack 66, win 12757, options [nop,nop,TS val 1301354432 ecr
1301354432], length 866: RESP "LPUSH" "test" "{"body": "W1syLCAzXSwge3
0sIHsiY2FsbGJhY2tzIjogbnVsbCwgImVycmJhY2tzIjogbnVsbCwgImNoYWluIjogbnVs
bCwgImNob3JkIjogbnVsbH1d", "content-encoding": "utf-8", "content-
type": "application/json", "headers": {"lang": "py", "task":
"bluesnake.tasks.simple.add", "id": "0be78bb8-d7dc-4dde-
a2b7-7579187ab952", "eta": null, "expires": null, "group": null,
"retries": 0, "timelimit": [null, null], "root_id": "0be78bb8-d7dc-
4dde-a2b7-7579187ab952", "parent_id": null, "argsrepr": "(2, 3)",
"kwargsrepr": "{}", "origin": "gen66794@guybrush.local"},
"properties": {"correlation_id": "0be78bb8-d7dc-4dde-
a2b7-7579187ab952", "reply_to": "95eaf326-0e9b-3324-a11e-
1855ba095e05", "delivery_mode": 2, "delivery_info": {"exchange": "",
"routing_key": "test"}, "priority": 0, "body_encoding": "base64",
"delivery_tag": "8f80d7e6-3a4e-417e-9243-940307ca7e4d"}}"

Turns out it's only encrypting the body field of the message, which is usually my_function_name(argument1, argument2, ...). But since arguments are also repeated (in clear) in the argsrepr field, basically you're just encrypting the name of your function.

And then you are leaking

  • the origin field that contains the name of the node that emitted the request
  • the task name that gives out the code path that could be sensitive
  • the argsrepr and kwargsrepr that could contain sensitive data (if you run calculations by passing sensitive data to your Celery tasks

tenor

Kombu Fernet (Redneck) Engineering

Thanks to that blog post, we learned about Fernet, and using it to encrypt data seemed indeed a good idea. Deploying keys to our workers and applications servers would not be an issue at all, it definitively makes life harder for people to lisen to our our network, and we don't need to secure redis. If only if was not leaking so much data...

But wait. What not just leave the default encoder, and encrypt the whole thing before it gets emitted to Redis ?

Then you just decrypt the traffic as it comes back from Redis and before Celery can read it, and you're good, no ?

tenor--1-

The quest for the mighty serializer

You only have two ways to figure out how a library is communicating with a server: you look at where it (supposedly) connects to the server and instrument each function call until you find the one that actually sends the data, or you read the network capture, isolate the instructions (here LPUSH) that's being used to actually push data, and grep/find the instruction on the library codebase.

Obviously the second option has more chances for success. Unfortunately I did not think this through and started with the first way.

Turns out that Celery is a quite complex piece of code, and that it implements a backend for each supported backend (AMQP, Redis, MongoDB, ...). Yes, you read backend, so to store the results, while I was busy looking to encrypt the message. Needless to say that my quest was a failure.

Then a coworker pointed me at this weird LPUSH command, casually mentionned it was not used often in the codebase, and two grep later I was editing the Kombu library, which is from the team behind Celery, and handles Celery messaging.

I very quickly then came over the kombu.utils.json module that contains two lovely functions: dumps and loads that are called by Celery/Kombu to send the whole payload to Redis from the application server, and retrieve it from the worker.

Literally 5 minutes later I managed to use Fernet to encode and decode traffic going to Redis and now it looks like this:

23:15:47.759061 IP localhost.58336 > localhost.6379: Flags [P.], seq
194:1648, ack 66, win 12757, options [nop,nop,TS val 1301276260 ecr
1301276260], length 1454: RESP "LPUSH" "test"
"gAAAAABbTlyD5JuayrIQMJWWIbbb-9s7jcajOzQojrgStM5r1Ti9gPUpIs-Pv6595YdO-
8nLZqnQqmUWRH8HxgS3zcCccjf0BsWmEJ0B517yQaBeooxY69nA4pZRMmmjj7zRwdPxTr6
JtbBzgunYYTHaon0BtMhBwe29ztsXXfieP36C0SlDO54sH0EWLAqXkP3Khyif0fDRSjqFf
oGrdGMgKdpV-W-nxUmA4n4W04-VtRSnQ8QXKnxhQ3JsWfaQ0JxorXgrplxrzzY1_TfuUjZ
HUmavPsKQHzxEmT_GWCZ7mgv9xWUqqqUaxTO84recgLDGGZcrJEMsWQbc6WvGECZPZ9TZe
XIAjqYbtkB9uNWBOnnN8Mj-xKf0EutB-
kVR9Urlpa9LAecaGUfTzzv0GflzVnQzPn3EhUtBszSoRwFUBuc_TO0sP4jNcyj-cvmCnXq
pWpmsueuwHZUtFi59qcOKygboG9feXt5MbLgIOkDXiYOZVDYyZeAa5jdsquu6WRHJwH3Ja
wPvwvYijQNvuNEgTSEA4oJoG9qjGCxb568D_gsOVFF51n_CLfuQ_992mw_O0_bR8kTmGbq
_9m3s2Cwr8cgLOeeq0RtjaaYSczS6ECkYOplS45UMkPnC-itsG4a27F7UzSjpqUhZYl7iw
h4L7QWU_ClsAzmjBPgvaR2lDaTljaO9EfHS3YQ8GlpTOZpoMC2L3FLEvFXdWJpA2gOIfSu
51cy3HdthTXw2METhawO_PsjJ7seGIVUpIB6NKQf6909dtfmqE7hv-
a2LX4eTM6wbrKlb8j2I6V1QH6V13KkndbBPrXshh3dISGV7T2n-ZJLHHr45fo7l6OJ48Cc
ghVyeVy6eOR9oneAFmK6YZbkBSMWSXxbW2b6wXE5ekdE64YsueG2qQ94jB0Fh8_7tWBKBp
1Y34Boj1ToQ9iT90UtUNyKe52ZgaoCfBjA56R-hnLKObtz-n1gTZhQimeIwuMlWr0Wcknn
t7c2UuHQr_82jsEJ4UYdKs6hisvu4jRjehfbJdI4kjyk055TzATNzKMQTGtB4VSv5-NN33
xRadx84BUgyvMQdZfopMtl3rZD5TtyXIkY0rpjPisq619qaSOKlBbvfv6xXRHbRe2IKNyz
aJRr_-7pEXtP8gQgj48MTU2TnuEqSgd8djp5p_Perth6Fa8LLz3b4P5Y9ZUq5XqwT6T6ec
EgWUISZvgzkpZTI7bCT4U5aLz3NgUofg0-5UYrIMktWxzPL-
Y5ewNpZ5qm9TxUAQVC3CeQwke6wWdcvUpwU07fmMd3H6Hw006vytERVv3E8JUvdg4O-
MhOcYUfzNejAG9Un_iT9Lyud0affsVxwMhPzjIeIfTpXyyaqDq-muntYcyZhjTt6T_Qym-
Tq9Q9JmCRGMzIJAZzsUMz6sVcA4b98"

tenor--2-

This big blob is the previous payload, Fernet-encrypted, and readable only by one of our workers or application servers.

Time spent on the issue: 4h, which is actually pretty fast compared to the time of setting up and operating a HA VPN network.

I still haven't checked that this is sufficient to encrypt the result of the tasks, but if it does, I'll probably release a piece of code to patch Kombu so you can store data in Redis in a secure manner, and like me enjoy using Celery over the internet safely.