Nginx shim without the shim

Hey-o. Me again, this time I have a good old trick for checking on traffic between Nginx, and some type of proxied application. Generally this is for helping to debug applications that are proxied over a local Unix socket. Why? Because if it was proxied over a local TCP connection, you could just use tcpdump on it. Then there would be no issue, would there.

Why is this hard?

Well, this is hard, because generally a Unix socket only has 2 ends, you can't just "attach" to it, and read it like a file or something. By reading the message, it's gone. So if you were to actually "attach" and read the requests... well, your app isn't going to get them. Not so great for invisibly inspecting requests (not for spying, for debugging weird issues. Weird shit happens a lot).

So, if we can't read the socket, how do we read the socket?

I'm glad you asked that, because there is a way. And, it's not even painful.

When I first tried to do this, I found a lot of bad suggestions on the gargler to write some weird sort of "shim" script-thing. And to me, that sounded like a pretty damn terrible idea; because I was trying to debug an issue in a live prod environment. Any shim that I could write would not only have to not alter data, but not incur delays or have throughput bottlenecks. That sounds like a fairly tall order for what essentially was an issue with HTTP headers. So, I thought about it for a bit.

Ok, so, what do I do.

What you do, and what I did, was realize that Nginx makes system calls to read and write to the Unix socket. System calls means strace.

Now, I'm sure that if you leave strace attached to a process for a long time, it'll cause some performance issues, but generally, on a busy server I'm only debugging it for maybe 5-10 seconds at a time. If you redirect the output to a file, or just grep over the output, you can usually find what you're looking for pretty quick.

The command to do this is: strace -p $(pgrep -f "nginx: worker") -s 1024

Quite magnificent.

The syscalls you're looking for as part of the output is writev, and recvfrom, if you want to be sure you're looking at the right messages, you could compare the FD numbers using lsof. Generally you can get a good idea what is going on from reading the headers.

The guts of it all.

Here's a look in-motion of a request to a gunicorn worker (I'll break into eatable pieces):

Here is the request that comes into Nginx over the network. You can also tell that it came from HAProxy, because of the unique header I enabled there (for reasons unrelated to this). I'm also obfuscating the headers and such, because I'm mean like that.

recvfrom(4, "GET /widget HTTP/1.1\r\nHost: an.example.com\r\nUser-Agent: Go-http-client/1.1\r\nContent-Length: 316\r\nAccept: application/json\r\nAuthorization: Token tokens-for-days\r\nAccept-Encoding: gzip\r\nX-Unique-Haproxy-ID: this-is-a-unique-headerE\r\nX-Forwarded-For: 255.255.0.0\r\nConnection: close\r\n\r\nchannel_id=channel&channel_name=channel&command=%2Fwidget&response_url=http%3A%2F%2Fan.example.com%2Fthing&team_domain=example&team_id=such_an_id&text=nope&token=such-a-token&user_id=such-a-user&user_name=waff", 1024, 0, NULL, NULL) = 638

Here is what the request looks like after it's been processed by it's own rules, and written out to the local unix socket (it also connected to it in this case):

connect(14, {sa_family=AF_FILE, path="/tmp/an.example.com_wsgi_application.sock"}, 110) = 0
writev(14, [{"GET /widget HTTP/1.0\r\nX-Real-IP: 127.0.0.1\r\nX-Forwarded-For: 255.255.0.0, 127.0.0.1\r\nHost: an.example.com\r\nX-NginX-Proxy: true\r\nConnection: close\r\nUser-Agent: Go-http-client/1.1\r\nContent-Length: 316\r\nAccept: application/json\r\nAuthorization: Token tokens-for-days\r\nAccept-Encoding: gzip\r\nX-Unique-Haproxy-ID: this-is-a-unique-header\r\n\r\n", 376}, {"channel_id=channel&channel_name=channel&command=%2Fwidget&response_url=http%3A%2F%2Fan.example.com%2Fthing&team_domain=example&team_id=such_an_id&text=nope&token=such-a-token&user_id=such-a-user&user_name=waff", 316}], 2) = 692

This is where your gunicorn worker would sit and ponder life, and eventually answer you. The next line is the output of gunicorn back onto the unix socket:

recvfrom(14, "HTTP/1.0 200 OK\r\nServer: gunicorn/18.0\r\nDate: Wed, 14 Mar 2018 21:07:21 GMT\r\nConnection: close\r\nContent-Type: application/json\r\nContent-Length: 138\r\n\r\n{\"text\": \"https://78.media.tumblr.com/3477b2b546f9478d3f25440141084a58/tumblr_o291y6EwSl1u67qr6o1_250.gif\", \"response_type\": \"in_channel\"}", 4096, 0, NULL, NULL) = 289

If you care, that's a pretty sweet gif, I'm just too lazy to embed or link to it directly. After that, you also see Nginx forwarding the response back out, after it monkeys with it:

writev(4, [{"HTTP/1.1 200 OK\r\nServer: nginx/1.0.15\r\nDate: Wed, 14 Mar 2018 21:07:21 GMT\r\nContent-Type: application/json\r\nConnection: close\r\nContent-Length: 138\r\n\r\n", 150}, {"{\"text\": \"https://78.media.tumblr.com/3477b2b546f9478d3f25440141084a58/tumblr_o291y6EwSl1u67qr6o1_250.gif\", \"response_type\": \"in_channel\"}", 138}], 2) = 288

And then it goes back out, merrily on it's way to the internets.

And that's about how that works.