Wednesday, March 18, 2015

Debugging a http 504 (Gateway Timeout) issue from Amazon ELB logs

Sometimes when you are testing an application that is hosted on AWS and front ended by ELB, you may see "504 Gateway Timeout" as below in the browser:-


The sequence of events that lead up to 504 client side error is as below:

1) Client connects to the ELB and submits HTTP request
2) ELB Picks a backend and sends the request onto the backend.
3) The backend receives the request and starts processing it
4) While the backend is still processing the request the client gives up waiting and ends the connection (this sometimes can be reported as 504 by proxies and other HTTP libraries)
5) The backend finishes processing the request and replys to the ELB
6) As the client has ended the connection the ELB can't pass the request on to the client and instead logs a 408 error.
7) A network packet capture on the client machine should indicate a client socket getting timed out.

For details on the various ELB error codes, you can refer to

http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/ts-elb-error-message.html

In the ELB logs, you will see

*************
2015-03-18T14:55:24.507386Z <ELB NAME> <Client IP>:51305 - -1 -1 -1 408 0 0 0 "GET <URL> HTTP/1.1"
*************

If you note that "-1" value reported by the ELB, it indicates that client socket was closed and the timing for the response and content-length are reported incorrectly as "0". AWS ELB team is looking into addressing this in future. A similar log for HTTP 404 will look like

************
2015-03-18T14:00:15.618471Z <ELB NAME> <Client IP>:50855 <Instance in ELB pool IP>:<Listen Port> 0.000071 0.010108 0.000021 404 404 0 2096 "GET <URL> HTTP/1.1"

************

2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. When you wrote "AWS ELB team is looking into addressing this in future." - is there an AWS documentation/bug URL that you can reference where you found this information?

    The ELB logs still appear to have this problem - it would be handy to know where the bug is being handled.

    ReplyDelete