Friday, May 23, 2014

Apache mod_proxy DNS caching issues with Route53 DNS

In your architecture if you are using Amazon Route53 DNS A record pointing to ELB and if you are using ProxyPass in Apache httpd to proxy some requests, you may frequently notice timeout errors until you restart httpd process.

httpd error_log will show errors like:

**************
[Fri May 23 12:54:07.751116 2014] [proxy:error] [pid 14385:tid 140341946558208]
(70007)The timeout specified has expired: AH00957: HTTP: attempt to connect to <public ip of ELB>:<port> (<DNS name>) failed
[Fri May 23 12:54:07.751168 2014] [proxy:error] [pid 14385:tid 140341946558208] AH00959: ap_proxy_connect_backend disabling worker for (<DNS name>) for 60s
[Fri May 23 12:54:07.751180 2014] [proxy_http:error] [pid 14385:tid 140341946558208] [client <private ip>:11314] AH01114: HTTP: failed to make connection to backend: <DNS name>, referer: http://<DNS name>/index.php
**************

To confirm that ELB's public ip has changed, you can run "dig" command

**************
; <<>> DiG 9.7.1 <<>> <DNS Name>
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20971
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;<DNS Name>. IN   A

;; ANSWER SECTION:
<DNS Name>. 60 IN A       <public ip addr of ELB>

;; Query time: 63 msec
;; SERVER: 
;; WHEN: Fri May 23 09:56:36 2014
;; MSG SIZE  rcvd: 66
**************

As per Amazon support, the public ip of ELB's can change under below conditions:-

  1. Scale up/down. When there is considerable change in the load ELB will scale Up or down to best serve the traffic.
  2. ELB replacement. When the ELB has to be replaced or upgraded due to an issue.
To circumvent the caching issues, you can try using ec2 instance name instead of DNS name or proxy to "localhost". Additionally, you can try setting TTL value for mod_proxy as suggested in


Example

ProxyPass /example http://backend.example.com max=20 ttl=60 disablereuse=On

"ttl=60" should set the time to live for inactive connections and associated connection pool entries, in seconds, so once reaching this limit, a connection will not be used again.

"DisableReuse On" is good when backends themselves may be under round-robin DNS which is true for the ELB.

12 comments:

  1. Thanks BalaV for sharing useful information.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. Hi, thank you for this post. Im having the same issue.

    Can you explain this part: "To circumvent the caching issues, you can try using ec2 instance name instead of DNS name or proxy to "localhost"."

    What do you mean by the ec2 instance name or localhost? Today i use the name of the ELB A record

    When you use:
    ProxyPass /example http://backend.example.com max=20 ttl=60 disablereuse=On

    You suggest to change "backend.example.com" to what?

    Im using the name of my ELB, do you mean change the name of my elb in the proxy pass like this:

    ProxyPass /example http://any-elb-name-XXXXXX.us-east-1.elb.amazonaws.com max=20 ttl=60 disablereuse=On

    Thanks

    ReplyDelete
    Replies
    1. @Jose, thanks for reading the post. If you have a reverse proxy sending back to ELB, is this an internal ELB (balancing traffic for vpc private subnet instances)?. Generally speaking, you will want to avoid proxying to public ip or private ip of an ec2 instance because they will change. So hostname or entry in /etc/hosts file should prevent DNS name resolution issues. Also, you want to avoid sending traffic from reverse proxy back to ELB if possible unless the ELB is an internal one.

      Delete
  4. Ran into this last few month, investigated with dig and saw the I change and found your post on the errors we were getting..I will give it a try.. Thank you for your post, hope it works.. otherwise we would be chasing it for a while! DNS Caching proxy issue is what we also experienced!

    ReplyDelete
    Replies
    1. @Ben, If you are expecting a heavy load, you could also consider requesting AWS to pre-warm your ELB, so that in scale out scenario, the public ip's of your ELB instances are changing during load scenario

      Delete
    2. @BalaV, In this case our partner's server is using ELB and IP changes too often and our apache proxy works with our tomcat app talking to them and we see the disconnection symptom, I have implemented the proxypass settings from your post.

      Delete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Does Nginx proxy have this caching problem or that also needs to be tunend in respect with AWS ELB IP changes?

    ReplyDelete
    Replies
    1. Most http servers should have a cache, this might help - https://www.nginx.com/blog/nginx-caching-guide/

      Delete
  7. I am seeing occasional slow down after implementing this. any ideas?

    ReplyDelete