Saturday, February 17, 2007

Path MTU discovery and MTU troubleshooting

Recently when debugging some performance issues on a client's site, I came across some very interesting behavior. Some users were reporting that the site performed very well for a short period of time, but after a while, performance became very poor, enough so to render the site unusable. Checking the apache logfiles for the IP addreses of those clients showed that the requests themselves were not taking an unusual amount of time, but instead the requests were coming into the webserver at a snails pace.

Checking at the network level, I saw some strange things happening:

prod-lb01:~# tethereal -R "http.request and ip.addr == (client)"
125.362898 (client) -> (server) HTTP GET /search/stuff HTTP/1.1
125.362922 (server) -> (client) ICMP Destination unreachable (Fragmentation needed)
126.612994 (client) -> (server) HTTP GET /search/stuff HTTP/1.1
126.613018 (server) -> (client) ICMP Destination unreachable (Fragmentation needed)
129.615113 (client) -> (server) HTTP GET /search/stuff HTTP/1.1
129.615135 (server) -> (client) ICMP Destination unreachable (Fragmentation needed)
135.616047 (client) -> (server) HTTP GET /search/stuff HTTP/1.1
135.616066 (server) -> (client) ICMP Destination unreachable (Fragmentation needed)
Fragmentation Needed? (ICMP Type 3/Code 4) Why would we be needing to fragment incoming packets? This should only happen if the packet is bigger than the Maximum Transmission Size (MTU), and since this is all connected with ethernet, at a constant 1500 MTU, it is odd to see this.

Then I remembered this site is using Linux Virtual Server (LVS) for load balancing incoming requests. LVS can be configured in several ways, but this site is using IP-IP aka LVS-Tun load balancing, which encapsulates the incoming IP packet inside another packet and sends that to the destination server. Since this uses IP encapsulation, each request that hits the load balancer will have additional headers tacked on, to address the packet to the appropriate realserver. It happens to add 20 bytes to the header.

Okay, so the actual MTU of requests that go to the load balancer is 1480 due to the encapsulation overhead. Snooping for this type of packet at the router, I notice that we're sending out a LOT of them:

(router):~# tcpdump -n -i eth7 "icmp[icmptype] & icmp-unreach != 0 and icmp[icmpcode] & 4 != 0"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth7, link-type EN10MB (Ethernet), capture size 96 bytes
17:07:00.608444 IP (server) > (client): icmp 556: (server) unreachable - need to frag (mtu 1480)
17:07:01.288197 IP (server) > (client): icmp 556: (server) unreachable - need to frag (mtu 1480)
17:07:01.910215 IP (server) > (client): icmp 556: (server) unreachable - need to frag (mtu 1480)
17:07:01.927728 IP (server) > (client): icmp 556: (server) unreachable - need to frag (mtu 1480)
17:07:02.391218 IP (server) > (client): icmp 556: (server) unreachable - need to frag (mtu 1480)
17:07:02.693094 IP (server) > (client): icmp 556: (server) unreachable - need to frag (mtu 1480)
17:07:02.912513 IP (server) > (client): icmp 556: (server) unreachable - need to frag (mtu 1480)
17:07:03.019852 IP (server) > (client): icmp 556: (server) unreachable - need to frag (mtu 1480)
17:07:03.398335 IP (server) > (client): icmp 556: (server) unreachable - need to frag (mtu 1480)
These ICMP messages are not bad, per say, they are part of the Path MTU Discovery process. However, many firewalls indiscriminately block ICMP packets of all kinds. Based on the research I did on this problem, most of the documentation I found was from the end-user's perspective, i.e., users who had PPPoE or other types of encapsulated/tunneled connections and had trouble getting to certain websites. Now with the proliferation of personal firewall hardware and software, some of which may be overzealously configured to block all ICMP (even "good" ICMP like PMTU discovery), this is something that server admins have to worry about, too, especially if running a load balancing solution which encapsulates packets.

The research I did on the problem pointed me to the following iptables rule to be added on the router:
iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -m tcpmss --mss 1400:1536 -j TCPMSS --clamp-mss-to-pmtu
This is intended to force the advertised Maximum Segment Size (MSS) to be the 40 less than of the smallest MTU that the router knows about. However, this didn't work for us (This tcpdump line looks for any TCP handshakes plus any ICMP unreachable errors):

(router):~# tcpdump -vv -n -i eth7 "(host (client) ) and \
(tcp[tcpflags] & tcp-syn != 0 oricmp[icmptype] & icmp-unreach != 0)"
tcpdump: listening on eth7, link-type EN10MB (Ethernet), capture size 96 bytes
18:00:17.479661 IP (tos 0x0, ttl 53, id 47601, offset 0, flags [DF], length: 52)
(client).1199 > (server).80: S [tcp sum ok] 2541494183:2541494183(0) win 65535
<mss 1460,nop,wscale 2,nop,nop,sackOK>

18:00:17.479861 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], length: 52)
(server).80 > (client).1199: S [tcp sum ok] 2875112671:2875112671(0) ack 2541494184 win 5840
<mss 1460,nop,nop,sackOK,nop,wscale 7>

18:00:17.771080 IP (tos 0xc0, ttl 63, id 10080, offset 0, flags [none], length: 576)
(server) > (client): icmp 556: (server) unreachable - need to frag (mtu 1480)
for IP (tos 0x0, ttl 52, id 47613, offset 0, flags [DF], length: 1500)
(client).1199 > (server).80: . 546:2006(1460) ack 1 win 64240
It was still negotiating a 1460 byte MSS during the handshake. In hindsight, this makes sense, because the router doesn't really know that the MTU of the load balancer and the realservers is actually smaller than 1500 - the router communicates with these machines over their ethernet interfaces, which are all still set to a 1500 byte MTU. Digging some more into the problem (Including the LVS-Tun HOWTO linked above) there were quite a few things mentioned, but no real definitive answers.

I chose to fix this problem by hardcoding the MSS to 1440 at the router, rather than using the "clamp-mss-to-pmtu" setting:
iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -m tcpmss --mss 1440:1536 -j TCPMSS --set-mss 1440
1440 is the normal MSS value of 1460, minus the 20 byte overhead for the encapsulated packet. This seems to have fixed the problem entirely:
(router):~# tcpdump -vv -n -i eth7 "(host (client) ) and \
(tcp[tcpflags] & tcp-syn != 0 or icmp[icmptype] & icmp-unreach != 0)"
tcpdump: listening on eth7, link-type EN10MB (Ethernet), capture size 96 bytes
18:02:19.466678 IP (tos 0x0, ttl 53, id 55012, offset 0, flags [DF], length: 52)
(client).1298 > (server).80: S [tcp sum ok] 2863214365:2863214365(0) win 65535
<mss 1460,nop,wscale 2,nop,nop,sackOK>

18:02:19.466886 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], length: 52)
(server).80 > (client).1298: S [tcp sum ok] 2996826059:2996826059(0) ack 2863214366 win 5840
<mss 1440,nop,nop,sackOK,nop,wscale 7>

.... silence!
PS - The reason that I was seeing this very odd behavior - very fast at first, followed by an unusable site?
  • The client website had recently added a search history, which was stored in a browser cookie. Things would go great until enough data was in the cookie to push it up over 1440 bytes.
  • I had configured my home DSL router to discard ICMP some many years back and had forgotten about it - My firewall was throwing away the ICMP Fragmentation Needed packets, so my PC never "Got the memo" that it needed to send smaller packets!
This actually worked out for the better, though - this site had had reports of odd slowness in the recent past, and hopefully this was the root cause!

EDIT: Note that in the original post, I had missed an important option, in the iptables config it is important to use the "-m tcpmss --mss 1440:1536" setting. Without this flag, iptables will force the MSS of ALL traffic to 1440, including clients which request a size smaller than that. This obviously presents a problem to the client.

Thursday, February 08, 2007

Search Engine Optimization with Apache and mod_rewrite

I've recently been using the powerful mod_rewrite to modify the URL's on a client's website. mod_rewrite is a powerful tool that lets you turn "ugly" URL's like

into cleaner URL's like

This is useful for a couple reasons - not only is it cleaner to look at, but it can help with search engine indexing. In this case, because "pumpkin_pie" is part of the URL as opposed to part of the query string, the keyword ranks higher in many search engines.

Lets say we have an application that will return search results for various categories, and we want the URL's to have the format of " term)". Also we want to have a landing page if the URL is simply "". We want to make this as generic as possible so that the httpd.conf does not need to be edited every time a category is added.

This can be configured a number of ways, but the way I have it installed here is with apache running on port 80, and the application - a java servlet container - is running on a different port, say port 8000. Apache intercepts most of the requests for static, on-disk content, and uses the proxy mechanism to send dynamic requests to the servlet container. Let's break down the relevant sections of the apache configuration file:

First, it can be useful to funnel all traffic for your site through a single hostname, as opposed to links to both "" and "". This rule will force a redirect back to "" with a HTTP 301 redirect:

RewriteCond %{HTTP_HOST} ^$ [NC]
RewriteRule ^/(.*)$1 [L,R=301]

Now lets map the static page elements and HTML to the local filesystem, so that they don't get remapped to a search query, and are served by apache instead of proxied through another layer. Note that we need to map favicon.ico to the local filesystem, else you can end up sending searches to your application when the browser requests the favicon.ico for /pie/pumpkin_pie/favicon.ico! The [L] in the rewrite modifier tells the rewrite engine to stop the processing at this point and serve the file directly.

RewriteRule ^/js/(.*) /opt/static/js/$1 [L]
RewriteRule ^/pictures/(.*) /opt/static/pictures/$1 [L]
RewriteRule ^/images/(.*) /opt/static/images/$1 [L]
RewriteRule ^/css/(.*) /opt/static/css/$1 [L]
RewriteRule /favicon.ico$ /opt/static/html/favicon.ico [L]
RewriteRule ^/robots.txt /opt/static/html/robots.txt [L]

Another useful trick is to re-map underscores to %20 in the search parameters, so we can use terms like "pumpkin_pie" that get remapped to "pumpkin%20pie" when sent to the backend application. This rule will match any URL that has an underscore in it, and then rewrite one underscore to a %20 and then send the processing back to the first rewrite rule. (So it will keep remapping them one at a time until they're all gone). This is necessary because we don't know how many underscores there might be in the URL, and there is no "replace all" modifier like "/g" for normal unix search and replace. Note the "QSA" in the rule modifiers; this means "Query String Append" and will leave any query string intact through the processing:

RewriteCond %{REQUEST_URI} ^/.*_
RewriteRule ^/(.*)_(.*) /$1\%20$2 [N,QSA]

Now lets say there are a couple of URL paths we want to treat differently, say, we need to treat the "buy" section of the site differently. With the way we map the general search cases later in this file, anything that needs to be treated differently needs to be mapped in a way that will bypass the generic match:

RewriteRule ^/buy/(.*) /purchase.jsp?cat=$1 [QSA]

Now for the "/(category)" landing page. We have to have a limitation here for categories to be only alphanumeric characters - this is so that things like "purchase.jsp" are not treated as categories! Also we prevent any request that contains a query string from being treated as a category, so we can have servlets, etc, continue to work:

RewriteCond %{QUERY_STRING} ^$
RewriteRule ^/([a-z]*)$ /landingPage.jsp?category=$1 [NC]

Now for the generic /(category)/(searchterm) mapping.

RewriteRule ^/([a-z]*)/(.*) /search.jsp?category=$1&search=$2 [NC,QSA]

We are at the end of the line, we proxy the resulting modified URL back to our application:

RewriteRule ^/(.*)$1 [P]

And if you run into any trouble, you can turn logging on with the following commands:

RewriteLog /opt/app/logs/rewrite.log
RewriteLogLevel 9

Now of course, these remappings only map INCOMING URL's to our application. Our application is still responsible for sending this URL format back to the user, so if a user links to your site they are using this optimized URL format. Another way to get these URLs sent to search engines is with a sitemaps file, see for details.

Tags: , ,