Author Michael Tougeron on June 11, 2009
When you think about SEO do you remember to think about how your website handles HTTP status codes? The odds are that you don’t. Your “page not found” probably returns a 404 (everyone knows about that one) but what about programmatic error pages or login pages? Search engines index pretty much anything that has a 200 status code. This means if you don’t pay attention you can end up with pages indexed & findable that shouldn’t be.
For example, if you are logged out, Washington Post’s newsletter page 302 redirects you to a login page that then serves a 200. This means a Google search for newsletters for washingtonpost.com gets indexed with the text from the login page. If they were to serve a 401 “Unauthorized” on the pages that require logins, these login pages would not be indexed. Use caution with that scenario though. It might be better to handle search engines differently than normal users so the content can be indexed.
What if sometimes you need to take a feature offline (or even, godz forbid, the entire site) for maintenance/upgrades? I assume you provide some sort of messaging to the users. If that page serves a 200 status and Google happens to try to index during that time you get screwed by having all of that feature’s pages get indexed as a single page. It is better to have the offline message page return a 503 status so the search engine knows that the service is temporarily unavailable and to try again later.
It is becoming more common for sites to capture a 404 and try to do a site search, or similar processing, for a possible match. Frequently when this happens it is handled within the app code rather than having the web server deliver the default 404 page. At GameSpot.com, the programmer who first set this up forgot to serve the 404 status code and was returning a 200. This meant that when a duplicate game was deleted from the database, the links to it were still showing up on Google. Once we made sure the code returned a status of 404 they went away next time Google indexed the site.
Finally, don’t forget about the 400 status code. If you try to load phpBB’s forums with an invalid forum id it will return a 200 status code even though it is not a valid request. This is a good situation to serve a 400 Bad Request. Keep in mind that that you probably wouldn’t want to return a 400 if the forum id just doesn’t exist (e.g., http://www.phpbb.com/community/viewforum.php?f=123456) because it might exist at some point. At the very least, return a 404.
As a side note, with HTTP/1.1 there is a new 307 status code that is very similar to how you’re used to using 302s. It will probably be a few more years before HTTP/1.1 is the default for browsers & the 307 will be truly usable.
This post was written by Michael Tougeron on June 11, 2009