12 Point Design  

(209) 565-12PD
Email Us

12 Point Design/advice/dashes_vs_underscores.asp  
 

Dashes vs. Underscores

What is the best word separator in URL naming?

One of the more common discussions on website development forums revolves around whether to use dashes or underscores (or something else) as word separators in file names. Most developers advocate one or the other based on their own experiences with search engines and what they've read and seen others promote. It's my turn.

To get in the right frame of mind, let's begin with a simple linguistic exercise.

  1. Please list two (2) words in the English language that use underscores ("_") within them.
  2. Now list fifty (50) words that use dashes/hyphens ("-").

You can probably fulfil the second request off the top of your head. The first one is impossible. Natural words do not include underscores. Programmatic functions, variables and constants sometimes do, but we're not talking about just programming terminology. We're talking about word separators in URLs based on the English language.

An insiders look

Much of the current "Professional" SEO advice on this matter is based on a blog entry written by Matt Cutts, Google Software Engineer since early 2000. Matt makes an effort to explain why dashes are preferred, in his perspective, because when he first needed to use a search engine to find programmatic keywords and variables used in programming languages back in 1999, Google returned better results. Underscores were (and still are) frequently used in those keywords and variables used in programming languages.

The problem with his argument is that it is a gross non-sequitur. He claims that dashes are better in URLs because Google returns better results for terms including an underscore. First of all, maybe to his own dismay, Google is not the only search engine. Second, this is from an experience from 6 years ago. Third, it has nothing to do with the actual URLs - only that Google returned better results when the search term used underscores. Hey Matt, if you're reading this: SearchTerm !== URL.

That said, six months ago I would have recommended doing the same thing. My own experience demonstrated that URLs were more readable and placed well if dashes were used in the file names - way back in 1999 and even as far up as mid 2005. But that's not the case anymore. Things change. This is a perfect example.

Try googling for a string of several (quoted or unquoted) words - like the title of a book or press release that should appear on many different sites. Perform several searches in order to try to get a less keyword-bias perspective. Look at the overall results in the top, say, 20 places. Now, this is the important part: open pages from each result set (some with dashes and some with underscores) and look at which pages have the most contextual adsense ads (contextual => valuable).

Not only will you place better but your ads will be higher value since they are more contextual.

Do the same searches in other engines and you'll get the same results. Sometimes dashes are better, but usually they're below their underscored counterparts. And, in my experience, the world has been preferring dashes to underscores for years, so the number of potential results for an underscored URL is likely to be significantly less than its dashed counterparts.

Mistakes happen

Late this summer my wife (who provides primary content management support for our design and hosting business) "accidentally" used underscores for a file name on one of our sites and saw a huge and immediate increase in contextual value and search result placement for that page. Later that evening, as a test, we switched several file names for long-standing and well-placed dashed URLs to use underscores instead. We saw an immediate increase in contextual advertisements (a couple of the pages never had any contextual advertisements). In the following days, search engine placement increased and they generated more revenue than ever before. Consistently. That was when I changed my position on whether dashes or underscores are better.

We switched all file names on our most prominent sites and saw a jump, exponentially in some cases, both in traffic and revenue generated from those pages. It worked, and across the board, too. But don't just take my word for it. Try it both ways yourself and see if the difference is worthwhile to you. And I only require a 10% commission. ;)

Big deal, right? This is all just my own anecdotal evidence loaded with my typical contrarian perspective. Or is it?

The geek speak

Lets get on to the technical side of it. What is Matt actually demonstrating in his post? He's demonstrating that searching for "_MAXINT" or "FTP_BINARY" returned more results on Google (in 1999 - 6 years ago) than on other search engines. First, this is a very loaded term, since it is very vertical-market-oriented (only developers would search for these terms). How many non-technical users ever heard of "_MAXINT" before his post? And more importantly, what are the number of searches performed that actually require underscores to be interpreted literally? Today, on Google, "_MAXINT" returns 142 results. NONE OF THEM (dead serious) actually have "_MAXINT" in the URL! NONE OF THEM! Now search for "MAXINT". There might be 203,000 results, but the first one actually includes "_MAXINT" in the URL. And that page was not indexed by Google for the term "_MAXINT" (including the underscore). So what does this actually demonstrate? That Google, today, does not treat characters in the URL the same way Matt implies in his post. Instead, Google treats underscores as white space or word/term separators. What he has actually proven by using "_MAXINT" as a sample is that Google does not preserve underscores as literal text within URLs.

Okay, big deal - his example wasn't perfect. At least not for non-technical sites. What other sites use underscores in their common words? Um...thinking...thinking... Nope. Got me. I can't think of any. At all. Can you? Do tell.

The important thing to understand here isn't that searching for "_MAXINT" or any other string with dashes or underscores in it places better or worse than something else. The question at hand is actually "what is the best word separator for use in file and URL naming?" Any search using cryptic characters fails this test, since it is ignoring the question. It's not about searching for the characters - it's about searching for the words. Using the example above, we can safely infer that Google treats underscores in URLs as white space. Heck, if it can't "see" the underscore in a URL that matches the query text exactly, then obviously it is treating underscores as white space and not as literal characters.

If that's not enough...

Lets look at other good reasons why not to use "dashes" as word separators in URLs.

A dash is also a hyphen. Hyphens are logical wrap points for text bodies. As much as underscores do not have any valid use in English text, dashes, in the form of hyphens, have an existing function. That function includes the natural ability within text to wrap at that location. This means that widely-used hyphenated text, like compound-word phrases, certain prefixes and what are called suspensive hyphens. A suspensive hyphen is one in which a term has a hyphen added to the end indicative of multiple descriptors for a given value. Like when you have single- or multi-word terms in a query. These hyphens are break points that will wrap, if appropriate in the application. Email programs, text editors - and hyphens within URLs are no exception! In other words, if you choose to utilize URLs that use hyphenation within the address, that URL can, and probably will, wrap if the address is sent by email to others. Underscores will not.

Dashes are also used in numerical notation. Subtraction. Numeric ranges. Arabic numerals and common mathematics are used in every language, so you've got one other competitor for dashes when searches include them.

Dashes are also used as field separators within dates. Though you can also use slashes or exclude separators altogether, dashes are preferred in some localities and by old-timers accustomed to 10-key input.

And lastly, dashes are used in the context of search engines as negation operators. This means that searches including a dash prefix ("-term") will actually prevent that term from being listed in the results. And the benefit of including URLs inciting people to exclude results like your own URLs is... what, exactly?

A fine line

My recommendation is simple: Use underscores as word separators/white space in URLs. Each of the major search engines treats URLs using underscores in the same fashion, so this is not just a "Google-friendly" fix, it's good for all the popular engines and spiders.

While I'm on the topic of URL naming conventions, other things you ought to do include:

  • make it shorter than 70 characters including the protocol (http://) and extension (.htm)
    I prefer 64 characters because it provides the ability to "quote" the URL in plain text emails several times before it wraps from the insertion of the "> " prefix most email clients will include;
  • use all lowercase file names or only use capital letters at the start of words (consistency here is important);
  • spell check your filenames, too. Many people spell check their content, but forget to spell check the actual file name;
  • Do not orphan URLs. Ever. Content should always be available at the original URL, or at least by some form of redirect to the new location. Impermanence is the greatest flaw of the internet.

Regards,

Shawn K. Hall
http://12PointDesign.com/

 

Contact us!





Securing your computer can't be done solely with software; educate yourself, free information to keep yourself safe online - SaferPC.info
Professional Web Hosting and Design Services: 12 Point DesignSaferPC dispels security misunderstandings and provides you with a solid understanding of viruses and computer securityReliable Answers - developer information, current news, human interest and legislative newsLocal Homeschool provides the most up-to-date support group listings in a geographical and searchable index
PO Box 1306
Twain Harte, CA
95383

(209) 565-12PD