Archive for the ‘Web’ Category

Dashes vs. Underscores in URLs

Wednesday, September 7th, 2011

The debate over whether to use dashes or underscores to represent spaces in URLs is rather heated in the web development community, but not quite as extremely as that of whether to use tabs or spaces when indenting code. If you know many human beings, you won’t be surprised to hear that the majority of people get both of these things completely wrong. This is partially because most people haven’t really thought it through yet, but mostly because they just don’t care what’s right and they just refer to the status quo, wrong or not. I plan to write about tabs vs. spaces in another post, but here I will present irrefutable arguments to answer the question once and for all: what is better to substitute for spaces in URLs, dashes or underscores?

dash-underscore face

Dash-underscore face is relevant to this discussion.

The simple answer is that, never mind what Google says, underscores are the right way to go. Why, you ask?

1) Hyphens Already Mean Something

Hyphens and dashes are actually slightly different, but in practice everybody just uses the same character, ASCII number 45, the hyphen-minus. So let’s just pretend they’re the same. The strongest argument against dashes is that they already mean something in English! “Mother-in-law”, “X-ray”, and “twenty-one” are all single words! Inserting a hyphen in the middle of a sentence can completely change its meaning. You can’t just ignore those rules, anymore you would write without capital letters, or proper punctuation. If you use dashes in your URLs when you don’t mean them, you a) lose information about what the content of the URL actually is, b) confuse people, and c) will have the English police at your door by the morning. I mean that last one, this shit is serious.

For example, I have a file called man-eating-shark.jpg. Now, can you tell me if it’s a picture of a man eating shark meat, or a picture of an actual man-eating shark? No, you can’t. This example is from Wikipedia, and there are some more great ones on that page. Sure, you can just open the file and see if the shark in question is dead and delicious or alive and ravenous. But when a search engine indexes the file, it has no idea. This is very, very bad.

One more that nicely illustrates the importance of preserving dashes: a document called scientists-discover-three-hundred-year-old-trees.html. Are they three-hundred-year-old trees, three hundred-year-old trees, or three hundred year-old trees? We know neither how many trees there are, nor how old each is. And Google doesn’t know either! If you’re looking for things that are one hundred years old and not three, you’re out of luck because Google can’t tell the difference. How dumb is that?

On the other hand, the following URLs are perfectly clear:

  • man-eating_shark.jpg
  • scientists_discover_three_hundred-year-old_trees.html

This isn’t just some rare exception scenario, I actually see this kind of confusion more frequently than you might think. Sadly, dashes are still more common than underscores, firstly because all people care about is their website’s PageRank, and secondly because Google programmers never learned about hyphenated words at MIT or wherever. Frankly, taking SEO this far is a bit childish. We should create high-quality and original content, make your site accessible and standards-compliant, but we shouldn’t have to worry about what Google thinks of our URLs, especially when they get it wrong.

In summary, you can’t just ignore centuries of English writing. Don’t use dashes in URLs when you don’t mean them. Doing so is the worst kind of wrong: grammatically incorrect.

2) Aesthetics and Readability

I’ve appealed to those who care about language, but if you happen not to, that’s okay. Underscores are still better, because they look better. Dashes are all up in your space (haha), next to the letters that you actually want to be looking at. I would rather honestly read something in Comic Sans than have all sorts of garbage between each word, firing all the wrong photons into your eyes and making them sore.

Underscores? Not as good as spaces themselves, but certainly a huge improvements over dashes. They’re a bit larger than spaces (if the typeface is not monospaced), but at least they rest comfortably at the bottom of the letters, and it should be no harder to read than something underlined (in fact, underscores are underlines — more on that later). In order of decreasing readability:

  1. The five boxing wizards and the quick brown fox jumped over the lazy dog.
  2. The_five_boxing_wizards_and_the_quick_brown_fox_jumped_over_the_lazy_dog.
  3. The-five-boxing-wizards-and-the-quick-brown-fox-jumped-over-the-lazy-dog.

The last one looks terrible. Depending on the person, and the typeface, this may not always be the case, but in general the underscore is far superior from a readability standpoint.

3) The Semantics of the Underscore

So you’re a web developer. Hopefully you care about language or readability, but perhaps not. But you definitely do care about semantics. In fact, if you don’t care about semantics on the web, you might not be in the right profession.

It’s not common knowledge, but the underscore isn’t really a character like the rest are on our keyboard. It’s only there because of a little piece of tech history known as the typewriter. In order to underline text, you had to write it out normally, then move the typewriter carriage back and go over it again with underscores. That means that an underscore character all on it’s own is basically an underlined space — which is pretty much as close to an actual space as you can get on the web.

The Better Answer

It should be obvious, but in a sane computer ecosystem we wouldn’t have to use either! We should not have to compromise on any of these three points, where the underscore is only the next-best option. A space should be a space, no matter where it is. Our filesystems themselves work fine with spaces in filenames, so why replace it at all? Unfortunately, a while ago a whole bunch of geeks with no appreciation for language decided it was a good idea to make the space (of all characters) the delimiter for filenames in things like command-line arguments and URLs, etc. There’s a prettygoodreasonweusespacesinwriting, but apparently they didn’t need l4ng4g3 bk 7h3n so whatever, right? Likely, they also didn’t anticipate the masses ever using the software they were writing, so they thought everything would continue to be named the ‘ol cryptic bastardization of English words like “tmp”, “lib”, or “srv”.

Of course, quotes or some other non-space delimiter should have been required from the beginning. The “use spaces, or else fall back on quotes” system is just silly.  You have to use some delimiter, of course, but not spaces, because we use them more than any letter in the alphabet. Imagine if we just said that you had to use an “a” to delimit file names and URLs. That’s the same logic, works fine, it’s only less readable. The question then becomes, should we be using @ or 4 to substitute for “a” in URLs? Ridiculous. In any case, that’s the reason that when you have something called “my file.txt” you keep seeing “No such file or directory” on the command line, and also the reason we can’t use spaces in URLs.

Conclusion

As it is, the crazy idea that a space should represent a space and not some other character is pretty impractical. Because of the great mistake of our digital ancestors, space delimiters, you would have to rewrite every web browser on earth to make spaces work properly in URLs. If you do try to use them, a browser encodes them as “%20″, which%20makes%20your%20URLs%20look%20like%20this. Technically, they’re encoded spaces, but it’s hideous. You can barely read it. Oh well, maybe one day this will change.

So, underscores are not quite as good as spaces. They’re a compromise of language, readability, and semantics, but they’re the best we’ve got. Better than dashes, CamelCase, plus+signs, or anything else. So use them. Don’t do something you know is wrong because of the minute difference it might make to your Google PageRank. You should be caring more about your users anyways, and if you do it right, Google will change along with everyone else.

I realize that, ironically, all my other blog posts use dashes instead of underscores. It’s the WordPress default, and I haven’t had time to fix it yet.

Rounding to a given number of decimals in Javascript

Tuesday, June 14th, 2011

When it comes to math, Javascript isn’t great. It lacks the very basics, like an exponentiation operator, where you’re forced to use the long and cumbersome Math.pow() instead. Things get pretty ugly when you try to round numbers, as the Math.round() function only rounds to the nearest integer. In order to round to a certain number of decimals, you can multiply your input by a power of ten and then divide the output of Math.round() by that same power of ten. This is easily done by a function:

function round_to(value, decimals) {
     return Math.round(value * Math.pow(10, decimals))/Math.pow(10, decimals);
}

Now, round_to(value, n) returns that value to at most n decimal places. For example, round_to(2, 5) = 2, round_to(1.123456, 5) = 1.12346, and round_to(22.3, 0) = 22.

Now in Helvetica Neue Light

Tuesday, October 6th, 2009

Articles and links in the sidebar have been upgraded to my official favourite font, Helvetica Neue Light. It’s pretty easy to do, but hard to figure out on your own. After searching for a while and experimenting on my own, I came up with the following CSS.

font-family: "HelveticaNeue-Light", "Helvetica Neue Light", "Helvetica Neue", Helvetica, Arial, sans-serif;
font-weight: 300;

Some browsers (IE + the older ones) use fonts based on their family name and their style, that is “Helvetica Neue Light”. Others use the PostScript name, under the Adobe spec, e.g. “HelveticaNeue-Light”. Others still, such as the latest versions of Safari, Firefox, and Chrome, conform to the W3C specification when it comes to fonts and their styles, and take only the Family Name. The W3C dictates that you must use other CSS directives to get “light” or “bold” fonts. In this case the “Helvetica Neue”, along with the “font-weight: 300;”, will display Helvetica Neue Light to the user.

Arial is included for the sake of the poor Windows users out there, who have to use the font Microsoft put on their computer because they were to cheap to pay for Helvetica, despite it being one of the oldest and most widely used fonts in the world. As any typography nerd would know, Arial was a total rip off of Helvetica and should never be used.

There you have it. Mac users get to look at Helvetica Neue Light, while Windows users still get crappy old Arial.

How to get your very own two-letter domain

Monday, October 5th, 2009

I’m sure most of you internet people have heard of the various URL shortening sites out there in the wild. They’re used to make a long and scary URL (which stands for Uniform Resource Locator) into one much more timid and digestible. Blah blah blah. The point is, you may have noticed that all the good ones (tinyurl.com is ugly and commercial and gross, bit.ly is an exception) have a domain that is only two letters long, like tr.im. Now how do they do it?

Most big hosting companies and domain registrars will not only tell you a domain any less than three letters is invalid, but they probably won’t support whichever obscure ccTLD (Country Code Top Level Domain, e.g. .ca for Canada) you feel like owning. There’s no real reason for this, other than that most companies are stupid and don’t know what the hell they’re doing. Especially when it comes to technological stuff like this; the CEO of GoDaddy probably doesn’t have more than a vague idea about how the internet really works. Oh well, more power to the people.

And by “the people” I mean the people at iwantmyname.com. Not only is their service great, with a nice clean interface and a whole six pages of TLDs to choose from, but they accept ANY valid domain, including those with only two characters at the second level.

Screen shot 2009-10-05 at 10.18.08 PM

It’s a great service, and I’ve used them to buy all two of my two letter domains. I even started a little URL shortening service of my own, called zi.gs, just for fun. I took it down a while ago after it became boring and tiresome to provide support for.

Anyways, there you have it. Easy two character domains, supporting lots and lots of TLDs.

The public beta of Wikipedia’s new interface announced

Thursday, September 17th, 2009

Ahh, that’s why. The reason Wikipedia has been acting like 4chan on a bad day is that they were upgrading their software, and now we know exactly what part was upgraded.

Pretty!

Pretty!

They added a little link at the top of the page entitled “Try Beta”. If you click on it, and log in, you’ll be able to switch your current interface for a brand spanking new look n’ feel. It looks nice, but I’m not sure whether I’m happy about sacrificing the speed and simplicity of the current design just for a few whistles and shiny menu bars.

Edit: Apparently the wonkiness was actually related to something else. The new look is still pretty cool though.

Wikipedia’s error page, if you ever wondered

Thursday, September 17th, 2009

Has anyone ever seen this before? Today when I visited it there was a message in the header saying they were updated their software and might experience some down time. But in the middle of the day? I guess it all depends on your time zone (EST here), and the point of Wikipedia is to be as neutral as possible. In any case, it was interesting to see.

 

Wikipedia Error Page

Technically it's Wikimedia, but still.

 

ACLr8 site redesign

Sunday, September 13th, 2009

It’s not much, but the ACLr8 site has been changed a bit. I added MacUpdate and stuff. Also, quite an accomplishment for me, I’ve finally managed to make a page have a minimum page length but no maximum. That is, if your screen is smaller than the content, you’ll get a scroll bar. But if it is larger, the page continues until the bottom of the user’s screen. Go to the ACLr8 page and press Cmd/Ctrl and Minus (-) to see what I mean. Really it’s just two divs, set with absolute position on top of each other. One contains the content (haha) and has no set height, and the other has a height of 100%, but is just an empty background. Simple, really. I just never had the idea until today.

I also got a nice email from Softpedia telling me they’ve added ACLr8 to their database, and that they’ve deemed it “100% clean” and all that. It was nice to hear. ‘Cause I was pretty sure I put a virus in there somewhere.

Softpedia 100% Clean

ZI.GS is no more

Saturday, August 29th, 2009

The not-so-legendary URL shortener has been a fun development project, and I’ve learned a lot from it’s creation. However, after reading about it in approximately seven different places, I have decided that URL shorteners aren’t good for the internet as a whole, and that I no longer support their use. They have only one legitimate use really, and that is Twitter. And for Twitter, there are a hundred shorteners out there already with way better interfaces, reliability, and support.

ZI.GS, requiescat in pace.

How to avoid looping in mod_rewrite redirection

Tuesday, June 30th, 2009

When redirecting things to a new location that fits the original rewrite pattern, you end up with a recursive feedback loop and all you get is an error. If you specified a URL scheme and a domain in your rewrite (meaning that the redirection is visible to the browser) you’ll see the error in your browser saying something along the lines of “the page tried to redirect too many times”. Otherwise, the redirection will be internal, and you’ll get your server’s 500 Internal Server Error page.

redirect_loop_safari

So, how do you avoid it? Well, say you wanted to redirect some regex (regular expression) pattern to the your index page, say index.php, but it turns out that “index.php” actually matches your pattern, so index.php redirects to index.php which redirects to index.php, and so on, and so forth.

To prevent this, simply add a RewriteCond that checks the REQUEST_URI to make sure it doesn’t match the string that you redirect to. The following is an example.

RewriteCond %{REQUEST_URI} !^/?index.php$
RewriteRule ^(regex|goes|here)$ /index.php

Just make sure the regex in the first part does not match any of the pages you want to redirect, and that’s it! You’re done.

Best of luck on getting this to work the way you want it to, and happy web developing!

A subdomain for each directory with .htaccess and some mod_rewrite wizardry

Wednesday, June 10th, 2009

I don’t know how useful this will be to anyone, but I was playing around with my webserver a while back and because subdomains are pretty cool I thought I would figure out for myself how to do this.

You may use either httpd.conf or an .htaccess at your document root.

Make sure you have mod_rewrite turned on before you begin.
RewriteEngine On

First, use a RewriteCond like this to prevent redirection things like your favicon.ico that need to be available everywhere. Change this based on your setup.
RewriteCond %{REQUEST_URI} !^/(favicon.ico|images/.+|javascript/.*)$

Then, capture the subdomain in %1.
RewriteCond %{HTTP_HOST} ^(.+)\..+\..+$
This next line just prevents looping by making sure the REQUEST_URI doesn’t match the what’s in %1. Don’t ask how it works.
RewriteCond %1,%{REQUEST_URI} !(^[^,]+),/\1.*

This checks to see if what we have in %1 is in fact a directory under the document root.
RewriteCond /your/document/root/%1 -d

And finally, redirect ‘/anything’ to ‘/subdomain/anything’.
RewriteRule ^(.*)$ /%1/$1 [L]

And there we go, that should do it. All together, that’s:
RewriteCond %{REQUEST_URI} !^/(favicon.ico|images/.+|javascript/.*)$
RewriteCond %{HTTP_HOST} ^(.+)\..+\..+$
RewriteCond %1,%{REQUEST_URI} !(^[^,]+),/\1.*
RewriteCond /your/document/root/%1 -d
RewriteRule ^(.*)$ /%1/$1 [L]

Questions, comments, criticism, and other feedback are very welcome. If you have a better way of doing it, let me know!