Nomulous Blog
Dashes vs. Underscores in URLs
September 7th, 2011The debate over whether to use dashes or underscores to represent spaces in URLs is rather heated in the web development community, but not quite as extremely as that of whether to use tabs or spaces when indenting code. If you know many human beings, you won’t be surprised to hear that the majority of people get both of these things completely wrong. This is partially because most people haven’t really thought it through yet, but mostly because they just don’t care what’s right and they just refer to the status quo, wrong or not. I plan to write about tabs vs. spaces in another post, but here I will present irrefutable arguments to answer the question once and for all: what is better to substitute for spaces in URLs, dashes or underscores?

Dash-underscore face is relevant to this discussion.
The simple answer is that, never mind what Google says, underscores are the right way to go. Why, you ask?
1) Hyphens Already Mean Something
Hyphens and dashes are actually slightly different, but in practice everybody just uses the same character, ASCII number 45, the hyphen-minus. So let’s just pretend they’re the same. The strongest argument against dashes is that they already mean something in English! “Mother-in-law”, “X-ray”, and “twenty-one” are all single words! Inserting a hyphen in the middle of a sentence can completely change its meaning. You can’t just ignore those rules, any more than you would write without capital letters or proper punctuation. If you use dashes in your URLs when you don’t mean them, you a) lose information about what the content of the URL actually is, b) confuse people, and c) will have the English police at your door by the morning. I mean that last one, this shit is serious.
For example, I have a file called man-eating-shark.jpg. Now, can you tell me if it’s a picture of a man eating shark meat, or a picture of an actual man-eating shark? No, you can’t. This example is from Wikipedia, and there are some more great ones on that page. Sure, you can just open the file and see if the shark in question is dead and delicious or alive and ravenous. But when a search engine indexes the file, it has no idea. This is very, very bad.
One more that nicely illustrates the importance of preserving dashes: a document called scientists-discover-three-hundred-year-old-trees.html. Are they three-hundred-year-old trees, three hundred-year-old trees, or three hundred year-old trees? We know neither how many trees there are, nor how old each is. And Google doesn’t know either! If you’re looking for things that are one hundred years old and not three, you’re out of luck because Google can’t tell the difference. How dumb is that?
On the other hand, the following URLs are perfectly clear:
- man-eating_shark.jpg
- scientists_discover_three_hundred-year-old_trees.html
This isn’t just some rare exception scenario, I actually see this kind of confusion more frequently than you might think. Sadly, dashes are still more common than underscores, firstly because all people care about is their website’s PageRank, and secondly because Google programmers never learned about hyphenated words at MIT or wherever. Frankly, taking SEO this far is a bit childish. We should create high-quality and original content, make your site accessible and standards-compliant, but we shouldn’t have to worry about what Google thinks of our URLs, especially when they get it wrong.
In summary, you can’t just ignore centuries of English writing. Don’t use dashes in URLs when you don’t mean them. Doing so is the worst kind of wrong: grammatically incorrect.
2) Aesthetics and Readability
I’ve appealed to those who care about language, but if you happen not to, that’s okay. Underscores are still better, because they look better. Dashes are all up in your space (haha), next to the letters that you actually want to be looking at. I would rather honestly read something in Comic Sans than have all sorts of garbage between each word, firing all the wrong photons into your eyes and making them sore.
Underscores? Not as good as spaces themselves, but certainly a huge improvements over dashes. They’re a bit larger than spaces (if the typeface is not monospaced), but at least they rest comfortably at the bottom of the letters, and it should be no harder to read than something underlined (in fact, underscores are underlines — more on that later). In order of decreasing readability:
- The five boxing wizards and the quick brown fox jumped over the lazy dog.
- The_five_boxing_wizards_and_the_quick_brown_fox_jumped_over_the_lazy_dog.
- The-five-boxing-wizards-and-the-quick-brown-fox-jumped-over-the-lazy-dog.
The last one looks terrible. Depending on the person, and the typeface, this may not always be the case, but in general the underscore is far superior from a readability standpoint.
3) The Semantics of the Underscore
So you’re a web developer. Hopefully you care about language or readability, but perhaps not. But you definitely do care about semantics. In fact, if you don’t care about semantics on the web, you might not be in the right profession.
It’s not common knowledge, but the underscore isn’t really a character like the rest are on our keyboard. It’s only there because of a little piece of tech history known as the typewriter. In order to underline text, you had to write it out normally, then move the typewriter carriage back and go over it again with underscores. That means that an underscore character all on it’s own is basically an underlined space — which is pretty much as close to an actual space as you can get on the web.
The Better Answer
It should be obvious, but in a sane computer ecosystem we wouldn’t have to use either! We should not have to compromise on any of these three points, where the underscore is only the next-best option. A space should be a space, no matter where it is. Our filesystems themselves work fine with spaces in filenames, so why replace it at all? Unfortunately, a while ago a whole bunch of geeks with no appreciation for language decided it was a good idea to make the space (of all characters) the delimiter for filenames in things like command-line arguments and URLs, etc. There’s a prettygoodreasonweusespacesinwriting, but apparently they didn’t need l4ng4g3 bk 7h3n so whatever, right? Likely, they also didn’t anticipate the masses ever using the software they were writing, so they thought everything would continue to be named the ‘ol cryptic bastardization of English words like “tmp”, “lib”, or “srv”.
Of course, quotes or some other non-space delimiter should have been required from the beginning. The “use spaces, or else fall back on quotes” system is just silly. You have to use some delimiter, of course, but not spaces, because we use them more than any letter in the alphabet. Imagine if we just said that you had to use an “a” to delimit file names and URLs. That’s the same logic, works fine, it’s only less readable. The question then becomes, should we be using @ or 4 to substitute for “a” in URLs? Ridiculous. In any case, that’s the reason that when you have something called “my file.txt” you keep seeing “No such file or directory” on the command line, and also the reason we can’t use spaces in URLs.
Conclusion
As it is, the crazy idea that a space should represent a space and not some other character is pretty impractical. Because of the great mistake of our digital ancestors, space delimiters, you would have to rewrite every web browser on earth to make spaces work properly in URLs. If you do try to use them, a browser encodes them as “%20″, which%20makes%20your%20URLs%20look%20like%20this. Technically, they’re encoded spaces, but it’s hideous. You can barely read it. Oh well, maybe one day this will change.
So, underscores are not quite as good as spaces. They’re a compromise of language, readability, and semantics, but they’re the best we’ve got. Better than dashes, CamelCase, plus+signs, or anything else. So use them. Don’t do something you know is wrong because of the minute difference it might make to your Google PageRank. You should be caring more about your users anyways, and if you do it right, Google will change along with everyone else.
I realize that, ironically, all my other blog posts use dashes instead of underscores. It’s the WordPress default, and I haven’t had time to fix it yet.
Jewelry for idiots — The Sportii Unisex Black Bracelet
August 23rd, 2011Sometimes I come across something so incredibly dumb that it makes me sad for at least a week. The Sportii Unisex Black Bracelet made by Enerjii in the airplane catalogue on my flight home from Rome is a prime example of this. What’s written here is utter nonsense — it’s just a regular fucking bracelet. Phases like “space-age silicone tubing” and “far infrared ceramic balls” are completely devoid of meaning or significance. HUMANS, Y U NO THINK CRITICALLY?
Rounding to a given number of decimals in Javascript
June 14th, 2011When it comes to math, Javascript isn’t great. It lacks the very basics, like an exponentiation operator, where you’re forced to use the long and cumbersome Math.pow() instead. Things get pretty ugly when you try to round numbers, as the Math.round() function only rounds to the nearest integer. In order to round to a certain number of decimals, you can multiply your input by a power of ten and then divide the output of Math.round() by that same power of ten. This is easily done by a function:
function round_to(value, decimals) {
return Math.round(value * Math.pow(10, decimals))/Math.pow(10, decimals);
}
Now, round_to(value, n) returns that value to at most n decimal places. For example, round_to(2, 5) = 2, round_to(1.123456, 5) = 1.12346, and round_to(22.3, 0) = 22.
Easily adopt Django’s new {% url ‘…’ %} template syntax
March 24th, 2011If you want to adopt the new recommended syntax but don’t want to add {% load url from future %} to every single template you have, you can simply add the following to your settings.py file:
import django.template
django.template.add_to_builtins('django.templatetags.future')
Now if you just change your {% url %} tags to the new syntax (see here), everything will work fine and will be compatible with Django 1.5 when it is released.
Notification email names (‘From:’ header) in phpBB3
October 10th, 2010If you use phpBB3 and it ever sends you emails, you’ll notice that the from address usually appears as some thing like “<admin@example.com> <admin@example.com>” or, even worse, the name of the server your using. I’ve even seen “something@box###.bluehost.com” (Bluehost is terrible by the way, stay away from them at all costs). Of course, this is just because there is From: <admin@example.com> in the headers of the email that phpBB sends out. For most apps (e.g. Django), you’d just have to replace your email setting with like “Super Forums <admin@example.com>” and most mail clients would just display the name “Super Forums”, but unfortunately phpBB provides no official way of doing this.
I’m sure there’s much better/more official way of adding the feature, with some kind of plugin for example, but without wanting to spend too much time on it I just hacked this solution together.
- Go into your forum root and find the file
/includes/functions_messenger.php. At around line 450 you’ll see where where the From: and Reply-To: headers for board notifications are defined. Remove the'<' +and+ '>'around both$config['board_contact']from the relevant code so that it looks like the following.
if (empty($this->replyto))
{
$this->replyto = $config['board_contact'];
}
if (empty($this->from))
{
$this->from = $config['board_contact'];
}
- Normally now you’d just have to change the “Contact e-mail address:” setting in the admin interface to “Super Forums <admin@example.com>”, but since phpBB was never designed for this it would actually escape the < and > and it would just become “Super Forums <admin@example.com>”. Instead you actually have to go into your database (I would recommend Sequel Pro if you’re on a Mac) and change ‘board_contact’ in the ‘config’ table to the unescaped value.
That’s it. I had to delete everything in the ‘cache’ folder (except .htaccess) before it would actually work though. Maybe I will make an official plugin of this one day.
Norman Bethune was this famous plumber…
October 10th, 2010The guy probably saved millions of lives by inventing the mobile blood-transfusion service. Concordia pranksters probably did this — I saw this near Guy-Concordia Metro and it made me laugh. Apparently it’s usually a flower.
Some wtf Python float arithmetic
October 8th, 2010Prof. Dominic Lemelin is about to try to prove that 7n + 2 is a multiple of three for all n in math class at the moment, and before I tried proving it myself I decided to do a few tests in Python to make sure I wasn’t wasting my time. It was doing pretty well until it 7365 + 2 caused it to overflow, but there was some weird stuff going on in the float arithmetic. I don’t know why I was using float arithmetic, maybe it’s still too early to be programming.
>>> x = 11398895185373145
>>> x/3
3799631728457715
>>> float(x)/3
3799631728457714.5
>>> wtf()
Update: By the way, x/3 actually gives the right answer, 3799631728457715. This is a key piece of information in recognizing that I’m not just an idiot who forgot that integer division rounds in Python.
Hidden fonts on Mac OS X
July 17th, 2010Over the course of a wide variety of design projects (websites, logos, school assignments, slide shows, posters, etc.) I have slowly and proudly expanded my font collection beyond what is included by default in Mac OS X. For example, while working at an Authorized Apple Service Provider, I-Technique, I acquired one of my most prized possessions, the Myriad Pro set. It’s Apple’s corporate font, used for most of their logos and website headings, etc., and it’s really friggin’ nice. Of course, I thought it would just be a cool font I could design with once in a while, or no more than a typography nerd’s piece of elitist paraphernalia. I never thought it would make a difference, say, browsing the web, because designers know it isn’t installed by default on any major operating system.
But, as it turns out, I was wrong. The folks at ZURB created an awesome looking sliding vinyl demo using CSS 3, which I happened upon the other day while idly reading my RSS feeds in Socialite. Lo and behold, there it was. In the <h1>, Myriad Pro in all of it’s glory, with a CSS gradient mask and text-shadow to boot. What a nice surprise! They must have known 99.99% of their visitors would not have seen the font, rendering in Helvetica instead — maybe they just liked the way it looked in their own browsers? In any case, it made me smile.
The moral of the story is: having extra fonts on your computer makes almost no difference at all, and if you aren’t the type of person who would be made happy by a pretty font (i.e., if you aren’t a typography nerd, i.e., if you can’t tell the difference between Arial and Helvetica), then I would advise against it, as having too many fonts slows down your computer. Moving on.
As it turns out, there are a whole host of fonts that, for the above reason, are available to certain applications on Mac OS X, but not actually installed into the system-wide font library. I discovered this while looking for Palatino, which I knew was installed somewhere, but couldn’t find in my system fonts. I opened up Terminal.app, ran a simple locate -i palatino, and found exactly what I was looking for. This, however, opened up a whole new dimension to my search: hidden fonts in Mac OS X.
These little treats are literally littered all around the OS. To install them (as with all font files), you just double click on the icons, and Font Book will open up, with a dialogue asking you if you to confirm the installation.
If you have iWork installed, then in /Library/Application Support/Apple/Fonts/iWork, you will find lots of cool fonts that aren’t normally available to the rest of the OS. These include:
- Academy Engraved LET Fonts
- Bank Gothic
- Blackmoor LET Fonts
- BlairMdITC TT-Medium
- Bodoni Ornaments ITC TT
- Bodoni SvtyTwo ITC TT
- Bodoni SvtyTwo OS ITC TT
- Bodoni SvtyTwo SC ITC TT
- Bordeaux Roman Bold LET Fonts
- Bradley Hand ITC TT-Bold
- Capitals
- Jazz LET Fonts
- Mona Lisa Solid ITC TT
- Palatino
- Party LET Fonts
- PortagoITC TT
- Princetown LET Fonts
- Santa Fe LET Fonts
- Savoye LET Fonts
- SchoolHouse Cursive B
- SchoolHouse Printed A
- Snell Roundhand
- Stone Sans ITC TT
- Synchro LET Fonts
- Type Embellishments One LET
Similarly, if you have iLife installed, there are a bunch of fonts that come with iDVD, not available to the rest of the system. These are in /Applications/iDVD.app/Contents/Resources/Fonts (Ctrl-Click on iDVD.app and select “Show Package Contents” to get there).
- Academy Engraved LET Fonts
- Algerian Condensed LET Fonts
- Bank Gothic
- BlairMdITC TT-Medium
- Bodoni SvtyTwo SC ITC TT
- Bradley Hand ITC TT-Bold
- Cracked
- Gadget
- Handwriting – Dakota
- Humana Serif ITC TT
- Machine ITC TT
- Palatino
- PortagoITC TT
- Santa Fe LET Fonts
- Savoye LET Fonts
- Snell Roundhand
- Stone Sans ITC TT
- Textile
- Wanted LET Fonts
Eight different versions of Lucida come with your standard installation of Java, but can only be found by going to /System/Library/Frameworks/JavaVM.framework/Home/lib/fonts/ (again, with the “Show Package Contents” trick).
- LucidaBrightDemiBold
- LucidaBrightDemiItalic
- LucidaBrightItalic
- LucidaBrightRegular
- LucidaSansDemiBold
- LucidaSansRegular
- LucidaTypewriterBold
- LucidaTypewriterRegular
A font called Matrix Ticker is available in the ESPN widget, which is installed by default. It’s at /Library/Widgets/ESPN.wdgt/ESPNTicker.dfont.
Another two widget fonts, found inside the Unit Converter widget, are at /Library/Widgets/Unit Converter.wdgt/DB LCD Temp-Black.ttf and/Library/Widgets/Unit Converter.wdgt/UC-LCD.ttf. These are pretty cool, mimicking the look of a seven-segment display.

That’s all I have for now. Let me know in the comments if there are others you have discovered. The more fonts, the better! (Hey, if I’m going to have a mindless consumer attitude towards something, it might as well be something that takes up no physical space and uses no natural resources, right?)








