File encoding – extra white spaces added

0 , Permalink

It’s been a very long time since I have written. Work has kept me very busy over the last 2 weeks or so, but this are finally letting up as I am reaching the home stretch.

I ran into an issue that nearly stopped my heart last week. I had been converting my HTML files into PHP templates and everything was going well. I started this process at home, using my PC. The next day, I worked on the site at work, on a Mac, and when I brought my files home and copied them over, that’s when things started breaking on me.

When I opened the files in IE8, the site was 100% broken. You can imagine how frightening that can be for a front-end developer. I re-ran my code through the validator to see if I had missed some closing tags, but that was not the issue – everything on the validating front was ok. I scanned through my markup to see if there was possibly anything that would not be IE compatible. I started putting borders around my divs and checking to ensure that my CSS rules complied (hasLayout), though my understanding has been that hasLayout is not really an issue with IE8.

I then noticed something else that was a bit strange in the other browsers. Although the layout of the pages were almost perfect, there was a small gap before the header bar. I checked the source code for the page and didn’t spot anything out of the ordinary, but when I inspected the elements using Firebug, I noticed some blank lines.

I opened the files in my editor (Notepad++) and did not see anything out of the ordinary. What the heck is happening here? I did some research online and when I opened the file in Vim (another text editor), I saw it – some gibberish at the start of line.

Looking into the issue further, it turns out that it is related to file encoding, which I have never had an issue with before. Back in Notepad++, I found that the files I brought back from work were encoded as UTF-8. If I saved the file as ANSI, it would eliminate the strange characters (), however, any special characters in the file would be replaced by weird symbols. And then I noticed an option to encode as UTF-8 without BOM. I’ve since learned that  is a Byte Order Mark (BOM) of the UTF-8 unicode standard. If I encoded my file as UTF-8 without BOM, it also eliminates the characters at the start of the line, and keeps any other special characters on the page as is.

Upon further reading, BOM is an invisible character placed at the beginning of UTF-8 files to tell people what the encoding is and what the endianness (byte order) of the text is. I found the following sites to provide some useful information:
http://htmlpurifier.org/docs/enduser-utf8.html
http://documentation.basis.com/BASISHelp/WebHelp/inst/character_encoding.htm

Once I changed the encoding, the pages rendered correctly in all browsers. What a huge relief that was.

I am now using Espresso, and I have set my preferences to save in UNIX format. I’m not sure if that’s related or not, but I have not run into the same problem since. Now when I am moving files from a Mac to PC, I have gotten into the paranoid habit of quickly checking a couple of the files for encoding.

Comments are closed.