MindFlailing: 2018

The war of tabs vs spaces has been raging for many moons.

My question: What other options are there? Is there a better way?

Tabs

Tabs are interesting. The arguments for say that they allow folks to format the code kinda how they want. They say that code is not ascii art and that things shouldn't be lined up. Tabs also save space, as they are just a single character for each indention level.

Spaces

Spaces folks argue that you don't need to view the code with different indent levels. There is a perfect indent level (usually argued to be 2 or 4) and we should stick to that. With spaces, code looks one way and you can get used to it. You can also line stuff up to make it easier to read without that alignment being thrown off by the dynamic-ness of tabs.

My Knife in the Gunfight

I am a bit of a ridiculous person with some of my ideas, I know that, but that's the only way I know how to make friends... Ahem... anyway, tabs have a bit of interesting history that not many seem to know. The tab character is really the "Horizontal Tabulation" character. It was created for printing and type writing stuff back in the day to help people make tables. On the printer or typewriter you would place a tab stop bar that would represent the next table over. In this way you could type in column 1 and then hit tab to advance to the next lovely table column. In fact, there is also a "Vertical Tab" key as well. This would allow you to move to the next row of your table. This character still exists and is basically unused. All that said, the tab key was not originally meant to be used for indentation.

Early word processors treated a page as if there was a tab stop every 8 or so characters. Hitting tab would advance to the next spot. This led to it being used as an easy indent tool. This small limit of 8 characters, however, broke the original table-building purpose of tab. If you want to build a table, you sometimes have to put two tabs after a word and sometimes have to put one, like so:

<HT> = horizontal tab

short<HT><HT>next col

reallylong<HT>next col

This gets annoying, as you have to manage the table yourself, which defeats the purpose of the <HT> character.

So, here we get to the heart of my proposal: What if we gave tab it's original purpose of making tables? Essentially, you could have it so that when a <HT> character exists on a line, it attempts to line itself up with th <HT>s on the previous and next line. In other words, the code would look like this on the disk:

foo(<HT>paramA,<HT>paramB,

<HT>reallylongparamC,<HT>paramD )

And it would be displayed like this:

foo( paramA, paramB,

reallylongparamC, paramD )

Yay! Horizontal tab has it's purpose again

Ok, so now tab has gained it's original glory, lets set up things further with a look at line endings.

As you probably already know, Windows uses the line endings <CR><LF> (Carriage return + Line Feed.) Linux uses just the <LF> (Line feed.) This, again, goes back to printers and typewriters. The carriage return refers to the print head returning to the start of the line and the line feed refers to advancing the paper by one line. Windows folks were staying true to the original purpose by saying that just a line feed would technically put you somewhere in the middle of the next line and that both are needed. Linux folks said, to hell with that. No one will ever need that! On the console it makes sense to just use a carriage return without a line feed, as that lets you overwrite the current line, but just a line feed is kinda pointless. Thus, they decided to say that <LF> has an implied carriage return and that would let them save time, space, and some complexity... while making every programmer from then on subtly hate that the two parties never did agree.

This leads to our first possible solution. It would make a lot of folks angry and really mess with line endings and indentation. Basically we treat a <CR> as setting up the next indentation information. When a <CR> is seen, we look for any <HT> characters afterward to mark our new indentation level. This indentation is finished with a <LF> character. If just a <LF> character is seen, a new line is made at the previous <CR> indentation setting. It may be easier to see an example:

indentation is handled by the following:
<CR> = carriage return
<LF> = line feed
<HT> = horizontal tab

if (newCodeLayout == cool) {<CR><HT><LF>

a = someFunc()<LF>

b = someOtherFunc()<LF>
if (anotherIndent == true) {<CR><HT><HT><LF>
c = lastFunc()<CR><HT><LF>
}<CR><LF>

}<LF>

As you can see, the <CR> marks the start of a new indentation block. <HT>s are repeated after the <CR> to set the indent level. <LF> separates the lines in that indentation block.

If you hate the idea of using <CR> and <LF> in yet another standard, there are other options as well. There is in ascii a bunch of "dead"-ish characters. They were used here and there by terminals and printers, but now serve no real purpose. There are some like "shift out" which lets you swap color ribbons on an old 2 color ribbon printer. This "shift out" has since been stolen by some languages to "shift out" to a different character set or to switch to emoji. In Russian standards, for example, you can use <SO> and <SI> for Roman and Cyrillic characters. There are a few other characters that aren't used hardly at all. They are the "group separator", "record separator", and "unit separator". Funny enough, the "TEC 7901 R5" spec says to use "unit separator" to mark items to be displayed in table columns... Sigh, what a mess. Thank goodness that spec is just for news outlet stuff. I say we use the unit separator to mark "units" of code, a.k.a. indent blocks.

The new file layout comes together like this:

indentation is handled by the following:
<US> unit separator
<LF> line feed
<CR> carriage return

if (newCodeLayout == cool) {<LF><HT><US>

a = someFunc()<LF>

b = someOtherFunc()<LF><US>

}

So, basically, a unit separator would mark the start of an indent block. The number of tab chars before that unit separator says how far to indent that unit of code. Line feeds would simply start a new line in the indent block without changing the indent level.

All of these characters would be handled auto-magically by the IDE.

File sizes would, on average, be smaller than both the tabs and spaces methods, as indent doesn't have to be specified on every line. All we would have to do... is... uh... re-write all document programs to support it! That's it!

Ok, you can't get too peeved by me, seeing as how this section is titled what it is. I did give you a heads up. I hope you at least enjoyed my strange ideas, even if they are not very likely to get implemented or anything.

Anyway, I don't expect this to go anywhere or ever be implemented. At the very least I hope you found it interesting. If it was otherwise a complete waste of your time, then feel free to berate me in the comments! You gotta pay the troll toll, though, if you want to get in.

MindFlailing

Friday, November 16, 2018

Tabs vs Spaces... vs Other Options?

Tabs

Spaces

My Knife in the Gunfight

Programming Languages Beyond Text Files