Is there a standard/preferred list order for non-alphanumeric characters?

submitted by 58008

Alphanumerical lists are sortable by alphabet and number, obviously, but if you have a list where each entry begins with a different punctuation mark (or any other kind of non-alphanumeric character), is there a similar standardised ordering method for them?

I imagine, for example, that a comma will come before whatever this is: ¦

I just tested an A-Z sort in Google Sheets where each cell was a different punctuation mark, and it seemed to rearrange what I'd entered into *some sort* of order, but is this order shared universally? Is there a global Unicode-compliant ordering method everyone uses?

Cheers!

Log in to comment

7 Comments

CountVon , edited

There is a Unicode Technical Standard for this, called the Unicode Collation Algorithm. Whether everyone uses it, I can't say. As it says on the linked page:

Conformance to the Unicode Standard does not imply conformance to any UTS.

So in other words it's possible to conform to the Unicode Standard without adhering to the Unicode Collation Algorithm.

whatever this is: ¦

That is the pipe symbol, or vertical bar. When it has a gap in the middle it may be known as the broken pipe symbol or broken bar. It's considered the same symbol with or without the gap. Early terminals displayed it with a gap to make it distinguishable from lower-case L characters.

elmicha

The vertical bar (pipe) and broken bar are not the same symbol. Wikipedia has a whole section about it ("Solid vertical bar versus broken bar"). Only the pipe character can be used for pipes in Linux/Windows/Mac terminals.

RegalPotoo

This is the technically correct answer, and like lots of things is waaaaay more complicated than you'd expect.

Shadow

Ascii numbers?

fubo

If your input is limited to ASCII, sure.

But ASCII is only a 7-bit standard, and only supports those characters needed by American English computer users in the 1960s. Lots of characters you might see in "plain text" are not part of ASCII; including all accented characters, all non-Latin alphabets, and many common symbols and punctuation marks including these: £€¢©™°

(Yes, you could get accented characters in the pre-Unicode days using 8-bit "extended ASCII", e.g. IBM/Windows code pages. However, those are not really ASCII and they will break if the text is interpreted as the wrong code page.)

Unicode collation is the Right Thing today.

Pronell

That's the best standard I can think of.

🇰 🔵 🇱 🇦 🇳 🇦 🇰 ℹ️ , edited

You mean like symbols and pipes? Punctuation and such?

Usually those do go before numbers, and letters (punctuation>numbers>letters is generally how lists that contain all those are ordered alphabetically).

They are probably further ordered by the order they appear in whatever it is that stores that stuff, which comes down to just how it was coded. I'm pretty sure the order of Unicode is universal and they are even numbered if you look at the character map tool in Windows. Assuming you're using software to order things for you, anyway. For a handwritten list I don't think there even is a standard for it. Would a period come before or after a comma? 🤔