Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The author wants the "worse" sort, one based on ASCII/Unicode codepoints, without any intelligence for numbers that 99% of GUI users want.

I want the author's opinion on how caplital and lowercase letters should be sorted. Do they follow strict ASCII/Unicode codepoints, or do they normalize into actual alphabetical order and sort upper/lower within each letter?



And where do you sort the letter ä? (After a is correct in German, but I think Swedish does it differently.)


This feels like the right moment to mention "ch", which is considered a letter in orthodox Czech, sorted between "h" and "i". The problem is, you can't reliably distinguish between "ch"-the-letter and "ch" as just "c" and "h" combined, which are present in loan words but also some original Czech compound words.

So if you're doing it "properly", sorting strings in Czech involves understanding the etymology of every word.


What a headache! I'm glad that the relevant standard ČSN 97 6030 does not demand analysis of compounds or knowledge of etymology.


That's why we have all this LC_* stuff in Linux, which you can configure to your needs:

  export  LC_MEASUREMENT="de_DE"
  export  LC_MONETARY="de_DE" 
  export  LC_PAPER="de_DE"                             
  export LC_CTYPE=de_DE.UTF-8  
  export LC_MESSAGES="en_US.UTF-8"        
  export LC_RESPONSE="en_US.UTF-8"  
  export LC_TIME=en_US.UTF-8
Mix in your Swedish or Swaheli, maybe even the Vatican State:

   e.g. de_DE, sw_TZ, it_VA (not guaranteed ;-).


> export LC_TIME=en_US.UTF-8

Why would you do this to yourself?


Why? For example to not have diacritics in month names? Take them as examples as you can easily add them to a shell script to make in work the way you want.


But you get

* 12h time

* Sunday start of week

* Silly pyramid mm/dd/yyyy


How does this work if you're a multi-lingual person and you have files with names in different languages?


I'm multi-lingual but try to separate business stuff for example (multi-lingual) from private stuff (mostly one language), so clashes between languages rarely happen.

But if it gets complicated I'll usually resort to Perl scripts to take care of pesky details. Sorting an associative array where the key is a string in unified form and the value is the multi-lingual target is rather easy in a script language which one is fluent in.


The sorting order is only defined between strings of the same locale, not between strings of different locales.

You can specify the sorting order per command like

LC_COLLATE="tr_TR.utf8" ls

if it differs from your system or user locale.

An alternative is to first transliterate the strings to ASCII and then sort them (but this does not preserve the sorting order of non-latin scripts).


You could alias cd to a shell script that sets the env based on the location.


> I want the author's opinion on how caplital and lowercase letters should be sorted. Do they follow strict ASCII/Unicode codepoints, or do they normalize into actual alphabetical order and sort upper/lower within each letter?

I prefer the strict ASCII / Unicode sorting (all capitals first, then all lowercase).


Asciibetical sorting




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: