More

jtvjan · 2026-01-04T13:45:23 1767534323

Maktone is so good! I remember hearing one of their songs in a GBA intro[1] and it still gets stuck in my head sometimes...

[1]: https://youtu.be/CGaqlSIUSEo

jtvjan · 2025-12-29T14:11:49 1767017509

I mean, to an extent... like it would still give passengers an earlier opportunity to correct course.

If they got off at the next stop after troisdorf they could take the local bus back to Troisdorf (ten minute wait worst-case).

At later stations they could get on the train in the opposite direction (30 min wait worst-case).

jtvjan · 2025-11-24T08:59:25 1763974765

i'm a little bit sad the kernel diagram background is gone

jtvjan · 2025-11-05T15:05:05 1762355105

That's upsetting. Being able to do templating without using JavaScript was a really cool party trick.

I've used it in an unfinished website where all data was stored in a single XML file and all markup was stored in a single XSLT file. A CGI one-liner then made path info available to XSLT, and routing (multiple pages) was achieved by doing string tests inside of the XSLT template.

jtvjan · 2025-05-02T12:22:03 1746188523

I think that convention depends on the language, not the currency.

For example, in German it's usually written postfix, but in Dutch it's usually prefix.

jtvjan · 2025-04-15T13:55:21 1744725321

this might be conspirational thinking, but i don't think it's an accident that the site came out like this. yes, there's moderation, but the moderators are explicitly told to go easy on moderating racism[1]. it feels like once that kind of stuff isn't punished, it starts to snowball a change in the attitudes of the site as a whole.

that's not to say stringent moderation doesn't make a site less welcoming, though. it's about choosing what's the lesser evil to you, i guess.

[1]: https://www.vice.com/en/article/the-man-who-helped-turn-4cha...

NoMoreNicksLeft · 2025-04-16T14:04:23 1744812263

>but the moderators are explicitly told to go easy on moderating racism[

What would be gained if they didn't "go easy on racism"? Would we all start singing kumbayah and love each other, hippy-style? Or would people be just as racist even more remote corners of the internet/world, and then slightly-left-of-center-minded individuals could pretend that all the world's problems were solved and it could continue for another 100 years?

ToucanLoucan · 2025-04-19T14:02:48 1745071368

Letting people with abhorrent beliefs assemble with one another and commiserate on the awful things they believe... I mean I don't think I'd go so far as to say it's responsible for our current historical moment, but it certainly isn't helping it. The primary disadvantage of believing terrible, anti-social things is you would be ejected from social groups, be them communal or familial. That's not to say that racism didn't exist before the internet of course, it absolutely did; but racism and sexism were both on society scale improving over time, because those beliefs would cost you: they would cost you spouses, they would cost you children, they would cost you friends, in extreme cases they would cost you jobs and potentially even open you up to legal trouble.

And it still does, but it's less effective, because various flavors of cretin now have online spaces where they can meet like-minded people and nurture those beliefs, and worse still, all of those spaces reward extremism as any social media site does: subtle, balanced views are not incentivized at all, and you get the most social attention for saying the most outrageous thing in the space. We all know this, like maybe you've never thought about it before, but I'd wager almost everyone on this board has had this experience over one thing or another, even benign nothing issues.

And all of that is before we even get to the subject of things like influencers peddling YouTube videos, TikToks, or whatever to amplify those beliefs for their own profit. Whether they "really believe" these things is irrelevant frankly; in either case, people who believe these things see people being paid to represent their (wrong) ideas which lends them legitimacy.

And now we just have little bespoke engines of radicalization humming away all over the internet in the little shadowy corners, whipping people up into a lather about whatever dumbass thing they googled way back about how they can't get a girlfriend or whatever, and there seem to be a lot of spree shootings now for some reason, totally disconnected I'm sure.

Like the problem with this Libertarian "as long as you're not hurting anyone" is that it leaves a wide open loophole in there about hurting yourself, and while in many cases hurting yourself doesn't lead to anyone being harmed apart from yourself, as I keep saying: No one is an island, if you harm yourself in certain ways, you are absolutely a risk to other people.

sickofparadox · 2025-04-19T14:09:59 1745071799

The totalizing idea that your beliefs and values get to be the ones guiding the moderation of every single conversation happening anywhere on the internet (and therefore, the world) is probably more authoritarian than 80% of the ideas informing people who post on /pol/.

johnnyjeans · 2025-04-15T14:50:27 1744728627

> it feels like once that kind of stuff isn't punished, it starts to snowball a change in the attitudes of the site as a whole.

Considering the site has been around for over 20 years and people still call out and flame racism, I think this is an uncharitable and unfounded cynicism. I'm not sure declarative claims of 3rd order effects in a system so chaotic are capable of being accurate.

KennyBlanken · 2025-04-15T23:53:27 1744761207

Multiple white supremacist mass shooters have been 4chan users.

4chan cheered on the Buffalo shooter who was live updating a 4chan thread during his murder spree: https://www.thetrace.org/newsletter/4chan-moderation-buffalo...

The christchurch shooter was a 4chan regular https://theconversation.com/christchurch-terrorist-discussed...

The whole "boogaloo" white nationalist/supremacist movement started on 4chan:

https://www.splcenter.org/resources/reports/mcinnes-molyneux...

Stop whitewashing 4chan's history.

HideousKojima · 2025-04-16T02:38:49 1744771129

And the Zizian murder cult sprang out of the bay area rationalist community and trans rights advocacy, what's your point?

ToucanLoucan · 2025-04-16T03:04:45 1744772685

You say this like the rationalist community and 4chan edgelords aren't two circles with an incredible amount of overlap.

filoleg · 2025-04-16T21:30:44 1744839044

> You say this like the rationalist community and 4chan edgelords aren't two circles with an incredible amount of overlap.

They are not.

Rationalists are the crowd that would attract typical Bay Area tech yuppies. Which is something that 4chan seems to despise with passion and makes merciless fun on.

Just go on /g/ (the technology board) and see any mentions of bay area, rationalists, or tech companies/startups. If you believe there is a significant overlap, then they surely are hiding it really well there by mercilessly mocking everything related to any of those topics.

potato3732842 · 2025-04-15T21:38:32 1744753112

I think people, whether they know it or not, rightly realize that race is too simplistic of a way to mark people as good/bad or whatever so even in communities that would be fine with racism it's gonna catch a lot of shit for simply not being a good way to accomplish its goal.

jtvjan · on Dec 16, 2024

yes, definitely. if you don't check the schedule multiple times per day, you're bound to end up in an empty classroom or missing class. at least once a week, in my high school experience, a class gets moved in time and/or place, or gets cancelled entirely.

it's like train timetables, you know. yes, they're meant to be the same every day, but you'd be a fool not to check the updates before you go. that's just how it is in a large chaotic system.

jtvjan · on Nov 24, 2024

A coworker once implemented a name validation regex that would reject his own name. It still mystifies me how much convincing it took to get him to make it less strict.

throw310822 · on Nov 24, 2024

I know multiple developers who would just say "well it's their fault, they have to change name then".

MrJohz · on Nov 24, 2024

I worked with an office of Germans who insisted that ASCII was sufficient. The German language uses letters that cannot be represented in ASCII.

In fairness, they mostly wanted stuff to be in English, and when necessary, to transliterate German characters into their English counterparts (in German there is a standardised way of doing this), so I can understand why they didn't see it was necessary. I just never understood why I, as the non-German, was forever the one trying to convince them that Germans would probably prefer to use their software in German...

bee_rider · on Nov 24, 2024

I’ve run into a similar-ish situation working with East-Asian students and East-Asian faculty. Me, an American who wants to be clear and make policies easy for everybody to understand: worried about name ordering a bit (Do we want to ask for their last name or their family name in this field, what’s the stupid learning management system want, etc etc). Chinese co-worker: we can just ask them for their last names, everybody knows what Americans mean when they ask for that, and all the students are used to dealing with this.

Hah, fair enough. I think it was an abstract question to me, so I was looking for the technically correct answer. Practical question for him, so he gave the practical answer.

sandreas · on Nov 24, 2024

You should have asked how they would encode the german currency sign (€ for euro) in ASCII or its german counterpart latin1/iso-8859-1...

It's not possible. However I bet they would argument to use iso-8859-15 (latin9 / latin0) with the international currency sign (¤) instead or insist that char 128 of latin1 is almost always meant as €, so just ignore the standard in these cases and use a new font.

This would only fail in older printers and who is still printing stuff these days? Nobody right?

Using real utf-8 is just too complex... All these emojis are nuts

richardwhiuk · on Nov 24, 2024

EUR is the common answer.

asddubs · on Nov 24, 2024

or just double all the numbers and use DM

Y_Y · on Nov 24, 2024

Weirdly the old Deutsch Mark doesn't seem to have its own code point in the block start U+20A0, whereas the Spanish equivalent (Peseta, ₧, not just Pt) does.

account42 · on Nov 26, 2024

It's not a Unicode issue, there just isn't a dedicated symbol for it, everyone just used the letters DM. Unicode (at least back then) was mostly a superset of existing character sets and then distinct glyphs.

Y_Y · on Nov 26, 2024

That would be a fine answer, but for the fact that other currencies like the rupee (₨) that are "just letters" do have their own codepoint. Being made up of two symbols doesn't necessarily make something not a symbols, in semiotics or in Unicode.

In fact this is one of the root problems, there are plenty of Unicode symbols you can make out of others, either juxtaposing or overstriking or using a combining character, but this isn't consistently done.

tugu77 · on Nov 25, 2024

TIL

https://www.compart.com/en/unicode/block/U+20A0

Even Bitcoin is there. And "German Penny Sign"?

throw0101a · on Nov 25, 2024

> international currency sign (¤)

TIL:

* https://en.wikipedia.org/wiki/Currency_sign_(generic)

account42 · on Nov 26, 2024

UTF-8 is simple, it's Unicode that is complex.

sandreas · on Nov 28, 2024

Besides UTF-8 is not that simple, it still was irony :-)

hooby · on Nov 25, 2024

There are some valid reasons to use software in English as a German speaker. Main among those is probably translations.

If you can speak English, you might be better of using the software in English, as having to deal with the English language can often be less of hassle, than having to deal with inconsistent, weird, or outright wrong translations.

Even high quality translations might run into issues, where the same thing is translated once as "A" and then as "B" in another context. Or run into issues where there is an English technical term being used, that has no prefect equivalent in German (i.e. a translation does exist, but is not a well-known, clearly defined technical term). More often than not though, translations are anything but high quality. Even in expensive products from big international companies.

MrJohz · on Nov 26, 2024

This is definitely a problem that can occur, but for the one I was thinking of originally when writing the comment, we had pretty much all the resources available: the company sold internationally, so already had plenty of access to high-quality translators, and the application we were building was in-house, so we could go and ask the teams themselves if the translations made sense. More importantly, the need was also clearly there - many of the users of the application were seasonal workers, often older and less well-educated, in countries where neither English nor German were particularly relevant languages. Giving buttons labels in our users' languages meant they could figure out what they needed to do much more quickly, rather than having to memorise button colours and positions.

You're right that sometimes translation for technical terms is difficult, but the case I experienced far more often was Germans creating their own English words, or guessing at phrases they thought ought to exist because their English was not as good at they believed.

I agree that high quality translations are hard, and particularly difficult to retrofit into an existing application. But unless you have a very specialised audience, they're usually worth it!

Muromec · on Nov 25, 2024

UX translations are broken most of the time for most of the software and not just in German. People just pretend it's working and okay, when it's not.

And then developers just do N > 1 ? "things" : "thing" without thinking twice, not use pgettext and all the other things.

account42 · on Nov 26, 2024

Compiler errors or low level error messages in general are a good example. Translating them reduces the ability of someone who doesn't share your language to help you.

throw0101a · on Nov 25, 2024

> I just never understood why I, as the non-German, was forever the one trying to convince them that Germans would probably prefer to use their software in German...

I've heard that German is often one of the first localizations of (desktop) software because there were often super-long words in the translations of various concepts, so if you wanted to test typeface rendering and menu breakage it was good language to run through your QA for that.

int_19h · on Nov 25, 2024

Or you use pseudo-localization, which does simple programmatic substitution to make all English strings longer by e.g. doubling letters or inserting random non-alphabetic characters, adding diacritics etc while still retaining readability to English speakers.

Windows actually ships with a locale like that.

ordu · on Nov 25, 2024

> I just never understood why I, as the non-German, was forever the one trying to convince them that Germans would probably prefer to use their software in German...

I cannot know, but they could be ideological. For example, they had found it wonderful to use plain ASCII, no need for special keyboard layouts or something like that, and they decided that German would be much better without its non-ASCII characters. They could believe something like this, and they wouldn't say it aloud in the discussion with you because it is irrelevant for the discussion: you weren't trying to change German.

account42 · on Nov 26, 2024

Perhaps you shouldn't be speaking for Germans then? Personally, I'd rather not have localization forces on me. Looking at you, Google.

MrJohz · on Nov 26, 2024

I don't think localisation should be forced on anyone, but we had enough people using our software who couldn't speak English that getting it right would have made a lot of people's lives easier. At one place I worked, they even added Cantonese text to a help page to let Cantonese users know how to get support - but all the text on the buttons and links to get to that point was in English!

As developers, we need to build software for our users, and not for ourselves. That means proper localisation, and it means giving users the option of choosing their own language and settings.

guappa · on Nov 25, 2024

I know someone who changed name just to remove the dots and have an "easier time when travelling"

guappa · on Nov 25, 2024

Our own software that we sell was crashing if you had a locale set in anything else than american english.

The coworker who made that happen said I'm a weirdo for setting my machine in my own language. According to him I should have set it to english.

This of course happened in a non english speaking country.

croes · on Nov 24, 2024

Is name validation even possible?

perching_aix · on Nov 24, 2024

In certain cultures yes. Where I live, you can only select from a central, though frequently updated, list of names when naming your child. So theoretically only (given) names that are on that list can occur.

Family names are not part of this, but maybe that exists too elsewhere. I don't know how people whose name has been given to them before this list was established is handled however.

An alternative method, which is again culture dependent, is to use virtual governmental IDs for this purpose. Whether this is viable in practice I don't know, never implemented such a thing. But just on the surface, should be.

Muromec · on Nov 24, 2024

>So theoretically only (given) names that are on that list can occur.

Unless of course immigration is allowed and doesn't involve changing a name.

taneliv · on Nov 25, 2024

Not the OP, but immigration often involves changing your name in the way digital systems store and display it. For example, from محمد to Muhammad or from 陳 to Chen. The pronunciation ideally should stay the same, but obviously there's often slight differences. But if the differences are annoying or confusing, someone might choose an entirely different name as well.

_ugfj · on Nov 25, 2024

Yes but GP said

> Where I live, you can only select from a central, though frequently updated, list of names when naming your child

I was born in such a country too and still have frequent connections there and I can confirm the laws only apply to citizens of said country so indeed immigration creates exceptions to this rule even if they transliterate their name.

bjackman · on Nov 24, 2024

I still don't see how any system in the real world can safely assume its users only have names from that list.

Even if you try to imagine a system for a hospital to register newly born babies... What happens if a pregnant tourist is visiting?

Y_Y · on Nov 24, 2024

For example in Iceland you don't have to name the baby immediately, and the registration times are different for foreign parents.https://www.skra.is/english/people/registration-of-children/...

Of course then you may fall foul of classic falsehood 40: People have names.

rrr_oh_man · on Nov 25, 2024

For today's lucky 10,000: Falsehoods programmers believe about names (https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...)

Skeime · on Nov 25, 2024

For today's lucky 10,000: Ten Thousand (https://xkcd.com/1053/)

perching_aix · on Nov 24, 2024

With plenty of attitude of course :)

I've only ever interacted with freeform textfields when inputting my name, so most regular systems clearly don't dare to attempt this.

But if somebody was dead set on only serving local customers or having only local personnel, I can definitely imagine someone being brave(?) enough.

onionisafruit · on Nov 25, 2024

The name a system knows you as doesn’t need to correspond to your legal name or what you are called by others.

tomtomtom777 · on Nov 24, 2024

This assumes every resident is born and registered in said country which is a silly assumption. Surely, any service only catered only to "naturally born citizen" is discriminatory and illegal?

lmm · on Nov 25, 2024

> Surely, any service only catered only to "naturally born citizen" is discriminatory and illegal?

No, that's also a question that is culturally dependent. In some contexts it's normal and expected.

marcus_holmes · on Nov 25, 2024

I read that Iceland asks people to change their names if they naturalise there (because of the -sson or -dottir surname suffix).

But your point stands - not everyone in the system will follow this pattern.

perching_aix · on Nov 25, 2024

Obviously, foreigners just living or visiting here will not have our strictly local names (thinking otherwise is what would be "silly"). Locals (people with my nationality, so either natural or naturalized citizens) will (*).

(*) I read up on it though, and it seems like exceptions can be requested and allowed, if it's "well supported". Kinda sours the whole thing unfortunately.

> is discriminatory and illegal?

Checked this too (well, using Copilot), it does appear to be illegal in most contexts, although not all.

But then, why would you want to perform name verification specific to my culture? One example I can think of is limiting abuse on social media sites for example. I vaguely recall Facebook being required to do such a thing like a decade ago (although they definitely did not go about it this way clearly).

armada651 · on Nov 24, 2024

Yes, it is essential when you want to avoid doing business with customers who have invalid names.

ryandrake · on Nov 24, 2024

You joke, but when a customer wants to give your company their money, it is our duty as developers to make sure their names are valid. That is so business critical!

Muromec · on Nov 24, 2024

It's not just business necrssary, it's also mandatory to do rigjt under gdpr

xtiansimon · on Nov 24, 2024

In legitimate retail, take the money, has always been the motto.

That said, recently I learned about monetary policy in North Korea and sanctions on the import of luxury goods.

Why Nations Fail (2012) by Daron Acemoglu and James Robinson

https://en.wikipedia.org/wiki/United_Nations_Security_Counci...

Diti · on Nov 24, 2024

What are “invalid names” in this context? Because, depending on the country the person was born in, a name can be literally anything, so I’m not sure what an invalid name looks like (unless you allow an `eval` of sorts).

Muromec · on Nov 24, 2024

The non-joke answer for Europe is extened Latin, dashes, spaces and apostrophe sign, separated into two (or three) distinct ordered fields. Just because it's written in a different script originally, doesn't mean it will printed only with that on your id in the country of residence or travel document issued at home. My name isn't written in Latin characters and it's fine. I know you can't even try to pronounce them, so I have it spelled out in above mentioned Latin script.

throw_a_grenade · on Nov 25, 2024

Non-joke answer for Europe is at least Latin, Greek or Cyrillic (български is already one of the official EU languages!). No reason to treat them differently, just don't allow for mixing them so you won't get homoglyphs. EURid (.eu-NIC) gets it mostly right I believe.

account42 · on Nov 26, 2024

The non-theoretical answer for Europe is just Latin because the names need to eventually be read by people who don't know Greek or Cyrillic.

dgoldstein0 · on Nov 24, 2024

Obligatory xkcd https://xkcd.com/327/

jandrese · on Nov 24, 2024

What if your customer is the artist formerly known as Prince or even X Æ A-12 Musk?

rsynnott · on Nov 25, 2024

Prince is still mostly screwed, even without spurious validation; Unicode doesn't allow personal symbols. Some discussion here: https://www.unicode.org/mail-arch/unicode-ml/Archives-Old/UM...

chungy · on Nov 24, 2024

Prince: "Get over yourself and just use your given name." (Shockingly, his given name actually is Prince; I first thought it was only a stage name)

Musk: Tell Elon to get over his narcissism enough to not use his children as his own vanity projects. This isn't just an Elon problem, many people treat children as vanity projects to fuel their own narcissism. That's not what children are for. Give him a proper name. (and then proceed to enter "X Æ A-12" into your database, it's just text...)

jandrese · on Nov 25, 2024

Sure it is just text, but the context is someone who wrote a isValidHumanName() function.

ValentinA23 · on Nov 24, 2024

Don't validate names, use transliteration to make them safe for postal services (or whatever). In SQL this is COLLATE, in the command line you can use uconv:

>echo "'Lódź'" | uconv -f "UTF-8" -t "UTF-8" -x "Latin-ASCII"

>'Lodz'

poincaredisk · on Nov 24, 2024

If I ever make my own customer facing product with registration, I'm rejecting names with 'v', 'x' and 'q'. After all, these characters don't exist in my language, and foreign people can always transliterate them to 'w', 'ks' or 'ku' if they have names with weird characters.

notanote · on Nov 24, 2024

The name of the city has the L with stroke (pronounced as a W), so it’s Łódź.

poincaredisk · on Nov 24, 2024

And the transliteration in this case is so far from the original that it's barely recognisable for me (three out of four characters are different and as a native I perceive Ł as a fully separate character, not as a funny variation of L)

Muromec · on Nov 24, 2024

The fact that it's pronounced as Вуч and not Лодж still triggers me.

pavel_lishin · on Nov 24, 2024

I just looked up the Russian wikipedia entry for it, and it's spelled "Лодзь", but it sounds like it's pronounced "Вуджь", and this fact irritates the hell out of me.

Why would it be transliterated with an Л? And an О? And a з? None of this makes sense.

cyberax · on Nov 25, 2024

> Why would it be transliterated with an Л?

Because it _used_ to be pronounced this way in Polish! "Ł" pronounced as "L" sounds "theatrical" these days, but it was more common in the past.

Muromec · on Nov 24, 2024

It's a general pattern of what russia does to names of places and people, which is aggressively imposing their own cultural paradigm (which follows the more general general pattern). You can look up your civil code provisions around names and ask a question or two of what historical problem they attempt to solve.

aguaviva · on Nov 25, 2024

It's not a Russian-specific thing by any stretch.

This happens all the time when names and loanwords get dragged across linguistic boundaries. Sometimes it results from an attempt to "simplify" the respective spelling and/or sounds (by mapping them into tokens more familiar in the local environment); sometimes there's a more complex process behind it; and other times it just happens for various obscure historical reasons.

And the mangling/degradation definitely happens in both directions: hence Москва → Moscow, Paris → Париж.

In this particular case, it may have been an attempt to transliterate from the original Polish name (Łódź), more "canonically" into Russian. Based on the idea that the Polish Ł (which sounds much closer to an English "w" than to a Russian "в") is logically closer to the Russian "Л" (as this actually makes sense in terms of how the two sounds are formed). And accordingly for the other weird-seeming mappings. Then again it could have just ended up that way for obscure etymological reasons.

Either way, how one can be "irritated as hell" over any of this (other than in some jocular or metaphorical sense) is another matter altogether, which I admit is a bit past me.

aguaviva · on Nov 25, 2024

Correction - it's nothing osbcure at all, but apparently a matter of the shift that accord broadly with the L sound in Polish a few centuries ago (whereby it became "dark" and velarized), affecting a great many other words and names (like słowo, mały, etc). While in parts east and south the "clear" L sound was preserved.

https://en.wikipedia.org/wiki/Ł

int_19h · on Nov 25, 2024

Velarized L is a common phoneme in Slavic languages, inherited from their common ancestor. What makes Polish somewhat unusual is that the pronunciation of velarized L eventually shifted to /w/ pretty much everywhere (a similar process happened in Ukrainian and Belarusian, but only in some contexts).

int_19h · on Nov 25, 2024

Adapting foreign names to phonotactics and/or spelling practices of one's native language is a common practice throughout the world. The city's name is spelled Lodz in Spanish, for example.

cyberax · on Nov 25, 2024

Wait until you hear what Chinese or Japanese languages do with loanwords...

notanote · on Nov 24, 2024

L with stroke is the english name for it according to wikipedia by the way, not my choice of naming. The transliterated version is not great, considering how far removed from the proper pronunciation it is, but I’m sort of used to it. The almost correct one above was jarring enough that I wanted to point it out.

ajsnigrutin · on Nov 24, 2024

Yeah, that'll work great..

https://en.wikipedia.org/wiki/%C4%8Celje

echo "Čelje" | uconv -f "UTF-8" -t "UTF-8" -x "Latin-ASCII"

> "Celje"

https://en.wikipedia.org/wiki/Celje

(i mean... we do have postal numbers just for problems like this, but both Štefan and Stefan are not-so-uncommon male names over here, so are Jozef and Jožef, etc.)

jeroenhd · on Nov 24, 2024

If you're dealing with a bad API that only takes ASCII, "Celje" is usually better than "ÄŒelje" or "蒌elje".

If you have control over the encoding on the input side and on the output side, you should just use UTF-8 or something comparable. If you don't, you have to try to get something useful on the output side.

ajsnigrutin · on Nov 25, 2024

This depends.

Everyone over here would know that "ÄŒelje" (?elje) is either čelje, šelje or želje. Maybe even đelje or ćelje if it's a name or something else. So, special attention would be taken to 'decypher' what was meant here.

But if you see "Celje", you assume it's actually Celje (a much larger city than Čelje) and not one of those variants above. And noone will bother with figuring out if part of a letter is missing, it'll just get sent to Celje.

Muromec · on Nov 24, 2024

Most places where telling Štefan from Stefan is a problem use postal numbers for people too, or/and ask for your DOB.

ajsnigrutin · on Nov 24, 2024

I don't have a problem from differentiatin Štefan from Stefan, 's' and 'š' sound pretty different to everyone around here. But if someone runs that script above and transliterates "š" to "s" it can cause confusion.

And no, we don't use "postal numbers for humans".

Muromec · on Nov 24, 2024

>And no, we don't use "postal numbers for humans".

An email, a phone number, a tax or social security number, demographic identifier, billing/contract number or combination of them.

All of those will help you tell Stefan from Štefan in the most practical situations.

>But if someone runs that script above and transliterates "š" to "s" it can cause confusion.

It's not nice, it will certainly make Štefan unhappy, but it's not like you will debit the money from the wrong account or deliver to a different address or contact the wrong customer because of that.

account42 · on Nov 26, 2024

So? Names are not unique to begin with.

poizan42 · on Nov 24, 2024

Yes, it's easy

    bool ValidateName(string name) => true;

(With the caveat that a name might not be representable in Unicode, in which case I dunno. Use an image format?)

arsome · on Nov 24, 2024

name.Length > 0

is probably pretty safe.

pridkett · on Nov 24, 2024

That only works if you’re concatenating the first and last name fields. Some people have no last name and thus would fail this validation if the system had fields for first and last name.

Macha · on Nov 24, 2024

Honestly I wish we could just abolish first and last name fields and replace them with a single free text name field since there's so many edge cases where first and last is an oversimplification that leads to errors. Unfortunately we have to interact with external systems that themselves insist on first and last name fields, and pushing it to the user to decide which is part of what name is wrong less often than string.split, so we're forced to become part of the problem.

caseyohara · on Nov 24, 2024

I did this in the product where I work. We operate globally so having separate first and last name fields was making less sense. So I merged them into a singular full name field.

The first and only people to complain about that change were our product marketing team, because now they couldn’t “personalize” emails like `Hi <firstname>,`. I had the hardest time convincing them that while the concept of first and last names are common in the west, it is not a universal concept.

So as a compromise, we added a “Preferred Name” field where users can enter their first name or whatever name they prefer to be called. Still better than separate first and last name fields.

cudder · on Nov 25, 2024

I tried this too, and a customer angrily asked why they can't sort their report alphabetically by last name. Sigh.

caseyohara · on Nov 25, 2024

Just split the full name on the space char and take the last value as the last name. Oh wait, some people have multiple last names.

Split on the space and take everything after the first space as the last name. Oh wait, some people have multiple first names.

Merging names is a one-way door, you can't break them apart programmatically. Knowing this, I put a lot of thought into whether it was worth it to merge them.

arkh · on Nov 25, 2024

One field?

Like people have only one name... I like the Human Name from the FHIR standard: https://hl7.org/fhir/datatypes.html#HumanName

People can have many names (depending on usage and of "when", think about marriage) and even if each of those human names can handle multiple parts the "text" field is what you should use to represent the name in UIs.

I encourage people to go check the examples the standards gives, especially the Japanese and Scandinavian ones.

JimDabell · on Nov 25, 2024

It’s not just external systems. In many (most?) places, when sorting by name, you use the family names first, then the given names. So you can’t correctly sort by name unless you split the fields. Having a single field, in this case, is “an oversimplification that leads to errors”.

roywiggins · on Nov 25, 2024

Right, but then you have to know which name is the family name, which really could be any of them.

JimDabell · on Nov 26, 2024

I’m not sure what you’re trying to get at. The field containing the family name is the one labelled “family name”. You don’t have two fields both labelled “name”; there’s no ambiguity.

cluckindan · on Nov 24, 2024

some people have no name at all

exitb · on Nov 24, 2024

Any notable examples apart from young children and Michael Scott that one time?

ndsipa_pomu · on Nov 24, 2024

I've been compiling a list of them:

dvfjsdhgfv · on Nov 24, 2024

You seem to have forgotten quite a few, like

poizan42 · on Nov 24, 2024

See point 40 and 32-36 on Falsehoods programmers believe about names[1]

[1] https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

from-nibly · on Nov 24, 2024

I know that this is trying to be helpful but the snark in this list detracts from the problem.

i80and · on Nov 24, 2024

Whether it's healthy or not, programmers tend to love snark, and that snark has kept this list circulating and hopefully educating for a long time to this very day

tomxor · on Nov 24, 2024

What if my name is

chuckadams · on Nov 24, 2024

Slim Shady?

zarzavat · on Nov 24, 2024

Presumably there aren't any people with control characters in their name, for example.

cobbzilla · on Nov 24, 2024

Watch as someone names themselves the bell character, “^G” (ASCII code 7) [1]

When they meet people, they tell them their name is unpronounceable, it’s the sound of a PC speaker from the late 20th century, but you can call them by their preferred nickname “beep”.

In paper and online forms they are probably forced to go by the name “BEL”.

[1] https://en.wikipedia.org/wiki/Bell_character

emmelaich · on Nov 24, 2024

Or Derek <wood dropping on desk>

https://www.youtube.com/watch?v=hNoS2BU6bbQ

Polizeiposaune · on Nov 24, 2024

The interaction brings to mind Grzegorz Brzęczyszczykiewicz:

https://www.youtube.com/watch?v=AfKZclMWS1U

(from the Polish comedy film "How I Unleashed World War II")

pavel_lishin · on Nov 24, 2024

I thought this was going to be a link to the Key & Peele sketch: https://youtu.be/gODZzSOelss?t=180

Izkata · on Nov 25, 2024

It's not exactly a bell, but there are clicks: https://en.wikipedia.org/wiki/Click_consonant

https://www.reddit.com/r/Damnthatsinteresting/comments/1614k...

RobotToaster · on Nov 25, 2024

I can finally change my name to something that represents my personality: ^G^C

https://en.wikipedia.org/wiki/End-of-Text_character

ValentinA23 · on Nov 24, 2024

คุณ สมชาย

This name, "คุณสมชาย" (Khun Somchai, a common Thai name), appears normal but has a Zero Width Space (U+200B) between "คุณ" (Khun, a title like Mr./Ms.) and "สมชาย" (Somchai, a given name).

In scripts like Thai, Chinese, and Arabic, where words are written without spaces, invisible characters can be inserted to signal word boundaries or provide a hint to text processing systems.

Saigonautica · on Nov 25, 2024

The reminds me of a few Thai colleagues who ended up with a legal first name of "Mr." (period included), probably as a result of this.

Buying them plane tickets to attend meetings and so on proved fairly difficult.

pwdisswordfishz · on Nov 24, 2024

But C0 and C1 control codes are out, probably.

lmm · on Nov 25, 2024

> Presumably there aren't any people with control characters in their name, for example.

Of course there are. If you commit to supporting everything anyone wants to do, people will naturally test the boundaries.

The biggest fallacy programmers believe about names is that getting name support 100% right matters. Real engineers build something that works well enough for enough of the population and ship it, and if that's not US-ASCII only then it's usually pretty close to it.

pwdisswordfishz · on Nov 24, 2024

Or unpaired surrogates. Or unassigned code points. Or fullwidth characters. Or "mathematical bold" characters. Though the latter two should be probably solved with NFKC normalization instead.

chrismorgan · on Nov 25, 2024

> Or unpaired surrogates.

That’s just an invalid Unicode string, then. Unicode strings are sequences of Unicode scalar values, not code points.

> unassigned code points

Ah, the tyranny of Unicode version support. I was going to suggest that it could be reasonable to check all code points are assigned at data ingress time, but then you urgently need to make sure that your ingress system always supports the latest version of Unicode. As soon as some part of the system goes depending on old Unicode tables, some data processing may go wrong!

How about Private Use Area? You could surely reasonably forbid that!

> fullwidth characters

I’m not so comfortable with halfwidth/fullwidth distinctions, but couldn’t fullwidth characters be completely legitimate?

(Yes, I’m happy to call mathematical bold, fraktur, &c. illegitimate for such purposes.)

> solved with NFKC normalization

I’d be very leery of doing this on storage; compatibility normalisations are fine for equivalence testing, things like search and such, but they are lossy, and I’m not confident that the lossiness won’t affect legitimate names. I don’t have anything specific in mind, just a general apprehension.

account42 · on Nov 26, 2024

> > Or unpaired surrogates.

> That’s just an invalid Unicode string, then. Unicode strings are sequences of Unicode scalar values, not code points.

Because surrogates were retrofitted onto UCS-2 to make it into UTF-8, they are both code units and (reserved) code points.

samatman · on Nov 24, 2024

It's safe to reject Cc, Cn, and Cs. You should probably reject Co as well, even though elves can't input their names if you do that.

Don't reject Cf. That's asking for trouble.

chrismorgan · on Nov 25, 2024

Explanation for those not accustomed, based on <https://www.unicode.org/reports/tr44/#GC_Values_Table> (with my own commentary):

Cc: Control, a C0 or C1 control code. (Definitely safe to reject.)

Cn: Unassigned, a reserved unassigned code point or a noncharacter. (Safe to reject if you keep up to date with Unicode versions; but if you don’t stay up to date, you risk blocking legitimate characters defined more recently, for better or for worse. The fixed set of 66 noncharacters are definitely safe to reject.)

Cs: Surrogate, a surrogate code point. (I’d put it stronger: you must reject these, it’s wrong not to.)

Co: Private_Use, a private-use character. (About elf names, I’m guessing samatman is referring to Tolkien’s Tengwar writing system, as assigned in the ConScript Unicode Registry to U+E000–U+E07F. There has long been a concrete proposal for inclusion in Unicode’s Supplementary Multilingual Plane <https://www.unicode.org/roadmaps/smp/>, from time to time it gets bumped along, and since fairly recently the linked spec document is actually on unicode.org, not sure if that means something.)

Cf: Format, a format control character. (See the list at <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[...>. You could reject a large number of these, but some are required by some scripts, such as ZERO-WIDTH NON-JOINER in Indic scripts.)

kijin · on Nov 24, 2024

Challenge accepted, I'll try to put a backspace and a null byte in my firstborn's name. Hope I don't get swatted for crashing the government servers.

eyelidlessness · on Nov 24, 2024

That sounds like a reasonable assumption, but probably not strictly correct.

baruchel · on Nov 24, 2024

Mandatory reference: https://xkcd.com/327/

michaelt · on Nov 25, 2024

There are of course some people who'll point you to a blog post saying no validation is possible.

However, for every 1 user you get whose full legal name is bob@example.com you'll get 100 users who put their e-mail into the name field by accident

And for every 1 user who wants to be called e.e. cummings you'll get 100 who just didn't reach for the shift key and who actually prefer E.E. Cummings. But you'll also get 100 McCarthys and O'Connors and al-Rahmans who don't need their "wrong" capitalisation "fixed" thank you very much.

Certainly, I think you can quite reasonably say a name should be comprised of between 2 and 75 characters, with no newlines, nulls, emojis, leading or trailing spaces, invalid unicode code points, or angle brackets.

crazygringo · on Nov 24, 2024

If you just use the {Alphabetic} Unicode character class (100K code points), together with a space, hyphen, and maybe comma, that might get you close. It includes diacritics.

I'm curious if anyone can think of any other non-alphabetic characters used in legal names around the world, in other scripts?

I wondered about numbers, but the most famous example of that has been overturned:

"Originally named X Æ A-12, the child (whom they call X) had to have his name officially changed to X Æ A-Xii in order to align with California laws regarding birth certificates."

(Of course I'm not saying you should do this. It is fun to wonder though.)

Seb-C · on Nov 24, 2024

> I'm curious if anyone can think of any other non-alphabetic characters used in legal names around the world, in other scripts?

Latin characters are NOT allowed in official names for Japanese citizens. It must be written in Japanese characters only.

For foreigners living in Japan it's quite frequent to end up in a situation where their official name in Latin does not pass the validation rules of many forms online. Issues like forbidden characters, or because it's too long since Japanese names (family name + first name) are typically only 4 characters long.

Also, when you get a visa to Japan, you have to bend and disform the pronunciation of your name to make it fit into the (limited) Japanese syllabary.

Funnily, they even had to register a whole new unicode range at some point, because old administrative documents sometimes contains characters that have been deprecated more than a century ago.

https://ccjktype.fonts.adobe.com/2016/11/hentaigana.html

crazygringo · on Nov 24, 2024

Very interesting about Japan!

To be clear, I wasn't thinking about within a specific country though.

More like, what is the set of all characters that are allowed in legal names across the world?

You know, to eliminate things like emoji, mathematical symbols, and so forth.

Seb-C · on Nov 24, 2024

Ah, I see.

I don't know, but I would bet that the sum of all corner cases and exceptions in the world would make it pretty hard to confidently eliminate any "obvious" characters.

From a technical standpoint, unicode emojis are probably safe to exclude, but on the other hand, some scripts like Chinese characters are fundamentally pictograms, which is semantically not so different than an emoji.

Maybe after centuries of evolution we will end up with a legit universal language based on emojis, and people named with it.

crazygringo · on Nov 24, 2024

Chinese characters are nothing like emoji. They are more akin to syllables. There is no semantic similarity to emoji at all, even if they were originally derived from pictorial representations.

And they belong to the {Alphabetic} Unicode class.

I'm mostly curious if Unicode character classes have already done all the hard work.

account42 · on Nov 26, 2024

I imagine at least Sealand has relatively lax (or at least informal) restrictions.

poizan42 · on Nov 24, 2024

You forgot apostrophe as is common in Irish names like O’Brien.

bloak · on Nov 24, 2024

Yes, though O’Brien is Ó Briain in Irish, according to Wikipedia. I think the apostrophe in Irish names was added by English speakers, perhaps by analogy with "o'clock", perhaps to avoid writing something that would look like an initial.

There are also English names of Norman origin that contain an apostrophe, though the only example I can think of immediately is the fictional d'Urberville.

lmm · on Nov 25, 2024

> I'm curious if anyone can think of any other non-alphabetic characters used in legal names around the world, in other scripts?

Some Japanese names are written with Japanese characters that do not have Unicode codepoints.

(The Unicode consortium claims that these characters are somehow "really" Chinese characters just written in a different font; holders of those names tend to disagree, but somehow the programmer community that would riot if someone suggested that people with ø in their name shouldn't care when it's written as o accepts that kind of thing when it comes to Japanese).

crazygringo · on Nov 25, 2024

Ha, well I don't think we need to worry about validating characters if they can't be typed in a text box in the first place. ;)

But very interesting thanks!

lmm · on Nov 29, 2024

> Ha, well I don't think we need to worry about validating characters if they can't be typed in a text box in the first place. ;)

They are frequently typed in text boxes, any software seriously targeting Japan supports them, you just have to use Shift-JIS (or EUC-JP). So your codebase needs to actually support text encodings rather than just blindly assuming everything is UTF-8.

nicoburns · on Nov 24, 2024

Apostrophe is common in surnames in parts of the world.

golergka · on Nov 24, 2024

דויד Smith (concatenated) will have an LTR control character in the middle

crazygringo · on Nov 24, 2024

Oh that's interesting.

Is that a thing? I've never known of anyone whose legal name used two alphabets that didn't have any overlap in letters at all -- two completely different scripts.

Would a birth certificate allow that? Wouldn't you be expected to transliterate one of them?

golergka · on Nov 25, 2024

I haven't known anyone like that either, but I can imagine how the same person would have name in Hebrew in some Israeli IT system and name in English somewhere else and then have a third system to unexpectedly combine them in some weird way.

shash · on Nov 24, 2024

There’s this individual’s name which involves a clock sound: Nǃxau ǂToma[1]

[1] https://en.m.wikipedia.org/wiki/N%25C7%2583xau_%C7%82Toma

crazygringo · on Nov 24, 2024

Click characters are part of {Alphabetic}!

https://en.wikipedia.org/wiki/Click_consonant

https://www.compart.com/en/unicode/category/Lo

https://stackoverflow.com/a/4843363

kens · on Nov 24, 2024

> There’s this individual’s name which involves a clock sound: Nǃxau ǂToma

I was extremely puzzled until I realized you meant a click sound, not a clock sound. Adding to my confusion, the vintage IBM 1401 computer uses ǂ as a record mark character.

GolDDranks · on Nov 24, 2024

What if one's name is not in alphabetic script? Let's say, "鈴木涼太".

crazygringo · on Nov 24, 2024

That's part of {Alphabetic} in Unicode. It validates.

Mordisquitos · on Nov 25, 2024

> I'm curious if anyone can think of any other non-alphabetic characters used in legal names around the world, in other scripts?

The Catalan name Gal·la is growing in popularity, with currently 1515 women in the census having it as a first name in Spain with an average age of 10.4 years old: https://ine.es/widgets/nombApell/nombApell.shtml

enriquto · on Nov 25, 2024

beautiful map of the Catalan Countries when you search for that name here

jlhwung · on Nov 25, 2024

https://en.wikipedia.org/wiki/Perri_6

gus_massa · on Nov 24, 2024

Comma or apostrophe, like in d'Alembert ?

(And I have 3 in my keyboard, I'm not sure everyone is using the same one.)

ahazred8ta · on Nov 24, 2024

Mrs. Keihanaikukauakahihuliheekahaunaele only had a string length problem, but there are people with a Hawaiian ʻokina in their names. U+02BB

gmuslera · on Nov 24, 2024

You may not want Bobby Tables in your system.

malfist · on Nov 24, 2024

If you're prohibiting valid letters to protect your database because you didn't parametrize your queries, you're solving the problem from the wrong end

account42 · on Nov 26, 2024

This is all well and good until the company looses real money becaus some other system you are interfacing with got compromised because of your attitude and fingers start being pointed. Defense in depth is a thing.

gmuslera · on Nov 25, 2024

There might be more than just 2 ends. And some of them may not be fixable by you.

nkrisc · on Nov 24, 2024

It is if you first provide a complete specification of a “name”. Then you can validate if a name is compliant with your specification.

Muromec · on Nov 24, 2024

It's super easy actually. Name consists of three parts -- Family Name, Given Name and Patronymic, spelled using Ukrainian Cyrillic. You can have a dash in the Family name and apostrophe is part of Cyrillic for this purposes, but no spaces in any of the three. If are unfortunate enough to not use Cyrillic (of our variety) or Patronymics in the country of your origin (why didn't you stay there, anyway), we will fix it for you, mister Нкріск. If you belong to certain ethnic groups who by their custom insist on not using Patronymics, you can have a free pass, but life will be difficult, as not everybody got the memo really. No, you can not use Matronimyc instead of Patronymic, but give us another 30 years of not having a nuclear war with country name starting with "R" and ending in "full of putin slaves si iiia" and we might see to that.

Unless of course the name is not used for official purposes, in which case you can get away with First-Last combination.

It's really a non issue and the answer is jurisdiction bound. In most of Europe extented Latin set is used in place of Cyrillic (because they don't know better), so my name is transliterated for the purposes of being in the uncivilized realms by my own government. No, I can't just use Л and Я as part of my name anywhere here.

GrantMoyer · on Nov 24, 2024

Valid names are those which terminate when run as Python programs.

barryrandall · on Nov 26, 2024

Anything is possible with enough qualifiers and caveats.

majkinetor · on Nov 24, 2024

Sure it is. Context matters. For example, in clone wars.

rsynnott · on Nov 24, 2024

No, but it doesn’t stop people trying.

jtvjan · on Sept 3, 2024

Perhaps some kind of rolling key system could've been used? If the key was rewritten on each successful login, either the attacker would have to use their cloned key immediately (alerting the user), or have their cloned key become useless the moment the user logs in again. This would only work with discoverable credentials, and would increase wear on the device's flash storage.

jtvjan · on June 12, 2024

Same. I suspect they're all blocked in the EU to avoid having to comply with GDPR. It's archived here, though: https://archive.today/D0fQZ

The picture from Ohio's website too: https://web.archive.org/web/20240612173428/https://www.trans...