Hacker Newsnew | past | comments | ask | show | jobs | submit | laserduck's commentslogin

I wonder why they grouped languages from the Middle East and South Asia together. Arabic and Hebrew are Semitic languages - no language from that family tree is native to the subcontinent. It would make sense if northern languages like Hindi, Urdu, Bengali, Nepali, etc were grouped with Persian, French, Russian, etc since those are all from the Indo-European family. South Indian languages like Telugu and Tamil are from a completely different family (Dravidian).

Why not either train the model exclusively on Semitic languages for further performance for those languages or on a wider set of languages for better multilingual performance overall? I don't understand the logic here.


There are a far greater number of speakers of Arabic in Germany (1.4M) [1] than in Afghanistan (420K) [2].

So properly speaking, they should be advertising the target region as Europe, Middle East and Africa. [3]

[1] https://en.wikipedia.org/wiki/Languages_of_Germany [2] https://en.wikipedia.org/wiki/Languages_of_Afghanistan [3] https://en.wikipedia.org/wiki/List_of_countries_and_territor...


There is a lot of Indian laborers in the Middle East, so it’s not that Tamil and Arabic are related, but a model used for that region should be fluent in both


Not sure if you've been to the middle east, but there's no way the labourers will have access to the internet besides their phones. And those phones can only be used to communicate with their loved ones back home, using Whatsapp.

They don't (or rather CAN) care about anything else in the world.

They have a lot more problems than "this model doesn't convert urdu to arabic well".


WhatsApp is a gateway to many things. Meta's Ai is on WhatsApp. Openai is on WhatsApp https://www.hindustantimes.com/technology/whatsapp-users-can... and I would really expect this model to have a WhatsApp gateway as well.

> They have a lot more problems than "this model doesn't convert urdu to arabic well".

I get what you mean, but I'm not sure what point you're trying to make. That they're a lost cause with too many problems and we shouldn't care about that use case? Why wouldn't we want to create models to provide more capabilities / information regardless?


they still need to communicate with local administration via "citizen app"(whatever it is called) to access any service, pay their fines, etc...(and be tracked) I'm guessing the stake holder in this project is the government of qatar


I'm Malayalam from Kerala state was not first if cultural exchange was the metric. ME natives often ask if somebody is from Kerala or (rest of) India. Malabar traded with Middle East for millenia (now cash crops, trades and skilled laborers, medical tourism) and Malayalam loans many words from Arabic and there is an Arabic script for Malayalam.


There's going to be diminishing returns in splitting the languages where you get less information related to the region / concept just because you're avoiding mixing languages. The language was not the only aspect: "cultural background, and in-depth regional knowledge". There's going to be lots of information shared in south/North languages just because of the geographically close (relatively anyway) distance.

I mean you wouldn't want to split a model into 3 separate ones, where one contains Austrian, another Slovakian, and another Hungarian, since there's going to be lots of cultural overlap.


I agree that it makes sense to group the Indic languages together due to cultural proximity but why would you group the Indic languages with Middle Eastern ones? Might as well group it with European or African or Sinitic languages at that point.


> I wonder why they grouped languages from the Middle East and South Asia together

Geography


Thank you so much! Do you know they would be willing to work with people who are still early in the prototyping process? Will they only help with PCB design or can they also help with casing, etc?

Really appreciate your response!


I love this channel! The videos are well made narratively while still preseving the facts and citing sources. Almost the ideal combo of being academic and entertaining


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: