June 3, 2026

"After testing six AI models, the researchers found consistent favoritism for words coming from Latin and French over those with Germanic etymologies..."

"... even more than you would typically encounter in the English language. This bias appears rooted in the preference-learning stage, when the models are trained to align with human expectations about language. This process poses an inescapable problem: that you need real people to make sure the machine is aligned, but the human workers are ironically biased as well. As annotators click through sample texts, for example, they are probably subconsciously disposed to approve those that sound confident and incisive. This new finding could help explain why large language models like ChatGPT and Claude seem to have a distinctive writing style. Previous research has found that AI chatbots tend to overuse words like 'meticulous' and 'commendable,' creating a kind of linguistic uncanny valley that sounds similar to how you speak, but ever so slightly off. Perhaps the ghosts of Latin and French haunted these words during preference learning, leading human workers to reward more prestigious-sounding sentences. Of course, the Germanic versus Romance distinction is a simplification of a messier etymological reality. The notoriously overrepresented word 'delve' is actually Old English in origin...."

Writes Adam Aleksic, author of "Algospeak: How Social Media Is Transforming the Future of Language," in "Do these words make you sound smarter? The bias is spreading. English speakers love the Romance vocabulary. AI noticed" (WaPo).

The "Germanic versus Romance distinction" = "We’ll use more Latin terms when we want to speak formally or authoritatively; we’ll use Germanic words to sound crass or casual." That's how Aleksic puts it.

That made me think about what Jorge Luis Borges said about English: English is a "far finer language than Spanish," and one reason is that "English is both a Germanic and Latin language."

"For any idea, you have 2 words. Those words will not mean exactly the same. For example, if I say 'regal,' that is not exactly the same thing as saying 'kingly.' Or if I say 'fraternal,' that's not the same thing as saying 'brotherly.' Or 'dark' and 'obscure' — those words are different. It will make all the difference speaking, for example, of a Holy Spirit. It will make all the difference in the world in a poem if I wrote about the Holy Spirit or the Holy Ghost, since 'ghost' is a fine dark Saxon word, while 'spirit' is light —it's the Latin word...."

30 comments:

Enigma said...

Alternate explanation: The AI firms pay low wages to human raters in Latin-based countries who prefer the Latin words of their first language over Germanic words. Not subconscious, rather, they just hire cheap overseas labor.

Garbage in, garbage out.

RideSpaceMountain said...

"We’ll use more Latin terms when we want to speak formally or authoritatively; we’ll use Germanic words to sound crass or casual."

Nothing new. Been that way at least since Harold took one of William's arrows to the eye at Hastings, so definitely not an AI hallucination.

Earnest Prole said...

Shorter version: The words of elites face off against the words of ordinary people. Two enter, one leaves.

Aggie said...

That's a wonderful clip.

Mary Beth said...

In that viral video where students had trouble reading some words, I believe the ones they had trouble with were all of Romance origin. (My impression from my memory of watching it. I didn't check the etymology and can't even remember what most of the words are other than silhouette.)

Romance words may make you sound more sophisticated, but effective communication, or straight talking, needs Germanic ones too. (I almost say "requires".)

Ampersand said...

English is a language of a people many times conquered. In 50 years it will have incorporated elements of Arabic, Farsi, and Turkish.

Jaq said...

These Turing tests are getting harder and harder.

Jaq said...

I am constantly amazed at the steady drumbeat of Muslim hatred that appears in almost every thread here. I guess that they must be inferior and undeserving because they are sitting on so much oil that we want. I bet if we stopped bombing them and allowed them to prosper in their own countries, they might not show up here so angry.

Mary Beth said...

English is a language of a people many times conquered. In 50 years it will have incorporated elements of Arabic, Farsi, and Turkish.

It already has a lot of Arabic origin words. Alcohol, coffee and candy, come from Arabic. Pajamas and khaki come from Farsi (Persian, really). I can't think of any Turkish right now.

It's less a matter of being conquered or conquering, English is a language that sees a word it likes and says, "that's mine now". It's like an obsessive collector.

Where's Professor Drout? He needs to comment on this thread.

Eva Marie said...

Jaq said “I am constantly amazed at the steady drumbeat of Muslim hatred that appears in almost every thread here. I guess that they must be inferior and undeserving because they are sitting on so much oil that we want. I bet if we stopped bombing them and allowed them to prosper in their own countries, they might not show up here so angry.”
Where is the anti Muslim hatred in this thread?
To each his own however - I can’t stand virtue signaling.

mikee said...

Language differences? Give me roast beef or steak, leg of mutton or lamb chops, pork belly or ham. It is all good. Where to go for lunch today?

And Jaq, if you are amazed at the Muslim hatred here, you should take a look at the hatred by Muslims - even those living far, far away from any oil - the hatred by Muslims for all those not submitting to their theocractic governance. Makes the Muslim "hatred" here look downright friendly. Must be something in the water where they live, or else they're subjects to an oppressive authoritarian religious and political belief. On second thought, probably not the water.

Smilin' Jack said...

“Previous research has found that AI chatbots tend to overuse words like 'meticulous' and 'commendable,' creating a kind of linguistic uncanny valley that sounds similar to how you speak, but ever so slightly off.”

Or maybe it’s just better. AI will teach us to talk good.

Narr said...

From birth, Muzzies are taught hatred and contempt for all things non-Islamic. It always surprises me how few educated Westerners really grasp this.

As for the English tongue, Bryson put it well (paraphrased from memory)--French prides itself on 'purity,' but English will make it with anyone, anytime, in dark alleys.



MadTownGuy said...

". It will make all the difference speaking, for example, of a Holy Spirit. It will make all the difference in the world in a poem if I wrote about the Holy Spirit or the Holy Ghost, since 'ghost' is a fine dark Saxon word, while 'spirit' is light —it's the Latin word...."

It's not just the etymology. It's the history of usage. "Ghost" in literature and ordinary speech (excluding, for now, ecclesiology) carries connotations that make it different from "spirit."

Aggie said...

"...I am constantly amazed at the steady drumbeat of Muslim hatred that appears in almost every thread here. ..."

Says the man that has never dwelled in a Muslim country. Am I right or wrong? If I'm wrong - are you a Muslim? I'd estimate that 98% of the negative things I've read here, have been in reaction, not because of unsupported prejudice.

I've found that Muslims are a lot like Democrats when it comes to presenting outward solidarity, no matter what the accusation is. There are no outwardly moderate Muslims. That problem - genuine moderation - is handled from within, until conformity is reestablished. Like Democrats do.

Smilin' Jack said...

“"For any idea, you have 2 words. Those words will not mean exactly the same. For example, if I say 'regal,' that is not exactly the same thing as saying 'kingly.'”

It’s the same with French. They use the same word, “ciel”, for both “heaven” and “sky”.

I don’t understand why all those foreigners don’t just speak English. If it was good enough for Jesus Christ it should be good enough for anyone.

Smilin' Jack said...

". It will make all the difference speaking, for example, of a Holy Spirit. It will make all the difference in the world in a poem if I wrote about the Holy Spirit or the Holy Ghost, since 'ghost' is a fine dark Saxon word, while 'spirit' is light —it's the Latin word...."

I remember when Catholics prayed to the Father, Son, and Holy Ghost. Now it’s Holy Spirit. Gives it a lighter touch—sort of like school spirit.

Jupiter said...

"Or 'dark' and 'obscure' — those words are different."

Interesting. To a native English-speaker, those words are so different they mean completely different things. But in Spanish, the dark form of Bohemia beer is called "Bohemia Oscura".

narciso said...

Have you read memri righf from the horses mouth

Jupiter said...

"The "Germanic versus Romance distinction" = "We’ll use more Latin terms when we want to speak formally or authoritatively; we’ll use Germanic words to sound crass or casual."

We'll use latinate terms when we want to sound like the guys who won in 1066.

Christopher B said...

My understanding, and what I've heard in general terms from several linguists, is that English has a lot of almost synonyms left over from the Norman French conquest of England. Thus we have a high-class Latin-based word like 'beef' for what the nobles ate and the Old English (nee Germanic) peasant word 'cow' for what the lower-class cared for. I think we still react in the same way to the Latin/Germanic sound of words and it's not surprise that AI trained on our language picks up on the same distinction. AI's 'uncanny valley' of consistently favoring Latin derived words would seem to me to be related to the 'second mention' rule of occasionally switching between the two forms in order to maintain reader interest.

Lazarus said...

Borges's lectures on English literature are a good read. He approached the subject as someone who'd been reading on his own for about 60 years and wasn't bound by academic fads and fashions.

Moloch! Solitude! Filth! Ugliness! Ashcans and unobtainable dollars!

Happy 100th birthday, Allen Ginsberg.

Lazarus said...

Borges called attention to Shakespeare's mixing Germanic and Latinate words:

"Will all great Neptune's ocean wash this blood
Clean from my hand? No, this my hand will rather
The multitudinous seas incarnadine,
Making the green one red."

Bob said...

AI lies and exaggerates because it was "taught" by reading humans - who lie and exaggerate.

X. P. Callahan said...

This whole discussion about AI is quietly devastating.

mccullough said...

schadenfreude and doppelgänger and sitzpinkler trump any Romance language favorite words.

RideSpaceMountain said...

With this whole Henry Nowak situation in the UK going on I commented elsewhere yesterday that 'at least Brits should be thankful they're not speaking German'. It's clearly devolved far beyond what harsh Saxon curses can resolve, and they'd be arrested for anyway.

Sad really, but hey...AT LEAST THEY'RE NOT SPEAKING GERMAN!

Eva Marie said...

I like AI speak and I get a kick out of using AI words when I speak and write: delve, underscore, amplify, great question, and I hope this helps. Other people cosplay, I try to costalk. O yeah also tapestry and landscape.

Amexpat said...

Educated speakers favor words with a Romance language origin over a Germanic origin and AI favors educated speech.

I had a long conversation with Grok about this in response to it's using "they" as a singular pronoun when the majority of English users don't do so. It "admitted" that it had a bias towards academic/educated speech over language the general population uses from the sources it gave

loudogblog said...

A friend of mine, who had a strong Spanish accent, once asked me how he could sound more "American" when he talked because people had trouble understanding him. I told him that he was trying to make English sound too pretty. English is a very harsh sounding language compared to Spanish. It's a guttural language.

I actually took Latin in high school, but the problem with Latin is that it is not a spoken language anymore. Scholars have estimated what it might have sounded like based on estimating the evolution of romance languages going back in time, but they still don't know what it really sounded like during the height of the Roman Empire.

Post a Comment

Please use the comments forum to respond to the post. Don't fight with each other. Be substantive... or interesting... or funny. Comments should go up immediately... unless you're commenting on a post older than 4 days. Then you have to wait for us to moderate you through. It's also possible to get shunted into spam by the machine. We try to keep an eye on that and release the miscaught good stuff. We do delete some comments, but not for viewpoint... for bad faith.