Twitter Gets Help from SLU Prof on How to Deal With Indigenous Tweeters


If you're one of the five remaining speakers of "Yuchi" -- a near-extinct Native American language in Oklahoma -- your tweets will look insane, even to those within your linguistic group.

That's because the "@" character is part of your alphabet, so whenever you type it in, Twitter will wrongly think you're using Twitterese to refer to a different user, such as @Joe_Smith. 

This is the kind of programming problem that Twitter is coming across more and more as it tries to make inroads where minority languages hold sway. And it's exactly the kind of problem that a computational linguist such as Professor Kevin Scannell of St. Louis University is equipped to solve.

Since October, Scannell -- on sabbatical from SLU's Department of Math and Computer Science -- has been flying out to Twitter's headquarters in San Francisco one week per month to consult with their international team on stuff like this. Or how about this one:

When folks label their tweets with "hashtags," they type "#"  then add text that flows to the right, as in #OccupyWallStreet. But what about Arabic, which flows in the opposite direction? Or what if, in the middle of a tweet in Arabic, the user wants to write "Hilary Clinton"?

Scannell was a member of the Twitter team that rewrote the code to handle such linguistic miscegenation.

These people tweet in Hatian Creole
  • These people tweet in Hatian Creole
The California web company discovered Scannell last summer when they became aware of his pet project, a website called "Indigenous Tweets." The site uses automated processes to trawl the vast ocean of Twitter for obscure tongues. It then groups those users together and tracks their usage.

About a week ago, Scannell was surprised to see his site mentioned by The Economist.

Daily RFT called him and wanted to know: Why did he start the site in the first place?

"There was a personal aspect to the work," says Scannell, who in addition to his native English also speaks Gaelic, used by only 20,000 people or so in western Ireland. "One of the things we've been encouraging in Irish is the use of social media, but on Twitter, we were having trouble finding other speakers. So this was me personally trying to find other people who spoke my language. Then, that approach to Irish we took to other languages."

Scannell's site is now tracking 129 indigenous languages on Twitter. The five most common, by number of users, are:

1) Hatian Creole (14,259 users)
2) Basque (7,063 users)
3) Welsh (4,808 users)
4) Irish Gaelic (2,712 users)
5) Frisian (2,034 users)

Of course, at the other end of the list are 28 languages with a only one lonely tweeter, such as Gamilaraay (in Southeastern Australia) and Wayuunaikai (in northeastern Colombia).

But Scannell says there are plenty of indigenous languages on Twitter he hasn't even tracked yet, including Yuchi, the language we mentioned first that uses the "@" in its alphabet. (Yuchi does boast at least one tweeter).

"I mentioned [Yuchi] to the people at Twitter yesterday," says Scannell, who has just returned from a trip out west. "I jokingly said they should change the way they do user names just to accommodate the Yuchi community."

He concludes: "I don't think they're gonna do it."

Support Local Journalism.
Join the Riverfront Times Press Club

Local journalism is information. Information is power. And we believe everyone deserves access to accurate independent coverage of their community and state. Our readers helped us continue this coverage in 2020, and we are so grateful for the support.

Help us keep this coverage going in 2021. Whether it's a one-time acknowledgement of this article or an ongoing membership pledge, your support goes to local-based reporting from our small but mighty team.

Join the Riverfront Times Club for as little as $5 a month.