CBC Edition

How AI can help Indigenous language revitaliza­tion, and why data sovereignt­y is important

- Candace Maracle

Indigenous language ex‐ perts working in computer science say Artificial Intelli‐ gence is a useful tool in language revitaliza­tion but communitie­s must priori‐ tize the ownership of their data.

"It's just going to be like a pencil. It's useful but it's not going to save our language," said Michael Running Wolf, a former engineer for Ama‐ zon's Alexa and co-founder of Lakota AI Code camp, a sum‐ mer program for high school students where they gain ex‐ perience developing mobile apps that incorporat­e Indige‐ nous knowledge and methods.

Running Wolf is Lakota, Cheyenne and Blackfeet and grew up on the Northern Cheyenne reservatio­n in Montana. Despite the lowtech home he was raised in often without running water or electricit­y - his mother, who engineered microchips for Hewlett-Packard, taught him math and physics by kerosene lamp.

"It was their [parents] per‐ spective that technology was not incompatib­le with Indige‐ nous ways of knowing," he said.

This, along with being sur‐ rounded by speakers of his traditiona­l language while growing up, encouraged his current work using AI to help support Indigenous language revitaliza­tion.

There are limitation­s, Run‐ ning Wolf said, like sparse da‐ ta and the polysynthe­tic nature of many Indigenous languages.

An efficient AI, for exam‐ ple, can take 50,000 hours of English to create automatic speech recognitio­n. Most In‐ digenous languages have so few speakers there is insuffi‐ cient data to train AI, he said, and AI cannot recognize or understand things it's never seen before and requires in‐ formation to replicate.

Also, languages such as Cheyenne and Blackfeet are polysynthe­tic and fusional, meaning prefixes and suffix‐ es blend into words so the roots are not apparent.

He said he intends to overcome these limitation­s by working with communitie­s to develop a manageable da‐ ta set that will train AI.

"We generate 500 phrases in Makah and Kwak'wala, de‐ fined by the community and also the rules of the lan‐ guage, obviously, and we trained the AI to recognize those 500 phrases and those 500 phrases are used in cur‐ ricula," he said.

"So the goal here is that, when they go to a classroom, they get their exercise in per‐ son and then they can go home and practise using the AI."

Running Wolf emphasized the importance of the com‐ munity's agency in their lan‐ guage revitaliza­tion, particu‐ larly when it comes to AI.

"We have to have our own engineers. We need to have our own computer scientists using the software … We need to have sovereignt­y over our own data, set the terms and that's the only way to build this AI," he said.

He pointed to a recent dispute between the Stand‐ ing Rock Sioux and a corpo‐ ration that copyrighte­d their language materials. He said he's mindful of the reciprocal nature of giving back to the communitie­s he's working with and said they're working with lawyers to create con‐ tracts that ensure any data collected remains with the community.

Running Wolf said AI re‐ quires a lot of data to up‐ grade and there are com‐ panies and academics who want a community's data be‐ cause there's potential revenue to sell to companies like Google, Microsoft and Meta.

'Sweat equity'

Robbie Jimerson, who has a

PhD in computing and infor‐ mation sciences and speaks Seneca, developed a Seneca and Oneida speech recogni‐ tion system as part of his dis‐ sertation. He agrees that people need to be guardians of their own data and said he is grateful for his time spent listening to the first language speakers in his community.

"To me, there's nothing better than, you know, hav‐ ing a conversati­on in Seneca with somebody," Jimerson said.

"There was a lot of sweat equity that went into it to train these models. You need a data set, right? So who's going to create those data sets…. For me, being a speaker, I was able to do both of those things."

 ?? ??

Newspapers in English

Newspapers from Canada