Why Teaching Ai New Languages Begins With Data


Samsung Research in Indonesia is part of a series about the people and innovations behind the democratization of mobile AI



As Samsung continues to pioneer premium mobile AI experiences, we visit Samsung Research centers around the world to learn how Galaxy AI is enabling more users to maximize their potential. Galaxy AI now supports 16 languages, so more people can expand their language capabilities, even when offline, thanks to on-device translation in features such as Live Translate, Interpreter and Note Assist. But what does AI language development involve? This series examines the challenges of working with mobile AI and how we overcame them. First up, we head to Indonesia to learn where one begins teaching AI to speak a new language.



The first step is establishing targets, according to the team at Samsung R&D Institute Indonesia (SRIN). “Great AI begins good quality and relevant data. Each language demands a different way to process this, so we dive deep to understand the linguistic needs and the unique conditions of our country,” says Junaidillah Fadlil, head of AI at SRIN, whose team recently added Bahasa Indonesia (Indonesian language) support to Galaxy AI. “Local language development has to be led by insight and science, so every process for adding languages to Galaxy AI starts with us planning what information we need and can legally and ethically obtain.”

Galaxy AI features such as Live Translate perform three core processes: automatic speech recognition (ASR), neural machine translation (NMT) and text-to-speech (TTS). Each process needs a distinct set of information.



ASR, for instance, needs extensive recordings of speech in numerous environments, each paired with an accurate text transcription. Varying background noise levels help account for different environments. “It’s not enough just to add noises to recordings,” explains Muchlisin Adi Saputra, the team’s ASR lead. “In addition to the language data we obtained from authorized 3rd party partners, we must go out into coffee shops or working environments to record our own voices. This allows us to authentically capture unique sounds from real life, like people calling out or the clattering of keyboards.”



The ever-changing nature of languages must also be considered. Saputra adds: “We need to keep up to date with the latest slang and how it is used, and mostly we find it on social media!”

Next, NMT requires translation training data. “Translating Bahasa Indonesia is challenging,” says Muhamad Faisal, the team’s NMT lead. "Its extensive use of contextual and implicit meanings relies on social and situational cues, so we need numerous translated texts that the AI could reference for new words, foreign words, proper nouns, and idioms – any information that helps AI understand the context and rules of communication.”



TTS then requires recordings that cover a range of voices and tones, with additional context on how parts of words sound in different circumstances. “Good voice recordings could do half the job and cover all the required phonemes (units of sound in speech) for the AI model,” adds Harits Abdurrohman, TTS lead. “If a voice actor did a great job in the earlier phase, the focus shifts to refining the AI model to clearly pronounce specific words.”



Stronger Together
It takes vast resources to plan for much data, and SRIN worked closely with linguistics experts. “This challenge requires creativity, resourcefulness and expertise in both Bahasa Indonesia and machine learning,” Fadlil reflects. “Samsung’s philosophy of open collaboration played a big part in getting the job done, as did our scale of operations and history of AI development.”

Working with other Samsung Research centers around the world, the SRIN team was able to quickly adopt best practices and overcome the complexities of establishing data targets. Furthermore, collaboration was good for advancing not only technology but also culture. When the SRIN team joined their counterparts in Bangalore, India, they observed the local fasting customs, creating deeper connections and expanding their understanding of different cultures.



For the team, Galaxy AI’s language expansion project took on a new significance. “We are particularly proud of our achievements here as this was our first AI project, and it won’t be our last as we continue to refine our models and improve the quality of output,” Fadlil concludes. “This expansion not only reflects our values of openness but also respects and incorporates our cultural identities through language.”



In the next episode of The Learning Curve, we will head to Samsung R&D Institute Jordan to speak to the team who led Galaxy AI’s Arabic language project. Tune in to learn about the complexities of building and training an AI model for a language with diverse dialects.


Artikel ini hanyalah simpanan cache dari url asal penulis yang berkebarangkalian sudah terlalu lama atau sudah dibuang :

https://www.zulyusmar.com/2024/05/why-teaching-ai-new-languages-begins.html

Kempen Promosi dan Iklan
Kami memerlukan jasa baik anda untuk menyokong kempen pengiklanan dalam website kami. Serba sedikit anda telah membantu kami untuk mengekalkan servis percuma aggregating ini kepada semua.

Anda juga boleh memberikan sumbangan anda kepada kami dengan menghubungi kami di sini
Playing A Dangerous Game Here S Why Facebook Removes News Of Pm Anwar Meeting With Hamas Terrorists

Playing A Dangerous Game Here S Why Facebook Removes News Of Pm Anwar Meeting With Hamas Terrorists

papar berkaitan - pada 17/5/2024 - jumlah : 234 hits
Defiant Malaysian Prime Minister Anwar Ibrahim thought he was clever and invincible when he deliberately met with Hamas political bureau chief Ismail Haniyeh during his 3 day visit in Qatar He wanted to send a message to Washington that eve...
Vietjet Gears Up For Peak Summer Travel With 1 4 Million Seats And New International Routes

Vietjet Gears Up For Peak Summer Travel With 1 4 Million Seats And New International Routes

papar berkaitan - pada 29/5/2024 - jumlah : 224 hits
Four new international routes including Phu Quoc Taichung Kaohsiung and Hanoi Melbourne Sydney to be opened in June Vietjet is ramping up its domestic routes by 35 and increasing night flights by 46 to meet the upcoming summer travel surge ...
No Pc Needed Blend Work And Play Effortlessly With This New Lg Myview Monitor

No Pc Needed Blend Work And Play Effortlessly With This New Lg Myview Monitor

papar berkaitan - pada 14/5/2024 - jumlah : 303 hits
Why the LG MyView Could Be the Missing Puzzle Piece to Your Dream LifestyleImagine a monitor that not only responds to your commands but also adapts to your needs enhancing your viewing experience whether you re working hard or unwinding Th...
Why The Us Can T Win The Trade War With China And Shouldn T Try

Why The Us Can T Win The Trade War With China And Shouldn T Try

papar berkaitan - pada 17/5/2024 - jumlah : 320 hits
Allegations about China s manufacturing overcapacity have sparked heated discussions among policymakers During her visit to China in April US Treasury secretary Janet L Yellen argued that when the global market is flooded by artificially ch...
Pas Youth Chief Why Register With Budi After Signing Up For Padu

Pas Youth Chief Why Register With Budi After Signing Up For Padu

papar berkaitan - pada 29/5/2024 - jumlah : 229 hits
PAS Youth chief Afnan Hamimi Taib Azamudden today questioned why those who wish to receive diesel subsidies must register with the Budi Madani Subsidy Aid Programme This is despite Malaysians having been told to register themselves on the C...
Be Consistent With Subsidy Plans Or Risk Eroding Investor Confidence Says Shahril

Be Consistent With Subsidy Plans Or Risk Eroding Investor Confidence Says Shahril

papar berkaitan - pada 18/5/2024 - jumlah : 207 hits
Former Umno information chief Shahril Hamdan said investors will feel unsettled if there are conflicting messages about the government s economic management PETALING JAYA The government s mixed messages over plans for targeted fuel subsidie...
Unlock Level 99 Chinese Mastery Gamifying Hanzi With On Screen Adventures

Unlock Level 99 Chinese Mastery Gamifying Hanzi With On Screen Adventures

papar berkaitan - pada 17/5/2024 - jumlah : 259 hits
For anyone battling to memorize those intricate little logograms let me lay some serious truth the most electrifying study method isn t deadbeat flashcard apps or humdrum classroom drills It s gaming Yes immersing yourself in sprawling open...
Malaysians Urged To Comply With Haj Visa Regulations To Avoid Legal Repercussions

Malaysians Urged To Comply With Haj Visa Regulations To Avoid Legal Repercussions

papar berkaitan - pada 18/5/2024 - jumlah : 223 hits
The Consulate General of Malaysia in Saudi Arabia has urged all Malaysians planning to perform haj this year to ensure they possess a valid haj visa or permit before embarking on their journey to the kingdom It emphasised that it is ready t...
Sah Kes Najib Razak Cacat

Slot Qris Explained The Key To Faster And Safer Gaming Transactions

Rahsia Kawal Gula Dalam Darah Supaya Tak Melompat Lompat Lagi

Tremendous Nadi Collaboration

Kebaikan Rawatan Rendaman Kaki Bersama Garam Bukit Dan Ais Batu

Salam Dalam Salat Jenazah Sekali Atau Dua Kali

10 Praktik Keberlanjutan Yang Wajib Diterapkan Di Tahun 2025

Takwim Cuti Persekolahan Tahun 2025 2026


echo '';
5 Insiden Jalan Sesak Yang Berlaku Lebih 24 Jam Durasinya

Senarai Lagu Tugasan Konsert Minggu 6 Gegar Vaganza 2024 Musim 11

Keputusan Markah Peserta Konsert Minggu 5 Gegar Vaganza 2024 Musim 11

10 Filem Drama Seram Melayu Berhantu Terbaru 2024 2025 Mesti Tonton

One In A Million 2024 Senarai Peserta Juri Format Pemarkahan Hadiah Dan Segala Info Saksikan Live Di TV3 Malaysia Dan Tonton Calpis Soda OIAM


Redha Di Dalam Rumah Tangga

Ioi City Mall Gears Up For A Meletop 2025 With A Star Studded New Year S Eve Celebration

Farhan Mustapha Mutiara Hati Chord

The West Isn T Dying But It S Working On It

Malaysian Muslims Baffled How Having A Pic With Christmas Tree Santa Claus Can Shake Their Faith

Have A Relaxing Day With Friends Over Lunch Or Dinner