Speech Technology · NLP · Low-Resource Languages

Natt Korat

Lecturer-Researcher at CADT developing AI systems for Khmer speech recognition, multilingual NLP, and low-resource language technologies.

Background

About Me

Natt Korat
Natt Korat
Lecturer-Researcher · Speech Technology Unit · CADT, Cambodia

I am a Lecturer-Researcher at the Cambodia Academy of Digital Technology (CADT), where I work on speech and language technologies for low-resource languages. My research centres on Khmer automatic speech recognition, speech data processing, and multilingual natural language processing.

Education
M.S. in Computer Science
Completed January 2026
B.S. in Computer Science
AGA Institute
2023
B.A. in Psychology
Royal University of Phnom Penh
2022
5
Publications
4
Projects
3
Degrees
🇰🇭
Focus Language
Research Interests
🎙️ Speech Recognition (ASR) 🔬 Self-Supervised Learning ⚡ Parameter-Efficient Fine-Tuning 🌏 Low-Resource Languages 🇰🇭 Khmer NLP 📰 Information Extraction 🏥 AI for Public Health 🔉 Speech Data Processing 🤗 Transformer Models 🌐 Multilingual NLP

Research Output

Publications

Journal 2025
Khmer News Classification in Low-Resource Settings: A Comparative Analysis of Embedding Methods
Korat, N., Heang, S., & Lay, V.
Journal on Information Technologies & Communications
DOI
Conference 2025
Epidemic Event Extraction from News Media Using Large Language Models
Korat, N., Kak, S., Lay, V., Uraiwan, B., & Waranrach, V.
International Conference on Digital Economy and Fintech Innovation (DEFI 2025)
DIO
Conference 2025
Building a Khmer NER Benchmark from Health News Data Towards Event Extraction
Chiep, C., Korat, N., Lay, V., & Ly, R.
ASEAN Conference on Emerging Technology (ACET 2025)
HAL
Poster 2025
Toward Multilingual Epidemic Event Extraction for Low-Resource Languages
Korat, N., Waranrach, V., & Kak, S.
UEC International Seminar Poster Session
Workshop 2024
Robust and Efficient Recognition of Khmer License Plates Using YOLOv5 and Parseq OCR
Kong, P., Korat, N., & Veng, P.
CV4DC Workshop, Asian Conference on Computer Vision (ACCV 2024)

Work in Progress

Featured Projects

🎙️
Khmer Speech Technology
End-to-end Khmer ASR systems built on XLS-R 300M with CTC decoding, speaker diarization (pyannote), and INT8 quantization via ONNX Runtime for low-latency clinical transcription.
Wav2Vec2 / XLS-R CTC Diarization ONNX
🔊
Speech Dataset Pipeline
Automated pipeline for collecting and processing Khmer speech data from broadcast media and YouTube, with text normalization, forced alignment, and PyCTCDecode + KenLM beam-search decoding.
Data Collection KenLM Forced Alignment
🦠
Event Extraction System
AI pipeline that extracts epidemic-related events from multilingual news and social media via transformer models and LLMs, generating structured public health intelligence from unstructured text.
NER LLM Multilingual Event Extraction

Get in Touch

Contact

I'm open to research collaborations, academic discussions on speech technology and low-resource NLP. Feel free to reach out through any of the channels below.

Send a Message

Fill in below and your default email client will open with a pre-filled draft.

Opens your local mail app (Gmail, Outlook, etc.)