30–31 May 2025
Sibiu, Romania
Europe/Bucharest timezone

User-Centered Evaluation of Turkish Language Accessibility in Digital Keyboard Systems

30 May 2025, 11:50
20m
COMPA Room (Mercure Sibiu Arsenal)

COMPA Room

Mercure Sibiu Arsenal

https://meet.google.com/qbp-itbr-fic
On-site Digital Economy and Management Session 1A

Speakers

Esra Kahya Ozyirmidokuz (Erciyes University)Mr Cagri Demirci (Erciyes University)

Description

In the context of the digital economy, language accessibility has become a vital element of equitable participation in communication, digital engagement, and the preservation of linguistic diversity. As digital platforms evolve into major channels of interaction, the availability—or lack—of virtual keyboards significantly affects the usability of many languages. While globally dominant languages are well-supported, many minority and regional languages—such as Turkish—continue to face accessibility challenges.

This study explores Turkish language accessibility in digital keyboard systems. User-generated content was collected from platforms like Ekşi Sözlük, Reddit, YouTube, and forums. The initial dataset included 64 unique entries with unstructured textual feedback and timestamps. These comments addressed Turkish character support, keyboard layouts (F/Q), mobile typing usability, and multilingual input.

A hybrid methodology combining expert labeling and computational analysis was used. Experts first reviewed and labeled the dataset thematically. Preprocessing involved lowercasing, removing URLs, punctuation, and numeric symbols, normalizing whitespace, and preserving Turkish-specific characters (ç, ğ, ı, ö, ş, ü). To improve dataset variety and robustness, data augmentation techniques were used: random word shuffling, synonym replacement, and word deletion. This expanded the dataset to 141 entries.

TF-IDF vectorization and Non-negative Matrix Factorization (NMF) were applied to extract topics. Five topics were identified: (1) Turkish Character Input Issues, (2) Unclear/Off-topic or Social Commentary, (3) Multilingual Keyboard Concerns, (4) Typing Experience and Personal Sentiment, and (5) General Complaints / Usability Frustrations. Each comment was matched to a topic to create a labeled dataset.

To evaluate topic prediction, machine learning models—Logistic Regression, Random Forest, and Linear Support Vector Machine (SVM)—were trained using an 80/20 stratified split. Linear SVM and Logistic Regression both achieved 93.1% accuracy, while Random Forest reached 79.3%. These results demonstrate the feasibility of using lightweight machine learning for automatic topic classification in digital language accessibility feedback.

This hybrid approach supports scalable and adaptive analysis of linguistic accessibility, and may be applied to other structurally complex or low-resource languages.

Primary authors

Esra Kahya Ozyirmidokuz (Erciyes University) Dr Eduard Stoica (Lucian Blaga University of Sibiu) Mr Cagri Demirci (Erciyes University)

Presentation materials

There are no materials yet.