30–31 May 2025
Sibiu, Romania
Europe/Bucharest timezone

Algorithmic Approaches to Code-Switching: Detecting English Lexical Borrowings in Turkish Social Media Texts

30 May 2025, 12:10
20m
COMPA Room (Mercure Sibiu Arsenal)

COMPA Room

Mercure Sibiu Arsenal

https://meet.google.com/qbp-itbr-fic
On-site Digital Economy and Management Session 1A

Speakers

Mr Cagri Demirci (Erciyes University) Esra Kahya Ozyirmidokuz (Erciyes University)

Description

Digital Economy and Management is an aspect that explores how digital technologies, online platforms, and data-driven practices influence economic behavior and management strategies. It examines how digital communication, social structures, and cultural norms intersect with economic actions, particularly in a globalized environment. This study analyzes the usage of English-derived words in Turkish social media texts within the framework of code-mixing, focusing on platforms such as YouTube and Twitter. Situated within the context of digital economy and management, this research investigates how code-mixing functions in digital communication and its role in shaping socio-economic interactions within the digital landscape. The study utilizes Natural Language Processing (NLP) techniques and modern language models to explore the frequency, context, and social-cultural implications of this linguistic phenomenon.

Data collection was carried out using two primary sources: YouTube and Twitter. For YouTube, the YouTube API was used to retrieve video transcripts by querying specific keywords and hashtags (e.g., #codeMixing, #Türkçeİngilizce). For Twitter, pre-existing datasets were used, filtered by specific keywords and hashtags to identify posts featuring Turkish-English code-mixing. Real-time data collection through the Twitter API was attempted, but limitations on the platform restricted the volume of data.

The raw data underwent several preprocessing steps: cleaning to remove unnecessary characters, stop words removal, tokenization, and lowercasing to ensure consistency. Various NLP techniques were applied to the preprocessed data, including frequency analysis, n-gram analysis, and language identification algorithms to detect code-mixed instances.

BERT (Bidirectional Encoder Representations from Transformers), a modern language model, was employed to detect code-mixed instances and analyze the contextual meaning of English-Turkish interactions. BERT’s ability to capture complex linguistic phenomena made it crucial in understanding code-mixing.

Findings revealed that English-derived words were frequently used in Turkish social media texts, serving communicative and identity-expressive functions. Code-mixing reflected social and cultural motivations, such as signaling cultural alignment. However, it also posed challenges for NLP systems, which struggled to handle mixed-language data. Suggestions for overcoming these challenges include adapting multilingual models.

This study provides valuable insights into the role of code-mixing in digital communication, examining both linguistic patterns and social implications. The integration of advanced language models like BERT enhances understanding of code-mixing and highlights its socio-economic impact on digital communication. The research contributes to both theoretical frameworks on language contact and practical advancements in NLP technologies for multilingual contexts.

Primary authors

Mr Cagri Demirci (Erciyes University) Esra Kahya Ozyirmidokuz (Erciyes University)

Presentation materials

There are no materials yet.