MuRIL_WOR
Model Description
MuRIL_WOR is a Telugu sentiment classification model built on MuRIL (Multilingual Representations for Indian Languages), a Transformer-based BERT model specifically designed to support Indian languages, including Telugu and English.
MuRIL is pretrained on a large and diverse corpus of Indian language text, including web data, religious scriptures, and news articles. Unlike general multilingual models such as mBERT or XLM-R, MuRIL is tailored to capture Indian language morphology and syntax more effectively.
The suffix WOR denotes Without Rationale supervision. This model is fine-tuned using only sentiment labels and serves as a label-only baseline without incorporating human-annotated rationales.
Pretraining Details
- Pretraining corpus: Indian language text from web sources, religious texts, and news data
- Training objectives:
- Masked Language Modeling (MLM)
- Translation Language Modeling (TLM)
- Language coverage: 17+ Indian languages, including Telugu and English
Training Data
- Fine-tuning dataset: Telugu-Dataset
- Task: Sentiment classification
- Supervision type: Label-only (no rationale supervision)
Intended Use
This model is intended for:
- Telugu sentiment classification
- Benchmarking Indian-language-focused models
- Baseline comparisons in explainability and rationale-supervision studies
- Analysis of informal, social media, or conversational Telugu text
Due to its Indian-language-centric pretraining, MuRIL_WOR is particularly effective for Telugu sentiment analysis compared to general multilingual models.
Performance Characteristics
MuRIL generally outperforms broad multilingual models such as mBERT and XLM-R on Telugu sentiment classification tasks, especially for informal and conversational text, due to its targeted pretraining.
Strengths
- Strong understanding of Telugu morphology and syntax
- Better performance on informal and web-based Telugu text
- Reliable baseline for Indian-language NLP tasks
Limitations
- Pretraining data favors informal text, which may reduce effectiveness on formal or classical Telugu
- Limited coverage beyond Indian languages
- Does not incorporate rationale supervision
Use as a Baseline
MuRIL_WOR serves as a strong Indian-language-focused baseline for:
- Comparing general multilingual vs. Indian-language-specific models
- Evaluating the impact of rationale supervision (WOR vs. WR)
- Telugu sentiment analysis in low-resource and informal text settings
References
- Khanuja et al., 2021
- Joshi, 2022
- Das et al., 2022
- Rajalakshmi et al., 2023
- Downloads last month
- 3
Model tree for DSL-13-SRMAP/MuRIL_WOR
Base model
google/muril-base-cased