HossamEL-Dein/arabic-eou-dataset
Viewer โข Updated โข 5k โข 10
This model detects End-of-Utterance (EOU) in Arabic conversations, specifically optimized for Saudi dialects. It predicts the probability that a speaker has finished their conversational turn based on text transcription.
Use Case: Real-time conversational AI agents (voice assistants, chatbots, customer service)
| Metric | Score |
|---|---|
| Test Accuracy | 99.6% |
| Precision | 100% |
| Recall | 99.45% |
| F1 Score | 99.73% |
| AUC-ROC | 99.96% |
| Inference Time | ~15-20ms |
pip install transformers torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model
model = AutoModelForSequenceClassification.from_pretrained("HossamEL-Dein/arabic-eou-model")
tokenizer = AutoTokenizer.from_pretrained("HossamEL-Dein/arabic-eou-model")
model.eval()
# Predict EOU
text = "ู
ุฑุญุจุง ููู ุญุงูู ุงูููู
"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
eou_probability = probs[0][1].item()
print(f"EOU Probability: {eou_probability:.2%}")
# Output: EOU Probability: 98.56%
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
class EOUDetector:
def __init__(self, threshold=0.7):
self.model = AutoModelForSequenceClassification.from_pretrained("HossamEL-Dein/arabic-eou-model")
self.tokenizer = AutoTokenizer.from_pretrained("HossamEL-Dein/arabic-eou-model")
self.model.eval()
self.threshold = threshold
def check_eou(self, transcript_text):
inputs = self.tokenizer(transcript_text, return_tensors="pt")
with torch.no_grad():
outputs = self.model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
eou_prob = probs[0][1].item()
return {
'probability': eou_prob,
'is_eou': eou_prob > self.threshold
}
# Use in LiveKit agent
detector = EOUDetector()
result = detector.check_eou("ู
ุฑุญุจุง ููู ุญุงูู")
if result['is_eou']:
print("User finished speaking!")
Training dataset available at: HossamEL-Dein/arabic-eou-dataset
@misc{arabic-eou-2024,
author = {HossamEL-Dein},
title = {Arabic End-of-Utterance Detection Model},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/HossamEL-Dein/arabic-eou-model}
}
Apache 2.0
For questions or issues, please open an issue on the model repository.
Base model
aubmindlab/bert-base-arabertv02