YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

brainbug

Model Description

This is a fine-tuned CodeT5 model for automatic bug detection and code repair. The model has been trained to identify and fix various types of programming errors in Python code.

Supported Error Types

WVAV: Wrong Variable Used in Variable Assignment
MLAC: Missing Line After Call
WPFV: Wrong Parameter in Function/Method Call
And more...

Model Details

Base Model: Salesforce/codet5-base
Fine-tuned on: Custom bug-fix dataset
Task: Code-to-Code generation (bug fixing)
Language: Python
Model Size: 220M parameters

Usage

from transformers import T5ForConditionalGeneration, RobertaTokenizer

# Load model and tokenizer
model = T5ForConditionalGeneration.from_pretrained("Sagar123x/brainbug")
tokenizer = RobertaTokenizer.from_pretrained("Sagar123x/brainbug")

# Example: Fix buggy code
faulty_code = """
def check_for_file(self, file_path):
    files = self.connection.glob(file_path)
    return len(files) == 1
"""

# Prepare input
input_text = f"Fix WVAV: {faulty_code}"
inputs = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True)

# Generate fix
outputs = model.generate(**inputs, max_length=256, num_beams=5)
fixed_code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(fixed_code)

Training Details

Training Epochs: 10
Batch Size: 1 (with gradient accumulation)
Learning Rate: 3e-5
Optimizer: AdamW
Hardware: NVIDIA RTX 4050 (6GB)

Performance Metrics

Exact Match Accuracy: 2.60%
Token-Level Accuracy: 28.52%
Average Similarity: 76.75%

Limitations

Trained primarily on Python code
Best performance on error types seen during training
May not handle very long code snippets (>256 tokens)
Requires error type specification for optimal results

Citation

@misc{brainbug-codet5,
  author = {Your Name},
  title = {BrainBug: CodeT5 for Automatic Bug Repair},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Sagar123x/brainbug}}
}

License

MIT License

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month: 19

Safetensors

Model size

0.2B params

Tensor type

F32