YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

brainbug

Model Description

This is a fine-tuned CodeT5 model for automatic bug detection and code repair. The model has been trained to identify and fix various types of programming errors in Python code.

Supported Error Types

  • WVAV: Wrong Variable Used in Variable Assignment
  • MLAC: Missing Line After Call
  • WPFV: Wrong Parameter in Function/Method Call
  • And more...

Model Details

  • Base Model: Salesforce/codet5-base
  • Fine-tuned on: Custom bug-fix dataset
  • Task: Code-to-Code generation (bug fixing)
  • Language: Python
  • Model Size: 220M parameters

Usage

from transformers import T5ForConditionalGeneration, RobertaTokenizer

# Load model and tokenizer
model = T5ForConditionalGeneration.from_pretrained("Sagar123x/brainbug")
tokenizer = RobertaTokenizer.from_pretrained("Sagar123x/brainbug")

# Example: Fix buggy code
faulty_code = """
def check_for_file(self, file_path):
    files = self.connection.glob(file_path)
    return len(files) == 1
"""

# Prepare input
input_text = f"Fix WVAV: {faulty_code}"
inputs = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True)

# Generate fix
outputs = model.generate(**inputs, max_length=256, num_beams=5)
fixed_code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(fixed_code)

Training Details

  • Training Epochs: 10
  • Batch Size: 1 (with gradient accumulation)
  • Learning Rate: 3e-5
  • Optimizer: AdamW
  • Hardware: NVIDIA RTX 4050 (6GB)

Performance Metrics

  • Exact Match Accuracy: 2.60%
  • Token-Level Accuracy: 28.52%
  • Average Similarity: 76.75%

Limitations

  • Trained primarily on Python code
  • Best performance on error types seen during training
  • May not handle very long code snippets (>256 tokens)
  • Requires error type specification for optimal results

Citation

@misc{brainbug-codet5,
  author = {Your Name},
  title = {BrainBug: CodeT5 for Automatic Bug Repair},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Sagar123x/brainbug}}
}

License

MIT License

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month
19
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support