Update README.md (#4)
Browse files- Update README.md (02b03a51809ad841d8287a433f01329f94b69d66)
Co-authored-by: Blake S <[email protected]>
README.md
CHANGED
|
@@ -229,3 +229,6 @@ We evaluate the model with three of the most popular math benchmarks where the s
|
|
| 229 |
- Math-500: This benchmark consists of 500 challenging math problems designed to test the model's ability to perform complex mathematical reasoning and problem-solving.
|
| 230 |
- AIME 2024: The American Invitational Mathematics Examination (AIME) is a highly regarded math competition that features a series of difficult problems aimed at assessing advanced mathematical skills and logical reasoning.
|
| 231 |
- GPQA Diamond: The Graduate-Level Google-Proof Q&A (GPQA) Diamond benchmark focuses on evaluating the model's ability to understand and solve a wide range of mathematical questions, including both straightforward calculations and more intricate problem-solving tasks.
|
|
|
|
|
|
|
|
|
|
|
|
| 229 |
- Math-500: This benchmark consists of 500 challenging math problems designed to test the model's ability to perform complex mathematical reasoning and problem-solving.
|
| 230 |
- AIME 2024: The American Invitational Mathematics Examination (AIME) is a highly regarded math competition that features a series of difficult problems aimed at assessing advanced mathematical skills and logical reasoning.
|
| 231 |
- GPQA Diamond: The Graduate-Level Google-Proof Q&A (GPQA) Diamond benchmark focuses on evaluating the model's ability to understand and solve a wide range of mathematical questions, including both straightforward calculations and more intricate problem-solving tasks.
|
| 232 |
+
|
| 233 |
+
## Data Summary
|
| 234 |
+
https://huggingface.co/microsoft/Phi-4-mini-reasoning/blob/main/data_summary_card.md
|