gpt2moe_het_1000mb
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 3.6910
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 36996
- training_steps: 369967
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0 | 0 | 11.0422 |
| 8.5391 | 0.0541 | 2000 | 7.8813 |
| 7.4674 | 0.1081 | 4000 | 6.8562 |
| 6.924 | 0.1622 | 6000 | 6.3113 |
| 6.5184 | 0.2162 | 8000 | 5.9048 |
| 6.2161 | 0.2703 | 10000 | 5.5947 |
| 5.9768 | 0.3244 | 12000 | 5.3563 |
| 5.8003 | 0.3784 | 14000 | 5.1644 |
| 5.6375 | 0.4325 | 16000 | 5.0059 |
| 5.506 | 0.4865 | 18000 | 4.8735 |
| 5.397 | 0.5406 | 20000 | 4.7669 |
| 5.3148 | 0.5946 | 22000 | 4.6759 |
| 5.2398 | 0.6487 | 24000 | 4.6121 |
| 5.181 | 0.7028 | 26000 | 4.5544 |
| 5.1326 | 0.7568 | 28000 | 4.5058 |
| 5.0889 | 0.8109 | 30000 | 4.4615 |
| 5.0476 | 0.8649 | 32000 | 4.4204 |
| 5.0093 | 0.9190 | 34000 | 4.3845 |
| 4.9781 | 0.9731 | 36000 | 4.3574 |
| 4.9358 | 1.0271 | 38000 | 4.3235 |
| 4.9002 | 1.0812 | 40000 | 4.2934 |
| 4.8755 | 1.1352 | 42000 | 4.2641 |
| 4.8536 | 1.1893 | 44000 | 4.2388 |
| 4.8312 | 1.2433 | 46000 | 4.2155 |
| 4.8083 | 1.2974 | 48000 | 4.1960 |
| 4.7885 | 1.3515 | 50000 | 4.1745 |
| 4.7715 | 1.4055 | 52000 | 4.1567 |
| 4.7567 | 1.4596 | 54000 | 4.1401 |
| 4.7436 | 1.5136 | 56000 | 4.1255 |
| 4.7208 | 1.5677 | 58000 | 4.1110 |
| 4.7126 | 1.6218 | 60000 | 4.0958 |
| 4.697 | 1.6758 | 62000 | 4.0853 |
| 4.6887 | 1.7299 | 64000 | 4.0723 |
| 4.6678 | 1.7839 | 66000 | 4.0603 |
| 4.6615 | 1.8380 | 68000 | 4.0498 |
| 4.6594 | 1.8920 | 70000 | 4.0389 |
| 4.6426 | 1.9461 | 72000 | 4.0294 |
| 4.6403 | 2.0002 | 74000 | 4.0230 |
| 4.5992 | 2.0542 | 76000 | 4.0132 |
| 4.5991 | 2.1083 | 78000 | 4.0042 |
| 4.5941 | 2.1623 | 80000 | 3.9972 |
| 4.5829 | 2.2164 | 82000 | 3.9910 |
| 4.5789 | 2.2705 | 84000 | 3.9831 |
| 4.5776 | 2.3245 | 86000 | 3.9770 |
| 4.5665 | 2.3786 | 88000 | 3.9703 |
| 4.5657 | 2.4326 | 90000 | 3.9634 |
| 4.5585 | 2.4867 | 92000 | 3.9580 |
| 4.5602 | 2.5407 | 94000 | 3.9504 |
| 4.5518 | 2.5948 | 96000 | 3.9458 |
| 4.547 | 2.6489 | 98000 | 3.9395 |
| 4.5369 | 2.7029 | 100000 | 3.9349 |
| 4.5336 | 2.7570 | 102000 | 3.9294 |
| 4.5324 | 2.8110 | 104000 | 3.9232 |
| 4.5321 | 2.8651 | 106000 | 3.9173 |
| 4.5235 | 2.9192 | 108000 | 3.9129 |
| 4.5152 | 2.9732 | 110000 | 3.9089 |
| 4.4836 | 3.0273 | 112000 | 3.9066 |
| 4.4824 | 3.0813 | 114000 | 3.9024 |
| 4.486 | 3.1354 | 116000 | 3.8977 |
| 4.4815 | 3.1894 | 118000 | 3.8947 |
| 4.484 | 3.2435 | 120000 | 3.8909 |
| 4.4782 | 3.2976 | 122000 | 3.8869 |
| 4.4774 | 3.3516 | 124000 | 3.8842 |
| 4.4769 | 3.4057 | 126000 | 3.8785 |
| 4.4749 | 3.4597 | 128000 | 3.8753 |
| 4.4689 | 3.5138 | 130000 | 3.8706 |
| 4.4659 | 3.5679 | 132000 | 3.8673 |
| 4.4651 | 3.6219 | 134000 | 3.8637 |
| 4.459 | 3.6760 | 136000 | 3.8609 |
| 4.4613 | 3.7300 | 138000 | 3.8576 |
| 4.4543 | 3.7841 | 140000 | 3.8550 |
| 4.4509 | 3.8382 | 142000 | 3.8512 |
| 4.4536 | 3.8922 | 144000 | 3.8488 |
| 4.4418 | 3.9463 | 146000 | 3.8454 |
| 4.4479 | 4.0 | 147988 | 3.8421 |
| 4.3787 | 4.0003 | 148000 | 3.8435 |
| 4.4127 | 4.0544 | 150000 | 3.8421 |
| 4.4093 | 4.1084 | 152000 | 3.8389 |
| 4.419 | 4.1625 | 154000 | 3.8360 |
| 4.4162 | 4.2166 | 156000 | 3.8340 |
| 4.4156 | 4.2706 | 158000 | 3.8318 |
| 4.4191 | 4.3247 | 160000 | 3.8286 |
| 4.4104 | 4.3787 | 162000 | 3.8260 |
| 4.4118 | 4.4328 | 164000 | 3.8236 |
| 4.4117 | 4.4869 | 166000 | 3.8198 |
| 4.4054 | 4.5409 | 168000 | 3.8180 |
| 4.4034 | 4.5950 | 170000 | 3.8168 |
| 4.3982 | 4.6490 | 172000 | 3.8137 |
| 4.4081 | 4.7031 | 174000 | 3.8115 |
| 4.4022 | 4.7571 | 176000 | 3.8077 |
| 4.4069 | 4.8112 | 178000 | 3.8059 |
| 4.4013 | 4.8653 | 180000 | 3.8039 |
| 4.4034 | 4.9193 | 182000 | 3.8017 |
| 4.3962 | 4.9734 | 184000 | 3.8006 |
| 4.3683 | 5.0274 | 186000 | 3.7990 |
| 4.3714 | 5.0815 | 188000 | 3.7978 |
| 4.366 | 5.1356 | 190000 | 3.7950 |
| 4.3646 | 5.1896 | 192000 | 3.7946 |
| 4.3645 | 5.2437 | 194000 | 3.7917 |
| 4.3651 | 5.2977 | 196000 | 3.7898 |
| 4.3714 | 5.3518 | 198000 | 3.7872 |
| 4.3648 | 5.4058 | 200000 | 3.7867 |
| 4.3679 | 5.4599 | 202000 | 3.7845 |
| 4.3621 | 5.5140 | 204000 | 3.7815 |
| 4.3646 | 5.5680 | 206000 | 3.7802 |
| 4.3611 | 5.6221 | 208000 | 3.7775 |
| 4.3621 | 5.6761 | 210000 | 3.7763 |
| 4.366 | 5.7302 | 212000 | 3.7735 |
| 4.3614 | 5.7843 | 214000 | 3.7728 |
| 4.3585 | 5.8383 | 216000 | 3.7705 |
| 4.3622 | 5.8924 | 218000 | 3.7685 |
| 4.3609 | 5.9464 | 220000 | 3.7674 |
| 4.3581 | 6.0005 | 222000 | 3.7672 |
| 4.3221 | 6.0545 | 224000 | 3.7657 |
| 4.3246 | 6.1086 | 226000 | 3.7643 |
| 4.3246 | 6.1627 | 228000 | 3.7628 |
| 4.3287 | 6.2167 | 230000 | 3.7621 |
| 4.3352 | 6.2708 | 232000 | 3.7602 |
| 4.3335 | 6.3248 | 234000 | 3.7585 |
| 4.3316 | 6.3789 | 236000 | 3.7575 |
| 4.3287 | 6.4330 | 238000 | 3.7559 |
| 4.3271 | 6.4870 | 240000 | 3.7536 |
| 4.326 | 6.5411 | 242000 | 3.7533 |
| 4.3287 | 6.5951 | 244000 | 3.7502 |
| 4.3296 | 6.6492 | 246000 | 3.7490 |
| 4.3283 | 6.7032 | 248000 | 3.7468 |
| 4.3332 | 6.7573 | 250000 | 3.7458 |
| 4.3247 | 6.8114 | 252000 | 3.7443 |
| 4.325 | 6.8654 | 254000 | 3.7425 |
| 4.3273 | 6.9195 | 256000 | 3.7411 |
| 4.3252 | 6.9735 | 258000 | 3.7401 |
| 4.2946 | 7.0276 | 260000 | 3.7396 |
| 4.3001 | 7.0817 | 262000 | 3.7390 |
| 4.299 | 7.1357 | 264000 | 3.7383 |
| 4.3005 | 7.1898 | 266000 | 3.7368 |
| 4.298 | 7.2438 | 268000 | 3.7353 |
| 4.3013 | 7.2979 | 270000 | 3.7352 |
| 4.3028 | 7.3519 | 272000 | 3.7335 |
| 4.3002 | 7.4060 | 274000 | 3.7319 |
| 4.2987 | 7.4601 | 276000 | 3.7304 |
| 4.3063 | 7.5141 | 278000 | 3.7298 |
| 4.298 | 7.5682 | 280000 | 3.7287 |
| 4.3013 | 7.6222 | 282000 | 3.7265 |
| 4.3045 | 7.6763 | 284000 | 3.7260 |
| 4.2957 | 7.7304 | 286000 | 3.7243 |
| 4.2968 | 7.7844 | 288000 | 3.7230 |
| 4.2947 | 7.8385 | 290000 | 3.7218 |
| 4.2958 | 7.8925 | 292000 | 3.7205 |
| 4.2976 | 7.9466 | 294000 | 3.7191 |
| 4.2905 | 8.0006 | 296000 | 3.7189 |
| 4.2785 | 8.0547 | 298000 | 3.7186 |
| 4.2761 | 8.1088 | 300000 | 3.7181 |
| 4.2742 | 8.1628 | 302000 | 3.7162 |
| 4.279 | 8.2169 | 304000 | 3.7160 |
| 4.2741 | 8.2709 | 306000 | 3.7154 |
| 4.2749 | 8.3250 | 308000 | 3.7142 |
| 4.2709 | 8.3791 | 310000 | 3.7126 |
| 4.2688 | 8.4331 | 312000 | 3.7117 |
| 4.273 | 8.4872 | 314000 | 3.7105 |
| 4.272 | 8.5412 | 316000 | 3.7093 |
| 4.2731 | 8.5953 | 318000 | 3.7082 |
| 4.2728 | 8.6494 | 320000 | 3.7072 |
| 4.2749 | 8.7034 | 322000 | 3.7058 |
| 4.2682 | 8.7575 | 324000 | 3.7052 |
| 4.2786 | 8.8115 | 326000 | 3.7042 |
| 4.27 | 8.8656 | 328000 | 3.7032 |
| 4.2665 | 8.9196 | 330000 | 3.7019 |
| 4.2728 | 8.9737 | 332000 | 3.7012 |
| 4.2627 | 9.0 | 332973 | 3.7004 |
| 4.2534 | 9.0278 | 334000 | 3.7015 |
| 4.2509 | 9.0818 | 336000 | 3.7010 |
| 4.2517 | 9.1359 | 338000 | 3.7006 |
| 4.2512 | 9.1899 | 340000 | 3.6998 |
| 4.2488 | 9.2440 | 342000 | 3.6986 |
| 4.2482 | 9.2981 | 344000 | 3.6985 |
| 4.255 | 9.3521 | 346000 | 3.6977 |
| 4.2505 | 9.4062 | 348000 | 3.6966 |
| 4.2544 | 9.4602 | 350000 | 3.6961 |
| 4.2515 | 9.5143 | 352000 | 3.6954 |
| 4.2441 | 9.5683 | 354000 | 3.6944 |
| 4.2498 | 9.6224 | 356000 | 3.6940 |
| 4.2485 | 9.6765 | 358000 | 3.6934 |
| 4.2528 | 9.7305 | 360000 | 3.6928 |
| 4.2499 | 9.7846 | 362000 | 3.6921 |
| 4.2521 | 9.8386 | 364000 | 3.6917 |
| 4.2467 | 9.8927 | 366000 | 3.6915 |
| 4.2495 | 9.9468 | 368000 | 3.6911 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.9.1+cu128
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- 13