You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

gpt2moe_het_1000mb

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 3.6910

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 36996
  • training_steps: 369967
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 11.0422
8.5391 0.0541 2000 7.8813
7.4674 0.1081 4000 6.8562
6.924 0.1622 6000 6.3113
6.5184 0.2162 8000 5.9048
6.2161 0.2703 10000 5.5947
5.9768 0.3244 12000 5.3563
5.8003 0.3784 14000 5.1644
5.6375 0.4325 16000 5.0059
5.506 0.4865 18000 4.8735
5.397 0.5406 20000 4.7669
5.3148 0.5946 22000 4.6759
5.2398 0.6487 24000 4.6121
5.181 0.7028 26000 4.5544
5.1326 0.7568 28000 4.5058
5.0889 0.8109 30000 4.4615
5.0476 0.8649 32000 4.4204
5.0093 0.9190 34000 4.3845
4.9781 0.9731 36000 4.3574
4.9358 1.0271 38000 4.3235
4.9002 1.0812 40000 4.2934
4.8755 1.1352 42000 4.2641
4.8536 1.1893 44000 4.2388
4.8312 1.2433 46000 4.2155
4.8083 1.2974 48000 4.1960
4.7885 1.3515 50000 4.1745
4.7715 1.4055 52000 4.1567
4.7567 1.4596 54000 4.1401
4.7436 1.5136 56000 4.1255
4.7208 1.5677 58000 4.1110
4.7126 1.6218 60000 4.0958
4.697 1.6758 62000 4.0853
4.6887 1.7299 64000 4.0723
4.6678 1.7839 66000 4.0603
4.6615 1.8380 68000 4.0498
4.6594 1.8920 70000 4.0389
4.6426 1.9461 72000 4.0294
4.6403 2.0002 74000 4.0230
4.5992 2.0542 76000 4.0132
4.5991 2.1083 78000 4.0042
4.5941 2.1623 80000 3.9972
4.5829 2.2164 82000 3.9910
4.5789 2.2705 84000 3.9831
4.5776 2.3245 86000 3.9770
4.5665 2.3786 88000 3.9703
4.5657 2.4326 90000 3.9634
4.5585 2.4867 92000 3.9580
4.5602 2.5407 94000 3.9504
4.5518 2.5948 96000 3.9458
4.547 2.6489 98000 3.9395
4.5369 2.7029 100000 3.9349
4.5336 2.7570 102000 3.9294
4.5324 2.8110 104000 3.9232
4.5321 2.8651 106000 3.9173
4.5235 2.9192 108000 3.9129
4.5152 2.9732 110000 3.9089
4.4836 3.0273 112000 3.9066
4.4824 3.0813 114000 3.9024
4.486 3.1354 116000 3.8977
4.4815 3.1894 118000 3.8947
4.484 3.2435 120000 3.8909
4.4782 3.2976 122000 3.8869
4.4774 3.3516 124000 3.8842
4.4769 3.4057 126000 3.8785
4.4749 3.4597 128000 3.8753
4.4689 3.5138 130000 3.8706
4.4659 3.5679 132000 3.8673
4.4651 3.6219 134000 3.8637
4.459 3.6760 136000 3.8609
4.4613 3.7300 138000 3.8576
4.4543 3.7841 140000 3.8550
4.4509 3.8382 142000 3.8512
4.4536 3.8922 144000 3.8488
4.4418 3.9463 146000 3.8454
4.4479 4.0 147988 3.8421
4.3787 4.0003 148000 3.8435
4.4127 4.0544 150000 3.8421
4.4093 4.1084 152000 3.8389
4.419 4.1625 154000 3.8360
4.4162 4.2166 156000 3.8340
4.4156 4.2706 158000 3.8318
4.4191 4.3247 160000 3.8286
4.4104 4.3787 162000 3.8260
4.4118 4.4328 164000 3.8236
4.4117 4.4869 166000 3.8198
4.4054 4.5409 168000 3.8180
4.4034 4.5950 170000 3.8168
4.3982 4.6490 172000 3.8137
4.4081 4.7031 174000 3.8115
4.4022 4.7571 176000 3.8077
4.4069 4.8112 178000 3.8059
4.4013 4.8653 180000 3.8039
4.4034 4.9193 182000 3.8017
4.3962 4.9734 184000 3.8006
4.3683 5.0274 186000 3.7990
4.3714 5.0815 188000 3.7978
4.366 5.1356 190000 3.7950
4.3646 5.1896 192000 3.7946
4.3645 5.2437 194000 3.7917
4.3651 5.2977 196000 3.7898
4.3714 5.3518 198000 3.7872
4.3648 5.4058 200000 3.7867
4.3679 5.4599 202000 3.7845
4.3621 5.5140 204000 3.7815
4.3646 5.5680 206000 3.7802
4.3611 5.6221 208000 3.7775
4.3621 5.6761 210000 3.7763
4.366 5.7302 212000 3.7735
4.3614 5.7843 214000 3.7728
4.3585 5.8383 216000 3.7705
4.3622 5.8924 218000 3.7685
4.3609 5.9464 220000 3.7674
4.3581 6.0005 222000 3.7672
4.3221 6.0545 224000 3.7657
4.3246 6.1086 226000 3.7643
4.3246 6.1627 228000 3.7628
4.3287 6.2167 230000 3.7621
4.3352 6.2708 232000 3.7602
4.3335 6.3248 234000 3.7585
4.3316 6.3789 236000 3.7575
4.3287 6.4330 238000 3.7559
4.3271 6.4870 240000 3.7536
4.326 6.5411 242000 3.7533
4.3287 6.5951 244000 3.7502
4.3296 6.6492 246000 3.7490
4.3283 6.7032 248000 3.7468
4.3332 6.7573 250000 3.7458
4.3247 6.8114 252000 3.7443
4.325 6.8654 254000 3.7425
4.3273 6.9195 256000 3.7411
4.3252 6.9735 258000 3.7401
4.2946 7.0276 260000 3.7396
4.3001 7.0817 262000 3.7390
4.299 7.1357 264000 3.7383
4.3005 7.1898 266000 3.7368
4.298 7.2438 268000 3.7353
4.3013 7.2979 270000 3.7352
4.3028 7.3519 272000 3.7335
4.3002 7.4060 274000 3.7319
4.2987 7.4601 276000 3.7304
4.3063 7.5141 278000 3.7298
4.298 7.5682 280000 3.7287
4.3013 7.6222 282000 3.7265
4.3045 7.6763 284000 3.7260
4.2957 7.7304 286000 3.7243
4.2968 7.7844 288000 3.7230
4.2947 7.8385 290000 3.7218
4.2958 7.8925 292000 3.7205
4.2976 7.9466 294000 3.7191
4.2905 8.0006 296000 3.7189
4.2785 8.0547 298000 3.7186
4.2761 8.1088 300000 3.7181
4.2742 8.1628 302000 3.7162
4.279 8.2169 304000 3.7160
4.2741 8.2709 306000 3.7154
4.2749 8.3250 308000 3.7142
4.2709 8.3791 310000 3.7126
4.2688 8.4331 312000 3.7117
4.273 8.4872 314000 3.7105
4.272 8.5412 316000 3.7093
4.2731 8.5953 318000 3.7082
4.2728 8.6494 320000 3.7072
4.2749 8.7034 322000 3.7058
4.2682 8.7575 324000 3.7052
4.2786 8.8115 326000 3.7042
4.27 8.8656 328000 3.7032
4.2665 8.9196 330000 3.7019
4.2728 8.9737 332000 3.7012
4.2627 9.0 332973 3.7004
4.2534 9.0278 334000 3.7015
4.2509 9.0818 336000 3.7010
4.2517 9.1359 338000 3.7006
4.2512 9.1899 340000 3.6998
4.2488 9.2440 342000 3.6986
4.2482 9.2981 344000 3.6985
4.255 9.3521 346000 3.6977
4.2505 9.4062 348000 3.6966
4.2544 9.4602 350000 3.6961
4.2515 9.5143 352000 3.6954
4.2441 9.5683 354000 3.6944
4.2498 9.6224 356000 3.6940
4.2485 9.6765 358000 3.6934
4.2528 9.7305 360000 3.6928
4.2499 9.7846 362000 3.6921
4.2521 9.8386 364000 3.6917
4.2467 9.8927 366000 3.6915
4.2495 9.9468 368000 3.6911

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.1+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
13
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results