Phil
phil111
AI & ML interests
None yet
Organizations
None yet
Doesn't stop thinking.
π
1
9
#3 opened 3 months ago
by
phil111
Impressive Broad Knowledge
π
π
5
8
#12 opened 5 months ago
by
phil111
This just trades general performance for domain specific gains.
π₯
π
16
11
#3 opened 5 months ago
by
phil111
Please stop blindly trusting and reporting Alibaba's scores.
π
7
2
#1 opened 5 months ago
by
phil111
Weird responses
12
#10 opened 5 months ago
by
vparth7
Gemma A3B
π
6
13
#3 opened 5 months ago
by
Maria99934
gpt-oss is actually good. even on less common benchmark
π€
π
7
2
#109 opened 5 months ago
by
groupfairnessllm
model quality issues
5
#92 opened 5 months ago
by
TheBigBlockPC
Terrible instruction following
π
1
4
#3 opened 5 months ago
by
denisalpino
4b model with an 84.2 MMLU-Redux score?
π€
3
1
#2 opened 5 months ago
by
phil111
This model is unbelievably ignorant.
β
π
41
15
#14 opened 5 months ago
by
phil111
Knowledge limitations
π
2
5
#25 opened 5 months ago
by
hexess
An Improvement, But Q3 30b Still Has Very Little General Knowledge
β€οΈ
π
3
11
#2 opened 5 months ago
by
phil111
Test Scores Can Be Misleading
π
1
8
#8 opened 5 months ago
by
phil111
More Knowledge, But Hard To Extract
π
1
#29 opened 5 months ago
by
phil111
The SimpleQA score of the model is WAY off.
π₯
4
3
#2 opened 6 months ago
by
phil111
Qwen3 is great, but could be better.
π
9
25
#18 opened 8 months ago
by
phil111
SimpleQA jumped from 12.2 to 54.3?
π₯
π§
22
25
#4 opened 6 months ago
by
phil111
SimpleQA score?
#6 opened 6 months ago
by
phil111
That SimpleQA score looks too good to be true.
π
12
19
#1 opened 6 months ago
by
phil111