Spaces-explorers

Activity Feed Request to join this org

AI & ML interests

Contributors who are invited to beta-test our next big feature! Contact us if you want to join this team :-)

Recent Activity

sizzle53 authored a paper about 1 month ago

Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench

sizzle53 authored a paper about 1 month ago

AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

victor submitted a paper about 2 months ago

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

View all activity

alceballosa

authored a paper 2 months ago

Robust automatic brain vessel segmentation in 3D CTA scans using dynamic 4D-CTA data

Paper • 2602.00391 • Published Jan 30

giux78

posted an update 3 months ago

Post

237

Together with @mferraretto and @efederici we released #Nesso-4B, a new model specialized for agentic workflows.

mii-llm/nesso-4B

#Nesso-4B is a fine-tuned version of Qwen-4B, trained on a highly curated and balanced dataset designed specifically for multilingual agentic workflows and conversational use cases.

As shown in the video below we simulate, the new “cowork” from #Antrophic, without any data sharing all running on a consumer device. The model can be used to build agentic behavior in #privateAI environments.

Not every problem requires super intelligence: in many cases, intelligence at the edge is more than enough.

#Nesso4B #AgenticAI #PrivateAI #EdgeAI #OnDeviceAI

2 replies

SivilTaram

authored a paper 5 months ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 132

wissamantoun

authored 2 papers 6 months ago

From Text to Source: Results in Detecting Large Language Model-Generated Content

Paper • 2309.13322 • Published Sep 23, 2023 • 1

Gaperon: A Peppered English-French Generative Language Model Suite

Paper • 2510.25771 • Published Oct 29, 2025 • 16

SivilTaram

authored a paper 8 months ago

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2, 2025 • 84

SivilTaram

authored 2 papers 9 months ago

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Paper • 2507.12415 • Published Jul 16, 2025 • 43

First Return, Entropy-Eliciting Explore

Paper • 2507.07017 • Published Jul 9, 2025 • 24

SivilTaram

authored a paper 10 months ago

ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention

Paper • 2507.01004 • Published Jul 1, 2025 • 10

SivilTaram

authored a paper 11 months ago

General-Reasoner: Advancing LLM Reasoning Across All Domains

Paper • 2505.14652 • Published May 20, 2025 • 24

wissamantoun

authored a paper about 1 year ago

ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance

Paper • 2504.08716 • Published Apr 11, 2025 • 9

giux78

posted an update about 1 year ago

Post

2445

LLAMA4 release highlight the importance of political and social bias. According to their own evaluation described in the release blog post:
- Refusals on contentious prompts dropped from 7% (hashtag#LLAMA 3.3) to under 2%
- Unequal response refusals are now under 1%
- Political lean bias is said to be halved compared to hashtag#LLaMA 3.3 and comparable to Grok

However, we @efederici @mferraretto @FinancialSupport and I released some weeks ago an independent open source benchmark called Propaganda to measure political bias in LLMs: https://github.com/mii-llm/propaganda

In the chart below, we evaluated multiple leading models on the basis of ratings across a range of prompts designed to expose ideological leanings.

Despite Meta’s stated neutrality goals, LLAMA4 ranks at the very top in terms of total ratings aligned with a clear ideological bias. The models were tested on their ability to respond even-handedly to politically sensitive prompts. LLaMA 4 scored even higher than models known for strong alignment policies like GPT-4o.

LLMs may be refusing less, but they still show bias through content framing. This suggests that refusal rates alone are not a sufficient measure of ideological bias. Relying solely on internal evaluations from AI labs also raises concerns about transparency and objectivity.

SivilTaram

authored 5 papers about 1 year ago

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Paper • 2411.07763 • Published Nov 12, 2024 • 2

posted an update about 1 year ago

Post

3254

This is truly an inspirational story please help us spread the word, @clem , @thomwolf and everyone who supports open source AI.

A few weeks ago, @mmuffo94 and @cittiberto from indigo_ai launched the Chatbot Arena for the Italian language: https://indigo.ai/it/chatbot-arena-italia/.

To our surprise, among the top-ranked models is mii-llm/maestrale-chat-v0.4-beta a carefully fine-tuned version of mistralai/Mistral-7B-v0.1, developed by @efederici and @mferraretto from

mii-llm , and released nearly a year ago.

At this very moment, as shown in the screenshot, mii-llm/maestrale-chat-v0.4-beta is ranked 8th right between ChatGPT-4.5 and ChatGPT-4o.

It's likely that for several months, the best Italian speaking LLM has been an open source 7B model created by open source contributors and hardly anyone knew it.

2 replies

SivilTaram

authored 2 papers about 1 year ago

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Paper • 2503.18892 • Published Mar 24, 2025 • 31

SkyLadder: Better and Faster Pretraining via Context Window Scheduling

Paper • 2503.15450 • Published Mar 19, 2025 • 12

AI & ML interests

Recent Activity

Team members 1,350

spaces-explorers's activity