Instructions to use Igriscodes/qwen3-4b-tool with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Igriscodes/qwen3-4b-tool with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
qwen-tool
This model is a fine-tuned version of Qwen/Qwen3-4B, optimized for complex functional calling and multi-step tool use via the Model Context Protocol (MCP).
The model was aligned using Proximal Policy Optimization (PPO) in a closed-loop agentic environment. It leverages execution-based feedback from an MCP server to drastically reduce tool hallucinations, adhere to strict JSON formatting, and self-correct based on execution error states.
Model Details
- Developed by: Igriscodes
- Base Model:
Qwen/Qwen3-4B - License: Mozilla Public License 2.0 (MPL 2.0)
- Training Framework: Hugging Face
trl&peft(LoRA) - Alignment Method: PPO (Proximal Policy Optimization) with Execution-Based Reward Guidance
Intended Uses & Limitations
Intended Use Cases
- Structured Tool Calling: Interfacing natively with Model Context Protocol (MCP) servers.
- Multi-step Agentic Tasks: Iterative problem-solving across math, web searching, database queries, and data processing.
- Error-Resilient Agents: Handling tool-execution errors gracefully by rewriting payload schemas based on environment exceptions.
Training Architecture & Alignment Loop
The model was trained as the Policy (Actor) within a custom gymnasium environment (MCPGymEnv). The environment tracks an execution loop between the model's textual outputs and a backend mock MCP server.
Reward Specification Matrix
The PPO agent was optimized against a dense, feedback-driven execution reward model:
| Trigger Status | Reward | Evaluation Logic |
|---|---|---|
| Success | +10.0 |
Tool executed cleanly; returned data matches the expected task state. |
| Tool Execution | 0.0 |
Tool ran successfully, but the overarching objective is incomplete. |
| Tool Error | -0.5 |
Target tool was hit, but threw a runtime exception (e.g., bad arguments). |
| Invalid JSON | -0.8 |
Failed to output a syntactically valid JSON tool-call schema. |
| Structural Fail | -1.0 |
Severe divergence from agentic system instructions or tool hallucination. |
Hyperparameters & Efficiency Stack
- Quantization: 4-bit NormalFloat (NF4) via
bitsandbytes(for base model loading). - PEFT Adaptation: LoRA targeted all linear layers (
q_proj,v_proj,k_proj,o_proj, etc.). - Memory Optimization: 8-bit Paged AdamW optimizer, gradient checkpointing, and parallel rollout sampling to balance the Actor-Critic-Reference model triplet footprint.
Acknowledgements
We express our gratitude to the following organizations, communities, and tools that made this project possible:
- Qwen (Alibaba Cloud) - For providing the foundational Qwen3 model weights and architecture.
- Hugging Face - For the incredible ecosystem and libraries used to load, manage, and train the model.
- PyTorch - For the robust, deep learning framework that powered the underlying tensor computations and GPU acceleration during fine-tuning.
- Google Gemini 3 - For providing assistance in optimizing, and debugging the fine-tuning code scripts.
License
Mozilla Public License Version 2.0 - Feel free to use and modify
- Downloads last month
- -