Qwen3.5 Fine-Tuning Guide – Unsloth Documentation

This guide from Unsloth details how to efficiently fine-tune the Qwen3.5 series of Large Language Models (LLMs), including their multimodal capabilities. Unsloth positions itself as a tool that dramatically reduces the computational overhead for fine-tuning, making these powerful models more accessible to a wider range of users and hardware configurations. The documentation covers various aspects from basic setup to advanced considerations for different model sizes and deployment targets.

Performance Boosts: Unsloth accelerates Qwen3.5 training by 1.5 times and cuts VRAM usage by 50% compared to typical FA2 setups.
Model Support: It supports the entire Qwen3.5 family (0.8B to 122B-A10B), including vision and text fine-tuning, with specific VRAM requirements listed (e.g., 27B requires 56GB for BF16 LoRA).
Fine-tuning Specifics: Recommendations include using transformers v5, preserving reasoning ability by mixing example types, and noting that Full Fine-Tuning (FFT) uses 4x more VRAM. QLoRA (4-bit) is explicitly not recommended due to quantization differences.
MoE Models: For Mixture of Experts (MoE) models like Qwen3.5-35B-A3B and 122B-A10B, BF16 setups (LoRA or FFT) are preferred, and Unsloth's MoE kernels are enabled by default.
Multimodal Capabilities: Qwen3.5 is a Causal Language Model with a Vision Encoder, and Unsloth supports vision fine-tuning, allowing users to fine-tune specific layers (vision, language, attention, MLP).
Troubleshooting: Tips for Out-Of-Memory (OOM) errors involve reducing batch size or sequence length.
Export and Deployment: Fine-tuned models can be exported to various formats like GGUF (for llama.cpp, Ollama, LM Studio), vLLM, or uploaded directly to Hugging Face, with warnings about matching chat templates during inference.

In essence, Unsloth provides a streamlined and highly optimized pathway for developers to customize Qwen3.5 models, addressing the critical challenges of speed and memory consumption in the rapidly evolving field of large language models.

Qwen3.5 Fine-Tuning Guide – Unsloth Documentation

The Lowdown