PANews reported on March 30th that Alibaba's Qianwen announced the launch of its full-modal large-scale model, Qwen3.5-Omni. The Qwen3.5-Omni series includes Instruct versions in Plus, Flash, and Light sizes, supports 256k long context, and allows for over 10 hours of audio input and over 400 seconds of 720P (1FPS) audio/video input. The model was natively pre-trained on massive amounts of text, visual data, and over 100 million hours of audio/video data, demonstrating exceptional full-modal perception and generation capabilities. Compared to Qwen3-Omni, Qwen3.5-Omni significantly enhances multilingual capabilities, supporting speech recognition for 113 languages and dialects and speech generation for 36 languages and dialects.


