The post Qwen 3.5 Omni: Alibaba’s AI Model Can Now Hear, Watch, and Clone Your Voice appeared on BitcoinEthereumNews.com. In brief Alibaba’s Qwen 3.5 Omni bringsThe post Qwen 3.5 Omni: Alibaba’s AI Model Can Now Hear, Watch, and Clone Your Voice appeared on BitcoinEthereumNews.com. In brief Alibaba’s Qwen 3.5 Omni brings

Qwen 3.5 Omni: Alibaba’s AI Model Can Now Hear, Watch, and Clone Your Voice

2026/03/31 04:07
Okuma süresi: 5 dk
Bu içerikle ilgili geri bildirim veya endişeleriniz için lütfen crypto.news@mexc.com üzerinden bizimle iletişime geçin.

In brief

  • Alibaba’s Qwen 3.5 Omni brings true real-time omnimodal AI to the frontier race.
  • Native audio-visual processing beats stitched multimodal pipelines in speed and coherence.
  • Voice cloning, semantic interruption, and vibe coding signal a shift toward fully interactive AI agents.

Alibaba just dropped its most ambitious AI upgrade yet.

The company’s Qwen team released Qwen 3.5 Omni on Sunday, a new version of its “omnimodal” AI that simultaneously processes text, images, audio, and video, and talks back in real time across 36 languages, placing its model on the same battlefield as the latest state-of-the-art AI foundational models currently available.

“Omni” isn’t just a marketing buzzword here. Most AI models you interact with are primarily text-in, text-out systems. Some handle images, some handle voice. Qwen 3.5 Omni handles all of them natively, at the same time, without the need to convert everything to text through third-party tools.

The new model comes in three sizes—Plus, Flash, and Light—all supporting a small (by today’s standards) 256,000-token context window. It was trained on over 100 million hours of audio-visual data—a scale that puts it in a different weight class from most competitors.

Qwen 3.5 Omni is an evolution of Qwen 3 Omni Flash, Alibaba’s previous omnimodal model released in December 2025. That version already impressed with its ability to process video and audio simultaneously—it could handle image editing instructions combining multiple visual inputs in ways competitors couldn’t—and streamed voice responses with latency as low as 234 milliseconds.

It was also the first model to try an alternative to Google’s NotebookLM. It achieved something, but the quality was not on par with Google’s offer.

Qwen 3.5 Omni takes all of that and adds a longer context window, better reasoning, a much wider language library, and a set of real-time interaction features the previous generation didn’t have.

The headline upgrade is what happens when you actually talk to it. Qwen3.5-Omni now supports semantic interruption: It can tell the difference between you saying “uh-huh” mid-sentence and actually wanting to cut in, so it won’t stop mid-thought every time someone coughs in the background, making spoken interaction more seamless.

A new technique called ARIA, short for Adaptive Rate Interleave Alignment, also fixes a subtle but persistent annoyance: AI systems that garble numbers or unusual words when reading aloud. ARIA dynamically syncs text and speech to keep output natural and accurate.

Then there’s voice cloning. Users can upload a voice sample and have the model adopt that voice in its responses, a feature that puts Qwen directly in competition with ElevenLabs and other dedicated voice tools. We weren’t able to access this feature, though, because this is a feature that, at least for now, is only available via API..

On multilingual voice stability benchmarks, Qwen3.5 Omni- Plus beat ElevenLabs, GPT-Audio, and Minimax across 20 languages. The model also now supports real-time web search, meaning it can answer questions about breaking news or live market data without pretending it already knows.

The team is also highlighting what they’re calling “Audio-Visual Vibe Coding,” the model can watch a screen recording or video of a coding task and write functional code based purely on what it sees and hears, no text prompt required. It’s a small preview of how AI assistants might eventually operate inside your workflow rather than alongside it.

To understand what “omnimodal” actually means in practice, we ran a quick test: We fed both Qwen3.5-Omni and ChatGPT 5.4 in “thinking” mode the same YouTube Short—a clip of Dastan President (Dastan is Decrypt’s parent company) and commentator Farokh discussing breaking news. Qwen 3.5 Omni processed the video natively and returned a full analysis in about one minute: who was speaking, what they were discussing, and a substantive comment on the topic based on its own knowledge of the subject area.

ChatGPT 5.4, which is not omnimodal, had to manage with what it got. It extracted frames from the video, ran them through a vision model, used Whisper to transcribe the audio, and applied an OCR tool to read embedded subtitles—three separate processes stitched together to approximate what Qwen3.5-Omni does in a single pass. The result took nine minutes, and that’s under ideal conditions: a well-lit video with clean audio and burned-in subtitles. Real-world content rarely offers all three.

In our quick tests across multiple inputs, the model also handled prompts in Spanish, Portuguese, and English without issue—switching languages mid-conversation without losing context.

On standard benchmarks, Qwen 3.5 Omni Plus outperformed Gemini 3.1 Pro on general audio understanding, reasoning, and translation tasks, and matched it on audio-visual comprehension. Speech recognition now covers 113 languages and dialects—up from 19 in the previous generation.

This is Alibaba’s second major AI release in six weeks. In February, it launched Qwen 3.5, a text-and-vision model that matched or beat frontier models on reasoning and coding benchmarks—part of a streak that has also included Qwen Deep Research and a lineup of tools rivaling OpenAI and Google. Qwen 3.5 Omni extends that momentum into full multimodal territory, at a time when every major AI lab is racing to build systems that handle the full spectrum of human communication—not just words on a screen.

The model is available now via Alibaba Cloud’s API and can be tested directly at Qwen Chat or through Hugging Face’s online demo.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Source: https://decrypt.co/362742/alibaba-qwen-omni-major-upgrade-review

Piyasa Fırsatı
Confidential Layer Logosu
Confidential Layer Fiyatı(CLONE)
$0.00511
$0.00511$0.00511
+1.34%
USD
Confidential Layer (CLONE) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen crypto.news@mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

This week, NFT transaction volume rebounded by 1.27% to US$108.6 million, and the number of buyers and sellers increased by more than 50%.

This week, NFT transaction volume rebounded by 1.27% to US$108.6 million, and the number of buyers and sellers increased by more than 50%.

PANews reported on September 21st that Crypto.news reported that CryptoSlam data showed that NFT market transaction volume increased by 1.27% over the past week, reaching $108.6 million. Market participation has rebounded, with the number of NFT buyers increasing by 53.24% to 276,735 and the number of NFT sellers increasing by 67.19% to 206,669. However, the number of NFT transactions decreased by 6.65% to 1,630,579. Ethereum network transaction volume reached $46.7 million, a 42.85% surge from the previous week. Mythos Chain network transaction volume reached $12.15 million, down 21.91%. Bitcoin network transaction volume reached $9.82 million, down 2.17%. This week's high-value transactions include: BOOGLE sold for 1,380 SOL ($324,846 USD) CryptoPunks #8521 sold for 55.48 ETH ($255,288 USD) CryptoPunks #4420 sold for 56.388 ETH ($254,250) CryptoPunks #2642 sold for 52.1 ETH ($239,735) CryptoPunks #1180 sold for 49.89 ETH ($232,394)
Paylaş
PANews2025/09/21 09:01
XRP’s ‘True Value’ Could Be $32, Says BlackRock Executive

XRP’s ‘True Value’ Could Be $32, Says BlackRock Executive

Robert Mitchnick and Susan Athey’s 2018 study valued XRP up to $32 under adoption scenarios. Bitcoin is trading above the modeled fair value of $93,000 at $112,800, while XRP has remained stagnant around $3. A resurfaced research paper co-authored in 2018 by Robert Mitchnick, now Head of Digital Assets at BlackRock, has drawn fresh attention [...]]]>
Paylaş
Crypto News Flash2025/09/22 16:40
Grayscale’s ‘first multi-crypto asset ETP’ in the works: Will BTC, ETH win?

Grayscale’s ‘first multi-crypto asset ETP’ in the works: Will BTC, ETH win?

The post Grayscale’s ‘first multi-crypto asset ETP’ in the works: Will BTC, ETH win? appeared on BitcoinEthereumNews.com. Key Takeaways What does this approval mean for investors? It allows traditional investors to access diversified exposure to major cryptocurrencies without buying tokens directly. Which cryptocurrencies are included in GDLC? Bitcoin, Ether, XRP, Solana, and Cardano. The U.S. Securities and Exchange Commission (SEC) has greenlit the Grayscale Digital Large Cap Fund (GDLC) for stock exchange trading.  The approval, coinciding with relaxed ETF listing standards, opens the door for traditional investors to access the crypto market more easily and signals growing institutional support. Grayscale CEO Peter Mintzberg weighs in Grayscale CEO Peter Mintzberg confirmed the development on X (formerly Twitter), praising the SEC’s Crypto Task Force for providing much-needed clarity to the sector. He said,  “The Grayscale team is working expeditiously to bring the FIRST multi #crypto asset ETP to market with Bitcoin, Ethereum, XRP, Solana, and Cardano.” He further added,  “Thank you to the SEC #Crypto Task Force for their continued, unmatched efforts in bringing the regulatory clarity our industry deserves.” The newly approved Grayscale Digital Large Cap Fund (GDLC) offers investors exposure to five of the world’s largest cryptocurrencies: Bitcoin [BTC], Ethereum [ETH], Ripple [XRP], Solana [SOL], and Cardano [ADA]. Impact on included tokens Following the announcement, markets reacted positively. BTC traded at $117,153.61 after a 0.69% rise in the past 24 hours, Ether climbed 2.02% to $4,579.73, XRP at $3.10 up by 3.07%, Solana at $245.94 up by 4.78%, and Cardano reached $0.9130 up by 4.85%, per CoinMarketCap. By packaging multiple cryptocurrencies into a single ETP, GDLC allows traditional investors to gain diversified crypto exposure without the need to open exchange accounts or purchase individual tokens. This green light comes just months after the SEC had delayed Grayscale’s plan to convert GDLC from an over-the-counter fund to an ETP listed on NYSE Arca. With approval now granted, the fund is…
Paylaş
BitcoinEthereumNews2025/09/19 12:53

Trade GOLD, Share 1,000,000 USDT

Trade GOLD, Share 1,000,000 USDTTrade GOLD, Share 1,000,000 USDT

0 fees, up to 1,000x leverage, deep liquidity