Global EditionASIA 中文雙語Fran?ais
    Business
    Home / Business / Technology

    Tech firms double down on multimodal LLMs

    By FAN FEIFEI | China Daily | Updated: 2025-04-23 07:32
    Share
    Share - WeChat
    A visitor interacts with ByteDance's AI model Doubao during a high-tech expo in December in Shanghai.[CHINA DAILY]

    Chinese technology companies are doubling down on artificial intelligence-powered multimodal large language models, as part of a broader push to bolster the use of cutting-edge technology in a wider range of fields.

    Multimodal LLMs boast the ability to process and generate various types of content, covering text, images, audio and video.

    Experts said multimodal LLMs will lead the way in the further development of the generative AI industry, with significant potential for application in diverse industries such as finance, retail, healthcare and intelligent manufacturing.

    Chinese internet heavyweight ByteDance recently released its latest AI model Doubao 1.5, which is equipped with deep thinking and vision comprehension capacities, while updating its text-to-image model to offer users imaging and visual content experience.

    The newly launched model has received a significant upgrade in categories like mathematics, programming, scientific reasoning and creative writing, and has significantly reduced training and inference costs, the company said.

    With visual reasoning ability, the model can ponder on what it sees. For instance, it can analyze landforms based on uploaded photos, help travelers conveniently choose restaurants while traveling and provide assistance to enterprises in project management and flowchart generation, so as to improve work efficiency and decision-making quality.

    Lu Yanxia, research director at market research company International Data Corp China, said Chinese tech companies' technological advancements in multimodal LLMs will further promote the popularization of AI models, and bring fresh business opportunities for domestic AI servers, cloud computing and chip companies.

    Such LLMs necessitate higher demand for data and knowledge in professional fields, and for talent that can fine-tune specialized models based on specific industrial demands, she said.

    Chinese video-sharing platform Kuaishou Technology recently launched its newest Kling AI 2.0 video generation model. Since its launch in June last year, the Kling AI model has undergone over 20 iterations, with the number of global users surpassing 22 million.

    The text-to-video AI model outperformed its rivals such as OpenAI's Sora in dimensions including semantic responsiveness, and visual and motion quality, Kuaishou claimed.

    Gai Kun, senior vice-president and head of the community science department of Kuaishou, said AI holds immense potential for assisting creative expression, but some challenges persist in terms of the stability of AI-generated content, or AIGC, and the precise expression of users' complex creative ideas.

    Gai said it is necessary to comprehensively enhance AI models' capabilities and improve human-machine interaction levels to "tell good stories with AI", adding that the rapid development of AIGC is reshaping many industries, such as advertising, film, television, entertainment and creativity.

    Moreover, multimodal editing capabilities are currently available on Kling AI platform, where users can input their ideas through images and other formats, generating creative videos that align with their concepts, according to Kuaishou.

    Over 15,000 developers worldwide have applied Kling API or application programming interface in various industrial scenarios, generating about 12 million images and over 40 million videos. Image-generated videos account for about 85 percent of Kling AI's video creation.

    Wang Peng, an associate researcher at the Beijing Academy of Social Sciences, said the multimodal capabilities enable AI models to understand and process complex information more comprehensively, with wide application prospects in fields such as finance, intelligent customer service and healthcare.

    Pan Helin, a member of the Expert Committee for Information and Communication Economy, which operates under the Ministry of Industry and Information Technology, said, "The training of multimodal AI models necessitates higher requirements for computing capacity, algorithms and high-quality data, and more efforts are required to bolster the efficient circulation of data elements, and expand application scenarios."

    Pan emphasized that Chinese tech companies should improve independent innovation abilities in computing power chips and programming software, and invest more in basic scientific research, including mathematics, statistics and computer science to catch up with leading foreign counterparts amid intensifying international competition.

    Top
    BACK TO THE TOP
    English
    Copyright 1995 - . All rights reserved. The content (including but not limited to text, photo, multimedia information, etc) published in this site belongs to China Daily Information Co (CDIC). Without written authorization from CDIC, such content shall not be republished or used in any form. Note: Browsers with 1024*768 or higher resolution are suggested for this site.
    License for publishing multimedia online 0108263

    Registration Number: 130349
    FOLLOW US
    CLOSE
     
    久久亚洲中文字幕精品一区四| 最近中文字幕免费2019| 91中文在线观看| AV无码免费永久在线观看| 日韩av无码免费播放| 中文字幕精品一区二区精品| 精品久久久久久无码专区 | 精品无码国产污污污免费网站| 天堂中文字幕在线| 国产AV巨作情欲放纵无码| 亚洲日韩精品无码一区二区三区| 7777久久亚洲中文字幕| 亚洲一级特黄大片无码毛片| 99久久国产热无码精品免费久久久久 | 亚洲熟妇无码八AV在线播放| 国产一区三区二区中文在线 | 最近2018中文字幕在线高清下载 | 国产成人亚洲综合无码| 最近2019年中文字幕6| 少女视频在线观看完整版中文| 人妻少妇看A偷人无码精品视频| 一区 二区 三区 中文字幕| 在线看中文福利影院| 超清无码一区二区三区 | 中文字幕一区日韩在线视频| 2022中文字幕在线| 亚洲日本va中文字幕久久| 中文字幕在线无码一区| 草草久久久无码国产专区| 国产激情无码一区二区app| 无码精品人妻一区二区三区人妻斩 | 国产精品无码无卡无需播放器| 久久精品国产亚洲AV无码偷窥| 无码一区二区三区| 少妇无码一区二区三区免费| 人妻无码视频一区二区三区| 精品久久无码中文字幕| 久久精品无码一区二区app| 久久精品无码免费不卡| 中文字幕av无码一区二区三区电影 | 中文字幕无码精品亚洲资源网久久|