Beijing-based Shengshu Technology announced a significant upgrade to its AI-powered video generation tool, Vidu, on Wednesday. The new feature enables Vidu to create dynamic videos by combining multiple images, offering users the ability to generate video sequences by merging three distinct pictures, such as a shirt, person, and moped, into a single video of the person wearing the shirt and riding the moped.
Launched in April, Vidu initially gained attention for its ability to create short 8-second video clips from text prompts. The new feature adds a layer of complexity, claiming to achieve greater visual consistency by integrating different images into fluid, AI-generated videos. This development positions Vidu as a potential competitor to OpenAI’s Sora, which was revealed in February as an AI model capable of generating one-minute videos from text. However, OpenAI has yet to release Sora publicly.
Fan Bao, Shengshu’s Chief Technology Officer, emphasized that addressing visual consistency was a key challenge for the company. “Very early on we pinpointed [visual consistency] as the problem, and wanted to solve it well,” Bao said.
Vidu’s innovative capabilities have already garnered attention, with the platform’s ability to turn two profile photos into lifelike videos of people hugging going viral on TikTok. The tool has also started generating revenue, with monthly usage rates ranging from 100,000 yuan to 1 million yuan ($13,871 to $138,711), primarily from advertisers, animators, and other businesses.
To mitigate potential copyright concerns, Shengshu is exploring partnerships with artists to allow their styles to be mimicked in AI-generated ads. The company also ensures that Vidu adheres to global data protection standards, removing personal data from users in compliance with privacy regulations.
Founded in 2023, Shengshu has attracted investment from Baidu Ventures, Ant Group, and other prominent backers. Vidu’s AI technology runs on rented cloud servers both domestically and internationally.