Chinese technology company ShengShu-AI and Tsinghua University on Saturday unveiled text-to-video artificial intelligence (AI) model Vidu, which is considered the first in China on par with Sora, in another manifestation of China’s rapid development in the emerging critical field of AI.

Launched at the ongoing Zhongguancun Forum in Beijing, Vidu can generate a 16-second 1080P video clip with one click. It is built on a self-developed visual transformation model architecture called Universal Vision Transformer (U-ViT), integrating two text-to-video AI models from Diffusion and Transformer, the developers said.

The AI ​​text-to-video model emerged just about two months after Sora, developed by US developer OpenAI, was released to great fanfare around the world.

“After launching Sora, we found that it was closely aligned with our technical roadmap, which further motivated us to advance our research with determination,” said Zhu Jun, vice dean of the Institute of Artificial Intelligence at Tsinghua University and scientist -head of ShengShu- AI, said at the forum.

The core technology of U-ViT was first proposed by the Vidu research team in September 2022, before Sora’s DiT – Diversity in Transformation model architecture, which is the world’s first visual transformation model architecture combining the advantages of Difusao and Transformador, according to media reports.

During a live demonstration on Saturday, Vidu can simulate the real physical world and generate scenes with complex details in line with real physical laws, such as reasonable light and shadow effects and delicate facial expressions. It can also generate complex dynamic shots rather than fixed ones.

Furthermore, developed in China, Vidu has a strong understanding of Chinese factors and can generate images of unique Chinese characters such as panda and loong, according to media reports.

Via Global Times.


Leave a Reply

Your email address will not be published. Required fields are marked *