laitimes

Zhihu CTO Li Dahai: Multimodal Exploration of Intelligent Communities under the Trend of Video

The 2021 WAIC World Artificial Intelligence Conference was recently held in Shanghai. At the waic·ai developer forum held on July 10th, Li Dahai, partner and cto of Zhihu, delivered a keynote speech, sharing the exploration and application practice of Zhihu as an intelligent community in the field of multimodality under the trend of video.

Zhihu CTO Li Dahai: Multimodal Exploration of Intelligent Communities under the Trend of Video

As a Q&A community, Zhihu has gone through ten years of development, and its business growth has gone through four stages of development, from the initial closed operation to openness, and constantly expanding user scenarios and user scale. Li Dahai said that AI technology has been widely used in every core link of Zhihu to build a smart community and improve community efficiency. As more and more users share their knowledge, experience and insights through video on Zhihu, Zhihu also realizes that videos and graphics have their own advantages and disadvantages and applicable scenarios, and the community needs to upgrade through media to make videos become as important as graphics and texts. Therefore, Zhihu has determined the video intelligent technology strategy with multimodality as the core.

According to Li Dahai, Zhihu has built a graphic multi-modal pre-training model using a two-stream comparative learning framework. The model has been widely used in many scenarios such as zhihu video production, search distribution, topic matching and sorting.

Zhihu CTO Li Dahai: Multimodal Exploration of Intelligent Communities under the Trend of Video

In October last year, Zhihu released a one-click video creation tool for graphics and text, known internally as the "ppt video creation tool", which can be used by graphic creators to quickly generate their text answers or articles into a video. The main idea of the conversion process is to find the corresponding picture, GIF or short video through the model of each paragraph or sentence in the article, and then calculate the correlation between each paragraph of text and the pictures in the material library through the pre-training model. In addition, there are other applications where creators can actively enter keywords, find the images with the highest matching keywords in the library, and let it actively build its own video stream.

Li Dahai said that the integration of video in the community can make Zhihu better realize its mission: "Let people better share their knowledge, experience, insights, and find their own answers." In the future, based on the accumulated massive graphic and video data, Zhihu will strive to build a large-scale pre-training model that integrates graphics, video, audio and other media, and fully open up the results for more developers in the academic community and industry.

Read on