
Hu Xuan, Senior Researcher of Tencent Research Institute
Wang Huanchao is a researcher at Tencent Research Institute
At the end of 2021, Microsoft Research Asia released an immersive 3D video communication system, VirtualCube, in the form of a paper, and won the Best Journal Paper Award in the field of virtual reality and graphics IEEE VR 2022 (& IEEE TVCG).
From a technical point of view, VirtualCube built a real-time 3D image of each user, using six cameras for obtaining the user's 3D model, and a surround display device consisting of three 65-inch 4K screens, located in a compartment. At the same time, the software system and special algorithms are used to ensure visual restoration, including the shape of the participants, the relative position relationship, mutual gaze, and ensure the stable frame rate of the video stream. The R&D team is also exploring the future of adding more dimensions such as spatial audio support and complex gesture processing.
In simple terms, the role of VirtualCube is to restore the "paper man" (a video composed of 24 frames per second of pictures) in the traditional video conference into a real "big living person": even if the other party stays still, you can turn your head and see the side of the ta's face.
In traditional video conferencing, the classic conundrum is the lack of eye contact among attendees: if the speaker looks at the camera, they feel like they're talking to a screen rather than a human; once they look at a face on the screen, they can't "look" at each other and look absent-minded. VirtualCube creatively solves this problem: both sides of the video do not have to look at the camera, and there will be natural eye contact.
According to the official introduction, the core goal of the VirtualCube system is to achieve a sense of presence that "participants are in the same room", so that participants in different time and space environments can be more immersed and relaxed, and more focused on communication itself. Before it, Google's Starline and Facebook's Horizon Workrooms were pioneers in addressing this need.
In the context of the COVID-19 pandemic, telecommuting has become the choice of many companies, and video conferencing has become a standard for telecommuting as a tool that balances experience and communication efficiency. However, compared with real offline communication, video conferencing is still not natural enough, and the existence of the above-mentioned lack of eye contact and other issues makes it impossible for participants to immerse themselves. From this perspective, it's easy to understand the implications of technical efforts like VirtualCube.
For telecommuters, VirtualCube moves your colleagues and bosses in front of you in a more vivid way than video conferencing, resulting in a sense of presence comparable to offline.
But there are also many views that the so-called "sense of presence" may be a pseudo-demand, for remote work, connection is the first goal, video and even voice calls are enough, the pursuit of presence is obviously not necessary. The question is, is the "sense of presence" really redundant for remote work and even broader work models? What is the point of pursuing a "sense of presence" through technology?
Next, we will discuss this issue.
Compared to offline,
What's wrong with telecommuting?
Thanks to the existence of various digital technologies and tools, telecommuting is no longer a rarity. Through voice calls and instant messaging software, we can keep in touch with colleagues; through online meeting tools, we can participate in large and small meetings or seminars; through online collaboration tools, we can synchronize work progress with the team and complete long chain tasks in a process-
So, in the wake of the COVID-19 outbreak, telecommuting quickly became the mode of work of choice for many companies. According to Ai Media Consulting data, during the resumption of work in 2020, more than 18 million enterprises in China have adopted the online remote work model, and more than 300 million users have used remote office applications. A survey by Tencent Research Institute T-ask also found that nearly 70% (69.8%) of the respondents had experienced telecommuting. Not long ago, Ctrip officially announced the regulations on hybrid office, and companies such as Microsoft, Google, and Meta have also launched relevant measures around the world.
It can be said that telecommuting is becoming the norm accepted by most people. However, with the deepening of the normalization process, some drawbacks or disadvantages of remoteness are gradually becoming prominent.
Pulling the timeline back to the beginning of the century, when there was a wave of telecommuting in the tech world. IBM, for example, claimed in a 2009 report that 40 percent of the world's 386,000 employees work from home, and in a decade, the company benefited $1.9 billion from selling the saved office space. But the wave quickly receded, because these pioneers found that remote work had many problems in terms of employee communication, work efficiency, and the formation of corporate culture, so it was still not a substitute for offline office.
One of the most representative is Yahoo. In 2013, Yahoo enacted a rule explicitly prohibiting employees from working remotely, but instead must work in the nearest office area or be fired. In 2017, IBM also recalled employees to work in the office field. Jackie Reses, Yahoo's global human resources director at the time, wrote in an internal memo: "As a Yahoo person, it is not only necessary to do a good job in daily work, but more importantly, to interact and experience. Interaction and experience can only be done in the office. ”
Yes, interaction and experience are the key clues in contrasting remote versus field office. Telecommuting, while somehow enabling "interaction" and "experience," is not complete.
In the field office, workers can directly communicate face-to-face, in this mode of communication, in addition to oral communication, both sides can see each other's eyes, expressions, gestures, movements and other elements outside the language. At the same time, the environment in which the communication takes place is also very important: smell, light, site layout, the position of both parties, and even each other's pores, splashing spit. Their presence gives each communication a profound uniqueness.
The sum of these factors together constitutes the "context" of communication, which is very important for conveying meaning and feelings, so that both sides of the exchange can establish better understanding and cooperation. Defined by a unified word, these things are "presence," which encompasses what you see, hear, hear, think, and feel in the context of communication, as well as the combined sensations formed by all these observations and feelings. In contrast, remote office can use various tools to achieve the transmission of voice and video pictures, and to some extent simulate reality, but what it lacks is this "sense of presence".
Walter Benjamin once proposed the concept of "Aura" to describe the physical nature of the artwork and the performance of the offline theater. When the era of mechanical reproduction came, through large-scale reproduction (Benjamin mainly referred to photography), the spread of artworks and performances was widespread, but the "spiritual rhyme" disappeared. What remote office does not have is the "spirit charm" of offline office.
The disappearance of the spirit rhyme is a trigger point. Telecommuting undoubtedly increases the cost of communication compared to face-to-face on-site office. The things that can be explained in a few words offline need to go through many times of inefficient online communication to achieve the goal. At the same time, face-to-face interaction dictates that the same thing should be done at the same time and in space, thus facilitating faster decision-making. Telecommuting is not the case, and you won't know what your colleagues on the other end of the line are doing: cats, dolls, or rowing machines. Even if you're in the same video conference, there's no guarantee that your colleagues have other windows open.
The second level is the working relationship. In the workplace, employees are able to understand the company's systems, norms and culture by observing the performance and behavior of their colleagues, which is a learning process. At the same time, the real interaction between employees helps to establish a good working relationship, which not only enhances team cohesion and company culture, but also the close and harmonious working relationship itself is part of work efficiency and creativity.
Similar interactions are especially important for new employees. If these links are missing, it is likely to face difficulties in integration by directly entering the remote work model. A Wall Street Journal report wrote of a young man who had been working for a year and a half before he had the opportunity to meet with other colleagues: "In a meeting, people turned off their cameras, and I didn't even know what they looked like. "Obviously, this does not help to establish a normal relationship between workers and further affect the work."
In Work, Consumerism, and the New Poor, Baumann argues that workplaces carry the primary social integration function. In modern society, adults spend most of their time at work, which means that interpersonal relationships at work are not only purely work-level, but also a necessity for us to connect with others and participate in socialization.
This is perhaps the more far-reaching significance of the work system, and the most serious sequelae of the lack of presence in remote work: long hours of work and task orientation, lack of interpersonal interaction, not only promote bad emotions such as anxiety and depression, but also seriously affect the normal socialization process of adults.
These technologies are in
Strive for "digital presence"
It is precisely in consideration of the series of negative effects of remote work that many supporters of remote work have "anti-water". In February 2021, Google noted in its annual report to regulators that working from home affected the company's productivity, competitiveness and corporate culture, and said that more employees will return to offline work.
Returning to the traditional model is certainly an option, but considering that telecommuting may still be one of the most important modes of work in the future, even if it only focuses on the near-term recurrence of the epidemic, it is also a pragmatic response, so it is not an option to make up for the lack of remote work in terms of "presence".
Imagine if you can project the 3D image of your colleagues to your side through VirtualCube, even if you are working from home, your colleagues will sit next to you or face to face with you, and you can talk or meet at any time (the space audio is perfectly restored), in fact, it has largely solved the problem of insufficient "sense of presence".
In fact, before VirtualCube, Google's video call Starline project, which was released in May 2021, has achieved a stunning "airborne transmission", and the images of family members thousands of miles away are so clear and three-dimensional that they seem to be within reach. Demonstrating the powerful addition of 3D holograms to "presence", it also initially overcomes the biggest challenge: restoring 3D content on 2D "flat panel" display devices such as mobile phones, TVs, and VR.
The lady opposite is a live image, neither a real person nor an ordinary video
VirtualCube and Starline work similarly, but with different technology paths. Let's look at how holograms evolve step by step from the perspective of visual principles and technical contexts, and how far we are from the "true holograms" in science fiction movies.
Holographic projection is one of Iron Man's many black technologies
3D vision is the magic weapon of human survival, and multiple clues help the brain form a sense of space. When hunting for survival, ancestors relied on their eyes to judge the distance, size, and form of their prey; to this day, vision is still the most important source of information for various senses, maintaining a high sensitivity to space, light and shadow, movement, etc.
The first type is the plane clue, including the near and far small, the occlusion relationship, the light and shadow texture, etc., which is also the basis of the three-dimensional sense in painting and photography. With the help of life experience, we regard the image as a projection of a three-dimensional object in a two-dimensional plane, and "brain supplement" its original appearance.
Left: A cube, not a parallelogram with three adjacents
Right: An example of the application of shadows, occlusions, near and far small in painting
When the picture moves, the stereoscopic effect doubles: for example, the example of "adding two white bars to become 3D". The white bar obscures part of the original image, cutting the video into three parts: foreground (Quidditch), mid-shot (white bar) and background (Harry Potter), and the blur effect further widens the psychological distance of the three scenes.
Watch out! Quidditch is going to fly to the tip of your nose
The second category is depth cues, which are the key to breaking through the "sense of presence". That is, more information on the XY axis outside the plane, on the Z axis, including: binocular parallax, mobile parallax, and focus blur. Typical applications of binocular parallax are 3D film and VR glasses, where the picture received by the left and right eyes is slightly different and reprocessed by the brain into a three-dimensional picture.
Mobile parallax is more important, in reality, "horizontal view into the mountain side into a peak", facing the mobile phone picture can not have this effect. After all, the pixel arrangement of each frame in the display is constant, whether it is shaking your head or adjusting the focus, the image will not have any difference, so the three-dimensional sense is also incomplete.
To achieve mobile parallax, it is necessary to ensure that the viewer sees different content in different positions and angles, and there are roughly two technical paths: the first is to make a fuss about the display itself, and the second is to track the viewer and send the correct picture.
Borrow a schematic of the effects of lookingglass
The typical representative of route one is the new light field display produced by LookingGlass, BOE and others, and this device is also used in Google Starline. The principle is to superimpose a layer of columnar lenses on the display layer to make different points of view of light enter the eye differently. The appearance is not much different from ordinary monitors, just very thick, like a large piece of glass. This in itself is not a black technology, you must have seen it when you were a child.
Around 2010, there was a wave of naked-eye 3D TV, Philips took the lead in the 2010 Berlin Electronics Show, Toshiba, Sony also had a layout; the biggest bottleneck is the chip computing power, to increase how many micro-viewing angles, to render how many times the picture at the same time, soon because of the poor experience and was forgotten.
After Philips' patents expired, a number of manufacturers picked up the tree again. With strong graphics support, LookingGlass launched its first device in 2018, and in early 2021 released a smaller consumer-grade product portrait, which can render 45 viewing angles at the same time, with a larger viewing range, and can be used with a variety of external devices to achieve advanced interaction with holographic content, including VR controllers, sensors, haptic feedback systems, etc.
LookingGlass is illustrated with the sensor linkage effect
A typical example of route two is a device built in a VirtualCube project. The principle is to continuously display the correct image and viewing angle according to the position of the user's eyes, which is more economical than Lookingglass and the like; the disadvantage is that it is slightly larger and only supports 1 person to watch.
There are also products that incorporate the above two methods, such as Sony's ELFD 3D display, which received a lot of attention after the 15-inch prototype was exhibited at CES in 2020. ELFD uses three of Sony's patented technologies: high-speed, high-precision, real-time sensing, real-time light field rendering, and high-precision 3D display. Track users for accurate eye detection with minimal delay, and combine with low-light lenses for more three-dimensional results.
Don't think "must" just because you are familiar
The discussion in this article is mainly from the perspective of "remote work", but the "sense of presence" and related technical efforts are obviously not limited to this one scenario. Digital technologies are driving paradigm shifts in large and small areas, and naturally lead to a comparison of the characteristics of traditional paradigms and emerging paradigms. Along this line of thinking, it is probably not so simple to think about the concept of "presence".
Our understanding and cognition of presence is based on the traditional model of face-to-face interpersonal communication, which we consider to be "good" and to be pursued, in large part because presence is a fixed attribute of the old paradigm, so it has rationality and naturally becomes our standard for evaluating and measuring the new paradigm. But with the extension of the digital age, the definition of "sense of presence" will also change, and perhaps the era of "online connection" will also develop its own "light rhyme", just as photography has become a representative of serious art.
So we don't have to struggle too much. Each generation has its own media environment and the background of the times, according to which different values and cognitions will be formed. Nothing is changing, the times are evolving, tools and technologies are advancing, and our perceptions and attitudes towards things will change as a result. The definition of "work" is also drifting, living in the 21st century we understand "work", and the perception of textile factory workers in the 16th century is almost completely different, in the same way, people who have always been in the online office mode, the perception of work, will obviously be very different from our generation.
This means that our pursuit of the traditional paradigm attribute of "presence" is likely to lack the support of rationality. Because for the new generation or the next generation of workers, their perception of work may be "scattered people, through online collaborative tools, communication and collaboration, modular, process-oriented completion of tasks", if remote connections can provide these conditions, then why have a physical workspace? Why pursue any "sense of presence"?
So, when we discuss the sense of presence, be vigilant, it may not be necessary in the future work model, and avoid feeling necessary because of familiarity. The inherent cognition of traditional things should not limit our imagination of new things. This should be consistent with our attitude toward new things like the metaverse, where everyone is talking about the metacosm, but what it looks like is not based on the current technological form. Instead of conceiving, defining, and framing according to the old thinking, it is better to open the imagination space, set no limits, and wait for it to develop and extend itself.
Note: The address of the Microsoft VirtualCube paper mentioned in the article:
https://www.microsoft.com/en-us/research/project/virtualcube/