USTC Open Source | A 3D Scene Generation Method Based on 3D Gaussian and Formation Mode Sampled Text

作者：Haoran Li | 编辑：3DCV

Add WeChat: cv3d008, note: direction + unit + nickname, pull you into the group. At the end of the article, industry subdivisions are attached

1. Effect display

DreamScene uses 3D Gaussian to generate high-quality, consistent, and editable 3D scenes.

USTC Open Source | A 3D Scene Generation Method Based on 3D Gaussian and Formation Mode Sampled Text

This is mainly due to the fact that the Formation Pattern Sampling (FPS) method in DreamScene can generate high-quality 3D objects.

2. Thesis information

标题：DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

作者:Haoran Li, et al.

机构：University of Science and Technology of China、HKUST、The Hong Kong Polytechnic University

Thesis: https://arxiv.org/abs/2404.03575

Code: https://github.com/DreamScene-Project/DreamScene

Homepage: https://dreamscene-project.github.io/

3. Summary

Text-to-3D scene generation has great potential in games, film, and architecture, but existing methods still struggle to maintain high quality, consistency, and editorial flexibility. In this paper, we propose DreamScene, a novel text-to-3D scene generation framework based on 3D Gaussian, which solves the above three challenges through two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-time-step sampling strategy guided by 3D object formation patterns, which can quickly form semantically rich and high-quality representations. FPS utilizes 3D Gaussian filtering to optimize stability and reconstructions to generate believable textures. Second, DreamScene employs a progressive three-stage camera sampling strategy designed for both indoor and outdoor scenes, effectively ensuring the integration of objects with the environment and scene-wide 3D consistency. Finally, DreamScene enhances the flexibility of scene editing, making targeted adjustments possible by integrating objects and environments.

4. Arithmetic analysis

Dreamscene is mainly composed of two parts: FPS method and camera sampling strategy, among which the FPS method includes multi-time step sampling, 3D Gaussian filtering, and 3D reconstruction optimization generation.

The specific algorithm process is as follows: firstly, the semantics of the object and the environment in the scene are segmented according to the prompt, for a single object in the scene, the corresponding initialized point cloud is obtained by using Point-E, and then the camera pose is randomly selected for rendering, and the multi-time step sampling strategy is used to guide the optimization of 3D content, which not only ensures the shape constraint of the 3D content in the optimization process, but also enriches the semantic information. However, too much 3D Gaussian can hinder the optimization process, so 3D Gaussian filtering implements the filtering out of redundant 3D Gaussian distributions during optimization. In the later stages of optimization, Dreamscene uses 3D reconstruction to accelerate the generation of reasonable surface textures for 3D content due to the high consistency of the generated 3D content.

For the scene's environment, Dreamscene optimizes environment generation using a progressive, three-stage camera sampling strategy. First, the environment is initialized (the indoor environment is initialized as a square point cloud, and the outdoor environment is initialized as a hemispherical point cloud), and then the optimized objects are combined with the environment. In the first stage of camera sampling, the method samples the camera pose within a certain range of the center of the scene to generate a rough representation of the surrounding environment (indoor walls, outdoor distant environment), in the second stage, the rough ground is generated by sampling the camera poses of some specific areas, and the contact parts of the ground with the surrounding environment are as coherent as possible, and in the third stage, the method uses all the camera poses in the first two stages to optimize all the environmental elements, and then uses 3D The method of reconstruction for more reasonable textures and details.

5. Experiments

Dreamscene uses GPT-4 as the LLM for scene prompt decomposition, Point-E to generate sparse point clouds for initial representation of objects, and Stable Diffusion 2.1 as a 2D text-to-image model. The maximum number of iterations for an object and environment is set at 1,500 and 2,000 rounds, respectively. The initial interval value m, starting at 4, decreases by 4 per 1 round.

Quality

Comparing DreamScene with existing SOTA methods in both indoor and outdoor scenes, it can be seen that Text2Room and Text2NeRF can only produce satisfactory results if they are generated in the right camera pose. Compared to text-to-3D methods for generating individual objects, Dreamscene's FPS method can also generate high-quality 3D representations in a short period of time following text prompts.

Consistency

The Dreamscene generation results ensure good 3D consistency while maintaining high generation quality.

Scene Editing

DreamScene can add or remove objects or redesign their position in the scene by adjusting the value of the object's affine component. When making these edits, the camera pose needs to be resampled at the original and new positions of the object, re-optimizing the ground and surrounding orientation. In addition, Dreamscene can change the style of the environment or objects in the scene by changing the text prompts.

Ablation

Results optimized for 30 minutes under the prompt of "A DSLR photo of Iron Man". As shown in the figure, multi-time step sampling (MTS) results in better geometry and texture compared to fractional distillation sampling (SDS) mentioned in DreamTime and DreamFusion. FPS (Formation Pattern Sampling) is built on top of MTS, using a refactoring approach to create smoother and more believable textures, demonstrating the superiority of FPS.

The figure below compares the results of the reconstruction and generation tasks before and after compression using the Gaussian filter algorithm. It can be seen that in the reconstruction task, the compression rate of Dreamscene reached 73.9%, and the overall image was slightly blurry, and some details were lost. However, in Dreamscene's generation task, the compression rate was 66.1% with no significant loss of quality.

Quantitative Results

The generation time of the Dreamscene compute environment generation phase. The left side of the table shows the environment with editing features that takes the shortest time to generate, and the right side shows the user survey (5 out of 5 points, the higher the score, the better), with DreamScene being far ahead of other SOTA methods in terms of consistency and rationality, and the generation quality is also high

6. Summary

Today, I would like to introduce a new text-to-3D scene generation strategy, DreamScene. By employing FPS, camera sampling strategies, and integrating objects and environments, Dreamscene solves the problems of inefficiency, inconsistency, and limited editability in current text-to-3D scene generation methods. A large number of experiments have proved that DreamScene has the potential to be widely used in many fields.

This article is only for academic sharing, if there is any infringement, please contact to delete the article.

3D Vision Boutique Courses:

3DGS, NeRF, Structured Light, Phase Deflection, Robotic Arm Grabbing, Point Cloud Practice, Open3D, Defect Detection, BEV Perception, Occupancy, Transformer, Model Deployment, 3D Object Detection, Depth Estimation, Multi-Sensor Calibration, Planning and Control, UAV Simulation, 3D Vision C++, 3D Vision python, dToF, Camera Calibration, ROS2, Robot Control Planning, LeGo-LAOM, Multi-modal fusion SLAM, LOAM-SLAM, indoor and outdoor SLAM, VINS-Fusion, ORB-SLAM3, MVSNet 3D reconstruction, colmap, linear and surface structured light, and hardware structured light scanners.

3D Visual Learning Circle

3D vision from the beginning to the proficient knowledge planet, the earliest establishment in China, 6000+ members exchange and learn. Including: nearly 20 planetary video courses (worth more than 6000), project docking, 3D vision learning route summary, the latest top meeting papers & codes, the latest modules in the 3D vision industry, 3D vision high-quality source code summary, book recommendations, programming basics & learning tools, practical projects & assignments, job search & recruitment & interview questions, etc. Welcome to 3D Vision: From beginner to proficient knowledge planet, learn and progress together.

3D visual communication group

At present, the workshop has established multiple communities in the direction of 3D vision, including SLAM, industrial 3D vision, autonomous driving, 3D reconstruction, drones, etc., and the subdivisions include:

Industrial 3D vision: camera calibration, stereo matching, 3D point cloud, structured light, robotic arm grasping, defect detection, 6D pose estimation, phase deflection, Halcon, photogrammetry, array camera, photometric stereo vision, etc.

SLAM: visual SLAM, laser SLAM, semantic SLAM, filtering algorithm, multi-sensor fusion, multi-sensor calibration, dynamic SLAM, MOT SLAM, NeRF SLAM, robot navigation, etc.

Autonomous driving: depth estimation, Transformer, millimeter-wave, lidar, visual camera sensors, multi-sensor calibration, multi-sensor fusion, autonomous driving integrated group, etc., 3D object detection, path planning, trajectory prediction, 3D point cloud segmentation, model deployment, lane line detection, Occupancy, target tracking, etc.

3D reconstruction: 3DGS, NeRF, multi-view geometry, OpenMVS, MVSNet, colmap, texture mapping, etc

Unmanned aerial vehicles: quadrotor modeling, unmanned aerial vehicle flight control, etc

In addition to these, there are also exchange groups such as job search, hardware selection, visual product landing, the latest papers, the latest 3D vision products, and 3D vision industry news

Add a small assistant: dddvision, note: research direction + school/company + nickname (such as 3D point cloud + Tsinghua + Little Strawberry), pull you into the group.

USTC Open Source | A 3D Scene Generation Method Based on 3D Gaussian and Formation Mode Sampled Text

Read on

HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations

The number of shareholders of Gauss Bell increased by 19.90%, and the average shareholding was 62,500 yuan

"Negative Man" Wang Yang: I have been in love with Jiang Xin for many years, but I turned my head and chose Gauss, who is "Wangfu"!

Wang Yang: After breaking up with Jiang Xin, he married Gauss, who was 4 years younger, and now he has finally succeeded in his career

Latest open source | Fast and good diffusion model helps complete 3D Gaussian scenes

Wang Yang: Although I have a relationship with Jiang Xin, I will never live up to Gauss, who has been waiting for me for 6 years, for the rest of my life

Ryan Gosling, a man of humor in a red suit, attended the premiere of the film Stunt Maniac in Paris

Actor Wang Yang: It was not popular for 10 years after his debut, but he became popular after marrying Gauss at the age of 40 and married a wife of a prosperous husband

The premiere was 430,000 Hong Kong dollars, and the word-of-mouth of "Stunt Maniac" was released, and Gosling beat Zhang Jiahui to win the runner-up

超越GaussianEditor | GScream：利用3D高斯高效、高质量物体移除

Actor Wang Yang picks up his daughter from school, her 3-year-old daughter is too beautiful, and Gauss is tall and thin and fashionable!

Actor Wang Yang picks up his daughter from school, her 3-year-old daughter is very beautiful, and Gauss is tall, thin and fashionable!

Wang Yang picked up his daughter from school, his wife Gauss was tall and thin, her daughter was so beautiful, and the family had a quiet life

After the "negative man" Wang Yang broke up with Jiang Xin, he turned around and married Gauss, who had been waiting for him for six years

Tama-chan enjoys a happy life in Japan, with a dear husband Yamaguchi and two lovely children, stepsons Nasan and Kina-chan. Today, the family decided to go out again and have fun

The speed of the voyage was truly impressive, and the condition of the wake was also impressive! It was really surprising that the two tugboats behind had to run for their lives just by trying their luck

Taking stock of the death scenes of those funny and grassy characters in the anime world, Gojo Goya Mucha is on the list

International Landscape Festival Creative Scene Award: Live display leads the trend!

On the penultimate day of the May Day holiday, there was no miracle at the box office, and "Spending Money at the End of the Road" won! Although "Peacekeeping Anti-riot Team" has no suspense to lead, it is estimated that the final result will be about 500 million, and this box office is willing

Hua Chenyu's re-created 15-minute "Born to the Sun" for the sunrise concert this time realized the scene of Huahua singing the sun in song, and you can imagine watching the sun with Hua Chenyu at the scene

Fantasy scene color matching丨Illustration

The Three Gorges is on fire again! A photographer took pictures of the scene before it was flooded, and every frame made people cry

Negative man Wang Yang: He has been in love with Jiang Xin for many years, but he turned around and chose Gauss of Wangfu

Angelababy was surprised to see on the street accompanying the little sponge to walk the dog in person, and the warm scene was super loving!

Lock the "scene" and say goodbye to the "unclear requirements"

Customer Experience: A holistic interpretation of "context" and "scenario" in the customer journey

Leap delivered 15,005 units in April, and it did not disappoint! As we all know, the current competition in the new energy market can be said to be very fierce.

What are the new consumption scenarios during this year's "May Day" holiday? These key words tell you the answer

The first public launch scenario of an air-launched ASBM means that the big bats are already in place?

Dry goods sharing|Hotel front office, room service complex scene handling guide

Knowledge Reader丨Applicable scenarios for equity income swaps