The monocular speckle structure calibration is done in one article

Author: memmolo|Editor: Computer Vision Workshop

Add assistant: dddvision, note: 3D object detection, pull you into the group. At the end of the article, industry subdivisions are attached

作者:memmolo(作者已授权)

https://blog.csdn.net/memmolo/article/details/136760798

The author has authorized, please do not reprint without permission

introduce

Monocular speckle structured light is a non-contact optical 3D measurement method based on triangulation, which has the advantages of simple system composition, low cost, high temporal resolution and high accuracy of near measurement. Many 3D cameras based on monocular speckle structured light technology have been launched on the market, such as Microsoft Kinect V1, Intel Realsense R200, and Orbbec Astra series, as well as many customized modules, which are used in many scenarios such as robot obstacle avoidance, face unlocking, face payment, logistics volume measurement, etc.

Due to factors such as system assembly error and lens distortion, there must be deviations between the actual production of monocular speckle structured light camera and the ideal design. In order to ensure the accuracy of depth measurements, it is necessary to calibrate for systematic errors caused by these factors. Although there are many blogs and papers on the Internet that introduce the ranging principle and stereo matching of monocular speckle structured light cameras, there are few content that introduces the calibration of monocular speckle structured light cameras. There are two reasons, one is that the calibration scheme is one of the core secrets of various manufacturers, and the other is that the calibration of monocular speckle structured light has not been well solved in academics, and there is a lack of simple, accurate, flexible and easy-to-use methods like Zhang Zhengyou's camera calibration method. In this paper, we introduce a monocular speckle structure calibration method proposed by the author, which has the characteristics of simple, flexible and easy-to-use model. The code and validation data are uploaded to the https://github.com/zxcn/MSSLSJointCalibration.git. This method is patented, and the author's permission is required for commercial purposes. Now, without further ado, let's get to the point.

Calibrate the model

The monocular speckle structured light camera consists of a speckle projector, a camera, and a computing power chip. A projector is used to project a speckle image into the measurement scene. The speckle pattern has the uniqueness of the global or parallax matching range, which is convenient for subsequent algorithm matching. The camera is used to capture speckle images of the scene. Depending on the depth of the object in the scene, the speckle will be offset to different degrees, and the magnitude of the offset can be calculated by three-dimensional matching of the speckle image of the scene with the reference speckle image of a certain distance plane recorded, and then converted to depth according to the system parameters. The computing power chip is used to run stereo matching algorithms and filtering algorithms, and is responsible for outputting measurement results such as parallax, depth, and point clouds. Here I recommend to you a series of tutorials on the reconstruction method of highly reflective objects based on surface structured light (phase deflection) jointly launched by us and teachers from Tsinghua University

The model used in this article to represent the monocular speckle structured light system is very simple. The camera is described using the "pinhole + distortion" camera model commonly used in computer vision, with specific parameters such as focal length, principal point, and camera distortion described by the Brown model, including radial distortion parameters and tangential distortion parameters. These parameters are collectively referred to as camera parameters. For the 3D point in the camera coordinate system, after perspective projection, distortion mapping, and pixel sampling, it becomes a 2D point in the pixel coordinates, and the process is shown as follows

This process is described. Unlike the inverse camera model commonly used to describe DLP projectors for fringe structured light, this paper treats a speckle projector as a point that projects light from the center and describes it with its coordinates in the camera's coordinate system. For the reference plane, it is described by the plane equation in the camera coordinate system. After the introduction of the plane equation, the reference speckle image can be collected by using the reference plane with unknown distance and inclination, which eliminates the need to adjust the vertical process and greatly reduces the difficulty of calibration.

The monocular speckle structure calibration is done in one article

An ideal monocular speckle structured light system is shown in Figure A. The projector is located on the camera axis, and the reference plane is perpendicular to the camera axis. The system records the reference speckle image on the plane in advance, when the actual scene is measured, the speckle position changes, and the change in the position of the speckle in the image is found through stereo matching, that is, the parallax, and then the object depth can be calculated by the following formula

In practice, the above ideal assumptions are no longer valid due to factors such as system assembly errors, lens distortion, reference plane tilt, and distance inaccuracy, as shown in Figure B, monocular speckle structured light systems usually deviate from the ideal state. Although many calibration methods have been proposed, these methods have the following shortcomings:

The reference plane is required to be perpendicular to the speckle, which requires fine adjustment.
It is necessary to add marker points on the reference plane to assist in the calculation of the reference plane pose, which destroys the integrity of the reference speckle image.
It is necessary to know the simulation design information of the projected speckle pattern, but there is inevitably a deviation between the simulated speckle image and the real speckle image, which will lead to additional matching errors, and the simulated speckle pattern is difficult for individual developers to obtain.

The following section describes how the method proposed by the author can flexibly implement the calibration and correct these errors to restore the ideal state as shown in Figure C.

Data acquisition and processing

The devices required to achieve calibration mainly include round point calibration plates and flat objects. The circle of the dot calibration plate needs to have a higher reflectivity than the background, which is conducive to high-precision speckle matching. Since the dots should contain sufficient speckle information for matching, the size of the dot calibration plate should be large, and the size of the entire calibration plate used in this paper is 90x70cm. The low-cost realization is to find a flat white wall, collect the reference speckle pattern, and then print the dot calibration plate pattern (about 20 yuan) on A0 paper in the print shop, and paste it on the white wall as a calibration board. If your speckle structured light camera works in the infrared band, you will also need a halogen flood light to provide uniform illumination in the infrared band (about 100 yuan with a tripod). The data collection process is as follows:

The dot calibration plate is placed in several different postures, and the image of the calibration plate under uniform illumination in these attitudes and the image of the calibration plate under speckle illumination when the projector is turned on is collected.
Place a flat object in front of the camera and collect a reference speckle image.

This completes the acquisition of calibration data. Figure A is an image of the flood calibration plate for one of the poses, Figure B is the speckle calibration plate image for the corresponding pose, and Figure c is the reference speckle image.

The acquired images are processed as follows:

As shown in Figure A above, the dot extraction algorithm is used to detect the image of the flood calibration plate and obtain the center coordinates of the dot.
As shown in Figure B above, the nearby speckle areas are extracted from the speckle calibration plate image corresponding to the attitude.
As shown in Figure c above, the resulting points with the same name are matched using a digital image correlation (DIC) algorithm in the reference speckle image.

In order to ensure the robustness of DIC matching to perspective distortion and lens distortion, it is recommended to use the second-order DIC shape function. For the implementation of data processing, please refer to run_circlegrid_detect_and_dic_match.m in the GitHub repository.

Camera calibration

The camera is indexed using a flood calibration plate image. The coordinates of the dots on the calibration plate are, and the coordinates of the dots in the camera coordinate system are related to the attitude of the calibration plates, which is denoted as

where is the axis angle, which is the rotation matrix transformed by the Rodriguez formula. After imaging by the camera, the pixel coordinates are. The dot extraction algorithm was used to process the calibration plate pattern under flood illumination to obtain the pixel coordinates of the dots. By minimizing the following objective functions, the camera principal point, focal length, distortion, and calibration board pose are obtained

Among them, it is the attitude index of the calibration board, the dot index, and the optimization parameters are included. The initial value of optimization can be obtained by the analytical method proposed by Zhang Zhengyou.

Projector and reference plane calibration

The projector and reference plane are calibrated using speckle plate images and reference speckle images. As shown in the figure, the coordinates of the intersection of the ray passing through the dot and the reference plane are represented as

After being imaged by the camera, the pixel coordinates are. After the dot detection algorithm is used to obtain the dot coordinates, the DIC algorithm is used to match the speckle pattern at the dots on the reference speckle map to obtain the coordinates of the speckle spots with the same name on the reference map. Bringing in the known, the reference plane equation and the projector center coordinates are obtained by minimizing the following objective function

Among them, the optimization parameters include: The theoretical or roughly measured baseline length and the reference plane distance can be used as the initial values of the sum, respectively, and the initial values of the remaining parameters are set to zero.

Joint calibration

Finally, the parameters of camera, projector and reference plane were jointly optimized to obtain the optimal calibration parameters. The objective function is as follows

The optimization parameter contains all the parameters in and . The author finds that the objective function can converge well in the absence of an exact initial value, and in practice, the optimization process of the projector and reference plane parameters can be skipped, and the same initial value can be used to directly carry out joint optimization. For the implementation of joint calibration, please refer to run_joint_optimization.m in the GitHub repository.

Speckle image correction

After the calibration parameters are obtained, the reference speckle image and the scene speckle image need to be corrected to make the monocular speckle structured light camera ideal. In this state, speckle stereo matching only needs to be performed in the row direction, and the parallax can be calculated directly by applying BlockMatch and SGM algorithms. The correction process is as follows: firstly, a virtual polar line correction camera coordinate system is established, the origin of which coincides with the origin of the original camera coordinate system, and the rotation matrix in the original camera coordinate system is obtained by the Fusiello polar line correction method, which is expressed as

Given the focal length and principal point of the virtual camera, the scene speckle image can be corrected with a general dedistortion and polar line correction algorithm. For reference speckle image correction, a distance perpendicular to the virtual camera axis virtual reference plane is given. Let represent the pixel coordinates of the virtual camera, then the coordinates of the points on the virtual reference plane that it observes are in the original camera coordinate system

The light rays projected from the center of the projector are met on the reference plane and then imaged as pixel coordinates. The acquired reference speckle image is resampled as an interpolation index to obtain a virtual reference speckle image. Note that given that it is best to approach to preserve more coincident FOVs, and when the inclination angle of the reference plane is too large, the interpolation process should consider adding a correction of illuminance. For the implementation of scene speckle image and reference speckle image correction, please refer to CorrectSceneImage.m and CorrectReferenceImage.m in the GitHub repository.

Experimental validation

Simulation and real experiments are used to verify the proposed calibration method. The code and data have been uploaded to the https://github.com/zxcn/MSSLSJointCalibration.git.

Simulation experiments verify

The simulation data is located in the images/sim folder of the GitHub repository. The matching results of the dot detection and DIC speckle in the simulation data are as follows:

Simulation values:

Calibration Results:

Verified by real experiments

The experiment was carried out with a consumer-grade monocular speckle structured light camera, and the real experimental data was in the images/real folder of the GitHub repository. First, run run_circlegrid_detect_and_dic_match.m for dot detection and DIC matching, and then run run_joint_optimization.m for calibration.

The comparison between the design parameters and the calibration results is shown in the figure below. The calibration results are close to the design values, which effectively characterizes the systematic deviation.

Finally, run_depth_and_pointcloud_calc.m was run to correct the reference speckle image and the scene speckle image, calculate the disparity, and convert the depth and point cloud. The figure below shows the polar line correction effect and parallax calculation results of the plaster image. By magnifying the area, you can see that the polar line correction has a high degree of accuracy, and the resulting parallax map is also very complete.

Comparing the point cloud output with the camera output in the figure below, the point cloud output of the camera on the left is the point cloud, and the point cloud on the right is calculated using the above calibration results. Although the different stereo matching algorithms lead to different points cloud delicacy, the overall effect is similar, which shows the effectiveness and correctness of the calibration method.

prospect

Scaling to a reference plane of multiple distances further improves the accuracy of the indexing, and at the same time, a larger depth FOV is obtained through the fusion of multi-distance reference speckle images.
It is used for monocular fringe structure calibration, and the advantage is that there is no need to predict the phase of the fringe projection, which is suitable for fringe projection devices that do not have a pixelated structure.

This article is only for academic sharing, if there is any infringement, please contact to delete the article.

Computer Vision Workshop Exchange Group

At present, we have established multiple communities in the direction of 3D vision, including 2D computer vision, large models, industrial 3D vision, SLAM, autonomous driving, 3D reconstruction, drones, etc., and the subdivisions include:

2D Computer Vision: Image Classification/Segmentation, Target/Detection, Medical Imaging, GAN, OCR, 2D Defect Detection, Remote Sensing Mapping, Super-Resolution, Face Detection, Behavior Recognition, Model Quantification Pruning, Transfer Learning, Human Pose Estimation, etc

Large models: NLP, CV, ASR, generative adversarial models, reinforcement learning models, dialogue models, etc

Industrial 3D vision: camera calibration, stereo matching, 3D point cloud, structured light, robotic arm grasping, defect detection, 6D pose estimation, phase deflection, Halcon, photogrammetry, array camera, photometric stereo vision, etc.

SLAM: visual SLAM, laser SLAM, semantic SLAM, filtering algorithm, multi-sensor fusion, multi-sensor calibration, dynamic SLAM, MOT SLAM, NeRF SLAM, robot navigation, etc.

Autonomous driving: depth estimation, Transformer, millimeter-wave, lidar, visual camera sensor, multi-sensor calibration, multi-sensor fusion, autonomous driving integrated group, etc., 3D object detection, path planning, trajectory prediction, 3D point cloud segmentation, model deployment, lane line detection, BEV perception, Occupancy, target tracking, end-to-end autonomous driving, etc.

3D reconstruction: 3DGS, NeRF, multi-view geometry, OpenMVS, MVSNet, colmap, texture mapping, etc

Unmanned aerial vehicles: quadrotor modeling, unmanned aerial vehicle flight control, etc

In addition to these, there are also exchange groups such as job search, hardware selection, visual product landing, the latest papers, the latest 3D vision products, and 3D vision industry news

Add a small assistant: dddvision, note: research direction + school/company + nickname (such as 3D point cloud + Tsinghua + Little Strawberry), pull you into the group.

3D Visual Learning Knowledge Planet

3DGS, NeRF, Structured Light, Phase Deflection, Robotic Arm Grabbing, Point Cloud Practice, Open3D, Defect Detection, BEV Perception, Occupancy, Transformer, Model Deployment, 3D Object Detection, Depth Estimation, Multi-Sensor Calibration, Planning and Control, UAV Simulation, 3D Vision C++, 3D Vision python, dToF, Camera Calibration, ROS2, Robot Control Planning, LeGo-LAOM, Multimodal fusion SLAM, LOAM-SLAM, indoor and outdoor SLAM, VINS-Fusion, ORB-SLAM3, MVSNet 3D reconstruction, colmap, linear and surface structured light, hardware structured light scanners, drones, etc.