Please click "Follow" before reading the article, so that it is easy to discuss and share, and in order to return your support, I will update the quality content daily.

Vision robots, lack of flexibility, what are the solutions for container reclaim systems

Text/Qingyue Night Talk

Editor/Clear Moon Night Talk

The presence of vision and robotic systems in industry has become common in recent years, but despite many achievements, many industrial tasks remain unsolved due to the lack of flexibility of vision systems when handling highly adaptable manufacturing environments.

In a wide range of modern flexible manufacturing environments, an important task is the need to supply parts from supply containers to automated machinery. In order to grip and manipulate safely and efficiently, we need to know the identity, location and spatial orientation of the unstructured objects located in the container.

Historically, mechanical vibratory feeders have been used to solve container resourcing problems, but visual feedback has been lacking. This solution is problematic in cases where parts are stuck and more importantly, because they are highly specialized.

In this regard, if changes to the manufacturing process are required, changeovers may require extensive retooling and over-the-board revisions of system control strategies. Due to these disadvantages, modern container reclaim systems use visual feedback to perform gripping and manipulation operations.

Vision-based robotic container resourcing has been the subject of research since the introduction of automated vision control processes in industry, and a review of existing systems shows that none of the proposed solutions can solve the universality of this classic vision problem.

One of the main challenges facing this container-reclaiming system is its ability to handle overlapping objects. Object recognition in cluttered scenes is the primary goal of these systems, and earlier methods attempted to perform container-reclaiming operations on similar objects mixed together in unstructured stacking, but did not have any understanding of the attitude or geometry of the parts.

While these assumptions may be acceptable for a limited number of applications, in most practical situations a flexible system must handle multiple types of objects with a wide range of shapes.

A flexible container-reclaiming system must solve three difficult problems: scene interpretation, object recognition, and pose estimation. The initial approach to solving these tasks was based on modeling parts using 2D surface representations. Typical 2D representations include invariant shape descriptors, algebraic curves, conic curves, and appearance-based models.

These systems are generally better suited for flat object recognition and cannot handle severe viewing angle distortion or objects with complex shapes/textures. In addition, the spatial positioning of free-form contoured objects cannot be robustly estimated. To address this limitation, most container-reclaiming systems attempt to use three-dimensional information to identify scene objects and estimate their spatial orientation.

Notable methods include the use of 3D local descriptors, polyhedra, generalized cylinders, hyperellipsoids, and visual learning methods, and Mittrapiyanuruk et al. propose that the most difficult problem faced by 3D container-reclaiming systems based on object structure description is the complex process required to perform scene-to-model feature matching.

This process is often based on sophisticated graph search techniques and becomes increasingly difficult to deal with object occlusion, where the structural description of scene objects is incomplete. Visual learning methods based on feature map analysis have been proposed as an alternative to solving object recognition and pose estimation of objects with complex appearances.

In this regard, Johnson and Hebert developed an object recognition scheme capable of identifying multiple 3D objects in a scene affected by clutter and occlusion. They propose a feature map analysis method applied to matching surface points using spin image representations.

The main advantage of this approach is the use of spin images as local surface descriptors; Therefore, they can be easily identified in real scenes that contain clutter and occlusion. This method returns accurate results, but cannot infer pose estimation because the spin image is a local descriptor and cannot robustly capture the orientation of the object.

Normally, pose sampling for visual learning methods is a difficult problem to solve because the number of perspectives required to sample all 6 degrees of freedom of an object's pose is prohibited. In Edwards' paper, he applied feature map analysis to a single-object scenario, and his method could only estimate the pose if the tilt angle was limited to 30 degrees relative to the sensor's optical axis.

System overview

The range sensor determines the depth structure by capturing two images with different focus settings. Then the image segmentation process is carried out, breaking down the input image into meaningful regions that do not intersect. The resulting scene areas from the image segmentation process are orthogonally projected, aligning them perpendicular to the optical axis of the sensor.

This operation will determine 2 degrees of freedom for each object. The recognition framework consists of matching geometric primitives derived from the split area to primitives in the model database. The object that best meets the matching criteria is then submitted to a pose estimation algorithm that restricts the object to rotate around the optical axis of the range sensor by using a principal component analysis method. Once the attitude of the object is estimated, the gripping coordinates of the identified object are passed to the container-reclaiming robot.

Range sensor

The range sensor used in this application is based on active lens defocusing depth measurement technology. This ranging technique was originally developed by Pentland as a passive ranging strategy. The principle of DFD ranging sensing is based on imaging scene objects relative to their position in space.

In this way, the object placed on the focal plane is clearly imaged on the camera's sensing element, while the point located on the surface of the object is offset from the focal plane and refracted by the lens into a patch whose size is directly related to the distance from the focal plane to the imaged object.

The diameter of the defocusing patch depends on the object distance u, lens aperture D, sensor distance s, and focal length f. While one image cannot resolve the uncertainty of whether a scene object is placed in front of or behind the focal plane, depth can be uniquely estimated by measuring the blur difference between two images captured at different focus settings. Capture a defocused image by changing the sensor distance s.

Since the degree of blur in the image can be regarded as convolution with the low-pass filter, in order to estimate the degree of blur in the image, we need to convolve the image with a focal operator that extracts the high-frequency information obtained from the scene object. However, this method returns accurate results only when the scene objects are highly textured.

This method returns inaccurate depth estimates when dealing with weakly textured and non-textured scene objects. To solve this problem, one solution is to cast structured light on the scene to force artificial textures on all visible surfaces. Although artificial textures have known patterns, focus operators are designed to respond strongly to the dominant frequencies in the image associated with lighting patterns.

The scene segmentation process

When developing robotic systems, an important decision is to decide which perceptual information is more appropriate for a particular application. Henderson suggests using information about the objects that define the scene for scene segmentation. In this regard, if the objects in the scene are highly textured and have significant depth discontinuities, the best results will be obtained when analyzing range data.

Conversely, if the scene is defined by small, untextured objects, you may get better results if you apply a segmentation process on the intensity image. Although our application involves the recognition of a set of untextured polyhedral objects, the researchers developed an edge-based segmentation scheme to identify the visible surface of scene objects.

Edges are associated with sharp transformations in the pixel intensity distribution, extracting edges by calculating the partial derivative of the input data. Edge detection is one of the most studied topics in computer vision, and until now, no edge detector has been able to adapt to problems caused by image noise and low contrast between meaningful areas in the input data.

As a result, the edge structure returned by the edge detector is either incomplete, creates gaps due to low variation in the distribution of input data, or contains spurious edges caused by image noise, shadows, etc. Therefore, after edge detection, additional post-processing is applied to eliminate spurious edge responses and fill gaps in the edge structure.

Methods used to fill gaps in edge structures include morphological methods, Hough transforms, probabilistic relaxation techniques, multiscale edge detection methods, and additional information including color.

From these techniques, the most common are morphology and multi-scale edge connection strategies. In general, morphological edge joining techniques use local information around edge terminals, while multiscale methods attempt to fill gaps in edge structures by aggregating information contained in image stacks with different spatial resolutions.

The main disadvantage of the multiscale approach is the high computational cost required to compute the image stack, and in implementation, scientists have developed a morphological edge junction scheme that evaluates the orientation of edge terminals to identify the best connection decision.

Edge connectivity

To extract the surface of the imaged scene object, we developed a multi-step edge joining scheme for use with an edge detector that uses the ISEF function to extract partial derivatives.

The reason for choosing an edge detector that uses ISEF is that its performance at detecting true edges is comparable to the more common Canny edge detector, but the computational cost of ISEF edge detectors is lower than that associated with Canny edge detectors.

In the implementation, the scientists set the scale parameter to 0.45 and selected the threshold parameters required for the hysteretic threshold by a scheme that minimizes the occurrence of small edge segments usually generated by image noise. As mentioned earlier, the edge structure returned via the ISEF detector will be further post-processed using a multi-step morphological edge joining strategy.

The first step of the edge connection algorithm involves extracting edge terminals. Edge point extraction requires a simple morphological analysis in which the edge structure is convolved with a set of 3×3 masks. The second step of the algorithm determines the direction of the edge terminal by evaluating the link edge point that generates the edge terminal.

Data formatting

The researchers' application implements a vision sensor that provides the gripping robot with the information it needs to perform object manipulation. Since the objects of interest are polyhedra, a convenient representation is to describe them in terms of their surfaces, which are identified by the scene segmentation algorithm detailed in the previous section.

Therefore, the object recognition task can be expressed in terms of matching the visible surface of the object with the surface stored in the model database. Although conceptually simple, this approach is quite difficult in practice due to the fact that the geometric properties of the surface of the object depend on the viewpoint.

To solve this problem, we need to align all visible surfaces produced by the scene segmentation process with a plane perpendicular to the optical axis of the distance sensor. An attempt is made to constrain two degrees of freedom using the 3D information returned by the distance sensor.

The first operation of the data formatting process involves calculating the normals of each surface obtained after the segmentation process for the scenario. Since the surface of the object is planar, the normal vector can be calculated using the functional dependence of the z-coordinate on the x and y coordinates.

Object recognition

As mentioned earlier, the recognition of scene objects is expressed as the recognition of their visible surfaces after applying the scene segmentation process, and the method used to calculate the characteristics of the geometric properties of the surface of the sampled object.

Although the geometric properties of an object's surface depend on its positioning in space, to eliminate viewpoint distortion, the segmented surfaces are submitted to a 3D data formatting process, aligning them with a plane aligned with the optical axis. The next step in the algorithm is to extract the geometric primitives used to perform the scene-to-model recognition process.

Methods that have been used include extracting local features such as intersections, lines, and partial contours, as well as macroscopic features such as area, perimeter, and statistical features. Local features may be more appropriate when dealing with scenes that are affected by confounding and occlusion than macro features.

However, it is worth noting that the local feature-based approach relies on detailed structural descriptions of objects of interest, which generate a large number of hypotheses when processing complex scenes, which requires the development of complex scene-to-model matching procedures.

While our goal is to identify a set of polyhedral objects, macroscopic features represent a better option because segmented surfaces are flat and can be easily indexed to describe the structure of the object. To do this, the researchers used features such as area, perimeter, form factor, and distance from the center of gravity of the surface to the maximum and minimum radii of the surface boundary.

The developed object recognition algorithm consists of two main stages. The training phase consists of building a database for each surface of the object by extracting the above features. Since the features involved have different ranges, to solve this problem, we apply a feature normalization process in which each feature is normalized to zero mean and unit variance. The matching phase consists of calculating the Euclidean distance between the normalized features of the scene surface and the object surface in the model database.

DOF pose estimation

The orthogonal transformation constrains only two degrees of freedom, the rotation around the x and y axes. The surface of this orthogonal transformation is perpendicular to the axis of the distance sensor, and the surface rotation with respect to the z-axis can be estimated using principal component analysis. The process involves computing a feature space representation from a set of training images that are generated by rotating the surface of an object in small increments.

To estimate rotation with respect to the z-axis, all identified scene surfaces are projected onto feature space and their projections are compared to those stored in the model database. The minimum distance between the projection of the input surface and the projection contained in the model database gives the best match.

Vision robots, lack of flexibility, what are the solutions for container reclaim systems

System overview

Range sensor

The scene segmentation process

Edge connectivity

Data formatting

Object recognition

DOF pose estimation

Read on

Waterproofing Manual for Robot Protective Clothing

Champion of domestic industrial robots: sales exceeded 20,000 units, surpassing KUKA to rank among the top two in the Chinese market

When the child's father's salary was 3000, he transferred 2000 to me, and now he still transfers 7000 to me when he pays 8000. Because he always felt that when the money was given to me, I would feel more secure. especially

The three key sensors of humanoid robots are sorted out by leading manufacturers

Medtronic's Mazor X™ Eagle Spine Surgery Robot Integrated Platform was launched in China

Celebrate more than one year and break the news! Uncle Wuzhu's robot identity is exposed, if one shot is the verdict!

AI server + intelligent preparation + robot, take the NVIDIA Express, and the industrial Fulian trillions can be bullied?

ICRA 2024: The popularity of "embodied intelligence" has soared, and "learning" has become the consensus of the robotics industry

Transcend's Digital Intelligence Orthopaedic Orthopedic Initiative launches Medtronic Mazor X™ Integrated Spine Surgery Robotics Platform in China

The humanoid robot concept is on fire! The NEEQ company has welcomed 7 institutional surveys this year

"Mentally retarded" or "intelligent", what kind of chips do robots need?

The 4th Youth Robot Competition in Lishi District: A fierce scientific and technological competition

In-depth understanding of the high elasticity and anti-static properties of robot protective clothing

Explore the challenges of the high elasticity and anti-static properties of robotic protective clothing

Lehends机器人神钩飞爪"钩崩"BLG,GEN 2-0率先拿到赛点

The birth of a Chinese beauty robot is much better than that of Japan, and netizens say that there is no need to worry about the single