Research on efficient matching method of coal gangue recognition image and sorting image

There introduces some related work on the application of image matching in sorting scenarios.

With the widespread application of machine vision in various industrial scenarios, scholars have also made certain progress in the research of image matching methods in sorting systems. Yang et al. proposed an improved edge feature template matching algorithm for the rapid sorting of multi-objective workpieces, which can achieve high real-time performance when combined with a pyramid layering strategy8. Yao et al. proposed a fast part matching method based on HU invariant moment features and improved Harris corner points to address the issues of long matching time and low precision in the process of part image matching. This method improves the speed of image matching, but the Harris corner detection operator performs relatively well only in fixed scale image detection9. Wang et al. proposed an improved ORB image matching algorithm for the fast recognition and sorting requirements of workpieces with complex texture features on the surface10. It replaces the Brief descriptor with the SURF feature sub-description, enhancing the robustness to lighting and image scale changes, and improving the matching accuracy of workpiece images under scale changes.

The above traditional image matching methods based on SIFT11,12, SURF13,14,15, BRIEF16,17, and ORB18,19,20 all rely on manually designed descriptors. Although they have certain effects, considering the on-site environment of the intelligent sorting robot for coal gangue, there are still some shortcomings. For example, for coal gangue images under complex lighting and scale changes, the above methods cannot effectively extract stable feature points or vectors from the images. Moreover, template based matching has relatively low efficiency, as deep learning can extract advanced semantic information from images.

To address above problems, DeTone et al. proposed SuperPoint network in 2018, which is an end-to-end learning approach that inputs images and outputs features of feature points and descriptors21. Among them, SuperPoint is not only accurate but also has a simple structure that can extract key points in real time. However, in the feature-matching stage, traditional methods often rely on minimizing or maximizing specific established metrics, such as squared differences and correlations. The typical Fast Approximation Neighbor Library (FLANN) matching method has poor robustness for matching image pairs lacking texture. SuperGlue: a graph neural network feature matching method provided a new idea for the matching problem22,23,24,25. The SuperGlue network is a mid-level matching network that needs to be combined with a feature extraction network to better leverage its performance.

The SuperPoint network extracts the most robust image feature points, while the SuperGlue network is completely robust to lighting changes, blurring, and scenes lacking texture. We have combined these two networks to achieve a significant improvement in the matching accuracy of complex images. We have also applied this combination to the sorting of coal gangue by robots in complex scenarios.

Proposed method

Motivation

We conducted research on coal gangue image matching under complex conditions by combining SuperPoint and Superglue networks. However, during the research process, based on the actual working conditions of coal gangue sorting, we believe that further optimization can be carried out in the following two aspects.

(1)

Due to the differences in object distance, scale, and rotation angle of sorting images acquired in different regions, it is difficult to detect enough key points for matching. In order to improve the ability to detect feature points, the sorting images are enhanced using the improved MSRCR algorithm.

(2)

The sorting process of coal gangue is complex. The presence of coal slag, water stains, and other factors on the belt can cause serious interference with the matching results. This will result in a decrease in the number of matching points and incorrect matching point pairs. Therefore, by optimizing the matching results through PROSAC, the matching accuracy can be improved.

Therefore, to solve the problem of image matching of coal gangue under complex conditions, we propose a more robust image matching network called IMSSP-Net.

Model construction

The coal gangue sorting robot obtains the recognition result of the target coal gangue from the recognition system as the recognition image. After the coal gangue enters the sorting area, a camera is used to capture the current image as the sorting image. Figure 1 illustrates the overall structure of our network. Firstly, the improved MSRCR algorithm enhances the two images separately. Then, the SuperPoint algorithm is used to extract feature points and their descriptors from the recognition image and sorting image. Next, it is used as input to perform feature matching between images using the SuperGlue algorithm. Finally, the obtained matching points are filtered using PROSAC to obtain the best matching point pair. The precise location of the target coal gangue is obtained by using the minimum bounding rectangle to frame the matching results.

Fig. 1

Overview of the IMSSP-Net method. Figures 1 and 2 respectively represent the original recognition image and the original sorting image. Iretinex is the image enhanced by MSRCR. Equation (1) normalizes Iretinex and outputs Iout1. Equation (2) implements the power-law transformation of Iretinex, and outputs Iretinexγ. Normalize Iretinexγ using Eq. (1) to obtain Iout2. Equation (4) achieves brightness adjustment.

Improved MSRCR algorithm and feature extraction

To improve the ability of network to detect feature points in coal gangue images. Therefore, we use the improved MSRCR for image enhancement and then use the SuperPoint algorithm to detect the features of the image.

Improved MSRCR algorithm

The MSRCR is a commonly used image enhancement technique that enhances the details and contrast of images while preserving edge information. Its principle involves decomposing the image into different scales, followed by local contrast adjustment and enhancement. Finally, the enhanced image is reconstructed.The normalization formula is employed for intensity mapping, normalizing the Retinex components to scale the intensity range of the image.This achieves overall adjustment of image contrast and brightness. The relationship is expressed as follows:

$$I{\text{out}}1 = \frac{{I_{{{\text{retinex}}}} – \min (I_{{{\text{retinex}}}} )}}{{\max (I_{{{\text{retinex}}}} ) – \min (I_{{{\text{retinex}}}} )}} \times (H_{{{\text{clip}}}} – L_{{{\text{clip}}}} ) + L_{{{\text{clip}}}}$$

(1)

In the equation, Iout1 and Iretinex represents the pixel values of the output image and enhanced image by the MSRCR algorithm. The max(Iretinex) represents the maximum pixel values of the image enhanced by the MSRCR algorithm, min(Iretinex) represents the minimum pixel values of the image enhanced by the MSRCR algorithm, and Lclip and Hclip respectively indicate the lower and upper limits of grayscale levels.

Power-law transformation is a non-linear image enhancement method that alters the pixel value distribution in order to highlight details and features in the image26. It is based on a simple mathematical principle, mapping original pixel values through an exponential function to anew pixel value range, thereby changing the shape of the original pixel value distribution. The relationship is expressed as follows:

$$I_{{{\text{retinex}}}}^{\gamma } = \left( {I{\text{retinex}}} \right)^{\gamma }$$

(2)

In the equation, Iretinex represents the image enhanced by MSRCR. Iretinexγ represents the result of Iretinex power-law transformation. $\:{\upgamma\:}\:$is the exponent of the power-law transformation. When γ > 1, the pixel values of the output image become more concentrated, emphasizing details. When 0 < γ < 1 ,the pixel values become more dispersed, resulting in smoother details When γ = 1, the power-law transformation has no effect.

Integrating the power-law transformation Eq. (2) into the normalization formula of MSRCR Eq. (1) ,the improved MSRCR algorithm is as follows:

$$I{\text{out}}2 = \frac{{I_{{{\text{retinex}}}}^{\gamma } – minI_{{{\text{retinex}}}}^{\gamma } )}}{{maxI_{{{\text{retinex}}}}^{\gamma } ) – minI_{{{\text{retinex}}}}^{\gamma } }} \times H_{{{\text{clip}}}} – L_{{{\text{clip}}}} + L_{{{\text{clip}}}}$$

(3)

Then use the Eq. (4) to adjust the brightness level range of the output image.

$$I_{{{\text{out}}}} = \left( {I_{{{\text{out2}}}} } \right)^{\beta }, \beta = \frac{{I_{{{\text{out1}}}} }}{{max(I_{{{\text{out1}}}} )}}$$

(4)

In the equation, $\:{\upbeta\:}$ represents the enhancement factor, used to adjust the brightness level range of the output image.Its expression is as follows:

The IMSRCR and MSRCR methods were used to enhance the recognition and sorting images respectively, and the results are shown in Fig. 2. The expression of Hclip and Lclip in Eq. (1) is general expression. In the experiment, Hclip and Lclip are set to 255 and 0, respectively.

Fig. 2

The Visualization of feature point detection. (a) The original recognition image. (b) The MSRCR enhanced recognition image. (c) The IMSRCR enhanced recognition image. (d) The original sorting image. (e) The MSRCR enhanced sorting image. (f) The IMSRCR enhanced sorting image.

To evaluate the quality of the enhanced image, this article uses two widely used image quality evaluation metrics, Peak-Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM). The results are shown in Table 1.

Table 1 The comparison of image enhancement.

It can be seen that compared with the MSRCR method, the improved method has significantly improved PSNR and SSIM. The IMSRCR image has a more balanced global grayscale, clearer contrast, and richer details.

Feature extraction

The SuperPoint algorithm is different from traditional feature detection algorithms, and its main feature is to simultaneously detect image key points and extract descriptors27. It utilizes a self-supervised convolutional neural network model and is implemented through the Encoder Decoder architecture. In the SuperPoint algorithm, the input image is first subjected to feature extraction through a VGG encoder to obtain a feature map. Then, decode two branches on the output feature map, one for key point detection and the other for descriptor extraction. The convolutional layer in SuperPoint network is shown in Table 2.

Table 2 The convolutional layer in SuperPoint network.

The key point detection branch adopts a sliding window approach to calculate the probability of being a key point pixel by pixel on the feature map. The size of each window is usually 8 × 8 pixels, and each pixel is evaluated for being a key point and a probability map of the key points is generated. This probability plot is a 3D tensor with a size of H/8 × W/8 × 65, where M and N are the height and width of the image, respectively, and each pixel block corresponds to an 8 × 8 area as the probability of key points. Determine the final key point position through threshold processing and connectivity analysis. Its structure is shown in Fig. 3.

Fig. 3

The descriptor extraction branch is responsible for extracting descriptors for each key point from the feature map. This branch converts feature maps into three-dimensional tensors of H/8 × W/8 × D, where D is the dimension of the descriptor. Extract local features through a series of convolution and pooling operations. To obtain the final descriptor, it is necessary to perform different operations and normalization. The feature map is transformed into a three-dimensional tensor of H×W×D, representing the descriptor of each pixel. Its structure is shown in Fig. 4.

Fig. 4

The SuperPoint network was used to extract feature points from the original image, MSRCR, and improved MSRCR images, as shown in Fig. 5. The colored dots in Fig. 5 represent the detected feature points. By comparison, the improved MSRCR method further enriches the color information of the image and extracts significantly more feature points compared to the other two methods. This will be more beneficial for subsequent matching.

Fig. 5

The Visualization of feature point detection. (a) Feature detection in original recognition image. (b) Feature detection in the MSRCR enhanced recognition image. (c) Feature detection in the IMSRCR enhanced recognition image. (d) Feature detection in original sorting image. (e) Feature detection in the MSRCR enhanced sorting image. (f) Feature detection in the IMSRCR enhanced sorting image.

Feature matching and optimization

After completing the feature point extraction of coal gangue recognition images and sorting images, we need to match the feature points and optimize the matching results.

SuperGlue feature matching

After using SuperPoint to complete the feature detection of coal gangue recognition images and sorting images, it is necessary to match the feature points of the two images through feature matching methods to obtain the target coal gangue position. This article adopts the SuperGlue algorithm, which comprehensively considers the position information of feature points and visual appearance information to improve the accuracy and robustness of matching.

SuperGlue is a feature matching algorithm based on Graph Neural Network (GNN), which constructs the graph structure of feature points and utilizes Graph Convolutional Neural Network (GCN) for feature aggregation and matching. Figure 6 illustrates its network structure. It mainly consists of two parts: the first half combines self attention mechanism and cross attention mechanism, aiming to extract better matching descriptors. This part significantly improves the accuracy and robustness of feature matching by continuously strengthening the role of the (L = 7) vector f. The optimal matching layer in the latter half uses the inner product to obtain the score matrix and iteratively optimizes it using the Sinkhorn algorithm to achieve preliminary pose estimation optimization. Its structure is shown in Fig. 6.

Fig. 6

The architecture of SuperGlue.

The first part of attention GNN is the key point encoder. Firstly, the feature points are dimensionally enhanced using a key point encoder. Each key point combines its position and score information and uses a multi-layer perceptron (MLP) to embed the position information of the key points into a high-dimensional vector. As shown in Eq. (5). Specifically, the input data is the key point p and its corresponding feature descriptor d extracted by the feature extraction network, which includes the coordinates and confidence information of the feature points. Firstly, feature points are converted into 256-dimensional data through 5 multi-layer perceptrons (fully connected layers), with channel numbers of 3, 32, 64, 128, and 256, respectively. As shown in Fig. 7.

$$x_{i}^{(0)} = d_{i} + MLP(p_{i} )$$

(5)

Among them, di and pi are the i-th descriptor and feature point, respectively.

Fig. 7

The second part of attention GNN is the graph attention mechanism, which includes two undirected edges: Image edges εself and Cross edge εcross. The former connects the feature points inside the image, while the latter connects the feature points i of adjacent images (denoted as A and B), where t is the time corresponding to adjacent frames. The txiA represents the middle expression of the i-th element on image A in layer L. The information mε→i aggregates the {j:(i, j)∈ε} of all feature points, where ε∈{εself, εcross }. From this, all features i in image A transmit updated residual information.

$$^{i + 1} x_{i}^{A} = {}^{t}x_{i}^{A} + MLP\left( {\left( {ix_{i}^{A} \left\| {m_{i \to i} } \right.} \right)} \right)$$

(6)

Among them, (∙∥∙) represents concatenation operation, and MLP represents multi-layer perceptron. When the number of layers L is odd, ε = εcross, and when the number of layers L is even, ε = εself. By alternately updating between self mechanism and cross mechanism through mε, we simulate the process of humans repeatedly looking between two images until differences or similarities are discovered. The final feature description vector used for matching is shown in Eq. (7).

$$f_{i}^{A} = W \, {.}x_{iA}^{(L)} + b$$

(7)

The above equation W represents weight, and b represents deviation.

The main function of the optimal matching layer is to generate a matching score matrix and output the final matching pair. After completing the feature description vectors fiA and fiB, the inner product is calculated to obtain the score matrix S∈RH×W, where M and N represent the number of feature points in the original image and the image to be matched, respectively. The inner product is calculated as follows:

$$S_{i,j} = \left\langle {f_{i}^{A} ,f_{i}^{B} } \right\rangle ,\forall \left( {i,j} \right) \in A \times B$$

(8)

Among them, <·,·> is the inner product. Expand the score matrix by one channel to obtain,‾S as shown in Eq. (9), and directly assign unmatched values to that channel. The score of newly added rows and columns is a fixed value, which can also be obtained through training.

$$\overline{S}_{i,W + 1} = \overline{S}_{H + 1,j} = \overline{S}_{H + 1,W + 1} = z \in R$$

(9)

Where S∈RH×W, represents the allocation matrix,‾S∈R(H+1)×(W+1) represents the augmented matching matrix, and each row represents the matching probability between a point and other points to be matched. Since each point can only have one matching point at most, the sum of values in each row of ‾S is 1. Finally, the Sinkhorn algorithm is used to solve for the maximum value of the ‾S allocation matrix, resulting in the optimal allocation result.

PROSAC optimization

Currently, traditional feature optimization methods often adopt the Random Sample Consensus (RANSAC) algorithm. However, since RANSAC selects data randomly without considering the difference in data quality, it may lead to difficulties in convergence under certain circumstances. To overcome this limitation, this paper introduces the PROSAC algorithm. The PROSAC algorithm adopts a semi-random approach. It evaluates the quality of matching point pairs, sorts them in descending order based on their quality, and gives priority to selecting high-quality point pairs. PROSAC can effectively reduce the computation time of the algorithm. Moreover, in cases where RANSAC fails to converge, its semi-random selection method can still ensure the convergence of the algorithm, thereby obtaining more accurate feature point-matching results. Figure 8 shows the results without optimization and the results further optimized with PROSAC. The matching relationship between the recognition image feature points and the sorting image feature points in the figure is connected by colored lines.

Fig. 8

From Fig. 8, we can see that there are some feature point pairs matching errors in Fig.(c), which leads to a deviation in the matching results indicated by the blue box in Fig.(e). However, after using the PROSAC algorithm for optimization, the mismatched feature point pairs were filtered out as shown in Fig.(d). The blue box in Fig.(e) outputs the correct result.

Source link

What's Hot

Health Canada approves Novartis’ KISQALI® for HR+/HER2- early breast cancer patients at high risk of recurrence

Sheriff, county lawyer seeking mental health funds at Minnesota State Capitol

Chronic absences have not disappeared. Research shows that poor children are most hurt.

Research on efficient matching method of coal gangue recognition image and sorting image

Chronic absences have not disappeared. Research shows that poor children are most hurt.

American Brain Tumor Society’s Metastatic Brain Tumor Collaborative Announces $50,000 Research Grant Opportunity to Fund High-Risk, High-Impact CNS Metastasis Research

Massive yard sale in Newtown benefits pancreatic cancer research

Health Canada approves Novartis’ KISQALI® for HR+/HER2- early breast cancer patients at high risk of recurrence

Sheriff, county lawyer seeking mental health funds at Minnesota State Capitol

Chronic absences have not disappeared. Research shows that poor children are most hurt.

Transport Secretary reveals overhaul of aging pneumatic transport systems

What's Hot

Research on efficient matching method of coal gangue recognition image and sorting image

Proposed method

Related Posts

Subscribe to Updates