Abstract

Aiming at the problem that the robot is difficult to locate in the oil-immersed transformer, a visual positioning of the robot is proposed for internal inspection. First, in order to solve the problem of blur, distortion, and low contrast of the image obtained by the camera in the deteriorated and discolored transformer oil, an image enhancement algorithm based on multiscale fusion is developed to provide a reliable data source for robot localization. Then, the FAST key points are extracted and the BRIEF descriptors are calculated from the enhanced images, and the pose transformation of the robot between image frames is calculated by using polar constraint and EPnP method. A pose optimization model of the robot is designed to improve the positioning accuracy. Finally, to verify the effectiveness of the proposed methods, function tests are carried out by using the real continuous image sequence acquired by the robot in Mitsubishi transformer. The experimental results show that the trajectory of the robot in the transformer can be accurately drawn, the position data of the robot can be efficiently obtained, and autonomous positioning of the robot in the transformer can be well achieved.

1. Introduction

Oil-immersed transformers are key equipment of the national power grid and have the advantages of excellent performance and low price. Because transformer oil has the characteristics of low viscosity, good insulation, and good heat transfer performance, it can protect transformer cores and windings well. However, with increasing transformer use time, failure will occur more frequently [1]. To solve the problems of low efficiency and high cost of manually inspecting internal faults of transformers, scholars have designed internal fault detection robots for oil-immersed transformers, which are based on vision [24]. The robots make use of their vision system to obtain the internal image of the transformer for judging the fault type. However, the internal structure of transformers is complicated, and the location information of the faults cannot be accurately obtained only through the image. In addition, these robots do not have autonomous localization ability and cannot realize autonomous operation. Therefore, transformer fault detection can only be achieved by manual manipulation, which is inefficient.

The oil-immersed transformer has a closed shell, compact interior structure, and full transformer oil. The complex transformer structure brings difficulties to robot positioning. The research on submersible transformer inspection robots is still limited, and there are few research results on the positioning method of robots under transformer oil. Considering that both transformer oil and water belong to the fluid medium and that the motion characteristics of robots in fluid are similar, this paper mainly refers to the positioning methods of underwater vehicles [5, 6].

According to the different measuring principles of sensors, underwater vehicle positioning technology is divided into underwater acoustic positioning methods, dead reckoning methods, and vision-based positioning methods [712]. In underwater acoustic localization, the position of the underwater vehicle can be calculated by the transmission time and phase difference of the acoustic wave between the underwater robot and acoustic beacon. It can be divided into long baseline, short baseline, and ultrashort baseline based on the length of the baseline. In the dead reckoning method, the position of the underwater vehicle can be obtained by integrating the time according to the pose and speed of the underwater vehicle measured by the inertial navigation and Doppler velocity log (DVL), but accumulated errors exist in this method, which must be eliminated by the underwater acoustic positioning method. An underwater location method based on a particle filter was proposed by Martínez-Barberá et al. [13]. This method processes and fuzes the data of several underwater sensors, which improves the positioning accuracy of underwater robots. An ultrashort baseline positioning method based on a Kalman filter was proposed by Luo et al. [14], which effectively improved the positioning accuracy.

The underwater acoustic positioning method needs to arrange the acoustic beacon array in advance. However, the structure of the oil-immersed transformer is closed, which does not have the conditions to arrange the array. There are many obstacles in the transformer, and acoustic communication cannot be realized, so underwater acoustic positioning cannot be applied in oil-immersed transformers. In addition, the dead reckoning method requires high-precision inertial navigation and DVL. Considering the cost and volume of the transformer internal inspection robot, the sensors mentioned above are not suitable for a transformer internal inspection robot.

In underwater location methods based on vision, underwater vehicles can be located with image information [1518], which is obtained by inexpensive vision sensors. An underwater vision positioning technology based on refraction correction was provided by Suresh et al. [19], which uses refraction correction and triangulation to locate underwater vision by observing aerial landmarks in the water. An underwater localization method based on binocular vision was proposed by Du et al. [20]. The method takes advantage of the combination of feature extraction and back-end global optimization to verify the feasibility of binocular vision positioning in underwater environments. An improved front-end method of underwater visual SLAM based on binocular cameras was given by Wang et al. [21]. The histogram equalization algorithm was used to improve image sharpness for feature point extraction and tracking, and the Rtab-map algorithm was used to optimize its tracking and positioning process. The feasibility of achieving accurate positioning by the visual sensor in a small underwater area has been verified by the above methods. However, most underwater positioning methods based on vision use binocular vision sensors. Considering the size of the robot and the cost of fault detection, binocular cameras are not suitable for submersible transformer inspection robots. In addition, the environment in the transformer oil is different from the underwater environment, and the transformer oil has the problems of deterioration and discoloration when being used high pressure and high-temperature environments for a long time. Therefore, the internal image of the transformer obtained by the robot has the problem of low contrast and color distortion, which brings new challenges to the visual positioning method of the submersible transformer inspection robot.

In recent years, scholars have conducted relevant research on the positioning of inspection robots inside oil-immersed transformers. A high-precision detection model for transformer components based on the Fast R-CNN structure was provided by Liu et al. [22]. This model employs the dual feature mapping method, enabling automatic detection of the category and location of various transformer components in the detection image. However, this position information is only relative between components and cannot provide global positioning within the transformer. In addition, the R-CNN model-based approach demands significant computational power for transformer inspection robots, making it unsuitable for real-time detection and localization. Zhang et al. [23] implemented robot autonomous navigation through path planning based on the open-source Cartographer algorithm. Oil levels were acquired using a depth camera, and YOLOv4 was employed to train and learn oil level states under various weather conditions. This method is suitable for single inspection tasks such as oil level detection but is not applicable to fault detection and localization tasks that require more visual information. Moreover, significant measurement errors are introduced in the measurement of infrared light emitted by the depth camera in transformer oil. In the face of various complex environmental stresses, Pan et al. [24] conducted an analysis of the performance failure mechanisms of internal inspection robots. They completed a study on improving the reliability of internal inspection robots and their control systems, effectively enhancing the reliability of the internal inspection robotic arm. However, no analysis was performed on the positioning of internal inspection robots within transformers. A positioning method for an oil-immersed transformer internal inspection robot was proposed by Feng et al. [25]. In this method, the robot’s position within the transformer is measured and analyzed using a detection instrument and laser radar carried by the inspection robot. However, high-precision requirements are imposed on the accuracy of laser radar measurements in this method, and there is a lack of abundant visual information.

In summary, the visual-based positioning method for the transformer internal inspection robot proposed in this paper does not require expensive high-precision radar. After image enhancement, it is capable of providing abundant visual information. Achieving visual positioning and utilizing images for fault detection simultaneously, waste of robot computational resources is avoided, and real-time positioning of the internal inspection robot is achieved.

The specific implementation process of the algorithm in this paper is as follows: first, the image enhancement method based on multiscale fusion is used to enhance the image acquired by a monocular camera to improve the brightness and contrast of the image in the transformer oil. Second, the enhanced image is converted into grayscale to improve the efficiency of extracting features from accelerated segment test (FAST) key points to ensure that each image can be tracked quickly. Then, the position and pose transformations of different camera positions were solved by the epipolar geometry constraint and efficient perspective-n-point (EPnP) algorithm. Finally, the nonlinear optimization method is applied to optimize the position and attitude of the robot to obtain a more accurate trajectory. The experimental results show that the autonomous positioning of the submersible transformer inspection robot can be realized, and the problem of robot position initialization failure and tracking loss caused by the lack of image information can be solved.

2. Framework of the Visual Localization for the Oil-Immersed Transformer

2.1. Oil-Immersed Transformer Structure

The structure of the oil-immersed transformer is shown in Figure 1. The structure of the transformer (510 cm × 230 cm × 350 cm) is mainly composed of an oil tank, core, winding, bushing, cooler, manhole, and conservator. The core and winding are located in the center of the transformer. There are a large number of locking nuts, screws, and cables in the transformer, which may interfere with robot motion. The tank is filled with 25# transformer oil, which is mainly used for insulation, heat dissipation, and arc extinction. The internal structure of the transformer is compact, and the space is narrow, so it is difficult to realize robot positioning by arranging external sensors.

According to the structural characteristics of the transformer, our research group has designed a submersible transformer inspection robot, as shown in Figure 2. The diameter of the robot is only 19 cm, and the propulsion system, wireless communication system, control system, sensing and detection system, and power supply system are integrated inside. The robot is propelled by the propulsion system, which has six oil-jet thrusters. The sensor detection system mainly includes single-line laser radar, monocular camera, electronic compasses, and other sensors. The laser radar can only detect the distance between the robot and the obstacle, and it is influenced by the transformer oil and Lidar hull, so the measurement error is large. The electronic compass is affected by the oil-jet thruster and the closed structure of the transformer, so the measurement data are not accurate. Therefore, the robot can only rely on the images obtained by monocular vision to realize autonomous positioning.

2.2. The Proposed Framework

According to the above analysis, constrained by the internal structure of the transformer and the size of the robot, this paper uses a monocular camera to achieve robot positioning. Compared with the depth camera, which can obtain the distance information directly, the monocular camera can only estimate the depth by feature matching from the adjacent images obtained by the camera position change. For this reason, image feature extraction is very important in the positioning process. However, with increasing transformer working time, the transformer oil color gradually changes from pure transparent and light yellow to light brown. The image obtained by the camera in the degraded and discolored oil exhibits distortion, discoloration and low contrast, which increases the difficulty of image feature point extraction and registration. Therefore, a visual localization method of the submersible transformer inspection robot based on image enhancement is proposed. The flowchart of this method is shown in Figure 3 and is elaborated as follows:

(1)The image enhancement method based on multiscale fusion is applied to process the original image obtained by the robot to improve the brightness and contrast of the image to increase the number and speed of image feature point extraction.(2)To improve the extraction efficiency of FAST key points, the enhanced image is converted to a grayscale image.(3)The binary robust independent elementary features (BRIEF) descriptors of the feature points are calculated to compare the similarity of the features of the two frames for feature matching.(4)The epipolar geometry constraint method is used to initialize the motion process of the robot, and the EPnP algorithm is applied to solve the pose transformation of the robot between image frames.(5)To solve the pose between image frames, the tracking process of the robot is optimized by a nonlinear optimization method to realize more accurate tracking and positioning.

3. The Proposed Algorithms

3.1. Image Enhancement in Transformer Oil

Because the transmittance of artificial light in transformer oil of different years is different, the image quality under transformer oil cannot be improved by restoring the optical imaging model under transformer oil. Consequently, the image enhancement method under transformer oil based on a multiscale fusion strategy is adopted.

To solve the problem of color distortion in the transformer oil image [26], a color correction method based on a perfect reflection algorithm is adopted to correct the image color channel to balance the image color. The correction formula can be expressed as follows:where , and indicate the values of each channel of the original picture pixel. , and are the maximum values of each of the three channels , and in the image, respectively. The histogram of the sum of three channels for each pixel , and in the image is traversed backward, and the pixel with values in the top 10% is found as the white point to find the threshold T of the white point. Where , and indicate the average values that the sums of pixel points , and are greater than T. , and are the correction results of each channel value of the pixel in the picture.

To solve the problem of low-image contrast caused by different attenuation speeds of different colors of light in transformer oil, the adaptive gamma correction method is devoted to improving the image contrast. The gamma correction can be expressed as follows:where indicates the adjustment parameter of gamma correction. is the input image. To avoid the same intensity change after correction, the weighted modified probability density and distribution statistics are constructed. The adaptive gamma correction method is obtained as follows:where is the constructed adaptive gamma calibration, indicates the image intensity, and is the maximum intensity.

To further improve the image quality, the color-corrected images and contrast-improved images are normalized by saliency weighting and brightness weighting. The process can be expressed as follows:where indicates the eigenvector of the image and is the corresponding pixel vector value of the original image after Gaussian blur processing. describes the pixel position of the input image to grayscale image , and is the standard deviation. is the required normalized weight and is the sum of saliency weighting and brightness weighting.

Finally, the images after the white balance of perfect reflection correction and adaptive gamma correction are processed by the multiscale fusion method. The fusion formula is represented as follows:where indicates the final output image pyramid, is the number of pyramid layers, is the number of images to be fuzed, indicates that the input image is decomposed by the Laplacian pyramid, and indicates that the standardized weight image is decomposed by the Gaussian pyramid. The fuzed pyramid is upsampled to enhance the transformer oil image.

3.2. Feature Extraction and Matching

The submersible transformer inspection robot has a small size and limited computing power. To reduce the computation time and improve the efficiency of key point detection, the FAST key points and BRIEF descriptors are selected as the feature points in the feature extraction of the transformer oil image [27]. The processes of extracting FAST key points are as follows: a certain pixel point is selected in the image, its brightness is set to , the threshold is set to , and 16 pixel points on a circle with point as the center and three pixels as the radius are selected. If there are 12 consecutive points in these 16 pixels and their brightness is in the range of , then the point is a key point. In this method, only the brightness difference between pixels is compared, which is efficient.

The moving direction of the submersible transformer inspection robot is variable, and the FAST key points do not have direction information, so the gray centroid method is used to describe the rotation direction of the features.

The moment of the area image is defined as follows:where indicates the gray value at pixel , and the centroid of the image block is represented as follows:

The principal direction of the key point can be expressed as the direction vector , which is from the centroid O of the circular image to the centroid C. Then, the direction of the feature point is defined as follows:

The intensity centroid method makes the FAST key points have rotation information, and the accuracy of feature matching is improved when the robot rotates in the transformer.

In addition, the position of the robot changes every moment in the running process, which causes the size of the internal components of the transformer photographed by the monocular camera to be different in the picture. The circle with a radius of 3 is selected for the FAST key point, so there is a scale problem. Therefore, the paper constructs the image pyramid and detects the key points at each layer of the pyramid to solve the above problems [28]. In this work, the number of layers of the image pyramid is 8, and the scaling factor is 1.2.

Key points are the positions of feature points in the image, and descriptors describe the information of pixels around key points. The similarity of key points descriptors has been compared to achieve feature matching between images. Considering the computing power and real-time positioning of the submersible transformer inspection robot, the binary BRIEF descriptor with convenient storage and fast calculation is selected. Its description vector is composed of 0 and 1, and only the number of different digits in the binary string needs to be compared in the process of feature point matching, which is suitable for real-time matching of feature points in transformer oil images. We choose to utilize a 256-bit binary descriptor in this paper. When matching feature points between different images, the similarity between two BRIEF descriptors is assessed by calculating their Hamming distance. A smaller Hamming distance indicates a higher degree of similarity between the two descriptors, thus implying a higher level of matching. To determine whether two descriptors match, a threshold is typically set. If the Hamming distance between two descriptors is less than the threshold, they are considered to be a match. The steps to set the threshold are as follows: first, the minimum Hamming distance is calculated. Second, 30 is set as the lower limit for the distance. Third, if the distance between descriptors is greater than 30 and less than twice the minimum distance, then the two feature points are considered to be a match.

3.3. Robot Pose Estimation

The images captured by the robot’s monocular camera have no depth information, so it is necessary to estimate the pose and depth of the robot at the beginning of its movement. In this paper, the epipolar geometry constraint method is used to estimate at the same time, and the method with smaller error is selected as the result of motion estimation [29]. After obtaining some 3D spatial points and their projections on 2D images, the relative motion of the camera can be estimated based on the coordinates of n known 3D points and the coordinates of these 3D points projected onto the image as 2D points in the pixel coordinate system. Various solution methods can be employed for such problems, including direct linear transform (DLT), perspective-three-point (P3P), efficient perspective-n-point (EPnP) [30], and others. The issues that need to be considered include the complexity of capturing images inside oil-immersed transformers, significant noise levels, and limitations in the computational power of the internal inspection robot. At least six pairs of points are required for the DLT method, and it is highly sensitive to noise and errors. The P3P method requires four pairs of points, with three pairs used for solving and one pair used for algorithm validation. However, P3P struggles to utilize information from additional matching point pairs, and it becomes ineffective in the presence of noise or mismatches. The EPnP algorithm transforms through control points, exhibiting better performance with noisy feature points. The solution obtained is a closed-form solution, eliminating the need for iteration and initial estimates, resulting in exceptionally high-computational efficiency. Therefore, the EPnP algorithm is chosen in this paper for camera pose estimation. It consists of the following steps:(1)The motion model between two images acquired by a monocular camera is shown in Figure 4, where and are camera optical centers and and are polar lines. If the feature point and the feature point are correctly matched, the position of the spatial point P can be calculated. According to the pinhole camera model and the epipolar geometry, the following equation is true:where indicates the internal reference matrix of the camera, is the camera rotation matrix between two coordinate systems, and is the translation vector of the camera. Suppose the fundamental matrix from pixel to is . Equation (10) can be written as follows:

Equation (11) is written in matrix form, which can be expressed as follows:where the pixel point is represented by homogeneous coordinates. Considering the scale equivalence, Equation (12) shows that the matrix can be solved by constructing 8 constraint equations through 8 pairs of matching points.(1)If the robot only moves in translation, it can be described by homography matrix . The matched pixels on the two images and can be represented by the following equation:

Equation (13) is written in matrix form as a homogeneous equation,

Equation (14) shows that the H-matrix can be obtained by constructing 8 constraint equations through 4 pairs of matching points. The rotation matrix and the translation vector can be obtained by singular value decomposition of the homography matrix [31].(3)The depth information of the space point P is estimated by the triangulation method, and the depth estimation model is shown in Figure 5. The geometric relation of the position of feature points in three-dimensional space can be expressed aswhere and indicate the normalized coordinates of the feature points after successful matching in the image and and are the depths of the two feature points.

(4)To reduce the computation load and improve the pose accuracy of the robot, the EPnP algorithm is used to estimate the relative motion of the robot. The EPnP algorithm only needs 4 pairs of matching points to solve the camera pose, as described below.

In the world coordinate system, 4 control points are selected to describe the distribution of all 3D points in the space. Each 3D point can be expressed aswhere indicates the control point and represents the weight of the four control points. Points in the world coordinate system are transformed to the camera coordinate system by rotation and translation, and the conversion equation can be expressed aswhere indicates the coordinate of the control point in the camera coordinate system. Because is invariant in the camera coordinate system, the weight of in the world coordinate system is known. All points in the world coordinate system can obtain their coordinates in the camera coordinate system.

Assuming that the coordinate of a 3D point in the camera coordinate system is and the coordinate in the world coordinate system is , there are rotation matrix and translation vector that can satisfy the following equation:

To solve matrix and vector , the error Equation (19) can be defined as follows:

The centroids of the 3D point in the world coordinate system and the 3D point in the camera coordinate system are expressed as follows:

The error equation can be expressed as follows:

The rotation matrix and translation vector of the robot can be obtained by the least square method.

3.4. Robot Pose Optimization

The relative camera poses of two consecutive pictures can be calculated by the above steps. Due to the existence of noise, accumulated errors will inevitably occur with increasing running time. In addition, the motion of the robot is nonlinear. To obtain a relatively accurate pose of the robot, it is necessary to optimize the pose of the robot.

Considering the computing power of the robot, on the premise that the computing performance and the optimization accuracy are balanced, a bundle adjustment (BA) optimization method with a small computing scale is used in this paper [32]. On the premise of having no error, the camera pose calculated by projection and the camera pose observed by feature matching should satisfy the Lie group transformation,where indicates the calculated pose, and is the pose obtained by sparse matching. To optimize the pose of the robot, the optimization objective function can be constructed based on Equation (23). The optimization objective equation can be written as follows:

The optimization function represents the error between the robot motion model and the observation model. In this paper, the Levenberg–Marquardt gradient descent strategy is used to optimize the objective function, and convergence can be realized after 10 iterations. To this point, the posture of the robot has been optimized.

4. Experimental Results and Discussion

The robot was tested in the Mitsubishi transformer of Shuibei Power Supply Station in Shenzhen. To evaluate the positioning performance of this algorithm on the transformer, an experiment was conducted with the image frame sequence that was obtained during the experiment. The internal space of the transformer is small and compact, and it has been working in a high temperature and high-pressure environment for more than 20 years. The discoloration of the transformer oil is serious.

4.1. Comparative Experiment on Feature Point Extraction

To verify the improvement of the image enhancement algorithm on feature point extraction, a comparative experiment of feature point extraction was carried out between the original image obtained by the robot and the enhanced image. The comparison before and after image enhancement obtained from Section 3.1 is shown in Figure 6. The results of the comparative experiment of feature extraction are shown in Figures 7 and 8. As shown in Figure 7, the original image is distorted and blurred by the metamorphic and discolored transformer oil. In the original image and grayscale image, only a few feature points can be extracted in the upper right corner of the image, so the pose of the robot cannot be solved by using the original image directly.

As shown in Figure 8, more feature points can be extracted where the pixel brightness contrast is strong in the enhanced picture. The experimental results of feature extraction using an 8-layer pyramid are presented in Table 1 and illustrated in Figure 9.

The algorithm’s robustness and stability can be improved, and effective feature detection and matching across multiple scales can be allowed by using an image pyramid for feature point extraction.

To verify the robustness of the image enhancement algorithm, we selected 10 images for comparative experiments, and the results are shown in Table 2.

As shown in Table 2, the average number of feature points extracted from 10 images without enhancement is 33.5. After the image is enhanced, the average number of feature points is approximately 333. The results show that the number of extracted feature points was greatly increased.

4.2. Comparison of Feature Matching between Images

To verify the performance of the image enhancement algorithm on interframe matching results, a comparative experiment of interframe feature matching was carried out. First, the feature points of two consecutive original images are extracted for feature matching, and mismatching points are deleted. Then, the same is done for two consecutive gray images. The matching results are shown in Figure 10, where only 14 pairs of points are successfully matched.

The feature matching experiments are carried out on the above two consecutive images after image enhancement, and the mismatching points are deleted. The matching results are shown in Figure 11. The number of matching pairs is greatly increased, reaching 72 pairs, which is beneficial for solving and optimizing the robot’s pose.

To realize the tracking and positioning of the robot, it is necessary to match successive multiframe images. Therefore, the matching experiment was carried out on the 10 consecutive enhanced images, and the feature matching results before and after image enhancement are shown in Table 3.

4.3. Algorithm Comparison Simulation Location Experiment

The real trajectory of the robot cannot be obtained inside the transformer. To verify the error between the trajectory obtained by the robot positioning method proposed in this paper and the real trajectory of the robot, the ROS and Gazebo environments [33] are used to build the simulation environment of the robot movement in transformer oil. The environment is shown in Figure 12, including the robot and the observation target.

The robot is equipped with a monocular camera. Gaussian noise is added to the robot monocular camera in the simulation process, and the images obtained before and after increasing the noise are shown in Figure 13.

As shown in Figure 13, the blurred image obtained by the robot after adding noise increases the difficulty of robot positioning. Image enhancement and feature matching experiments are performed on the data with added noise. The matching results are shown in Figure 14. After matching and screening, there are 46 pairs of matching points in the image with noise, and the number of matched feature points in the enhanced image reaches 93 pairs. The experimental results show that the image enhancement algorithm can effectively improve the matching number of image feature points.

To verify the positioning accuracy of the algorithm, robot motion observation and control experiments are outperformed. Conduct comparative experiments with the open-source monocular visual odometry algorithm direct sparse odometry (DSO). DSO is a monocular vision-based localization algorithm released by Dr. Jakob Engel from the Computer Vision Laboratory at the Technical University of Munich (TUM) in 2016. DSO falls under the category of sparse direct methods. The 3D trajectory of the robot in the simulation environment is shown in Figure 15, in which the blue trajectory represents the trajectory obtained by the localization algorithm, green represents the real trajectory of the robot, and pink indicates the trajectory obtained by the DSO algorithm based on a monocular visual odometer. As shown in Figure 15, the motion trajectory obtained by the algorithm in this paper is basically consistent with the real trajectory, with a certain deviation. The trajectory obtained by the DSO algorithm is quite different from the real trajectory.

It is not easy to evaluate the positioning effect using a 3D trajectory, so the positioning errors of the trajectory on the X axis, Y axis, and Z axis are plotted. The results are shown in Figure 16, in which the pink curve represents the positioning error of the DSO algorithm, and the blue curve represents the positioning error of the algorithm in this paper. As shown in Figure 16, the positioning error of the proposed algorithm is smaller than that of the DSO algorithm in the X-axis, Y-axis, and Z-axis directions. The maximum positioning error of the proposed algorithm is only 1 m, while the maximum positioning error of the DSO algorithm is close to 3 m. In addition, the root-mean-square errors of the robot position on the X-axis, Y-axis, and Z-axis are calculated. As shown in Table 4, the root-mean-square errors of this algorithm are obviously smaller than those of the DSO algorithm.

The simulation results show that the initial trajectory of the DSO location algorithm is far from the real trajectory, and the tracking fails. After image enhancement, the algorithm in this paper effectively realizes feature extraction and accurately restores the running trajectory of the robot under oil. From the error comparison experimental data, it can be seen that the positioning accuracy of the proposed method is better than that of the DSO positioning algorithm, thus verifying the effectiveness of the proposed method in the transformer oil environment.

4.4. Positioning Experiment of the Robot in Transformer Oil

To verify the effectiveness of the positioning method of the submersible transformer inspection robot, the positioning method is tested and analyzed by the data collected by the robot in the Mitsubishi transformer. The time length of the experimental data is 57 s, and the frame rate is 25 fps. The video data are extracted into a picture sequence, and the positioning performance test is carried out.

The robot tracking and positioning experiment is carried out by the original data, as shown in Figure 17, which mainly includes a blue trajectory, a purple trajectory and a red starting position. After the robot is successfully initialized in the red position, the blue track indicates the robot trajectory drawn from the original image. At the end of the blue track, the robot failed to track because the image obtained by the robot was not clear. Then, the robot localization algorithm is reinitialized. After reinitialization, according to the image currently acquired by the robot, the motion trajectory is drawn, which is shown in purple. From the results, we can see that the robot trajectory can be drawn according to the acquired visual information. However, if the image quality obtained by the robot is poor, it will lead to the failure of robot tracking, the repeated reinitialization of the positioning algorithm, and the discontinuous trajectory of the robot.

To verify the effectiveness of the proposed algorithm in the transformer, the image sequence is processed by multiscale fusion enhancement, and then the robot tracking and positioning experiment is carried out. The results are shown in Figure 18. It includes a red box, some blue boxes, and a green box. The red box indicates the initial position of the robot, the blue boxes indicate the historical trajectory of the robot, and the green box indicates the current position of the robot.

From the result in Figure 18, the robot trajectory can be displayed intuitively and clearly by the positioning method in this paper, and relatively accurate position information can be obtained. In addition, tracking failure and the phenomenon of reinitialization do not appear. The pose information of each frame in the image sequence can be provided by the positioning method. Part of the data are shown in Table 5.

On the basis of the coordinate information of each image obtained by the robot, the 3D motion trajectory of the robot inside the transformer is drawn by MATLAB, which is shown in Figure 19. The red trajectory represents the robot trajectory obtained by the original image, and the blue trajectory represents the robot trajectory obtained after image enhancement. It can be seen from the results that maximum errors of 0.03 m along the X-axis, 0.008 m along the Y-axis, and 0.005 m along the Z-axis within the actual transformer space. Root-mean-square errors are calculated at 0.0158 m along the X-axis, 0.0026 m along the Y-axis, and 0.0028 m along the Z-axis. Errors in positioning along each axis inside the transformer by the robot are found to be within permissible limits. And the robot’s trajectory in the transformer can be drawn by monocular vision in this algorithm, and the algorithm solves that the difficulty of locating the robot in the transformer.

5. Conclusion

To realize the autonomous positioning of the robot in the deteriorated and discolored transformer oil, a visual positioning method of the submersible transformer inspection robot is proposed in this paper. Aiming at the problem of transformer oil deterioration and discoloration, an image enhancement algorithm based on multiscale fusion strategy is proposed, to improve the image quality, brightness, and contrast. Aiming at accurately locating the robot pose in the transformer, the rotation and scale information of the robot are added based on the feature extraction of FAST key points and BRIEF descriptors. The interframe pose is solved by epipolar constraint and EPnP, and the robot pose optimization model is designed to further improve the positioning accuracy of the robot. The performance of the algorithm proposed in this paper is evaluated by using the monocular image obtained by the robot inside the transformer. The experimental results show that the enhanced images can satisfy feature point extraction, the trajectory of robot in transformer the algorithm can be plotted.

Our future work involves two aspects: first, investigating how to test the positioning accuracy of the algorithm within the transformer; second, considering ways to enhance positioning accuracy in larger and more complex scenarios. Building upon the existing positioning methods, incorporating multiple sensors such as inertial navigation or radar is contemplated to improve the robot’s positioning accuracy. Additionally, the addition of loop detection to the system to mitigate the accumulation of errors over time is seen as an important direction for our next research steps.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this article.

Acknowledgments

This work was supported by the Project of Liaoning Education Department (No. LN20221025), this research was funded by the Liaoning Provincial Education Department Project (No. LJKMZ20220614), and the Joint Open Foundation of Key Scientific and Technological Innovation Bases of Liaoning Province (No. 2021-KF-12-05).