Corrosion is the result of deterioration caused by the reaction of metallic materials’ surface and internal micro-structure with corrosive environments[1-2].With long-term exposure to the external environment, corrosion has become a common defect of power equipment.Corrosion can substantially shorten the power equipment’s life span and cause economic losses if not tackled promptly.Hence, it is important to regularly detect and eliminate corrosion in time.
Currently, non-destructive methods are usually applied to detect corrosion.These include X-rays[3], local wavenumber[4], infrared thermography[5-6], magneto-optic imaging[7], and cameras.However, all these methods have strict requirements related to the testing environment, testing equipment, and the professional level of testing personnel.At present, convolutional neural networks(CNNs)[8] are employed to perform corrosion detection using collected RGB images.
Since the first appearance of AlexNet[9] in ImageNet competition, methods using CNNs for feature extraction and image classification have rapidly developed and are now widely used in image recognition and object detection across various fields.Among the current CNN-based object detection models, prominent architectures include Faster R-CNN[10] and YOLO[11-12].
However, due to the irregular shape and detachable nature of corrosion, these detection models cannot directly achieve satisfying results[13].The main reason is that traditional annotation methods entail ambiguity and uncertainty during the labeling process, making it difficult for this process to be unified.As a result, inconsistent annotation causes problems for detection models intending to effectively learn the inherent features of metal corrosion during the training process.At present, three main data annotation approaches are used in object detection tasks: two-dimensional(2D)bounding box, three-dimensional(3D)cuboids, and polygonal segmentation.
While the 2D bounding-box approach is relatively simple and widely used, the rectangular area usually contains non-target objects.Particularly relevant to the present context is that metal corrosion exhibits obvious irregularities, resulting in a large area of non-corrosion in the labeled box.Ref.[14]used a sliding window to intercept a small area so as to reduce the non-corroded area contained in the window.Subsequently, a small CNN was used to judge whether each window contained a corroded area.However, this method divides an image into many tiny windows, so the model only focuses on local pixel data in each window and ignores global pixel data.Thus, the features learned by the model are too partial and one-sided.Meanwhile, due to the size of the window, the designed CNN should be small, which limits the CNN’s learning ability.3D cuboids can display the depth of the target, but this is rarely required for metal corrosion detection.Moreover, polygonal segmentation can fit the target shape well and resolve the shortcomings of the bounding-box method.However, this annotation method is highly time-consuming and costly and hence is rarely used on a large scale in practice.
This paper proposes a novel hierarchical annotation method(HAM).Firstly, large boxes were used to label a large area covering the range of corrosion, provided that the area is visually continuous and adjacent to corrosion that cannot be clearly divided.Secondly, within each box established in the first step, regions with distinct corrosion and relative independence were labeled to create a second layer of nested boxes.This method is simple and can easily produce unified and unambiguous annotation results.In addition, it highlights the corrosion features and increases the number of ground truth, which enables data augmentation to be achieved and makes it easier for detection models to learn the inherent features of corrosion during the training process.
In summary, this work makes three main contributions: Firstly, a novel method named HAM is proposed to accurately detect metal corrosion in power equipment.Secondly, a detection box merging algorithm is applied to merge intersecting boxes.Finally, novel definitions of precision and recall are proposed in view of corrosion features.
Due to the diversity of equipment and ways in which corrosion spreads, the shape and size of corroded areas tend to be highly irregular.Meanwhile, corrosion is often detachable and can thus be regarded as either a whole area or several small independent areas.In view of the characteristics of metal corrosion, two methods are generally used in applying the 2D bounding-box approach to label corrosion areas:
1)Fine-grained annotation method(FAM).Using the smallest possible box to label the corrosion area
2)Coarse-grained annotation method(CAM).Using the largest possible box to label the corrosion area
FAM can minimize the area containing non-target objects within a rectangular box.However, the tendency of metal corrosion to spread will cause significant ambiguity in the labeling process, yielding inconsistent annotation results.One simple solution is to ignore corrosion spreading areas.Fig.1(a)shows some unlabeled corrosion spreading areas near the labeled corrosion.However, corrosion spreading is such a highly important feature of metal corrosion where ignorance will significantly reduce the annotation quality of the dataset.
CAM adopts a large box to label a dataset, specifically one large enough to encompass all areas of spreading corrosion.Thus, the problem encountered by FAM can be avoided during the labeling process.The annotation results for this method are shown in Fig.1(b).However, CAM also has certain shortcomings: Firstly, the number of non-target objects contained in the labeling box greatly increases.Moreover, different objects may have the same annotation results, whereas similar objects may have different annotations.For example, this method causes the normal nut in the lower-right corner and the corroded nut in the top-right corner in Fig.1(b)to be labeled as a part of the labeling box.Meanwhile, the corroded nut in Fig.1(b)is labeled as a part of the corrosion area, but another similar corroded nut in Fig.1(c)is labeled as an independent corrosion area.Hence, CAM results in the instability of detection model training.
(a)
(b)
(c)
(d)
Fig.1 Performance of the labeling process using different methods.(a)The annotation result by FAM;(b)The annotation result by CAM;(c)The detachability of metal corrosion;(d)The annotation result by HAM
The HAM proposed in this paper is adopted to solve the above problems.Firstly, a large-enough box labels a large area covering the range of corrosion, as long as the area is visually continuous and adjacent to corrosion that cannot be clearly divided; an example is shown in GT1 in Fig.1(d).Then, based on the previous labeling box, areas with distinct features and relative independence are labeled twice to form a second layer of nested labeling, as shown in GT2 and GT3 in Fig.1(d).
Therefore, the first operation of HAM provides a solution to the problem of labeling omission in FAM using a large box to annotate metal corrosion, and the second operation of HAM reannotates independent corrosion in the large box to refrain the problem of labeling ambiguity in CAM.At the same time, the number of ground truth per sample is increased after using HAM.Thus, data augmentation is achieved to a certain extent, which will counteract the negative impact of the rectangular box containing non-target objects.
Although HAM can effectively solve labeling omission and labeling ambiguity, there must be labeling nested boxes in the outer labeling boxes.Therefore, in the final detection results, the models trained by a hierarchical labeling dataset will also output nesting or intersection detection boxes, as shown in Fig.2 in detection boxes A and B.Although this condition does not affect actual detection results, it interferes with the visualization of results and the calculation of the detection models’ precision and recall.Accordingly, we transformed the detection results in this study.The intersection and nesting detection boxes were merged to form the final annotation result.That is, in Fig.2, orange box C is adopted to replace A and B as the final annotation result.
(a)
(b)
Fig.2 Diagram of merging boxes.(a)Detection results before merging;(b)Detection results after merging
Based on the above discussion, one corrosion sample may obtain different annotation results due to the use of different annotation methods.Consequently, the test sets labeled by different methods are also different.Thus, the performance of models trained by the three annotation methods cannot be meaningfully compared.To solve this problem, in this study, the test set was labeled uniformly using polygonal segmentation to create a unified test set.Some of the labeled samples are shown in Fig.3.
(a)
(b)
Fig.3 Two annotation results using polygons.(a)One corrosion area labeled;(b)Two corrosion areas labeled
In the experiment, precision and recall were adopted as the evaluation criteria.According to the positive and negative of predicted results and actual results, the classification results can be divided into four categories, as shown in Tab.1.
Tab.1 Confusion matrix for binary classification
Actual resultsPredicted resultsPositiveNegativePositiveTPFNNegativeFPTN
For object detection, the intersection over union(IoU)is typically used to evaluate whether or not the detection result is correct.The IoU calculates the ratio of the intersection and union of the “detection box” and “ground truth.” Usually, the threshold value of the IoU is set to 0.5 in actual projects.When the value of the IoU is larger than 0.5, the target has been detected.However, in this study, the test set was labeled via polygonal segmentation.As a result, the value of the IoU of the detection box and ground truth is often very low.As shown in Fig.4, the value of the IoU is lower than 0.5; thus, it is not suitable to directly use the IoU to judge whether one predicted box is correct.
Fig.4 Large non-corroded area
Accordingly, the precision and recall of the models were evaluated using the following expressions.
As shown in Fig 4, even if the model effectively detects the corrosion area, several non-corroded areas are present in the results.Therefore, if the ratio between the area sum of correct detection areas and the area sum of detection areas is used as the precision, then the calculated precision is often very low, and the performance of the models cannot be correctly evaluated.Hence, in this study, the precision was calculated with reference to the number of detection areas.The expression for calculating the precision is
where P is the precision of the models; Nca is the number of correct detection areas; and Nda is the number of detection areas.
To reduce the influence of image size, all image sizes were normalized before calculation.The expression for calculating the recall is
where R is the recall of the models; Sca is the area sum of the correct detection area; Sta is the area sum of the true corrosion areas.
Therefore, the unified test set was used to ensure that the three annotation methods can be compared correctly through the use of the two calculation expressions presented above.Finally, the performance of different models can be evaluated realistically.
The experimental software and hardware environments are presented in Tab.2.
Tab.2 Experimental software and hardware environments
ConfigurationDescriptionCPUI7-9800X/8@4.4GHzRAM32GBGPUNvidia RTX2080Ti@11GOSCentos7Deep learning frameworkPyTorchObject detection modelFaster R-CNN, YOLOv5[15]DatasetCollected from actual substations by Nari Corp; 1 180 training set, 199 test set Evaluation indicatorsPrecision, Recall
Firstly, this study used polygonal segmentation to label 199 images as the test set.Secondly, 1 180 images were labeled by three annotation methods for training Faster R-CNN and YOLOv5 models.Subsequently, VGG16 or Res101[16](as the backbone networks)and SGD[17](as the optimizer)were applied to train Faster R-CNN, whereas DarkNet53[11](as the backbone network)and SGD(as the optimizer)were applied to train YOLOv5.Finally, after the detection box merging algorithm was used to merge the intersecting boxes of each image, the precision and recall of the detection models were calculated.The experimental results are shown in Tab.3.
Tab.3 Experimental results %
ModelAnnotation methodPrecisionRecallYOLOv5FAM89.5851.20CAM93.8450.79HAM91.1459.41Faster R-CNN +VGG16FAM84.7266.50CAM84.6970.51HAM82.9578.94Faster R-CNN +Res101FAM80.4978.32CAM82.3182.69HAM81.4884.61
In the experiment, Faster R-CNN+Res101 based on HAM had the best detection result with the highest recall, as shown in Fig.5.In addition, Faster R-CNN achieved a higher recall than YOLOv5, although its precision was lower.For Faster R-CNN, using Res101 as the backbone network resulted in a higher recall than that using VGG16, and its precision was slightly reduced.With HAM, YOLOv5 and Faster R-CNN+VGG16 exhibited great improvement in recall and kept slight fluctuations in precision.Furthermore, the recall of Faster R-CNN+Res101 reached the maximum out of all the experiments.
(a)
(b)
(c)
(d)
Fig.5 Some outputs of Faster R-CNN+Res101 using HAM.(a)One detection result by merging three boxes;(b)One detection result by merging two boxes;(c)One detection result without merging;(d)Two detection results by merging four boxes
According to the actual requirements of corrosion detection and processing procedures, further analysis of the experimental results shows the following findings:
1)Although YOLOv5 has the highest precision, it also has the lowest recall.In practical applications, it is often necessary to manually review the detection results of a model.Therefore, compared with higher precision, it is preferable to have a higher recall for corrosion detection models to avoid the omission of corrosion to the greatest extent possible.
2)In Faster R-CNN, Res101 has a higher recall than VGG16, and its precision is slightly lower.Because the structure of Res101 is more complicated than that of VGG16, Res101 has more convolution layers than VGG16 and can thus learn more features.At the same time, Res101 uses batch normalization[18] and rectified linear unit activation function[19] to ensure that the gradient can achieve good back propagation and speed up convergence.Simultaneously, Res101 enables the model to be fully trained and not degrade through the residual module.Therefore, its recall is higher.
3)After using HAM, the recall of each model clearly increases as compared with those of the other two methods, as shown in Tab.4.The models used in the experiment are all object detection models based on region proposal; that is, the object detection models detect objects with different sizes using predefined anchors with different sizes.Therefore, compared with FAM, HAM increases the number of larger predicted boxes.Although this does not affect the detection ability of small-size anchors, it improves that of large-size ones, thereby strengthening the models’ comprehensive detection ability.The analysis of CAM reveals the same results.Therefore, through the use of HAM, the detection effect of models will be improved.
Tab.4 Recall of models with different methods %
ModelFAMCAMHAMRes10178.3282.6984.61VGG1666.5070.5178.94YOLOv551.2050.7959.41
1)The training effect of object detection models is highly dependent on the availability of sufficient and high-quality datasets.Traditional annotation methods encounter ambiguity and uncertainty because of the irregularity and detachability of metal corrosion.Therefore, when Faster R-CNN and YOLOv5 are applied to directly detect corrosion in power equipment, it is difficult to standardize the labeling process of training samples and keep annotation results consistent.
2)In this paper, a novel HAM that utilizes the characteristics of corrosion is proposed.Meanwhile, a detection box merging algorithm is applied to solve the problem of nested boxes.
3)Ultimately, according to experimental findings, the corrosion detection ability of YOLOv5 and Faster-R-CNN models is found to be greatly improved after adopting HAM.The models further obtain better generalization ability, which can be popularized and widely applied in practice.
[1] Roberge P R.Handbook of corrosion engineering[M].New York, USA: McGraw-Hill, 2019:25-61.
[2] Volkan C.Corrosion engineering[M].Beverly: Scrivener Publishing LLC, 2014:1-19.
[3] Dunn W L, Yacout A M.Corrosion detection in aircraft by X-ray backscatter methods[J].Applied Radiation and Isotopes, 2000, 53(4): 625-632.DOI: 10.1016/S0969-8043(00)00240-2.
[4] Gao T, Sun H, Hong Y.Hidden corrosion detection using laser ultrasonic guided waves with multi-frequency local wavenumber estimation[J].Ultrasonics, 2020, 108: 106182.DOI: 10.1016/j.ultras.2020.106182.
[5] Doshvarpassand S, Wu C, Wang X.An overview of corrosion defect characterization using active infrared thermography[J].Infrared Physics & Technology, 2019, 96: 366-389.DOI: 10.1016/j.infrared.2018.12.006.
[6] Wicker M, Alduse B P, Jung S.Detection of hidden corrosion in metal roofing shingles utilizing infrared thermography[J].Journal of Building Engineering, 2018, 20: 201-207.DOI: 10.1016/j.jobe.2018.07.018.
[7] Dudziak M J, Chervonenkis A Y, Chinarov V.Nondestructive evaluation for crack, corrosion, and stress detection for metal assemblies and structures[C]//Nondestructive Evaluation of Aging Aircraft, Airports, and Aerospace Hardware Ⅲ.International Society for Optics and Photonics, Newport Beach, CA, USA, 1999, 3586: 20-31.DOI: 10.1117/12.339888.
[8] LeCun Y, Boser B, Denker J S.Backpropagation applied to handwritten zip code recognition[J].Neural computation, 1989, 1(4): 541-551.DOI: 10.1162/neco.1989.1.4.541.
[9] Krizhevsky A, Sutskever I, Hinton G E.ImageNet classification with deep convolutional neural networks[J].Communications of the ACM, 2017, 60: 84-90.DOI: 10.1145/3065386.
[10] Ren S, He K, Girshick R.Faster R-CNN: Towards real-time object detection with region proposal networks[J].IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149.DOI: 10.1109/TPAMI.2016.2577031.
[11] Redmon J, Divvala S, Girshick R.You only look once: Unified, real-time object detection[C]// IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas, NV, USA, 2016: 779-788.DOI: 10.1109/CVPR.2016.91.
[12] Redmon J, Farhadi A.YOLO9000: Better, faster, stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition.Honolulu, Hawaii, US, 2017: 6517-6525, doi: 10.1109/CVPR.2017.690.
[13] Fan H B, Hu X X, Liu Y M.Application of deep learning in corrosion detection of power equipment[J].Guangdong Electric Power, 2020, 33(9): 154-165.DOI: 10.3969/j.issn.1007-290X.2020.009.020.(in Chinese)
[14] Yao Y, Yang Y, Wang Y.Artificial intelligence-based hull structural plate corrosion damage detection and recognition using convolutional neural network[J].Applied Ocean Research, 2019, 90: 101823.DOI: 10.1016/j.apor.2019.05.008.
[15] Glenn J, Alex S, Jirka B. Ultralytics/yolov5: v3.1-bug fixes and performance improvements[EB/OL].(2020-10-29)[2021-02-01].https://zenodo.org/record/4154370#.YT4F4nbJ3Us.DOI:10.5281/zenodo.4154370.
[16] He K, Zhang X, Ren S.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas, NV, USA, 2016: 770-778.
[17] Sutskever I, Martens J, Dahl G.On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning.Atlanta, USA, 2013: 1139-1147.
[18] Ioffe S, Szegedy C.Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning.Stroudsburg, PA, USA, 2015, 37: 448-456.DOI: 10.5555/3045118.3045167.
[19] Nair V, Hinton G E.Rectified linear units improve restricted boltzmann machines[C]//International Conference on International Conference on Machine Learning.Haifa, Israel, 2010: 807-814.