Abstract

The operations of the operators are important for nuclear safety, but conventional operating experience feedback and common data-driven methods make it difficult to explicitly find valuable information hidden in these operational sequences that can help the operator to provide advice at the operational level. During the nuclear power plant (NPP) operation, a large amount of historical operating data is accumulated, which records the operational sequences of the operators and the state parameters of equipment. Therefore, this paper proposes the use of association rule techniques to mine the NPP operating data to discover the operational characteristics of operators and reveal their possible impact on the NPP operation. This work helps to improve the operational performance of operators and prevent human-factor events. To this end, the concept of state switching values for describing the operating states of NPPs is proposed to enable the proposed method to be adapted to different practical application scenarios. A sequence segmentation method is proposed to be able to transform historical NPP operating data into a sequence data set for association rule mining. Furthermore, an ensemble algorithm based on sequence pattern mining and sequence rule mining and its postprocessing method are designed. The empirical study was carried out using 20 batches of historical operating data of the cold start-up. A total of 164 original association rules are generated using the proposed method and were analyzed by experts. The recommendations were made for 4 different cases that would improve the operational performance of the operators.

1. Introduction

With the continuous increase in the number of operating experience feedback (OEF) from nuclear power plants (NPPs), analysis of human factors events (HFEs) has become an important part of OEF and is important for improving personnel performance and preventing human errors [1, 2]. Many studies have been conducted on HFEs in recent years [36]. It is evident that human reliability analysis (HRA) tends to focus more on the impact of objective conditions on the operator and lacks analysis of the operational level of the operator [7, 8]. Moreover, the main source of HRA data for the study is full-scope simulators; operational level problems are more likely to occur under real operating conditions [9]. In HFEs, the operations of the operator have a direct impact on nuclear safety and account for a large proportion of human errors [10]. Faced with an emergency situation, operators may make incorrect judgements about the state of the equipment and perform corresponding operations accordingly [11]. More often, due to variations in cognitive ability, experience level, and behaviour among different operators at nuclear power plants, operational sequences may display differing characteristics, such as regular performance of certain operations and habitual disregard for procedural steps [12], even if the initial state of the NPP is the same. Revealing hidden characteristics in these operational sequences can help prevent human error and improve the safety of NPPs.

Currently, there are two approaches for analysing the operational sequences of NPP operators. One approach involves obtaining OEF subsequent to anomalies or accidents. However, the frequency of such cases is very low. For operations that do not generate anomalies or accidents, it becomes arduous to accurately determine the consequences for NPPs. In this context, if operators ignore characteristics that are hidden in operational sequences, the probability of HFEs is likely to increase. Another approach is the presently popular data-driven techniques such as neural networks, support vector machines, clustering, and others [13]. These techniques can extract operational characteristics of operators from data. Data-driven techniques have extensively been utilized in condition monitoring, fault diagnosis, detection of environmental radiation, and other related domains of NPPs [14]. The issue with this approach is that the model’s extracted features from data are often implicit and challenging to interpret, thus insufficient in aiding operators to enhance their operational performance for a particular case. Furthermore, the model’s generalization performance is limited due to the nature of the data, resulting in a situation where implementing a data-driven model in complex real-life scenarios is unfavourable for obtaining dependable operation recommendations.

Association rule mining is one of the main techniques of data mining, which can find frequent patterns, associations, and correlations among variables or items in a database [15]. Unlike the data-driven approach mentioned above, the method of obtaining frequent items in the data is explicit and does not involve model learning. This method can provide results in the form of rules that are intuitive and easy to understand. In addition, it exhibits strong applicability to various data sets and does not require excessive parameter tuning. These characteristics of association rules make them able be directly used to mine the operational patterns of operators from historical data, and effective conclusions and suggestions can be obtained through expert analysis. There has been some research into the use of association rule mining techniques for HFEs in NPPs. Jiang et al. used an association rule-based approach to assess the support and confidence level of HFEs, using the example of a steam generator tube rupture accident, to assist in reducing HFEs [16]. Zou et al. used association rule techniques to identify associations and causality among HFEs [1]. These studies have inspired us to consider using this technology to analyze the association between the operational sequence and some operating phenomena to help the operator improve operational performance and prevent human errors. In addition, with the increasing digitization of NPPs, a large amount of historical operating data is now available [17]. These data contain information on the condition of the equipment and the operations of the operators, and already provide the data basis for carrying out related research.

This paper proposes a method based on association rule mining to discover the operational sequence characteristics of operators through historical operating data of NPPs and reveal the impact of these operational sequences on NPPs. The proposed method conducts an association analysis on whether these operational sequences are conducive to the normal operation of NPPs.

2. Methodology

The outline of the proposed method is shown in Figure 1. It consists of three modules, which are data preprocessing, association rule mining, and rule postprocessing. Data preprocessing aims to generate state switching values that represent operation events and to segment the raw data into sequence datasets accordingly. Association rule mining is the central part, the ultimate aim of which is to discover original rules between operational sequences and operation events represented by the state switching values. The purpose of rule postprocessing is to obtain rules and accurate confidence for expert analysis and to reveal the impact of operational sequence on NPPs operating according to these rules and metrics. In Section 2.1, state switching values are proposed to adapt the association rule technique to varying task types. In addition, a sequence segmentation method is presented that accounts for the specific features of data obtained from the NPPs operating data. In Section 2.2, the technical principles of association rules are introduced, and our proposed approach is shown. In Section 2.3, we outline the process of revising association rule mining results to obtain a form suitable for expert analysis.

2.1. Data Preprocessing

An NPP has many different types of systems and equipment, and its data acquisition devices acquire and record thousands of major parameters [18]. These data variables can be divided into analog quantities and switching values. Analog quantities are some physical quantities that vary continuously over time, usually represented by numerical values with units. The state of the NPP system, such as pressure and flow, can be monitored by observing the changes in the analog quantities during operation. The switching value is a physical quantity with only two states, usually represented by 0 and 1. The switching value can be further divided into a control switching value and alarm an switching value. The control switching value reflects the intervention actions of the operator, e.g., the stop of a pump can be represented by 1 ⟶ 0, and the action of a control rod can be represented by 0 ⟶ 1. The alarm switching value is a variable that reflects whether the key operating parameter exceeds the safety limit. The original dataset for the method proposed in this paper is the time series data containing the above variable types, which is transformed into a segmented sequence dataset containing control switching values and state switching values after data preprocessing. For the convenience of understanding, Figure 2 shows how the proposed method changes the format of the data during process.

2.1.1. State Switching Value Generation

Since the alarm switching value describes the abnormal or accident state of the NPP in terms of a definite safety limit, the information in the period before and after the alarm switching value is triggered will be lost. Moreover, the scenarios for triggering the alarm switching value in actual operation are very limited. These factors are not conducive to data analysis and mining. Based on the prior knowledge of experts, this paper proposes the concept of state switching value, which is used to find the sample points that may tend to or are in an abnormal or accident state. The state switching value is a variable that describes whether the value of continuous variables such as temperature and pressure is in a state conducive to the normal operating of the NPP from the perspective of data distribution and is represented by 0 and 1. Where 0 represents the normal operating state, 1 represents the deviation from the normal operating state, and 0 ⟶ 1 and 1 ⟶ 0 indicate the state change. The state switching values are defined as follows.

Threshold definition method. Arrange the values of the continuous variable x in ascending or descending order, and then, according to the physical meaning of the variable, mark the first n% of the variable x as state 1, and the rest as state 0.

Differential definition method. Arrange the continuous variables x in chronological order, set the value of the variable x at the current moment to , and the value of the variable x at the previous moment to be , and perform the following calculations for each moment:

The resulting is then arranged in ascending or descending order, marking the first n% of moments of the variable as state 1 and the rest as state 0, depending on the physical meaning of the variable.

Moving average definition method. For continuous variables x in time order, let the value of variable x at the current moment be and the values of variable x at the previous moments be , …, , respectively. For each moment the following calculation is performed:

Then sort in ascending or descending order, according to the physical meaning of the variable, and mark the first n% of variable as state 1, and the rest as state 0.

The threshold definition method finds the states within the statistics that are closer or further away from the safety limit by applying constraints between the upper and lower limits of the alarm switching threshold. The differential definition method is used to observe states where the variables change instantaneously, either more rapidly or more slowly. The moving average definition method is suitable for analysing the parameters which are prone to drastic fluctuations and identifying the variables with strong nonlinear changes over some time. In practice, irrelevant and redundant variables can be removed in advance using a filter feature selection algorithm based on a defined state switching value to improve mining efficiency and reduce postprocessing difficulties [19].

2.1.2. Event Sequence Segmentation

Before mining association rules, the event sequence should be segmented into sequence datasets. The commonly used static time series segmentation methods are piecewise aggregate approximation (PAA) [20], discrete Fourier transform (DFT) [21, 22], symbolic aggregate approXimate (SAX) [23], and special points-based method [24, 25], etc. However, the process of segmenting the historical operating data of NPPs based on state switching values has the following difficulties.(i)Segmentation by time intervals only would result in different events appearing in the same sequence at the same time and affecting each other. Moreover, if different events are selected as research objects, different segmentation results will be obtained.(ii)If the time interval is set too long, it can lead to irrelevant events appearing in the same sequence and may produce some pseudo-rules with low support and confidence. In addition, setting the time interval too short can lead to some events that would otherwise be associated with being divided into different sequences, resulting in low confidence in the rule.(iii)There may be interference sequences in the event sequence, such as 0 ⟶ 1 and 1 ⟶ 0 that change frequently in the state switch value in a short time.(iv)If the segmented dataset has extra-long sequences, the efficiency of the sequence segmentation algorithm can be seriously affected.(v)If the whole sequence dataset is used as input to the association rule mining algorithm after segmentation, it will be too computationally intensive to complete the association rule mining algorithm.

Aiming at the above difficulties, a sequence segmentation method is designed in this paper to target the event sequence. The motivation for proposing this approach is that the issues mentioned above are of an engineering nature rather than an academic nature. Existing methods struggle with the need to efficiently generate segmentation sequences with different time window sizes and event sequences of different lengths from raw time series data. Therefore, the idea of event sequence segmentation is to consider only the moments where there is a change in the state switching values and to output them in two different sets according to whether they contain state switching values or not. Only the set containing state switching values is used for association rule mining.

Figure 3 shows the sequence segmentation process using only a single state switching value change as an example. The input data set D is the time series data, including the timestamp, the control switching values at each time, and the change of the state switching values (0 ⟶ 1 or 1 ⟶ 0). The process scans the database only once and records the moments and operational sequences where switching values change. The two parameters λ and μ are used to limit the sequence length and the event interval, facilitating expert analysis during rule postprocessing. With this method, two sets of sequences S and S′ are generated for each state switching value, where S is used as input to the association rule mining algorithm and S′ is used to update the confidence in rule postprocessing. When segmenting the input with multiple state switching value changes, only parallel processing is required.

2.2. Association Rule Mining

As it is the sequence data with a timestamp that reflects the operator’s operational sequence, the association rule mining module uses sequential pattern mining and sequential rule mining techniques to process the segmented sequence dataset.

Sequential pattern mining was proposed by Agrawal et al. to mine frequently occurring ordered events or subsequences as patterns [26]. The problem of sequential pattern mining can be briefly stated as follows [27].

Let be the set of k items. The sequence () is an ordered list in which each set of items represents events that occur at the same timestamp. For a sequence , we call α a subsequence of β if and only if there exists such that and , denoted as . For a given sequence data set , the support of a sequence α is the number of α contained in S. If the sequence α satisfies the minimum support threshold, α is a sequential pattern.

The key to association rule mining is to find all sequential patterns efficiently. In this paper, the idea of prefix-projected pattern growth proposed by Pei et al. is used to traverse the search space to enumerate all frequent sequences [28]. The main ideas are as follows: (i) The first scan of the database yields a set of sequential patterns of length 1. (ii) Each sequential pattern is regarded as a prefix, and the complete set of sequential patterns can be divided into different subsets depending on the prefix. (iii) To mine a subset of the sequential patterns, the corresponding projection database is constructed and recursively mined. Given the minimum support γ, the above process can be performed to obtain frequent terms, as shown in Figure 4 (lines 1–13).

Once the frequent sequences are known, they can be used to obtain rules describing the relationships between different sequence items [29]. In this paper, we use the association rule representation in [30], i.e., AB. The confidence of A is expressed as fr(AB)/fr(A) and fr(·) represents the frequency of sequence occurrence. Given the minimum confidence, we use the algorithm shown in Figure 4 (lines 14 to 18) to generate rules that satisfy the conditions. The rules have the form AB as described above, where A is a subsequence of B.

Figure 4 shows the ensemble algorithm for association rule mining. The data input of the algorithm is a set of segmented sequences according to the state switching values, and the set of segmented sequences with different state switching values can be combined as the whole input. Lines 1–13 represents the process that searches for sequential patterns, and the result is a set F composed of sequential patterns. Lines 14–18 is the process by which generates sequence rules based on the set F. The result is a set AR consisting of sequence rules, support, and confidence shown in Figure 2 (orange table). Due to the limitations of the output data format, the minimum confidence η of the algorithm needs to be set to 1 to find the sequence of operations associated with the state switching values, and the true confidence is updated in the rule postprocessing.

2.3. Rule Postprocessing

After association rule mining, each event corresponding to the change of state switching value (0 ⟶ 1 or 1 ⟶ 0) generates a rule set to be updated. Association rules for expert analysis are finally obtained through association rule filtering and confidence updating. For expert analysis, each action in the operational sequence carries a timestamp, as shown in Figure 5.

2.3.1. Association Rules Filtering

For rule AB, we call rule A an antecedent sequence and rule B a consequence sequence. The final purpose of association rule filtering is to make the antecedent sequence the operational sequence of the operator, and the consequence sequence the sequence that contains the sequence of operations and the event of a change in a state switching value after the sequence of operations has occurred. However, many of the rules generated by association rule mining may not satisfy the above purposes. Therefore, in this phase of association rule filtering, it is considered to filter the rules for each state transition set using the data format requirements. Filter the judged rule when the following occurs.(1)(2)(3)(4)(5)

2.3.2. Confidence Updating

Support and confidence are two important metrics of association rules. If support is too low, it means the rule is highly contingent. When the confidence is too low, the antecedent and consequence of the rule are associated to a low degree. The confidence updating aims to obtain the true confidence of the filtered rule through the sequence set S′ which does not contain the changes of state switching value generated in the process of event sequence segmentation. For the convenience of subsequent expert analysis, the part of the consequence sequence that is unrelated to the state switching value change events is removed, i.e., rules are represented by the operational sequencestate switching value changes.

The process of confidence updating is shown in Figure 6. By executing this process on the sequence (S′) without state switching value change and the corresponding association rule (AR) generated by each state switching value change, the association rule with updated confidence can be obtained. In this process, the parameter ε is used as the minimum confidence of the truth, and finally, the association rules that meet the requirements are obtained.

3. Empirical Study

To verify the effectiveness of the method proposed in this paper, the historical data of the cold start-up were used to carry out empirical research, analyze the operational sequence of operators, and put forward guidance opinions that can help improve the performance of operators and avoid HFEs [31]. The cold start-up of the NPP is the process of moving a nuclear reactor from a cold state to stable power. It can be roughly divided into 3 stages [32]: (1) system preparation, (2) subcritical state to critical state, and (3) heat-up. This process involves the extensive operation of pumps, valves, electrical heaters, and control rods and requires consideration of the appropriate coordination of the operating states of various subsystems [3133]. At present, the start-up control of reactors is mainly realized manually by operators, which takes a long time and places a heavy burden on operators, which is prone to HFEs.

3.1. Description of the Data

To ensure the credibility of the conclusions obtained from association rule mining, we extracted 20 batches of cold start-up data from the historical operating data of a commercial NPP. No operation errors or alarms are known to have occurred with these data. We set the start time of each batch of cold start-up data as the time of the first control rod action (the lifting order of the control rod is represented as I⟶II⟶III), and the deadline time as the time when the reactor reaches stable power operation. The total sample size of the extracted data was 357595.

3.2. Result of Data Preprocessing

During the 3 Stages involved in a cold start-up, the operator is primarily concerned with the response of the neutron period of the source range channel in Stage 1, the changes in pressure in Stage 2, and the changes in temperature, pressure, and flow in Stage 3. We used the threshold definition method, the differential definition method, and the moving average definition method to define the source range channel-related state switching values, and the differential definition method to define the state switching values related to the pressure of the pressurizer, the primary loop temperature, and flow, depending on the distribution characteristics of the parameters. The detailed description of the defined state switching values is shown in Table 1.

Based on the generated state switching values, we reduced the data dimensionality to 272 using the feature selection algorithm proposed in [34]. The sequence segmentation method in Section 2.1 is then executed, where the maximum time interval is set to 100 sec, and the maximum sequence length is set to 30.

3.3. Results and Analysis of the Association Rule

In this study, the threshold γ of minimum support was set to 3 to remove rare association rules. Although setting a low threshold may affect the efficiency of the algorithm, it is meaningful to reveal the influence of the operational sequence on the NPP operation. For example, rules with long sequences often have low support, but they may have high confidence and can be analyzed in comparison with subsequences with high support. The true minimum confidence ε was set to 0.6 to remove weak association rules. Finally, a total of 164 original association rules are generated for further analysis, some of which are shown in Table 2.

All association rules are grouped according to the state switching value changes in the consequence sequence, and the rules are presented in the form of operational sequencestate switching value changes. After careful examination of all the association rules, four typical cases found are discussed by experts. It should be noted that the results and recommendations generated by the discussion are only applicable to the source of data collection.

Case 1. Close the emergency shut-down signal on the source range channel.
Following the operating instruction, the operator should close the emergency shut-down signal of the source range channel when the corresponding threshold is exceeded [35]. Rules 1–6 show that the operation of closing this signal has high support and confidence with the 0 ⟶ 1 change of the state switching value related to the source range channel (point 2) defined by the moving average definition method. The maximum support of the relevant rule is 18, indicating that the phenomenon did not occur only 2 times out of 20 batches of cold start-up data. Furthermore, the rules show that the more frequently the operator withdraws the control rod in the period before performing the operation to close this signal, the lower the support and the higher the confidence in the 0 ⟶ 1 change of the defined source range channel (point 2) with the moving average state switching value. In addition, in the rules related to this operation, although there are rules in which the state switches to 1 ⟶ 0, the operation is the last item in the antecedent sequence.
The above findings suggest that the operation of closing the emergency shut-down signal of the source range channel causes a local peak in the source range channel. This peak is influenced by the rate at which the control rod is withdrawn before the closing operation. The higher the rate, the more likely the phenomenon is to occur. After a short period, the source range channel value gradually returns to normal, and this recovery process is unaffected by the subsequent withdrawal of the control rod. We make the following recommendations based on these rules.(i)The operator should carefully monitor the change in the source range channel value for a while before and after closing the emergency shut-down signal of the source range channel until the source range channel value stabilizes.(ii)Before closing the emergency shut-down signal of the source range channel, the operator should reduce the control rod withdrawal rate by reducing the frequency of withdrawal or the duration of a single rod withdrawal to avoid greater fluctuations caused by this operation.

Case 2. Withdraw control rods from the lower limit position.
Before the cold start-up of the reactor, each control rod is located in the lower limit position. To achieve a cold start-up, multiple sets of control rods need to be withdrawn to the specified position. Rules 8–11 show that withdrawing the control rods of II and III from the lower limit position has a certain correlation with the change of 0 ⟶ 1 of the state switching values of the source range channel defined by the threshold definition method and the differential definition method, and the maximum support of the corresponding rules is 6.
According to prior knowledge, control rods I and II need to be withdrawn before control rods II and III are withdrawn from the lower limit position. The above findings indicate that when one set of control rods is in the lower limit position, the continuous operation of the control rod withdrawal with the last set of control rods will increase the change rate of the source range channel. This phenomenon is not conducive to the safe operation of the NPP. Accordingly, we propose the suggestion that when a set of control rods is in the lower limit position and needs to be withdrawn, the operator should avoid continuous withdrawal operations with the previous set of control rods, thus reducing the average withdrawal rate of multiple sets of control rods and preventing a large source range channel rate of change.

Case 3. The source range channels of points 1 and point 2 are not uniformly affected by the same operation.
Based on Case 1 results, it can be further found that the number of state switching values containing the source range channel definition of point 1 in the rule is small, and the support and confidence are low. This demonstrates that the nonsmooth variation in the source range channel due to the operations of the operator in the rules generated by the mining is mainly reflected in point 2. Therefore, operators should pay more attention to the source range channel of point 2 during the operation of the cold start-up.

Case 4. No rules related to the state switching values defined by the primary loop temperature and flow rate are generated.
It is worth noting that although the differential definition method was used to define the state switching values related to the pressure of the pressurizer, temperature, and flow of the primary loop, it did not produce valuable rules related to changes in these state switching values. According to the mining results of these historical operating data, if there is a large instantaneous change rate of the pressure of the pressurizer, temperature, and flow of the primary loop under the premise of obeying the operating instruction during the cold start-up, the operator can preliminarily determine that it is not caused by the operations of the operator. This is when the system may be abnormal or faulty, and the cause needs to be further identified.

4. Conclusions

Nowadays, NPPs have accumulated a large amount of historical operating data, from which it is valuable to reveal the impact of operational sequence on their operation. In this paper, a method based on association rule mining is proposed to analyze the operational sequence characteristics of operators and their impact on the NPPs operating. We verify the effectiveness of the proposed method using 20 batches of cold start-up historical operating data. The results show that the raw data can be converted into segmented sequence data sets according to the defined state switching value. 164 original association rules were obtained using the accompanying mining technique and its postprocessing solution. These rules reveal some valuable operational issues, such as the effect of control rod action on the neutron period under specific conditions.

The advantage of the proposed method is that it can flexibly mine the association between the operational sequences of the operator and the operation phenomenon under study through the artificially defined state switching value. In particular, it is possible to mine operation phenomena that do not trigger an alarm but may harm the safe operation of the NPP. The rules mined can also be used to guide the operations of operators to avoid the recurrence of these phenomena. It is important to note that the results obtained based on association rule mining do not indicate a causal relationship between the operations and the operational phenomenon to which the rule corresponds, and further analysis of the rule is necessary. Nevertheless, this approach can be used to mine rule patterns as long as historical operating data for several batches (the same scenario) is available. The proposed method could be further explored for extended applications in the future. First, the proposed method can be improved for online data mining. Second, the training data of the operator is extracted, and the operations of the operator are evaluated by this method. Third, it is used to study the correlation between different operation phenomena of NPPs and explore the implied nonlinear relationship between different parameter variables.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.