Research Article

Multiscale Time-Frequency Sparse Transformer Based on Partly Interpretable Method for Bearing Fault Diagnosis

Algorithm 1

Training of MTFST.
Input: Three multiscale TFRs , , where , , , and , which denote the fault types.
(1)Set training batch , training epoch max_epoch, token embedding dimension , self-attention weight matrix size , number of head , positionwise forward network weight matrix size , block number of encoder , block number of decoder , and number of fault types .
(2)Initialize trainable parameters of MSTFT
(3)for epoch in 1, 2, …, max_epoch do
(4)for step in 1, 2, …, max_step do
(5)  //Tokenizer
(6)  for each in , in and in do
(7)   Reshape , , to  = ,  =  and  =  then slice into patches sequence [], [], [];
(8)   Add position encoding, obtain , , ;
(9)  end Stack batches, obtain sequences , , .
(10)  //Encoders
(11)  for in 1, 2, …, do
(12)   ,
(13)   ;
(14)   ,
(15)   ;
(16)   ,
(17)   .
(18)  end
(19)  //Decoder
(20)  for in 0, 1, 2, …, do
(21)   If (block = = 0)
(22)    ,
(23)    ;
(24)   else
(25)    ,
(26)    ;
(27)  end
(28)  //Classifier
(29)   Obtain feature matrix ;
(30)  ;
(31)  ;
(32)  Batch loss ;
(33)  Calculate gradients , ;
(34)  Update parameters , ;
(35) end
(36)end
Output: Weights and biases