Abstract

Recently, the Reconfigurable FSM has drawn the attention of the researchers for multistage signal processing applications. The optimal synthesis of Reconfigurable finite state machine with input multiplexing (Reconfigurable FSMIM) architecture is done by the iterative greedy heuristic based Hungarian algorithm (IGHA). The major problem concerning IGHA is the disintegration of a state encoding technique. This paper proposes the integration of IGHA with the state assignment using logarithmic barrier function based gradient descent approach to reduce the hardware consumption of Reconfigurable FSMIM. Experiments have been performed using MCNC FSM benchmarks which illustrate a significant area and speed improvement over other architectures during field programmable gate array (FPGA) implementation.

1. Introduction

Digital signal processing (DSP) [13], pattern matching [4], and circuit testing [5] are the primary applications for most of the digital systems. These applications require a hardware-oriented as well as high-speed control unit. A finite state machine (FSM) is an integral part of any complex digital system. Its inputs are multiplexed to make it hardware oriented, which is known as the finite state machine with input multiplexing (FSMIM). It serves as a control unit, and its operating speed determines the processing speed of the system. The applications as mentioned earlier can be observed as cascaded stages (i.e., multistage) of operations [2], where each stage requires a specific FSM. Hence, a Reconfigurable FSM is investigated in the literature for optimal performance in such applications [6, 7]. A Reconfigurable FSM is defined as a single FSM, which acts as one of the FSMs from the set (i.e., set of FSMs for a specific application) by applying particular mode bits. Its implementation is performed on field programmable gate array (FPGA) platforms [6].

The Reconfigurable FSMIM architecture is created by joining (A) Conventional FSMIM architecture [8] and (B) multiplexer bank (which defines the mode based reconfiguration). The optimal synthesis of both the constituting elements is done by Iterative greedy heuristic based Hungarian algorithm (IGHA) [6]. An efficient state encoding technique for an FSM serves as a vital tool to optimize the hardware utilization while implementing on an FPGA platform [9, 10]. In the case of Reconfigurable FSMIM, the state encoding of the constituent FSMs altogether affects the look-up table (LUT) requirement of the Reconfigurable FSMIM [6].

The major problem concerning IGHA is the disintegration of a state encoding technique. It uses binary state encoding as a default state assignment technique for operation. The state assignment method for the Reconfigurable FSMIM architecture leads to an optimization problem [6]. To the best of the authors’ knowledge, all the state assignment techniques proposed in the literature provide state codes only for a single FSM. Therefore, the objective of this work is the integration of IGHA with an optimal state encoding technique to reduce the hardware consumption of Reconfigurable FSMIM on an FPGA platform.

In the literature, another direction in the implementation of an FSM is RAM-based architectures. The following three types of RAM-based FSM architectures are studied [11]: (a) basic RAM-based FSM architecture, (b) RAM-based FSM architecture with transition-controlled multiplexers, and (c) RAM-based FSM architecture with state-controlled multiplexers. In the basic RAM-based FSM architecture, bits are stored in the form of words. For each transition (i.e., present state combined with the external inputs), the outputs and the state assignment bits for next state are stored in the RAM-word memory [12, 13]. The RAM size required for basic RAM-based FSM implementation is enormous. Hence, to reduce the RAM depth, RAM-based FSM architecture with transition-controlled multiplexers is used. It consists of an input selector bank, which provides active inputs from the external inputs for selecting a particular state [11]. RAM-based FSM architecture with state-controlled multiplexers is used to reduce the RAM size further. It consists of two separate RAM blocks, out of which the smaller RAM block is assigned to operate the input selector bank [11]. Thus, designing such architecture is very complicated.

In this paper, the Improved Reconfigurable FSMIM architecture is proposed, which surmounts the issue of high LUT consumption during FPGA implementation. The proposed architecture is formed using the improved iterative greedy heuristic based Hungarian algorithm (Improved-IGHA). The Improved-IGHA is the integration of IGHA with the state assignment using logarithmic barrier function based gradient descent approach.

To validate the proposed approach, experiments have been performed using MCNC FSM benchmarks [14]. Experimental results for the proposed architecture illustrate a significant area reduction by an average of 20.38% and speed improvement by an average of 32.73% over VRMUX [11] during FPGA implementation. It also demonstrates an adequate area reduction by an average of 16.05% and speed improvement by an average of 1.77% over Reconfigurable FSMIM-S architecture [6] during FPGA implementation. When these results are compared with CRMUX [11], a speed improvement by an average of 11.06% is obtained. The proposed architecture requires an average of 58.38% more LUTs as compared with CRMUX [11] during FPGA implementation. It is the only trade-off for the proposed design.

The remainder of this article is formed as follows. The research problem formulation is made in Section 2. Section 3 consists of state assignment using logarithmic barrier function based gradient descent approach and an illustrative example. Experimental setup and comparative analysis of this work with the literature are devised in Section 4. In the end, concluding remarks are drawn in Section 5.

2. Problem Formulation

Recently, the Reconfigurable FSM has drawn the attention of the researchers for multistage signal processing applications. A novel framework for the creation of Reconfigurable FSMIM is given in [6].

A Mealy FSM is represented in a vector form, such as where set of states; set of input variables; set of output variables; transition function; output function; initial state.

Moreover, the following variables are defined to illustrate the complete functionality of an FSM: any instantaneous state where; binary state code for the, state ; set of number of transitions per state corresponding to ; number of transitions per state where ; the minimum length of a binary-state code,

The Reconfigurable FSMIM is defined as a single FSM, which acts as any one of the FSM from the set (i.e., set of FSMs for a specific application) by applying particular mode bits. A set of FSM for a specific application is chosen, where the largest FSM (i.e., the FSM with the highest total number of transitions, states, and inputs) in the set and   rest of the FSMs in the set. -mode is the default mode of operation for the Reconfigurable FSMIM [6].

The Reconfigurable FSMIM architecture is created by joining the following two parts: (A) Conventional FSMIM architecture [8], & (B) Multiplexer bank (which defines the mode based reconfiguration). The optimal synthesis of the Multiplexer bank is done by iterative greedy heuristic based Hungarian algorithm (IGHA) [6]. At the last phase of IGHA, state transitions of each constituent FSM of the Reconfigurable FSMIM architecture are presented in Figure 1. Therefore, the state encoding of the constituent FSMs altogether affects the LUT requirement of the Reconfigurable FSMIM architecture. At the end of IGHA, a modified description of a single FSM (i.e., ) is obtained which is used to create the Conventional FSMIM part [6].

In FSM implementation on an FPGA platform, state encoding technique acts as a tool for minimizing the hardware consumption [9, 10]. For example, an MCNC FSM benchmark requires 82 LUTs when implemented on a Xilinx xc6vlx75t-3 device (Virtex-6) using the Grey encoding technique. But it needs only 41 LUTs on the same platform using the binary encoding technique.

The major problem concerning IGHA is the disintegration of a state encoding technique. It uses binary state encoding as a default state assignment technique for operation [6]. The state assignment method for the Reconfigurable FSMIM architecture leads to an optimization problem as evident from Figure 1. To the best of the authors’ knowledge, all the state assignment techniques proposed in the literature provide state codes only for a single FSM.

Therefore, the objective of this work is the integration of IGHA with an optimal state encoding technique to reduce the hardware consumption of Reconfigurable FSMIM on an FPGA platform.

3. Methodology

This work is an extension of work presented in [6]. Hence, all the variables from [6] are used in the same context throughout the article. An improved version of IGHA (Improved-IGHA) is proposed. It addresses the issue of optimal state encoding.

A recent body of literature has investigated the performance of three fundamental types of state encoding techniques on an FPGA platform [9]. The studied methods are as follows: (a) structural approaches, (b) heuristic approaches, and (c) pragmatic approaches. Out of these three approaches, structural state encoding technique outperforms on an FPGA platform [9, 10]. It uses the knowledge of internal structure (i.e., state transition) of the FSM to generate optimal state codes. Therefore, structural information of FSMs is considered to develop the proposed state encoding technique for the Reconfigurable FSMIM.

The structural information of the Reconfigurable FSMIM (i.e., state transition) is obtained from Figure 1. Hence, a unified weight matrix is defined by adding the weight of all component FSMs for the same corresponding states. It is given in (1).

The mathematical formulation of the cost function for an FSM is given in [15]. It uses the structural information (i.e., state transitions) of the particular FSM. Let element of weight matrix and   be the hamming distance between two particular state codes. is obtained by counting the number of 1’s after an exclusive-OR operation between the binary state codes as shown in Figure 2. Therefore, from the literature [15], the cost associated with a particular set of state codes (i.e., ) is defined by (2).

3.1. State Assignment Using Logarithmic Barrier Function Based Gradient Descent Approach for the Reconfigurable FSM

Let the graph described by (2) be , where ( i.e., ) indicates the edge weights between the nodes &   ( i.e., columns of ) represents the set of nodes. Hence, each node corresponds to a particular binary state code because opts only the binary labels. symbolizes the total number of nodes in the graph .

Let a hypercube be characterized as , where is the dimension, is the set of edges, and is the set of vertices of the hypercube [16]. The cardinality of and is given in (3) and (4), respectively.

Now, the concept of hypercube embedding is used to reduce (2). An embedding is performed from graph onto a hypercube as described earlier [16, 17]. It is defined as which is a one-to-one mapping function. Consequently, -binary -vectors are defined as in (5). Thus, if a node of graph (i.e., ) is expressed by a binary state code, the corresponding vertex of the hypercube (i.e., ) is represented by the same binary state code.

In a hypercube, represents the hamming distance between and . It is shown in (6), where is the instantaneous value of . The value of varies between and . Therefore, the cost function is reduced to (7) using hypercube embedding.

The objective is thus confined to minimize the cost function given in (7). Evidently, it is a discrete optimization problem, where each state can opt only a particular binary state code.

The convergence of Improved-IGHA depends on the convergence of its constituent algorithms, i.e., IGHA and the applied state assignment technique. Therefore, an algorithm with a high convergence speed is preferred to construct the state assignment technique for Improved-IGHA.

The evolutionary technique, such as genetic algorithm (GA), presents a significant shortcoming as its convergence speed slows down near the global optimum [18, 19]. Similarly, particle swarm optimization (PSO) and differential evolution (DE) operate with a high convergence rate but offer premature convergence which is a critical drawback [20, 21]. In the literature, penalty-based approaches, such as Lagrangian technique and logarithmic-barrier function (LBF) method, have proven their potentials to obtain the optimum solution with a high convergence speed [22, 23]. These methods are advantageous in solving a discrete or combinatorial optimization problem [24, 25].

Therefore, the LBF-based Gradient descent approach is adopted to construct the state assignment technique for Improved-IGHA. It is an interior point method that assures the feasible solution. The mathematical formulation of the cost minimization function is performed by LBF. Then, it is reduced iteratively by the gradient-projection approach. The flow chart for the Improved-IGHA is presented in Figure 3.

In LBF technique, the search operation is performed in a continuous space domain to deduce the optimal points. Then, these points are discretized to obtain the optimal solution [26, 27].

In LBF method, an objective function subject to inequality constraints is given in

The logarithmic barrier function to minimize the cost function (as in (7)) is given in (9). In LBF search, for any move which omits the constraints, the second term serves as a barrier [28] as shown in

At the iteration , (9) is defined as shown in

Initially, LBF selects a feasible and . Then, it chooses , where . This iterative process goes on until reaches an adequately small value.

A full-fledged method is required to solve (10) with respect to . A first-order gradient-projection approach [29] is well-suited for iteratively minimizing (10). In this approach, the model parameters (a.k.a. weight vectors) are evaluated to minimize the objective function when an analytical calculation is not possible [30, 31]. In this approach, the underlying representation of the objective function of the problem is given in

An iteration of this projection method is defined by (12). In (12), denotes the step size. is chosen to be a small positive real number [29].

Thus, small steps (i.e., ) are taken in the negative gradient direction of the objective function as illustrated in (12). Then, (13) is used to outline the value of on the constraint surface at the next iteration (i.e., ).

The convergence criterion for this iterative process is defined by (14), where .

In this way, embedding problem is reduced to the determination of -binary -vectors (as shown in (15)) which optimizes the cost function (i.e., (7)).

Hence, the cost function (from (7)) is defined in terms of Hamming distance as shown in

The constraint (i.e., boundary condition) for this problem is formed, such as any two vertices on hypercube should not contain the same binary state code (i.e., ). Hence, the mathematical representation of the constraint is presented in

By applying (16) and (17) on (9), the objective function for LBF is reduced to

Therefore, the entity (from (13)) is defined by

The evaluation of the derivative term (i.e., ) is required to move in the gradient descent direction as shown in (12). The needed derivative term is obtained by putting (20), (21), (22), and (23) into (18). Hence, is defined by (24).

By applying (19) into (13), the normalized vector is defined as shown in

If (14) is satisfied, a solution vector which is defined as is obtained at the end of the iteration. Therefore, the required set of state codes (i.e., ) is deduced by discretizing using

The pseudocode for the proposed state assignment approach is presented in Algorithm 1.

Input:the objective function defined by Equation (7)
Output: (i.e., the final state code vector)
begin
Initialization: ;
;
while ()do
repeat
for
;
(12)
(24)
end
return
;
evaluate
(26)
(7);
if (
)then
;
else if (
)then
;
end
until
;
end
return;
end
3.2. An Illustrative Example for the Improved Reconfigurable FSMIM Architecture

The following MCNC FSM benchmarks [14] are considered to demonstrate the steps involved in the creation of the Improved Reconfigurable FSMIM architecture:(1) (description is provided in Table 1)(2) (description is provided in Table 2)

The improved Reconfigurable FSMIM architecture is created by joining (A) Conventional FSMIM architecture and (B) Multiplexer bank (which defines the mode based reconfiguration). The optimal synthesis of the Multiplexer bank is done by the proposed Improved-IGHA. At the end of the proposed algorithm, a modified description of a single FSM (i.e., ) is obtained which is used to create the Conventional FSMIM part [6]. The Improved-IGHA consists of the following steps:(i)Initialization (Define   and  ): is selected as because its complexity is greater than as observed from their descriptions. Consequently, acts as .(ii)Input and State Matching using Hungarian Algorithm: Input and state matchings are performed together using Algorithms 1, 2, 5, and 6 from [6]. Combinations of input lines of (i.e., ) are generated. For the first combination , states are matched as , , , , , , , , , , and . It offers zero   and  . For the second combination , states are matched as , , , , , , , , , , and . It also offers zero   and  . Therefore, the first combination is finalized to match with .(iii)Dummy State and Position Replacement: The replacements of the dummy states and positions in   and   are performed using Algorithm 3 from [6]. The replaced dummy states (highlighted in “bold italic font”)  and dummy positions (highlighted in “bold font”) are presented in Tables 3 and 4.(iv)Output Matching using Bitwise-XOR Operations: Output Matching is not required in this case, as there is a single output line in as well as in .(v)Update the descriptions of FSMs: The updated descriptions of   and   are presented in Tables 3 and 4, respectively.(vi)State assignment using logarithmic barrier function based gradient descent approach for the Reconfigurable FSM: The pictorial representation of state transitions for   and   (from Tables 3 and 4) is given in Figure 4. Therefore, the weight matrix is formed using (1). It is given in The proposed state assignment algorithm starts by considering the binary state codes as an initial solution. It offers the cost as 62 (from (2)).At the iteration, the instantaneous value (from previous iteration, ) is obtained as defined by The derivative (from (24)) is evaluated as defined by So, the current value of (i.e., ) is obtained from (12). It is given in (30) by choosing (a very small value). Then, is directed towards the unity radius hypersphere. It is given in The required set of state codes is deduced as , , , , , , , , , , and by discretizing the current value of using (26). Hence, the cost is reduced to 48 (from (2)).

In the end, a Bitwise-XOR operation is performed between the updated descriptions of   and  . It provides the Multiplexer bank (i.e., part-B). The updated descriptions of are used to construct the Conventional FSMIM part (i.e., part-A).

4. Numerical Results and Discussions

To validate the proposed approach, experiments have been performed using MCNC FSM benchmarks [14]. MATLAB (2016b) environment is used to implement the proposed Improved-IGHA. It produces the optimized description for the constituting parts of the Improved Reconfigurable FSMIM architecture. The obtained description is then converted into the Verilog HDL code using MATLAB HDL Coder tool-box. The implementation of the Improved Reconfigurable FSMIM architecture is performed on the Virtex-6 speed-3 device as in [6, 11]. The configuration of the workstation to execute computations is as follows: Intel(R) Core i7 (6th Gen), 16 GB RAM, and 3.5 GHz CPU.

In Improved-IGHA, combinations of input lines, states, and output lines are generated using permutation to perform input, state, and output matching, respectively. The number of input and output lines used for matching is restricted to 7 (i.e., combinations) to utilize the resources efficiently. Hence, the information content of an input/output line becomes the criteria for selection. An input/output line with high information content is preferred.

The following MCNC FSM benchmarks [14] are selected to illustrate the implementation of the Improved Reconfigurable FSMIM architecture and present its comparative analysis with the existing literature: , , , , , , , , , , , and .

is chosen as (i.e., the circuit added at the iteration of Improved-IGHA), as it is more complex (i.e., the total number of transitions is high) as compared with the other FSMs in the set. The other FSMs in the set are added iteratively in the design in their respective order.

In an FSM, a specific state is chosen only if a particular set of input bits (i.e., 1’s or 0’s) are present. Hence, the percentage of 1’s and 0’s together in an input line acts as information content as shown in Table 5 (the selected input lines to match with are highlighted). Similarly, the output is always defined by “1.” Hence, the percentage of 1’s in an output line serves as information content as shown in Tables 6 and 7 (the selected output lines to match with are highlighted).

At the first phase of Improved-IGHA, input and state matching are performed together, and optimal assignments (with respect to ) are made. It is presented in Table 5. All the states are mapped onto states in their respective order. Output matching (with respect to ) is performed iteratively by Bitwise-XOR operations. It is presented in Tables 6 and 7. Then, after updating the descriptions of constituting FSMs, the state assignment using logarithmic barrier function based gradient descent approach is performed.

To present a comparative analysis of the total computation time required by IGHA [6] and Improved-IGHA, an inbuilt feature in MATLAB named “stopwatch timer” is used. It evaluates the elapsed time (i.e., the execution time between the starting and stopping of a function). As evident from the literature [6], linear assignment problems (LAPs) are solved several times by IGHA to perform matchings among all generated combinations to add iteratively. The convergence period of IGHA to solve a single LAP ranges from ms to ms. Hence, the total elapsed time taken by IGHA (i.e., ) is given in (32). The convergence time for the state assignment using LBF-based gradient descent approach (i.e., ) to add iteratively is given in Table 8. Therefore, the total elapsed time taken by Improved-IGHA (i.e., ) is an addition of   and   (from Figure 3). It is presented in Table 8.

The experimental results presented in Table 8 illustrate that the total computation time required by IGHA is far higher than the convergence time for the proposed state assignment technique (i.e., ). Therefore, the total computation time required by Improved-IGHA is equivalent to the total computation time needed by IGHA (i.e., ).

Convergence plot for the state assignment using logarithmic barrier function based gradient descent approach after adding the last constituting FSM in the proposed architecture is presented in Figure 5. It starts by taking binary state codes as an initial code. The cost offered to the proposed architecture is calculated by (2). It converges to 200 iterations. The cost is reduced from 1028 to 923 as shown in Figure 5. Consequently, at iteration, the following state codes are obtained: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and .

At the last phase of Improved-IGHA, a mutual Bitwise-XOR operation is conducted between the updated descriptions of FSMs. Therefore, the constituting parts of the proposed architecture are created. The individual share of constituent FSMs in the Improved-Reconfigurable FSMIM architecture is determined by the difference between the occupied LUTs in the recent and its previous iteration. After adding all the constituting FSMs in the proposed design (i.e., at the last iteration), the total LUT consumption and operating frequency are obtained. It is presented in Table 9.

Experimental results for the proposed architecture illustrates a significant area reduction by an average of 20.38%  and speed improvement by an average of 32.73% over VRMUX [11] during FPGA implementation. It also demonstrates an adequate area reduction by an average of 16.05%  and speed improvement by an average of 1.77% over Reconfigurable FSMIM-S architecture [6] during FPGA implementation. When these results are compared with CRMUX [11], a speed improvement by an average of 11.06% is obtained. The proposed architecture requires an average of 58.38% more LUTs as compared with CRMUX [11] during FPGA implementation. It is the only trade-off for the proposed design. A comparative analysis of the hardware consumption and maximum operating frequency variation on FPGA implementation is presented in Figures 6 and 7, respectively.

5. Concluding Remarks

This article furnishes the framework for the Improved-Reconfigurable FSMIM architecture. The Improved-Reconfigurable FSMIM architecture is created by joining the following two parts: (A) Conventional FSMIM architecture and (B) Multiplexer bank (which defines the mode based reconfiguration). An improved version of iterative greedy heuristic based Hungarian algorithm (Improved-IGHA) is proposed to establish the constituting parts as mentioned earlier. Improved-IGHA is an integration of IGHA [6] and a state assignment using logarithmic barrier function based gradient descent approach. It reduces the hardware consumption of the proposed architecture by performing an optimal state encoding. An illustrative example using MCNC FSM benchmarks is also given to demonstrate the steps involved in the creation of the proposed architecture.

The proposed architecture illustrates a significant area reduction by an average of 20.38% and speed improvement by an average of 32.73% over VRMUX [11] during FPGA implementation. It also demonstrates an adequate area reduction by an average of 16.05% and speed improvement by an average of 1.77% over Reconfigurable FSMIM-S architecture [6] during FPGA implementation. When these results are compared with CRMUX [11], a speed improvement by an average of 11.06% is obtained. The proposed architecture requires an average of 58.38% more LUTs as compared with CRMUX [11] during FPGA implementation. It is the only trade-off for the proposed design.

Further, the proposed architecture will be investigated to develop an efficient architecture for multistage signal processing [1, 2] and circuit testing [5] based applications.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The datasets generated during and/or analyzed during the current study are available in [6] repository [DOI: 10.1155/2018/6831901]. This work is conducted in the Department of ECE, SRM Institute of Science and Technology, Kattankulathur-603203, Chennai, India.