Abstract

The thermodynamic properties of pure compounds are relevant data for process systems engineering. Different first-order group contribution models have been reported in the literature to calculate these properties and they are also widely employed in commercial process simulators. However, they may have some limitations and, consequently, a reliable comparison of these models is required to analyze their performance and to determine the best alternative for the calculation of pure compound properties. This paper reports the implementation and evaluation of several first-order group contribution models to calculate the normal boiling point and critical properties (temperature, pressure, and volume) of pure compounds. The performance of these models was characterized and compared for several compound families using a standardized approach to determine their group contributions and parameters. An artificial neural network model was also applied and assessed to improve the estimations obtained with the best group contribution models. Results showed that the calculation of critical temperature was challenging for several compound families where AARD values ranged from 0.05 to 56.28%, while the group contribution models were more accurate to estimate the critical volume with AARD values ranging from 0.48 to 35.99%. This study allows us to identify the limitations and gaps of this type of thermodynamic models with the objective of improving its performance for the calculation of pure compound thermodynamic properties. The findings of this study can help to enhance the capabilities of thermodynamic models for the calculation of the normal boiling point and critical properties of pure compounds, which are relevant for the process systems engineering of new operations and products.

1. Introduction

Process design, simulation, control, and optimization require the knowledge of the thermodynamic properties of pure compounds and their mixtures. For example, the critical properties are employed as input parameters to predict the volumetric and phase equilibria behavior of pure compounds and mixtures using cubic EoS in process simulators [1]. These properties establish limits to the operating conditions of equipment and processes. The characterization of the critical properties for the synthesis and application of new substances is a relevant issue from the perspective of process systems engineering. However, several authors have indicated that the experimental quantification of critical properties is time-consuming, potentially expensive for new compounds, and may show other limitations (e.g., some compounds with high molecular weight can degrade before reaching the critical point) [25]. Therefore, the development of thermodynamic models to calculate the pure compound critical properties is fundamental for process design [610].

The calculation of critical points with group contribution models (GCMs) is usually straightforward and offers additional advantages [1, 5, 79, 1121]. These models are versatile, easy to implement, and do not require a substantial computational effort. GCMs can estimate the property of a compound from the individual contributions of functional groups that conform its molecular structure [16]. They assume that the contribution of each functional group will be the same for any molecule thus allowing to expand its application for the prediction of almost any type of chemical substance [22].

Different GCMs have been proposed to calculate the normal boiling point and critical properties of pure compounds [1,9,11,12,14,16,2327]. An extensive review and description of these models was reported by Su et al. [9]. One of the first GCMs for the estimation of critical properties of pure compounds was proposed by Lydersen [23]. Then, Joback and Reid [11] proposed different first-order GCMs to calculate several properties of pure compounds including critical properties and normal boiling point. Another set of GCMs which considered two levels of analysis for the estimation of group contribution values were proposed by Constantinou and Gani [12]. In these models, the basic level used first-order group contributions constituted by simple functional groups and the higher level included second-order groups. This approach allows for distinction between isomers; however, its implementation is more difficult in comparison to the first-order models as those proposed by Joback and Reid [11]. Other GCM that considered group-interaction contributions (i.e., the contributions of interactions between bonding groups in the molecule) for the calculation of critical properties and normal boiling point of pure organic compounds were reported by Marrero-Morejón and Pardillo-Fontdevila [14]. The experimental database used to develop this model included 507 organic compounds where 39 groups were considered. Marrero and Gani [1] proposed another GCM that determined the contributions at three levels. Specifically, the first level utilized contributions from simple groups, and the second and third levels included polyfunctional and structural groups that supply more information about the molecular fragments.

Retzekas et al. [28] proposed a GCM to calculate the normal boiling point, critical temperature, and pressure of hydrocarbons. This model involved compound density as an additional parameter. A second-order GCM for the prediction of the critical temperature of pure compounds was introduced by Dalmazzone et al. [15]. They used an experimental database constituted of 381 organic compounds including hydrocarbons and their derivatives containing oxygen, chlorine, nitrogen, bromine, and sulfur. Wang et al. [25], Wang et al. [26], and Jia et al. [27] published a set of GCMs for the estimation of critical properties (temperature, pressure, and volume) of organic compounds. But these models were only applicable to compounds with relatively small molecular weights and for those compounds with carbon chains from C2 to C18 that contain oxygen, nitrogen, sulfur, chlorine, and bromine groups. These authors used an experimental database with 467, 232, and 219 values of critical temperature, pressure, and volume, respectively, of organic compounds to determine the group contributions. Recently, Ghasemitabar and Movagharnejad [29] developed a second-order GCM for the normal boiling point of pure organic compounds including hydrocarbons with nitrogen, oxygen, sulfur, fluorine, chlorine, bromine, and iodine atoms. This model applied the first-order groups reported by Joback and Reid [11] but included additional second-order groups that allowed to distinguish between isomers. A similar approach to obtain the critical properties of pure organic compounds was proposed by Tahami et al. [5]. They used an experimental database with 969, 715, and 539 data points of critical temperature, pressure, and volume, respectively.

On the other hand, some artificial intelligence tools such as artificial neural networks (ANN) have been used for the modeling of difficult nonlinear systems thus proving their numerical capabilities and reliable performance [3033]. For the case of thermodynamic property modeling, ANN has been also applied with GCMs to obtain better estimations of the normal boiling point and critical properties of pure compounds [6, 8, 34].

Although a significant advance has been achieved in the prediction of these relevant thermodynamic properties, the major failure of these models still relies on their high prediction errors for different molecules (especially those with limited experimental data). Note that the lack of numerical values of some group contributions affects the property calculation via GCMs for some molecules, especially with complex chemical structures [35]. In this direction, it is important to highlight that a direct and reliable comparison of the performance of available GCMs is usually not feasible due to these models have been developed using different databases and numerical frameworks to establish their group contributions for the estimation of the thermodynamic property at hand. To date, few studies have reported the performance comparison of some GCMs for the calculation of pure compound critical properties but without recalculating their group contributions using the same basic line.

In this research, several first-order GCMs have been assessed and compared in the calculation of the normal boiling point and critical properties of pure compounds. This study has focused on the evaluation of first-order GCMs because they are a straightforward option to estimate these thermodynamic properties and are already available in commercial process simulators. In contrast to higher-order GCMs, they do not require additional information on molecules and are easy to implement by the users. Therefore, the novelty of this study relies on a reliable comparison of GCMs based on the application of the same analytical framework. First, this comparison was carried out via a standardized approach to establish the group contributions with the same thermodynamic data (correlation and validation sets) and an identical numerical procedure to solve the corresponding parameter estimation problems. A set of statistical metrics was applied to characterize the limitations and capabilities of tested models including a detailed performance analysis for different compound families where the best model was identified. Then, an ANN-based approach was assessed to improve the estimations obtained with the GCMs. Finally, a comparative analysis of these models was carried out using the GCMs implemented in the Aspen Plus® simulator. Therefore, this study highlights some gaps to be resolved for GCMs with the aim of developing reliable thermodynamic models for the calculation of pure compound properties.

2. Approach to Analyze and Compare the Performance of First-Order Group Contribution Models to Calculate the Critical Properties and Normal Boiling Point of Pure Compounds

2.1. Experimental Database

A database with experimental information on critical volume (Vc), critical pressure (Pc), critical temperature (Tc), and normal boiling point (Tb) of different compound families was integrated for this study. This information was extracted from the National Institute of Standards and Technology (NIST), a compilation of articles from the American Chemical Society [3646], and other sources of physicochemical properties were also consulted to integrate this database [4749]. A preliminary analysis of the available experimental information on these properties was performed to identify inconsistencies in the values reported in different sources for the same compound. The experimental values that displayed inconsistencies between two or more sources were discarded to reduce the uncertainty in the determination of the contributions of functional groups for the calculation of the corresponding thermodynamic properties. Table 1 shows the number of experimental values that were employed in the database of this study. In short, this database included the next experimental values of pure compounds: 397 for Tb, 514 for Tc, 445 for Pc, and 312 for Vc. This experimental database included organic compounds with carbon numbers from C1 to C60 and molecular weights ranging from 26.04 to 843.6 g/mol, organic compounds with metals (e.g., iron and titanium) as well as silicon and boron groups. These groups are usually not included in other GCMs reported in the literature such as the models proposed by Marrero and Gani [1], Marrero-Morejón and Pardillo-Fontdevila [14], Constantinou and Gani [12], Joback and Reid [11], Wen and Qiang [50], Dalmazzone et al. [15], Valderrama and Alvarez [51], Wang et al. [25], Wang et al. [26], Jia et al. [27], Deng et al. [34], Ghasemitabar and Movagharnejad [29], Mondejar et al. [8] and Tahami et al. [5]. Therefore, this database was used to define the numerical values of group contributions for the set of first-order GCMs reported in this study.

2.2. Definition of Functional Groups for GCMs

Table 2 shows the functional groups of first-order GCMs to estimate each thermodynamic property. The criteria applied to determine these functional groups included:(a)The molecular groups were defined to represent a wide range of compounds including multifunctional oxygen, nitrogen, sulfur, halogenated, organometallic, silicon, and boron substances.(b)The molecular groups were defined as simple as possible but representative of all the compounds included in the database.(c)The molecular groups contained in chemical structures with and without rings (e.g., benzene and phenol) were differentiated. This aspect was already evaluated by Marrero and Gani [1] showing that the performance of GCMs was better when there was a differentiation of the groups inside and outside the cyclic molecular structures.

2.3. Description of GCMs Used and Compared for the Calculation of Vc, Pc, Tc, and Tb of Pure Compounds

A set of first-order GCMs that do not consider group interactions were applied and compared in the calculation of the thermodynamic properties. Different equations (i.e., functionalities) reported for other GCMs were used in this study (see Table S1 of Supporting Information). A comparison and evaluation of these functionalities to estimate Vc, Tc, Pc, and Tb of pure compounds was performed. The group contributions and parameters of these functionalities were calculated using the same numerical framework (i.e., database, correlation and validation sets, objective function, and optimization method). These GCMs are described below for each thermodynamic property.

2.3.1. Normal Boiling Point

Three GCMs (i.e., MTB1, MTB2, and MTB3) were used and assessed to estimate Tb of pure compounds. In particular, the MTB1 model was based on the equation proposed by Joback and Reid [11] for the calculation of the normal boiling point of pure compounds. The equation proposed is given bywhere is the number of repetitions of the functional group in the molecule of the tested compound, is the contribution of the functional group to estimate the normal boiling point, and is a constant value for all compounds.

MTB2 model corresponded to the equation proposed by Nannoolal et al. [24]. The expression of this model is defined bywhere is the atoms number in the molecule (except hydrogen) and , , and are constant values for all the compounds.

MTB3 model was a modification of the model proposed by Joback and Reid [11] where a relationship between the molecular weights of the functional group (, g/mol) and the pure compound (, g/mol) was incorporated as follows:where is a constant value for all the compounds.

2.3.2. Critical Temperature

Four GCMs were implemented to calculate the critical temperature. MTC1 model was based on the equation proposed by Lydersen [23].where is the contribution of the functional group to estimate the critical temperature and is a constant value of this GCM.

MTC2 model was an expression proposed by Joback and Reid [11] and corresponded to a modified version of equation (4) that was given bywhere and are constant parameters.

The third model (MTC3) was a modified version of the Joback and Reid [11] model. This model included a relationship of and where it was defined aswhere and are constant parameters.

MTC4 model was also a modified expression of Joback and Reid [11] that included the atoms number (except hydrogen) and rings contained in the molecule under analysiswhere is the rings number in the molecule and , , , and are constant parameters of this GCM.

2.3.3. Critical Pressure

Four GCMs were utilized to estimate the critical pressure. MPC1 model was proposed by Lydersen [23] to estimate Pc via the following equation:where is the contribution of the functional group to estimate Pc and is a constant parameter.

The second model (MPC2) was based on the equation proposed by Joback and Reid [11].where and are constant parameters.

MPC3 model was a modification of equation (9) and is described aswhere and are constant parameters of the GCM.

The fourth model (MPC4) was another modified equation that included the number of rings of the moleculewhere , , and are constant parameters.

2.3.4. Critical Volume

MVC1, MVC2, MVC3, and MVC4 models were compared in the calculation of critical volume of pure compounds. MVC1 was proposed by Lydersen [23] and defined aswhere is the contribution of the functional group to estimate the critical volume and is a constant parameter.

MVC2 model was a modification of equation (12) that included the molecular weights of functional groups and pure compoundwhere is a constant parameter of this GCM.

MVC3 and MVC4 were obtained from the modification of equation (12) where the atoms number in the molecule (except hydrogen) was included and they were given by the next equationswhere , , , , and are constant parameters for all the pure compounds.

2.4. Determination of the Contributions of Functional Groups of GCMs for the Thermodynamic Property Calculation

Group contributions and constant parameters of GCMs (see equations (1)–(15)) were determined via a nonlinear regression of the experimental database. 337, 404, 378, and 265 experimental data of Tb, Tc, Pc, and Vc, respectively, were used in this stage (i.e., 79–85% of the thermodynamic database). The frequency of each functional group in the molecules employed in the correlation stage can be found in an Excel file (Groups_correlation.xlsx) that is provided in the Supporting Information. The objective function utilized for the correlation of the experimental data was defined as follows:where is the number of experimental data and and are the experimental and calculated values of tested thermodynamic properties (i.e., Vc, Pc, Tc, and Tb). This objective function was minimized with a combination of a local optimization method and an evolutionary algorithm. This optimization tool was used to determine the parameters and group contributions of the set of GCMs. Several initial values were used to resolve the parameter identification problems because it was assumed as a global optimization problem and the best solution found was reported in this study.

The predictive capabilities of tested GCMs were evaluated using pure compounds (i.e., molecules) not considered in the data correlation stage. These compounds were included in the validation set and represented approximately 15% of the experimental thermodynamic property database (i.e., 60, 110, 67, and 47 experimental data of Tb, Tc, Pc, and Vc, respectively). For illustration, Figure 1 shows the flowchart of the procedure utilized to determine GCMs parameters and to perform their evaluation. To reduce the deviations obtained in the determination of the functional group contributions and the model parameters, the next criteria were applied to establish the correlation and validation sets:(a)The validation set included compounds with experimental data reported by different sources where their experimental deviations were higher than 1% for the normal boiling point and 5% for the critical properties.(b)The validation set included compounds whose experimental properties could not be confirmed by at least two literature sources.(c)It was established that the validation set did not include molecules with functional groups that were not present in the correlation set.(d)Functional groups with repetitions ≤5 were considered only in the correlation set.

2.5. Application of ANN to Improve the Predictions of GCMs

An ANN was applied as a corrector to improve the estimations obtained with GCMs. ANN input variables were the thermodynamic property calculated with GCMs and molecular weight (MM) of the pure compound, while the output variable was the improved (corrected) thermodynamic property value (see Figures S1 and S2). The ANN toolbox of Matlab ® was used in these calculations. A feed-forward backpropagation ANN was employed. The input data set was randomly divided into three subgroups: training set (70% of the database), validation set (15% of the database), and test set (15% of the database). ANN training was used to determine the best values of weights and biases where the Levenberg–Marquardt training method and a logarithmic sigmoid transfer function were employed. The performance function corresponded to the mean square error (), which is given by

Different ANN configurations (i.e., hidden neurons ranging from 1 to 10 and layers ranging from 1–2) were evaluated to identify the simplest one with the best performance (without overfitting) and to establish their impact on the calculation of thermodynamic properties with the GCM-ANN approach.

2.6. Statistical Analysis to Compare GCMs Performance

Statistical metrics were utilized to assess the performance of all GCMs for the thermodynamic property calculation. These metrics were the absolute relative deviation (), average absolute relative deviation (AARD), and standard deviation (σ), which are defined by

Additionally, a comparative analysis of these GCMs was also carried out using the GCMs implemented in Aspen Plus®. These GCMs were proposed by Riedel [52], Lydersen [23], Ambrose [53,54], Fedors [55,56], Joback and Reid [11], and Constantinou and Gani [12]. In particular, the thermodynamic properties of some molecules not included in the training set were calculated with Aspen Plus® and these results were compared with those estimations obtained with tested first-order GCMs.

3. Results and Discussion

3.1. Normal Boiling Point

Figure 1 shows the comparison of the experimental and calculated Tb values using different GCMs. The determination coefficient (R2) was included for the correlation and validation stages of each model. Tables S2 and S3 of Supporting Information provide the group contributions and parameters of MTB1, MTB2, and MTB3 models for Tb estimation. Overall, these GCMs showed R2 values of 0.83–0.97 where the highest ARD was obtained for compounds with Tb higher than 500 K. The performance of these GCMs showed the next trend: MTB2 > MTB1 > MTB3. In particular, the MTB2 model was the best to calculate the Tb of tested compounds. Note that this model included the number of atoms in the molecule as an additional parameter. Also, it should be highlighted that the nonlinear functionality of the MTB2 model outperformed those of the MTB1 and MTB3 models. Table 3 summarizes the statistical metrics used to evaluate the performances of these models. ARD values ranged from 8.34E-08 to 60.48% in Tb calculation.

For illustration, Figure 2 shows AARD obtained in Tb estimation for different compound families. This analysis allowed us to identify some compound families with the highest deviations between experimental and predicted values. GCMs were able to calculate the Tb of alkynes, phenols, multifunctional oxygen, sulfur, and silicon compounds with acceptable accuracy. AARD in these families ranged from 0.28 to 6.37%, while this statistical parameter increased for alkanes, carboxylic acids, nitrogen, and halogenated compounds. In general, these GCMs showed the highest AARD for alkanes, alkenes, cyclic hydrocarbons, nitrogen, chlorine, and fluorine compounds. These results can be associated with the relatively high uncertainty of the experimental data reported for different compounds with a complex chemical structure (e.g., some nitrogen, and halogenated compounds). Figure S3 reports the ARD values of the MTB2 model as a function of MM and Tb for different compound families where the magnitude of this statistical metric is indicated by the size of the symbol “O”. Although the MTB2 model was the best model, it presented the highest ARD for those compounds with low and high molecular weights within a homologous series. For example, high deviations were obtained for alkanes and alkenes with low molecular weight. This behavior was identified and highlighted by Nannoolal et al. [16]. Low molecular weight compounds usually do not follow the general trend of the homologous series of a given chemical family. However, the experimental data of Tb for these compounds are commonly available in the literature and there is no need to estimate them using GCMs. For some halogenated compounds with molecular weights <100 g/mol, high deviations were obtained. Ethanedinitrile and 1,1,1,3,5,5,5-heptamethyltrisiloxane were the compounds with the maximum (29.31%) and minimum (8.34E-08%) ARD to calculate their Tb using MTB2. It was identified that high deviations (i.e., ARD >20%) were obtained for some molecules containing functional groups with halogen and nitrogen atoms (e.g., -F, -CN, and -NH2).

3.2. Critical Temperature

Figure 3 and Table 3 show the results of Tc calculation with GCMs and the corresponding model parameters are reported in Tables S4 and S5 of Supporting Information. It was identified that the highest ARD was obtained for pure compounds with Tc >600 K. It should be highlighted that for these compounds, Tc experimental data reported in the literature showed high deviations. MTC4 model displayed the best performance for the calculation of Tc. This model considered the number of atoms and rings in the molecule as parameters. Overall, ARD ranged from 6.36E-08 to 72.39% for MTC1, from 1.22E-07 to 65.52% for MTC2, from 4.95E-04 to 107.69% for MTC3 and from 3.43E-04 to 70.23% for MTC4, respectively. AARD values for Tc estimation of different compound families are reported in Figure 4. All GCMs showed high AARD for alkanes, alcohols, carboxylic acids, ethers, esters, nitrogen, halogenated, and silicon compounds. On the other hand, the calculation performance of these GCMs was acceptable (i.e., AARD <2.5%) for families of alkenes, cyclic and aromatic hydrocarbons, ketones, and multifunctional oxygen compounds. All GCMs presented the highest ARD for organometallic compounds. The highest deviations in the calculation of Tc could be associated with the uncertainty of the experimental values used to determine the group contributions, especially for complex molecules with high molecular weight. MTC4 outperformed other models to calculate this thermodynamic property where MTC3 was the worst GCM.

Figure S4 reports the ARD values of the MTC4 model. This model showed the highest deviations for calculating Tc for those compounds with high molecular weight within a homologous series. This trend was also observed for alkanes where the deviations were higher for compounds with molecular weights >500 g/mol, alkenes with molecular weights >200 g/mol, esters with molecular weights >300 g/mol, and silicon compounds with molecular weights >400 g/mol. Also, the modeling errors were higher for high molecular weight aromatic hydrocarbon and alcohol compounds. (2R, 4 R, 6R)-2,4,6-trimethyl-2,4,6-triphenylcyclotrisiloxane was the compound with the maximum ARD (i.e., 70.23%) to calculate Tc using MTC4. This compound was a silicon compound with a high molecular weight (408.67 g/mol). It could be anticipated that GCMs could show high modeling errors for the calculation of critical properties of high molecular weight compounds. As indicated, the procedures utilized in the experimental determination of critical properties may have some limitations, mainly for the high molecular weight compounds where these type of chemicals can degrade before reaching the critical point. These deviations in the experimental determination of the critical point introduced uncertainties in the determination of the group contribution values of tested GCMs. Higher modeling errors (ARD >30%) were obtained in the Tc prediction of some molecules containing silicon, nitrogen, and halogenated functional groups (e.g., >Si<, -F, -CN, -NO2).

3.3. Critical Pressure

Results of the calculation of Pc with tested GCMs are reported in Figure 5 and Table 3, while the estimated group contributions and additional model parameters are given in Tables S6 and S7 of Supporting Information. Overall, ARD values ranged from 3.43E-07 to 46.17% where GCMs followed the next performance in terms of the deviations for the estimation of Pc: MPC2 > MPC1 > MPC3 > MPC4. In particular, MPC2 showed high ARD (i.e., >10%) to estimate Pc at >60 bars. On the other hand, the MPC4 model showed the best performance for the prediction of this thermodynamic property. This model also included the number of atoms and rings in the molecule as additional parameters. Table 3 summarizes the statistical metrics used in the comparison of these GCMs.

Figure 6 shows the values of AARD in Pc estimation for different compound families. These results indicated that the Pc prediction from these GCMs showed the highest ARD for nitrogen, multifunctional oxygen, halogenated, and silicon compounds. All models showed high modeling errors to calculate Pc of iron compounds. For illustration, ARD values of the MPC4 model for different families of chemicals are reported in Figure S5. In general, there were no clear trends in model performance with respect to the molecular weight and compound families. However, ARD >10% was obtained for alkanes with molecular weights higher than 400 g/mol. For some halogenated compounds (e.g., fluorine compounds), the performance of this GCM was better for substances with intermediate molecular weights within the homologous series. Trifluoroethanoic acid was the compound with the maximum ARD (46.17%) to calculate Pc using MPC4. The low precision in the experimental information reported for the halogenated compounds could be the source of these modeling errors. It was also identified that the highest deviations (i.e., ARD >20%) were obtained for molecules constituted by functional groups containing iron, nitrogen, silicon, and halogen atoms (e.g., >Si<, -F, >N-, -Fe, and -NO2).

3.4. Critical Volume

Results of the calculation of Vc using MVC1, MVC2, MVC3, and MVC4 models are reported in Figure 7, while the corresponding statistical metrics for model comparison are given in Table 3. It was clear that MVC1, MVC3, and MVC4 displayed a similar performance with ARD ranging from 4.41E-04 to 29.98%, while MVC2 was the worst option to calculate this critical property. Tables S8 and S9 of Supporting Information contain the group contributions, and parameters of these GCMs. In general, these GCMs had the highest AARD to predict Vc of nitrogen, multifunctional oxygen, fluorine, and silicon compounds, see Figure 8. Statistical metrics indicated that the MVC4 model was the best to estimate critical volumes of pure compounds. This model also included the number of atoms and the molecular weight as additional parameters. Figure S6 illustrates the MVC4 model performance for the Vc calculation of different compound families. ARD increased with respect to the molecular weight for alkanes, carboxylic acids, ketones, and fluorine compounds. These results also confirmed that the calculation of critical properties of high molecular weight compounds is challenging via GCMs. This GCM showed the highest ARD values for alcohols, ketones, nitrogen, and silicon compounds with low molecular weight. 1,1,1,5,5,5-hexafluoro-2,4-pentanedione was the compound with the maximum ARD (29.97%) for Vc calculation using MVC4. Different molecules containing some functional groups with halogen and nitrogen (e.g., -F, >NH, -NH2) showed the highest deviations (i.e., ARD >10%).

Finally, the Supporting Information contains an Excel file (Properties_calculation.xlsx) that can be utilized to calculate Tb and critical properties of pure compounds using the best GCMs reported in this paper.

3.5. Correction of the Thermodynamic Property Estimations of GCMs with ANN

AARD values for the calculation of Vc, Pc, Tc, and Tb using ANN to correct the GCM estimations are shown in Figure 9. These results corresponded to different ANN configurations in terms of the number of hidden layers and their neurons. The application of ANN allowed improving the estimations of thermodynamic properties obtained with GCMs. For the case of Tb, the AARD value of MTB1 was 5.82% and decreased to 4.48% with the implementation of ANN with 1 hidden layer and 5 hidden neurons. MTB2 plus ANN with 2 hidden layers and 3 hidden neurons achieved an AARD of 3.90%, in contrast to 4.25% AARD of the original MTB2. AARD values of MTB3with and without ANN (1 hidden layer and 10 hidden neurons) were 8.90 and 5.74%, respectively.

For Tc prediction, the AARD value of MTC1 decreased from 14.56 to 9.76% using an ANN model with 2 hidden layers and 5 hidden neurons. The application of ANN with 1 hidden layer and 7 hidden neurons decreased the AARD of MTC2 from 13.66 to 9.79%. Similar results were obtained for MTC3 and MTC4 after applying the ANN as a model corrector where AARD decreased from 19.16 to 10.93% (using ANN with 1 hidden layer and 3 hidden neurons) and from 13.68 to 8.15% (using ANN with 1 hidden layer and 7 hidden neurons), respectively.

The ANN application also improved Pc estimations obtained with all GCMs. AARD decreased from 8.48 (MPC1) to 7.75% (MPC1 plus ANN with 1 hidden layer and 3 hidden neurons), from 9.05 (MPC2) to 7.59% (MPC2 plus ANN with 1 hidden layer and 5 hidden neurons), from 8.48 (MPC3) to 7.36% (MPC3 plus ANN with 1 hidden layer and 5 hidden neurons), and from 8.48 (MPC4) to 7.28% (MPC3 plus ANN with 2 hidden layers and 5 hidden neurons). These results showed that ANN with several hidden layers and neurons was required to obtain an accurate estimation of this thermodynamic property. However, it is important to indicate that the best ANN structure should be identified to avoid model overfitting.

Finally, the approach GCM + ANN also improved the prediction of Vc although the final result was highly dependent on the model and ANN structure. For example, the AARD value of MVC2 decreased from 18.24 to 11.13% using ANN with 2 hidden layers and 3 hidden neurons. In contrast, Vc calculations performed with other GCMs (MVC1, MVC3, MVC4) plus ANN did not show a significant reduction of AARD for different configurations of hidden layers and neurons (see Figure 9).

In summary, these results were consistent with the study of Mondejar et al. [8] where GCMs and ANN models were also applied. However, these authors only analyzed a specific group of chlorine and fluorine compounds with carbon numbers from C2 to C20 in contrast to the present study where a broad spectrum of chemical families was analyzed.

3.6. Comparative Analysis of GCMs Reported in the Literature

A comparison of the performances of different GCMs was carried out and the results are given in Table 4. Figure 10 shows the distribution of AARD for the calculation of tested thermodynamic properties of pure compounds using the first-order GCMs and ANN. As stated, the Aspen Plus® simulator was also employed to calculate these thermodynamic properties. This process simulator offers the possibility of using the next GCMs: R–Riedel [52], LS–Lydersen [23], A–Ambrose [53,54], F–Fedors [55,56], JR–Joback and Reid [11], and CG–Constantinou and Gani [12]. Therefore, this comparison was carried out using a specific group of molecules where all the models were applicable, and the results obtained with Aspen Plus® were compared with the best first-order GCMs identified in this study.

Results reported in Table 4 and Figure 10 showed that the estimations of normal boiling point with MTB2 and MTB2 + ANN with 2 hidden layers and 3 hidden neurons were better than those obtained with the models of Constantinou and Gani [12] and Joback and Reid [11]. For Tc calculation, MTC4 and MTC4 + ANN with 1 hidden layer and 7 hidden neurons showed higher modeling errors than those of models proposed by Lydersen [23], Ambrose [53], Fedors [56], Joback and Reid [11], and Constantinou and Gani [12]. MPC4 (with and without ANN) outperformed Lydersen [23], Ambrose [54,] and Constantinou and Gani [12] models. Also, MVC4 (with and without ANN) provided better predictions of Vc than those obtained with the models of Riedel [52], Lydersen [23], Fedors [55], Constantinou, and Gani [12], and Joback and Reid [11].

Finally, the performance of MTC4, MPC4, and MVC4 was compared with the ANN model proposed by Gharagheizi et al. [6]. This ANN-based model has a complex network configuration with a substantial number of weights and biases parameters (i.e., more than 2000 parameters of ANN) and, as expected, its accuracy for the prediction of these thermodynamic properties is high. For this comparison, the critical properties of 200 additional molecules reported by Gharagheizi et al. [6] were estimated using the three GCMs. Figure 11 shows the percentage deviation between the predictions generated by these models taking as a reference the result obtained with the model of Gharagheizi et al. [6]. In general, most of the estimated properties differed by less than 10% from those calculations with the model of Gharagheizi et al. [6]. It was clear that the first-order GCMs (i.e., MTC4, MPC4, and MVC4) can predict the critical properties of pure compounds with acceptable accuracy compared to this complex ANN model. Note that these results also highlighted the importance of utilizing a reliable approach to develop and implement GCMs for the calculation and prediction of pure compound thermodynamic properties.

3.7. Perspectives and Recommendations for Future Applications of GCMs for the Calculation of Critical Properties and Normal Boiling Point of Pure Compounds

The estimation of critical properties of pure compounds using predictive models is challenging. First-order GCMs are a suitable and straightforward alternative to perform these calculations. However, these models can show high errors to calculate the thermodynamic properties of compounds with high molecular weight besides the contribution values for some functional groups are not reported. These drawbacks are strongly related to the lack of experimental information on the critical properties reported in the literature and to the uncertainties in the databases utilized to obtain the group contribution values.

Some efforts still should be performed to improve the performance of first-order GCMs to obtain reliable estimations of critical properties of pure compounds. These attempts could be focused on the following directions:(a)It is important to establish proper experimental strategies to quantify the critical properties of novel compounds and those with limited experimental data to create a reliable experimental thermodynamic database. This aspect is essential for the development of reliable GCMs with a wide applicability spectrum to calculate the thermophysical properties of pure compounds. In this direction, it is recommended to recalculate the group contributions of available GCMs after incorporating new experimental information.(b)It is also suggested to improve the strategies based on hybrid models GCM + ANN to estimate critical properties of pure compounds. Other artificial intelligence tools could also be tested for these calculations.

4. Summary and Conclusions

This study has covered the next topics:A comparative analysis of different first-order GCMs for the calculation of normal boiling point and critical properties of pure compounds was performed.The incorporation of parameters associated with molecular characteristics in these thermodynamic models (e.g., molecular weight, number of atoms, and rings) contributed to improve their performance.The critical temperature was a challenging property to be calculated, while GCMs were more accurate to estimate the critical volume.These GCMs showed higher modeling errors in the thermodynamic property calculation of nitrogen, alkanes, halogenated, and silicon compounds. The highest deviations were also obtained in the estimation of critical properties and normal boiling points of molecules with low and high molecular weights within a homologous series, which usually did not follow the trend of property behavior of the chemical family under analysis.The incorporation of an ANN as a corrector to improve the estimations obtained with GCMs is an interesting approach for the simulation of thermodynamic properties. This artificial intelligence algorithm can be used to enhance the capabilities of models proposed in the literature including those that are available and implemented in commercial simulation software.Results also showed that the predictive models incorporated in the commercial Aspen Plus® software can provide inaccurate estimations of the critical thermodynamic properties for some compounds.This study contributes with straightforward first-order GCMs that can be employed to calculate the critical volume and pressure besides the normal boiling point of pure compounds with better accuracy than that obtained with other models reported in the literature even those incorporated in commercial process simulators.

Nomenclature

:Average absolute relative deviation
, , , :Constant parameters of the models
:Absolute relative deviation
:Number of experimental data
:Molecular weight of the pure compound
:Molecular weight of the functional group
:Number of repetitions of the functional group in the molecule of tested compound
:Number of rings in the molecule
:Number of atoms in the molecule (except hydrogen)
Pc:Critical pressure
:Contribution of the functional group to estimate the critical pressure
R2:Determination coefficient
:Standard deviation
Tb:Normal boiling point
:Contribution of the functional group to estimate the normal boiling point
Tc:Critical temperature
:Contribution of the functional group to estimate the critical temperature
Vc:Critical volume
:Contribution of the functional group to estimate the critical volume
:Experimental values of tested thermodynamic properties (i.e., Vc, Pc, Tc,Tb)
:Calculated values of tested thermodynamic properties (i.e., Vc, Pc, Tc,Tb).

Data Availability

The data of this paper are available on request to the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This article was prepared with funding provided by the Instituto Tecnologico de Aguascalientes, Mexico.

Supplementary Materials

Supplementary information of the calculations reported in this paper is provided in the supporting information file. It includes results and data of the implementation and analysis of tested group contribution models. (Supplementary Materials)