Abstract

Search engines play an important role in providing us with the main information of our daily life. The research on the search behavior on the Internet enjoys greater and greater popularity, for the search behavior has been proved to affect our daily decisions in purchasing, traveling, and even defining beauty. However, there is still a lack of full appreciation of the relation between the search behavior itself in terms of the emotional meaning and the decisions thus generated. Therefore, this study was carried out to analyze the emotional meanings of 13,915 English words obtained from Google Trends and the profits gained from the US house market by automatic transactions and discovered that the emotional meanings of the search contents could modulate the financial decision with unsupervised machine learning methods.

1. Introduction

In investigating how the search behavior affects people’s daily lives, the pioneers in this domain focused on confirming the correlation between search volumes and contemporaneous events in various fields. For example, the frequency of specific cancers inquired on the Internet during 2001–2003 was found to be closely correlated with their actual incidence [1]. The counts of the top 300 search contents during 2001–2003 were claimed to be in high correlation with the unemployment figures presented by US Bureau of Labor Statistics. Ettredge et al. [2] and Choi and Varian [3] once analyzed how the search volume correlated with such economic activities as auto and home sales, international visitor statistics, and US unemployment rate. It was found that there was a linear relationship between search behavior and the events that had happened. Later on, the focus in this domain shifted from conforming the correlation to “predicting the present,” that is, predicting the events days or weeks in advance of their actual occurrence by analyzing the search behavior on the Internet. Moat et al. could quantify Wikipedia usage patterns before weekly stock market moved [4]. Recent studies have shown that search behavior could even change how people defined outdoor beauty on daily basis [5].

Yet great argument has been seen about using search behavior on the Internet as an information source to predict reality, saying the prediction with search behavior could fail from time to time. For example, in 2008, researchers from Google claimed that they could “nowcast” the flu based on the people’s search volume on Google. It was a success at that time. However, their prediction after 2008 failed—and failed spectacularly—missing at the peak of the 2013 flu season by 140 percent. Thereafter, many researchers and applicators turned to explore how the search behavior on the Internet was able to be effectively used under certain restrictions or with specific techniques, and many successes in using search behavior data have been made [6, 7]. For instance, during the financially unstable years in particular, the search behavior on the Internet has been regarded as a more reliable information source than the traditional ones [8]. Goel et al. even held that sales prediction was only valid in such industries as computer game sales and movie sales with the consumers’ search behavior on the Internet before their dates for release [9].

In prediction with search behavior, of course, Google Trends played an important role in significantly raising the accuracy. With Google Trends, Tobias and Moat forecast the influenza outbreaks successfully in 2014 [10]. They also quantified the advantage of looking forward, and found the countries in 2012 which searched the phrase “2013” more in Google Trends than “2012” had a much higher GDP level [11]. In an emerging market, Curme et al. employed search records on Google for the prediction of subsequent stock market moves [12]. The prediction of the real estate market has never been absent. Hohenstatt et al. found in a real estate market that online search behavior could predict aggregate changes of house price in 20 major cities across the U. S. [13]. Beracha and Wintoki claimed that cross-sectional difference in search behavior could predict cross-sectional differences in the changes of house price across more than 200 cities in the U. S. [14]. Wu and Brynjolfsson disclosed that Google Trends had a good prediction ability in a real estate market [15]. It was worth noting that Dietzel et al. tried to explore the role of human sentiment or personal emotion in predicting the changes in house prices [16, 17]. Dietzel et al. took the Internet search volume provided by Google Trends as a sentiment indicator and proved that this sentiment indicator could improve commercial real estate forecasting models for transactions and price indices [16]. Tsolacos revealed that people’s economic sentiment indicator (ESI) could generate advance signals for forecasting the turning points of the house price in the real estate market [17].

Although the correlation between search behavior and actual real estate market has been confirmed and the accuracy for prediction has also been raised, and the research indicators were widely seen among search volume, frequency and headline topics, the size of search contents, the role of sentiment as a functional indicator has not gained its weight as it should deserve. Worse still, although some researchers like Dietzel et al. examined the general role of sentiment in predicting the changes in house price, no exactly defined or detailed emotional meaning of the sentiment indicators themselves was involved [16, 17]. The detailed emotional meaning was a pervasive aspect of how we interact with the world around us [18]; but it has never been considered in search behavior researches.

So, this study will take the emotional meanings of the search contents on the Internet as a case to explore the role of lexical meanings in predicting the actual events. To be specific, the purpose of this study is twofold. One is to confirm the correlation with large size of search contents between the search volume and the house price and to justify the good prediction ability as claimed by Hohenstatt et al. in a real estate market [13, 15]. The other is to explore the role of emotional meanings in predicting the changes in house prices.

2. Materials and Methods

Three data sources were employed in this study: Google Trends data was obtained from https://trends.google.com/trends/?geo=US; and each word was collected from the Warriner linguistic scale (2014) [19]. There were altogether 13,915 words. All the 13,915 Google Trends words were monthly downloaded from March 2008 to January 2019. Each of the emotional meanings of 13,915 English words in this study was collected from the norms of Warriner along three dimensions, that is, valence (the pleasantness of the words), arousal (the intensity of emotion provoked by the words), and dominance (the degree of control exerted by the words). The house price was obtained from Zillow because the investment to real estate was more common to ordinary citizens than stock markets or others. Zillow data (https://www.zillow.com/research/data/) was the monthly median sold price per square feet in the U. S., and the data covered 131 months (from March 2008 to January 2019) for this study.

Step 1. The Pearson correlation test was carried out to test the linearity between the standardized search volume of each word from the norms of Warriner and the house price for all the 131 months from Zillow [19]. The value threshold was preset at 0.001. 10,341 words passed the Pearson correlation test, which means the standardized search volume of most words is highly correlated with the US house price for most of the words listed in the norms of Warriner et al. [19].

Step 2. profits based on 10,341 words, an automatic transaction method called “Google Trends strategy” was implemented for calculating the profits with a portfolio. Profits can only be made in a trading strategy if at least some future changes in the house price are correctly anticipated, in particular around large fluctuation of the house price. The results were compared with the “Buy & Hold” strategy, which is the profits made by the rise of the house price.
The thresholding period was set as six months, and then Google Trends strategy started from the seventh month. The standardized search volume of each word for each month was compared with its average search volume of the first six months. In general, there were two positions, that is, short position and long position. In short position, if its search volume for one specific month went up as compared with its average search volume of the last six months, the house would be sold at the price of this specific month offered on Zillow. In long position, when its search volume for one specific month went down, the house would be bought. In this way, the cumulative profits of a strategy’s portfolio for a word could be obtained on the basis of buying and selling actions. Thus, Google Trends strategy was able to give us the profits of each word every month after the seventh month. Finally, the profits for 10,341 words were figured out.
Of course, in applying this approach to analyzing the relationship between standardized search volume and fluctuation of house price, the transaction fees have been neglected in this hypothetical investment strategy.

Step 3. The Pearson correlation test was carried out to test the linearity between the emotional meanings of each of the 10,341 words along three dimensions and their corresponding profits obtained in Step 2.

Step 4. Considering the pattern of a space of three emotional dimensions, emotional words have the tendency to get together in clusters [20], a machine learning method called “hierarchical clustering” was then carried out among all the 13,915 words presented by Google Trends strategy in terms of their profits and emotional meanings along three dimensions, and the Pearson correlation test would be carried out within each cluster and between clusters.

3. Results

3.1. Predicting House Price

Taking the word “bankruptcy” as an example of a search indicator, Figure 1 is plotted to show the advantages of Google Trends strategy over Buy & Hold strategy. The results were that the green line standing for the cumulative profits gained through Google Trends strategy with 131 months plotted as a function of time went over the red line for Buy & Hold strategy, and that the cumulative profit made through Google Trends strategy was as high as 152.42%, while the profit through Buy & Hold strategy was only 112.39%, which was the profits out of the house price rise over the 131 months. In the same way, 10,341 out of 13,915 English words passed the Pearson correlation test and their cumulative profits were calculated.

3.2. Emotional Meaning in Prediction

The cumulative profits and the emotional scores along three dimensions of the top words are presented in Table 1.

As shown in Table 1, the word “success” was the highest (167.04%) in cumulative profits, with its valence, arousal, and dominance scores at 7.49, 5.8, and 6.38, respectively. “Transaction” was the lowest (145.6%), with the scores along three dimensions at 5.26, 3.73, and 5.58.

Based on the 10,341 words, the Pearson correlation test was carried out between their cumulative profits and their emotional scores along three dimensions, but no strong correlation ( 1 is 0.84; is 0.77; and is 0.31) was found.

In order to further dig the complex relationship between the emotional meaning of the words and the cumulative profits related to them, hierarchial clustering was done with the standard errors among the 13900 words (after deleting 15 words from the 13,915 ones for their showing no significant profits through Google Trends strategy), and the 13,900 words were grouped into 26 clusters with 500 words in each cluster according to their standard errors by a supervised machine learning method called “hierarchical clustering.”

3.2.1. Prediction within a Cluster

Within each cluster, multiple regression analysis was carried out with cumulative profits as the independent variable and emotional scores along three dimensions as the three dependent variables. This time, a strong correlation within each cluster was found for all 26 clusters along three dimensions, that is, valence (c = 0.143, df = 498, and ), arousal (c = 0.183, df = 498, and ), and dominance (c = 0.305, df = 498, and ). Taking the arousal scores of the 26th cluster as a case, high correlations between the emotional scores of 500 words and their cumulative profits are plotted in Figure 2.

3.2.2. Prediction between Clusters

Another Pearson correlation test was adopted to analyze the relationship between the 26 clusters based on the emotional mean scores of all the 500 words within each cluster. As indicated in Figure 3, the cumulative profits of each cluster (as plotted by the size of the red balls in Figure 3) were significantly related to their mean scores along two dimensions, that is, arousal (c = −0.94, df = 24, and ) and dominance (c = −0.941, df = 24, and ), but not along valence (c = −0.47, df = 24, and ). As we can see from Figure 3, the mean scores of valences within each cluster went only within a little range (between 3.93 and 4.91). The results indicate the following: (1) For the valence dimension, the mean scores of each cluster were stable, and the variation between each cluster and the others would not affect the cumulative profits. In other words, if the words are to be analyzed in terms of emotional meaning, their valence cannot be used for predicting the profits. (2) With regard to the arousal dimension, the mean scores of each cluster were negatively correlated with the cumulative profits; that is, a cluster with high mean scores of arousal may have low cumulative profit. (3) As for the dominance, it indicated that scores shared a similar pattern in correlation with cumulative profits, and a cluster with a low mean score most probably would bespeak high cumulative profits.

4. Discussion

In this study, the size of search contents as large as 13,915 English words was involved and 10,341 of them were proved to be highly correlated with the cumulative profits through Google Trends strategy. Therefore, our findings were in accordance with what those previous studies proved in both confirming [2, 3] and predicting roles of search behaviors [610, 1315, 21], but their size of search volume was much smaller, which would not be as convincing as a large enough size is.

As for the role of the emotional meaning of the search contents in prediction, the findings in this study were as follows: in general, the emotional meanings of the search contents were not significantly correlated with the cumulative profits along three emotional dimensions. This seems to deny the application of the emotional meaning of the search contents in predicting the house price. However, if the search contents were clustered based on the interaction between the emotional meanings along three dimensions and the cumulative profits, the emotional meanings of search could surely be used to predict the profits. Within a cluster, the emotional meanings of the search contents were in a high positive correlation with cumulative profits along three emotional dimensions. Between clusters, they were negatively correlated along with arousal and dominance but not with the valence dimension. These complicated findings might be ascribed to the distributive features of emotional meanings within a space of three dimensions.

First, along a single dimension, the measurements of three emotional dimensions were distributed in different ways. Valence was displayed along a bipolar dimension. A negative pole and a positive pole were located at each side of the scale, while a neutral pole was gathered on the scale’s median value [22]. But arousal and dominance were distributed by a unipolar dimension.

Second, if the emotional meanings were observed at the interface along with any two of the three emotional dimensions, their measurements may interact. For example, the measurements along any two of the three dimensions can be described by the quadratic boomerang-like function, which reflects that words with lower and higher scores of valences are perceived to be the most arousing [23]. Moreover, the interaction might also occur between the interfaces along two different dimensions; for example, valence may emerge as the result of arousal and dominance and the level of valence might also be influenced by the interaction between arousal and dominance [24].

The differences in the distribution of the emotional meanings along a single dimension and the interaction along two dimensions might be the reasons why the arousal and dominance dimensions rather than the valence dimension could be used to predict the cumulative profits based on the analysis of emotional meanings between clusters.

Third, within a space of three emotional dimensions, emotional words had the tendency to get together in clusters [20]. The emotional meanings within each cluster shared identical features in emotional measurements. For example, a strong prevalence was generally of words that refer to anger being low in valence and high in arousal and dominance. In a similar way, the emotional meanings between neighboring clusters shared something in common with the emotional measurements, as Montefinese et al. once found that Italian words referring to fear, sadness, and despair were generally assembled in a similar place within an emotional space with low valence and dominance and high arousal [22].

The identical features within each cluster and similarity between neighboring clusters in emotional measurements might be the reasons why the emotional meanings were found to be correlated with the profits among all the three dimensions within a cluster and between neighboring clusters.

With regard to the role of emotional meanings in predicting the changes of house price, the stimulus-organism-response (S-O-R) model developed by Mehrabian and Russell could be used as the theoretical framework for justifying the correlation between emotional meanings of the search contents and the cumulative profits in this study [25]. The S-O-R model derived from environmental psychology may offer another perspective from which to view search behavior on the Internet, with emotions in particular [26]. As posited by Mehrabian and Russell, Stimuli in the environment affected internal emotions of Organism, which in turn evoked behavioral Responses. The emotions initially identified by Mehrabian and Russell in the S-O-R model were pleasure, arousal, and dominance [27], which are exactly the three dimensions examined in this study. Accordingly, Stimulus is the house price in this study; Response is the search contents used by the people (Organism) who care about and are interested in house prices on the Internet. Within the S-O-R Model, emotional meanings belong to the distinctive features directly binding with the search contents. However, although Dietzel et al. once examined the role of emotions (they used the term “sentiment”) in predicting the changes of house price, the indicators they used were not as direct as those in our study [16, 17]. Tsolacos employed the economic sentiment indicator (ESI), which was to show how people feel about the market or economy and to quantify how current beliefs and positions affect future behavior in a graphical or numerical index [17]. In a more indirect way, Dietzel et al. regarded the Internet search volume as an appropriate indicator for emotion [16].

5. Conclusion

This study confirmed that the search volume was highly correlated with the house price and could be employed for predicting the house price. When taking emotional meanings of search contents into consideration for predicting the house price, although the profits based on prediction were not correlated with their emotional meanings as a whole along three dimensions, the profits were positively correlated with emotional meaning along valence, arousal, and dominance dimensions within each cluster and were negatively correlated between clusters along both arousal and dominance dimensions.

To the best of our knowledge, this study is the first to confirm the correlation between the search volume and the house price and to justify the good prediction ability with large size of search contents. This study is also the first to explore the role of emotional meanings in predicting the changes in house prices. However, several limitations are evident in this study. First, although a large size of search contents is one of the originalities as compared with those of the previous studies, the size of 13,915 words is not large enough to represent 100,000 often-used words in English. Second, the search contents are analyzed only in terms of words rather than phrases or sentences. The findings based on words might not fully present the role of emotional meanings of search contents on the Internet in predicting the changes in house prices. Thus, for further studies in this domain, the size of search contents should be further enlarged and the search contents should involve phrases and sentences besides words.

Data Availability

The data used to support the findings of this study are available upon request.

Disclosure

This paper has been presented in SSRN Electronic Journal (not a publication journal) for more suggestions (Ni et al., 2019).

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

Du Ni, Xingzhi Li, and Zhi Xiao designed the study. Du Ni collected and analyzed the data. Du Ni and Zhi Xiao interpreted the result and wrote the manuscript. Ke Gong helped with the revision. All authors gave final approval for publication.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (Grant nos. 71671019, 72071021, and 71871034).