Document Type : Original Article
Authors
- Babak Sohrabi 1
- Saeed Rouhani 2
- Hamid Reza Yazdani 3
- Ahmad Khalili Jafarabad 4
- Mahsima Kazemi Movahed 5
1 Prof., Department of Information Technology Management, Faculty of management, University of Tehran, Tehran, Iran.
2 Associate Prof., Department of Information Technology Management, College of Management, University of Tehran, Tehran, Iran.
3 Associate Prof., Department of Business Management, Farabi Collage, University of Tehran, Tehran, Iran.
4 Ph.D., Department of Information Technology Management, Faculty of Management, University of Tehran, Tehran, Iran.
5 Ph.D. Candidate, Department of Information Technology Management, Faculty of management, University of Tehran, Tehran, Iran.
Abstract
Two predominant methods for analyzing financial markets have been technical and fundamental analysis. However, the emergence of the Internet has altered the trading landscape. The availability of Internet and social media access plays a moderating role in information asymmetry, resulting in investors making informed decisions. Social media has turned into a source of information for investors. Through diverse communication channels on social media, investors articulate their perspectives on whether to buy or sell a stock. According to Surowiecki, the collective opinions gathered through social media frequently offer better predictions than individual opinions, a phenomenon referred to as the Wisdom of the Crowd. The wisdom of the crowd stands as an essential measure within social networks, with its potential to reduce errors and lessen information-gathering costs. In this study, we tried to evaluate the wisdom of the crowd's potential to improve stock price prediction accuracy. So, we developed a prediction model by Long Short-Term Memory based on the wisdom of the crowd. Users’ opinions in Persian about the Tehran Stock Exchange (TSE) stocks were collected from SAHMETO for eight months. The Support Vector Machine classified them into buy, sell, and neutral classes. During the research period, people mentioned 823 stocks, and 52 stocks with over 100 signals were chosen. The results of the study show that although the model presented has achieved an acceptable level of accuracy, correlations between the actual and predicted values exceeded 90%. The accuracy metrics of the proposed model compared to the base model were not improved.
Keywords
Introduction
Prediction is a hot topic in finance and business (Adebiyi et al., 2012), which has acquired remarkable importance among expert analysts and financiers. Stock price plays an essential part in the dynamics of economic activities, which can sign or affect social temperament (Soni, 2011). Price is critical information for investors (Nikou et al., 2019), which is complicated because of the stock market's dynamic behavior, non-linearity, and tumultuous essence (Geng et al., 2022). Stock prices are affected by various factors, such as politics, economics, and investor sentiment. There is no fundamental rule to estimate and predict prices in the stock market (Shahverdiani & Khajehzadeh, 2018). Investors need accurate and fast information to make effective decisions (Wang et al., 2018). However, with the Internet, the transaction's course changed (Geng et al., 2022). The Internet provides instant access to data sources, articles, and statistics related to companies (Coyne et al., 2017). Moreover, the Internet has made a unique system to collect independent opinions of people who interact online in chat rooms (Hamada et al., 2020). Habermas explained that exchanging information and ideas in public spaces will become a public opinion. Social media has offered a platform to voice opinions and become a popular hub for discussing financial markets (Rajabi & Khaloozadeh, 2020).
Social media become a new data source for investors, which helps them to obtain more information to decide (Ding & Hou, 2015). Sharing opinions and votes on social networks manifests the wisdom of the crowds, which can be a key to profitable investment (Hill & Ready-Campbell, 2011). Based on the wisdom of the crowd assumption, a large crowd can outperform smaller groups or few individuals (Surowiecki, 2005) (Kumar et al., 2020). The most important feature of social media is wisdom and collective awareness (Rahjerdi et al., 2022). Price in behavioral economics is a perceived value (Cristescu et al., 2022); thus, exploring the effect of crowd opinions on stock price will be rational. According to behavioral economics, investors' decision-making is not entirely based on objective information and is also influenced by investors' emotions. In fact, social interactions influence financial decisions (Gui, 2019). Collective opinions often predict better than individuals (Geng et al., 2022). The wisdom of the crowds will reduce error and cost and increase the amount of data (Raie et al., 2016).
(Woolley et al., 2010) (Nofer & Hinz, 2014) (Velic et al., 2013) (Pan et al., 2012) (Zhang et al., 2011) (Hill & Ready-Campbell, 2011) (Saumya et al., 2016) (Eickhoff & Muntermann, 2016) (Al-Hasan, 2018) (Xu et al., 2017) (Sun et al., 2017)(Arnes & Copenhagen, 2014) (Reed, 2016) (Bari et al., 2019) (Wu et al., 2019) (T. Li et al., 2018) (Garcia-Lopez et al., 2018) (Hatefi et al., 2020) (Wu et al., 2020) (Breitmayer et al., 2019) (Chao et al., 2019) (Geng et al., 2022) studied the role of the wisdom of the crowd on the stock market and forecasting performance. Results have shown that the wisdom of the crowd has the potential to increase the accuracy of prediction models in the stock market and improve the performance of models. In comparison, (Lorenz et al., 2011) (Antweiler & Frank, 2004) (Tumarkin & Whitelaw, 2001) (Das & Chen, 2007) (Raie et al., 2016) (Dewally, 2000) denied and did not find a relationship between the wisdom of the crowd and stock price prediction. What is clear is that, in addition to the history of prices, the stock market is also affected by the feelings of society and investors (Hatefi et al., 2020). However, the issue of using the wisdom of the crowd in social media to predict stock prices still needs to be solved and is challenging. Research on improving the effectiveness of prediction models has also continued. Technical and fundamental analysis have always been the two main approaches of stock market analysts (D. Shah et al., 2019). Technical analysis is based on historical financial data. Unstructured data such as news articles, financial reports, and analysts' analyses is analyzed fundamentally (Vargas et al., 2017). Fundamental analysis examines macroeconomic data and corporations' financial well-being. Statistical and econometrics are also common methods (Ji et al., 2021). Forecasting models that are extensively used in predicting the stock market include autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), generalized autoregressive conditional heteroskedasticity (GARCH), and smooth transition autoregressive (STAR) (Rajabi & Khaloozadeh, 2020). However, these methods must be more dynamic to deal with time series (Ji et al., 2021). Many machine-learning approaches have evolved as an essential analytical tool in financial markets (Nikou et al., 2019). They can handle noisy, nonlinear, and uncertain data (Rajabi & Khaloozadeh, 2020). So, they are more efficient and reach more desired results than statistical methods (Nikou et al., 2019). The main problem with machine learning performance depends upon the representation of data they are given. As the feature space increases, the model's training time will increase, too, and the model outputs become more challenging to comprehend. While dimensionality reduction can damage (Masalegou et al., 2022). In recent studies, deep learning models have been mentioned as a powerful alternative to machine learning approaches. Deep learning algorithms are the best choice for nonlinear learning and feature extraction from big data (Rajabi & Khaloozadeh, 2020). High accuracy, generalization power, and new data identification are the most critical features of deep learning (Nikou et al., 2019). Sentiment analysis is also one of the vital ways to predict the stock market. In addition, social media plays an essential role in sentiment analysis (Mukherjee et al., 2021). The potential of social media content and text mining techniques still deserves more attention and effort. This research explored whether the wisdom of the crowds in SAHMETO can increase the accuracy of the prediction model or not. SAHMETO is a social system, and people share their opinions in Persian about the Tehran Stock Exchange (TSE). The structure of the present article is as follows: First, we briefly review the wisdom of the crowd and use it in stock market prediction. Next, we have introduced the methods and models used. Then, the results of the research are presented. Moreover, finally, we discuss and conclude the results.
Literature Review
Internet availability is a phenomenon of going times that is changing the world. Social media is the common platform on the Internet. In the last few years, there has been a heated debate over stock trading—investors' quest for tools and techniques to increase profit and reduce risk. Driving profits from the trading of stocks is a vital agent for predicting the stock market (Rouf et al., 2021). Previous research found that social media can explain market trends (Eickhoff & Muntermann, 2016). Therefore, in order to measure the sentiment of society and investors, social media content has become an important data source (Hatefi et al., 2020). Prices reflect all available information in an efficient market. Therefore, more information increases the accuracy (Koen, 2014), and an investor can predict the variation of an asset more accurately.
Efficient market theory depends on the concept of the wisdom of the crowd. Pooling an individual's opinions and knowledge will be more effective than relying on one expert (Kristjanpoller et al., 2021). The wisdom of the crowd is one of the essential characteristics of social networks (Mukherjee et al., 2021). In 2004, James Surowiecki authored a well-known book, "The Wisdom of the Crowds," which explores the concept of large groups of people exhibiting superior intelligence compared to an individual (Koen, 2014). He concludes that "large groups of untrained people are better at decision-making than a few elites in that domain." The wisdom of the crowd is about how to judge correctly. It is not just an analytical decision, a compromise, a vote, or a win-win situation. It promotes dialogue and a shared understanding of right action and purpose (Eickhoff & Muntermann, 2016).
Wisdom is not about just a few wise people but about the aptitude of human communities to make intellectual choices. Diversity, independence, and decentralization are essential factors that enable the emergence of the wisdom of the crowd. The core idea of the wisdom of the crowd is the existence of group diversity. Although experts possess more excellent knowledge, a blend of informed and uninformed persons could outperform them. Surowiecki argues that independence acts as a counterbalance to herding behavior (Eickhoff & Muntermann, 2016). In reality, preserving independent opinions is challenging (Mavrodiev & Schweitzer, 2021). Coordinating decisions is more accessible in a centralized group (Eickhoff & Muntermann, 2016).
Studies indicated that the wisdom of the crowds enhances prediction accuracy. (Hill & Ready-Campbell, 2011) the online crowd outperforms more, on average, than the S&P 500. Specialists' judgments were weighed more, resulting in better accuracy and higher returns. (Pan et al., 2012) demonstrated that the wisdom of the crowd outperforms individual trades. They mentioned that social influence plays a significant role in trades, chiefly during unknown decisions. (Nofer & Hinz, 2014) announced that we can observe the wisdom of the crowd phenomenon on the Internet. Also, prediction results, according to the wisdom of the crowd, were 0.59 percent higher than analysts' return. It was acknowledged that user-created content boosts market efficiency and overall welfare. They concluded that performing in the crowd improves with a higher independence.
The wisdom of the crowd surpasses experts when information is easily accessible (Eickhoff & Muntermann, 2016)—discovered that the online financial forum could serve as a potential data source for forecasting stock market changes. Likewise, ( Reed,2016) concluded that conversation multiplicity of economic matters causes moves in daily stock market prices and has a substantial adverse effect. Also, he concluded that individuals discuss negative events in markets rather than positive results. Thus, a growth in conversation multiplicity about economic indicators will cause individuals to sell off their stock. (Sun et al., 2017) demonstrated a powerful correlation and Granger causality between chat-room sentiment and the stock price movement in China. In addition, sentiment analysis can enhance the prediction of stock price return. (Xu et al., 2017) announced that detailed sentiments encompass more information about the stock market than polarity sentiments. (X. Li et al., 2019) proposed a new text-based crude oil price forecasting method. They used deep learning techniques, sentiment analysis, and topic derivation. The proposed approach achieves the desired accuracy for crude oil forecasting. Combining news text and financial market information significantly enhances crude oil price forecasting accuracy. (Wu et al., 2019)(Bari et al., 2019) and (Chau et al., 2020) also showed that the wisdom of the crowd and sensitivity analysis are effective in predicting stock prices and returns. The wisdom of the crowd has both positive (wisdom) and negative (irrational) effects on the search process. (Geng et al., 2022) mentioned that the crowd's opinions increase the positive impact of internet searches on returns.
(Khanigodarzi et al., 2019) did a survey research and distributed a questionnaire among 151 investors. According to the results, negative words in the written media influence the emotional feelings of investors. In the same way emotional feelings affect the market index, the market index affects the behavior of investors. (Majd, 2019) provided a model based on text mining and machine learning. He evaluated the impact of using the ideas of social media analysts in predicting the stock price trend in Iran's capital market from April 2017 to May 2019. They investigated four symbols. The hybrid model had better accuracy for one-day predictions but not long-term ones. (Hatefi Ghahfarrokhi & Shamsfard, 2020) gathered users' explanations for three stocks from SAHAMYAB for three months. Demonstrated that Volume and the sentiment of the content could help predict the daily return. (Yaghoobi, 2020) studied the impact of social media on people's financial decisions. Twenty symbols were investigated for three months. Results illustrate that adding social media data to the predictor model decreases error rates. Also (Roast et al., 2020) concluded that the accuracy of the results will be higher by considering sentiment analysis. (Mansour et al., 2021) worked based on the data of Nguyen et al. (2015). The sentiment information of people on social media was used to improve stock prediction accuracy by 19.8%. (Ebrahimian et al., 2021) devised a model that forecasts stock movement with 72.08 percent accuracy. Examining 14 stocks and predicting their price trends were performed by analyzing the sentiments of users' ideas and synthesizing them with 20 technical indicators. Decision tree, Naïve Bayes, and support vector machine were used to investigate 14 stocks. The findings indicate a significant correlation between the trading volume of the following day and the number of opinions. (Memarzadeh et al., 2022) collected data on (Cisco) and (Intel) from the social networks Twitter and Yahoo Finance for three months. Emotion indices have effectively predicted the trend of stock market value with the least error.
Studies of the wisdom of the crowd in social media and its effect on the stock market have been conducted in English or translated into English. It is possible to lose information when translating messages from other languages to English. In this article, we used the content of social media SAHMETO in Persian. Previous studies mainly chose sample stocks selected by researchers, usually the most watched stocks or stocks related to the US stock market. This research selected stocks related to the Tehran stock exchange based on the most aggregated comments. In most of the previous research, the prediction is of the classification type, and the prediction of stocks was presented as buying and selling classes. Regression is used in the research to predict the stock price as a numerical value. Previous research used Twitter as a social media platform, while Twitter could be more prominent among Iranian investors. In this article, we used SAHMETO data as social media data in which Iranians share their opinions in Persian about the Tehran stock exchange. Survey methods and statistical analyses have been conventional methods of previous domestic research. Also, using more advanced methods such as machine learning, only one or two stocks are used as samples, not as Tehran exchange stocks. So, in this research, we explored the effect of the wisdom of crowds in SAHMETO as social media on prediction model accuracy.
Research Methodology
This research is applied in terms of purpose. Because of the hardware limitation of SAHMETO servers, the maximum time period could be eight months. So, to extract the wisdom of the crowds in SAHMETO, opinions of users were gathered from 10 January 2020 to 15 August 2022, which were 177,444 records. After removing duplicated and null records, 151,274 records remained. Preprocessing is a key factor in improving classification accuracy. Preprocessing includes tokenizing, stemming, and removing stop-words. Tokenization splits sentences into words. (Khedr et al., 2017). Stemming transforms a word into its root form. Stop words are common document words, like conjunctions. These words are unnecessary for distinguishing two documents (Aqlan et al., 2019). The subsequent stage entails the extraction of feature sets for training the classifier. We followed the sentiment score measurement (1) for feature extraction. Each textual data, in our case, which is the SAHMETO messages, has been categorized into three values: negative, positive, or neutral. The sentiment polarity of each message was determined by adopting the following measurement (Geng et al., 2022)(Gupta et al., 2019). There needs to be an emotional dictionary or special library in the Persian language to analyze the financial market sentiment. We used a sentiment dictionary created from data by expert help for higher precision in classification.
|
(1) |
Where represents the total number of words related to buying stock i on day t and counts the words in the messages related to selling stock i on day t. It is represented by a separate binary variable, C, which represents the sentiment class:
The variable C can hold three distinct values because of varying thresholds (Gupta et al., 2019).
= |
(2) |
We have used a supervised learning algorithm, Support Vector Machine (SVM), which is widely used as a machine learning classifier (Oyland, 2015), In our work. First, we have trained our classifier using sentiment score. After this, the trained classifier is used in predicting the test data. Also, experts tagged messages in order to monitor SVM results. According to (Seif et al., 2021), SVM reduces risk and has strong generalization capabilities. SVM locates a hyperplane in an N-dimensional space, where N signifies the number of features present, enabling clear differentiation between data points. Various hyperplanes may be considered to separate the two classes of data points. The SVM object is to find a plane that has the maximum margin. Increased margin distance results in additional reinforcement, leading to a more confident classification of future data points (Pisner & Schnyer, 2019). Hyperplanes serve as decision boundaries to classify data points. The occurrence of data points on either side of the hyperplane can be attributed to distinct classes. Furthermore, the hyperplane's dimension is dependent on the number of features. If the training points are [ ], the input vector is ϵ , and the value of the class is i = 1,.., J; is defined, ϵ {-1, 1}, then, if the data exhibits linear separability, the decision rules are established and relevant (3) by utilizing an optimal plane, the binary decision classes can be separated.
y= Sign ( + b) |
(3) |
Where y is the output, is the value of the training sample class, and indicates the internal coefficient. The vector x= ( ) presents input data, and vector : 1= 1,…, NXi, are backup vectors. In relationship (3), parameters b and determined the effectiveness. If the data cannot be separated linearly, equation (3) change to equation (4).
Y= Sign ( + b) |
(4) |
The K (X, function serves as a kernel function that produces internal beats to form a machine with distinct nonlinear decision levels within the data space. The support vector machine regression model employs a variety of kernels, including linear, Gaussian, polynomial, Radial Basis Function (RBF), and Sigmoid (Salehi & Aminifard, 2013). Multi-classification problems are transformed into various binary classification problems. The goal is to map data points into a high-dimensional space, enabling linear separation between each pair of classes. The One-to-One strategy reduces the multiclass problem into many binary classification problems. A binary classifier will be implemented for each pair of classes. Another technique that can be employed is referred to as One-to-Rest. The approach involves setting the breakdown to a binary classifier for each class.(Anzid et al., 2019). The Python library Scikit-learn is extensively employed for implementing machine learning algorithms. The scikit-learn library also provides access to SVM. Finally, 823 TSE stocks were mentioned to buy, sell, or neutral during the study period. Signals here mean people's suggestions to buy or sell or being neutral about a stock. Total messages were 36,824, including 10,107 positive (buy), 807 negative (sell), and 25,910 neutral, and as Figure 1 shows, the number of neutral signals is higher than buy and sell signals.
Figure 1. Signals distribution
The research sampling was purposeful: stocks with over 100 signals. Therefore, 52 TSE stocks with over 100 signals in the research period were selected as the research sample. Selected TSE stocks are shown in Table 1.
Table 1. Research sample TSE stocks
rom |
Ticker |
Buy |
Sell |
Neutral |
Total |
row |
Ticker |
Buy |
Sell |
Neutral |
Total |
1 |
KHODRO |
183 |
9 |
1153 |
1345 |
27 |
TOPKISH |
44 |
1 |
89 |
134 |
2 |
KHSAPA |
198 |
4 |
997 |
1199 |
28 |
KMARJAN |
41 |
0 |
91 |
132 |
3 |
KHGOSTAR |
120 |
9 |
664 |
793 |
29 |
SMEGA |
61 |
5 |
64 |
130 |
4 |
SHAPNA |
97 |
5 |
531 |
633 |
30 |
BARAKAT |
38 |
2 |
90 |
130 |
5 |
SHATRAN |
64 |
10 |
378 |
452 |
31 |
FAZAR |
24 |
1 |
100 |
125 |
6 |
VATEJARAJ |
108 |
5 |
329 |
442 |
32 |
KIAMA |
43 |
4 |
77 |
124 |
7 |
KHZAMYA |
98 |
11 |
326 |
435 |
33 |
FKHOZ |
41 |
1 |
82 |
124 |
8 |
VABEMELAT |
77 |
6 |
343 |
426 |
34 |
SIMORGH |
51 |
2 |
71 |
124 |
9 |
VAJAMI |
132 |
1 |
281 |
441 |
35 |
KERMAN |
42 |
5 |
75 |
122 |
10 |
VAKHARAZM |
148 |
4 |
198 |
350 |
36 |
KHTERAK |
42 |
2 |
88 |
122 |
11 |
FARABOURCE |
111 |
1 |
215 |
327 |
37 |
FOLEY |
43 |
1 |
78 |
122 |
12 |
VARNA |
47 |
5 |
274 |
326 |
38 |
KHTOGHA |
51 |
2 |
65 |
118 |
13 |
VASAPA |
44 |
3 |
265 |
312 |
39 |
KHMOTOR |
30 |
0 |
88 |
118 |
14 |
ENERGY3 |
85 |
2 |
149 |
236 |
40 |
NOORI |
25 |
10 |
83 |
118 |
15 |
PALAYESH |
52 |
1 |
156 |
209 |
41 |
BOURCE |
45 |
3 |
69 |
117 |
16 |
LEPARS |
81 |
6 |
118 |
205 |
42 |
KAMA |
37 |
4 |
75 |
116 |
17 |
VBESADER |
49 |
2 |
141 |
192 |
43 |
FLOULE |
22 |
2 |
89 |
113 |
18 |
HEKESHTI |
36 |
3 |
152 |
191 |
44 |
THAKHT |
45 |
0 |
67 |
112 |
19 |
THFARS |
52 |
8 |
120 |
180 |
45 |
ZDASHT |
32 |
2 |
75 |
108 |
20 |
SHABRIZ |
19 |
3 |
140 |
162 |
46 |
GHTHABET |
32 |
3 |
70 |
105 |
21 |
FMORAD |
27 |
7 |
126 |
160 |
47 |
THABAD |
41 |
2 |
61 |
104 |
22 |
DARAYEKOM |
40 |
2 |
106 |
148 |
48 |
SIDEBAR |
37 |
4 |
62 |
103 |
23 |
KHKAVEH |
23 |
2 |
117 |
142 |
49 |
MODIRIYAT |
17 |
4 |
80 |
101 |
24 |
KHCHARKHESH |
21 |
1 |
118 |
140 |
50 |
VAZAR |
18 |
7 |
75 |
100 |
25 |
FOREVER |
33 |
0 |
107 |
140 |
51 |
VBARGH |
32 |
4 |
64 |
100 |
26 |
GHNOOSH |
44 |
8 |
84 |
136 |
52 |
SNIR |
32 |
1 |
67 |
100 |
We drew on people's suggestions about TSE stocks to design a prediction model and test if using people's opinions as an indicator improves accuracy. So, to build a prediction model, as previous research pointed out, deep learning models are a powerful and efficient approach to stock market prediction, which effectively process time-series and multi-period data (Rajabi & Khaloozadeh, 2020). Deep learning is data-driven, and increasing the depth of the network will improve its ability (Dastgerdi & Brojeni, 2019). Deep learning can improve semantic information (Ji et al., 2021), extract features, and minimize human intervention in feature selection(Yang et al., 2020). (Nabipour et al., 2020) (A. W. Li & Bastos, 2020) (Bhandari et al., 2022) (A. Shah et al., 2022) all mentioned that deep learning is the best for building a prediction model for stock price patterns. Our paper studies the stock price prediction method based on deep learning. Since the 1950s, machine learning as a limited portion of artificial intelligence has revolutionized multiple fields. Neural networks belong to the subfield of machine learning, and it was from this subfield that deep learning emerged (Alom et al., 2018). Deep learning is a hierarchical structure comprising more hidden layers (Y. Li et al., 2019). Deep learning approaches are classified into three categories: supervised, semi-supervised, and unsupervised. Supervised learning employs labeled data, where the algorithm receives inputs and corresponding outputs. Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), including Long Short-term Memory (LSTM), and Gated Recurrent Units (GRU) are supervised learning approaches. Semi-supervised learning is based on partially labeled datasets, such as Deep Reinforcement Learning (DRL) and Generative Adversarial Networks (GAN). Unsupervised learning seeks hidden relationships and structures without labels. Auto Encoders (AE), Restricted Boltzmann Machines (RBM), and the developed GAN. Moreover, RNNs, such as LSTM and RL, are also used for unsupervised learning (Alom et al., 2018). Research comparing deep learning algorithms showed that LSTM had more accurate results than other algorithms (K. Chen et al., 2015) (Nelson et al., 2017)(Roondiwala et al., 2017)(Y. Chen et al., 2018) (Arora & M, 2019)(Eapen et al., 2019)(Nikou et al., 2019)(Nabipour et al., 2020)(Yadav et al., 2020)(Shen & Shafiq, 2020)(Hargreaves & Chen, 2020). Also (A. et al., 2020), based on a systematic review, pointed out that 73.5 percent of recent research used LSTM as a deep learning algorithm in order to predict. Therefore, we adopt LSTM in this paper to achieve a more accurate price prediction. LSTM is an algorithm that can be employed for time series forecasting because of its ability to store memory and solve the gradient vanishing problem (Moghar & Hamiche, 2020). LSTM network is a deep and recurrent model of neural networks. Recurrent networks have not only neural connections in a solitary direction, but also neurons can pass data to a previous or the same layer (Nelson et al., 2017). LSTM can recall prior states and incorporate them into prediction. The LSTM model incorporates three gates, namely the input gate, forget gate, and output gate, which facilitate processing information from previous states and the current input to derive the subsequent state (A. Shah et al., 2022). The forget gate serves to eliminate information from the cell state. The input and output gates function to determine the incorporation of information into the cell state and the utilization of information as an output, correspondingly (Hargreaves & Chen, 2020). Figure 2 below portrays the stream of information at time t. Each gate was associated with an activation function for calculating a weighted sum.
Figure 2. The Architecture of the LSTM Network
The equations and calculations are presented below, where W denotes the weight matrix, b is the bias vector, σ represents the sigmoid function, and Tanh denotes the hyperbolic tangent function.
(5) |
|
(6) |
|
(7) |
|
(8) |
|
(9) |
|
(10) |
|
Sigmoid, Tanh, and ReLU are different activation functions used for LSTM (Vargas et al., 2017). Also, Optimizer functions are SGD, Adagrad, AdaDelta, RMSprop, and Adam. The optimization algorithm is used to update the network's weights through the iterative processing of training data (Alom et al., 2018). (Jiang, 2021) Adam is the most used optimizer function for stock market prediction. The LSTM model's best architecture is based on neuron amounts and activation functions. Some neurons will lead to more error, and the network will not converge. Also, many neurons will lead to overfitting and high error (Alom et al., 2018). Determining suitable parameters for the prediction model is a major challenge, and most parameters are selected empirically. Various architectures were subjected to testing to identify the optimal model with the least loss value. Despite its widespread use in research studies, the activation of the ReLU function resulted in an increase in the loss function value. Due to its lower and acceptable values for the loss function, the activation function selected was the hyperbolic tangent (Tanh). LSTM model had an input layer, two hidden layers, and a dense (output) layer. The neurons in each layer were 32. LSTM Model architecture is shown in Figure 3.
Figure 3. The Architecture of the proposed LSTM model
A description of the LSTM model is shown in Table 2, in which the epoch defines the number of times the entire data set has to be worked through the learning algorithm. Also, batch size defines the number of samples propagated through the network.
Table 2. LSTM model parameters
parameters |
Values |
Cells of each layer |
32 |
Batch size |
1 |
Hidden layers |
2 |
Activation function |
Tanh |
optimizer function |
Adam |
Dense layer |
1 |
Epochs |
100 |
Furthermore, the loss function employed was a Mean Squared Error (MSE) (Singh & Srivastava, 2017), equation 11, which determines the difference between the current output of the algorithm and the expected output.
|
(11) |
We fed our LSTM model with buy, sell, and neutral signals from 52 selected TSE stocks and their close prices from www.tsetmc.com. In order to train the model, we divided the dataset into three parts: train data (70%), test data (15%), and validation data (15%); dataset details are shown in Table.
Table 3. Distribution of datasets
Train data |
(2022-01-10) to (2022-06-12) |
133 trading day |
Test data |
(2022-06-14) to (2022-07-15) |
33 trading day |
Validation data |
(2022-07-16) to (2022-08-15) |
30 trading day |
Total |
(2022-01-10) to (2022-08-15) |
196 trading day |
Normalization was done to eliminate the adverse effects of the different scales of each stock. In this research, for normalization, the minimum and maximum method was used to convert the data into a new data set, equation 12. All values are between zero and one (Hiransha et al., 2018).
|
(12) |
Moreover, the model is evaluated on the test dataset. In this research, the metrics of the Normalized Root Mean Square Error (nRMSE), equation 13, Mean Absolute Percentage Error (MAPE), equation14 (Nikou et al., 2019), coefficient of determination (R2), equation 15 (Vanani, 2013) and correlation of actual and predicted values had been used.
|
(13) |
|
(14) |
|
(15) |
In formula, is the actual value, and is the predicted value. LSTM model M0, called the base model, is fed by only the close price of 52 TSE stocks. Also, the second LSTM model, M1, is fed by the close price and number of buy, sell, and neutral signals of each 52 TSE stock. The model description is mentioned in Table 4.
Table 4. Model description
Features of models |
Models |
|||
|
|
|
close price |
M0 |
number of neutral signals |
number of sell signals |
number of buy signals |
close price |
M1 |
Results
The following Table shows the results of the run of two LSTM models, M0 and M1, on the validation dataset. M0, the base model, relied solely on the close price of stocks. However, the proposed M1 model incorporated the collective insights of SAHMETO's crowd through buy, sell, and neutral signals. According to the results, the accuracy metrics of the M1 were similar to the base model. Incorporating crowd suggestions as the wisdom of the crowd indicators did not enhance model accuracy or improve model performance.
Table 5. LSTM model results
Models
Symbols |
M0 |
M1 |
M0 |
M1 |
M0 |
M1 |
M0 |
M1 |
MAPE |
MAPE |
nRMSE |
nRMSE |
R2 |
R2 |
Corr |
Corr |
|
KHODRO |
0.006 |
0.011 |
0.045 |
0.014 |
0.980 |
0.895 |
0.999049 |
0.973763 |
KHSAPA |
0.002 |
0.008 |
0.013 |
0.046 |
0.998 |
0.969 |
0.999714 |
0.993592 |
KHGOSTAR |
0.003 |
0.011 |
0.024 |
0.085 |
0.995 |
0.937 |
0.999975 |
0.988921 |
SHAPNA |
0.004 |
0.005 |
0.034 |
0.054 |
0.991 |
0.975 |
0.999404 |
0.992977 |
SHATRAN |
0.001 |
0.002 |
0.016 |
0.040 |
0.998 |
0.987 |
0.999876 |
0.996787 |
VTEJARAT |
0.007 |
0.014 |
0.052 |
0.019 |
0.970 |
0.867 |
0.998569 |
0.974524 |
KHZAMYA |
0.005 |
0.012 |
0.043 |
0.104 |
0.975 |
0.855 |
0.990915 |
0.945431 |
VBMELAT |
0.007 |
0.008 |
0.038 |
0.054 |
0.980 |
0.960 |
0.999876 |
0.979907 |
VJAMI |
0.007 |
0.006 |
0.04 |
0.037 |
0.975 |
0.979 |
0.999812 |
0.996878 |
VKHARAZM |
0.003 |
0.004 |
0.035 |
0.045 |
0.985 |
0.975 |
0.999558 |
0.993231 |
FARABOURCE |
0.001 |
0.005 |
0.009 |
0.041 |
0.999 |
0.979 |
0.999627 |
0.993231 |
VRENA |
0.008 |
0.004 |
0.035 |
0.020 |
0.991 |
0.997 |
0.999365 |
0.998723 |
VSAPA |
0.003 |
0.010 |
0.016 |
0.047 |
0.998 |
0.979 |
0.999859 |
0.992852 |
ENERGY3 |
0.008 |
0.012 |
0.033 |
0.058 |
0.991 |
0.972 |
0.999905 |
0.992125 |
PALAYESH |
0.001 |
0.002 |
0.017 |
0.024 |
0.996 |
0.992 |
0.999707 |
0.998208 |
LPARS |
0.002 |
0.003 |
0.022 |
0.043 |
0.993 |
0.975 |
0.999762 |
0.990889 |
VBSADER |
0.003 |
0.007 |
0.033 |
0.067 |
0.985 |
0.939 |
0.999200 |
0.975405 |
HKESHTI |
0.001 |
0.005 |
0.013 |
0.050 |
0.998 |
0.974 |
0.999844 |
0.995888 |
THFARS |
0.002 |
0.014 |
0.018 |
0.069 |
0.994 |
0.918 |
0.999214 |
0.974986 |
SHABRIZ |
0.006 |
0.009 |
0.016 |
0.035 |
0.998 |
0.993 |
0.999935 |
0.996578 |
FMORAD |
0.006 |
0.005 |
0.033 |
0.031 |
0.986 |
0.987 |
0.996704 |
0.994000 |
DARAYEKOM |
0.008 |
0.010 |
0.084 |
0.091 |
0.931 |
0.919 |
0.997544 |
0.982543 |
KHKAVE |
0.002 |
0.003 |
0.013 |
0.022 |
0.998 |
0.995 |
0.999474 |
0.998870 |
KHCHARKHESH |
0.002 |
0.006 |
0.013 |
0.034 |
0.998 |
0.988 |
0.999474 |
0.994973 |
FARAVAR |
0.001 |
0.004 |
0.014 |
0.059 |
0.998 |
0.963 |
0.999683 |
0.986542 |
GHNOOSH |
0.002 |
0.002 |
0.014 |
0.020 |
0.998 |
0.996 |
0.999143 |
0.998618 |
TAPKISH |
0.003 |
0.005 |
0.028 |
0.053 |
0.989 |
0.960 |
0.999595 |
0.983138 |
KMARJAN |
0.001 |
0.002 |
0.013 |
0.018 |
0.998 |
0.995 |
0.999419 |
0.998809 |
SMEGA |
0.008 |
0.008 |
0.048 |
0.052 |
0.988 |
0.982 |
0.999202 |
0.994838 |
BAREKAT |
0.003 |
0.005 |
0.016 |
0.032 |
0.997 |
0.989 |
0.999228 |
0.994817 |
FAZAR |
0.002 |
0.005 |
0.024 |
0.051 |
0.988 |
0.944 |
0.998852 |
0.991333 |
KDAMA |
0.002 |
0.002 |
0.018 |
0.021 |
0.996 |
0.995 |
0.999297 |
0.997712 |
FKHOUZ |
0.006 |
0.011 |
0.032 |
0.060 |
0.992 |
0.972 |
0.999754 |
0.989908 |
SEAMORGH |
0.001 |
0.004 |
0.018 |
0.063 |
0.996 |
0.954 |
0.99201 |
0.985371 |
KERMAN |
0.002 |
0.003 |
0.022 |
0.037 |
0.993 |
0.982 |
0.999235 |
0.994960 |
KHTERAC |
0.003 |
0.007 |
0.027 |
0.044 |
0.992 |
0.978 |
0.999452 |
0.996722 |
FOLAY |
0.013 |
0.006 |
0.066 |
0.047 |
0.920 |
0.959 |
0.998602 |
0.983439 |
KHTOGHA |
0.002 |
0.016 |
0.015 |
0.079 |
0.998 |
0.942 |
0.999684 |
0.990104 |
KHMOTOR |
0.008 |
0.010 |
0.039 |
0.057 |
0.985 |
0.967 |
0.999676 |
0.992905 |
NOORI |
0.002 |
0.005 |
0.016 |
0.028 |
0.995 |
0.985 |
0.998386 |
0.993326 |
BOURCE |
0.007 |
0.008 |
0.039 |
0.047 |
0.984 |
0.977 |
0.999587 |
0.993400 |
KAMA |
0.004 |
0.008 |
0.033 |
0.060 |
0.985 |
0.950 |
0.998534 |
0.980997 |
FLOULE |
0.001 |
0.004 |
0.011 |
0.023 |
0.999 |
0.994 |
0.999561 |
0.997310 |
THAKHT |
0.004 |
0.004 |
0.041 |
0.044 |
0.984 |
0.981 |
0.998571 |
0.993215 |
ZDASHT |
0.001 |
0.003 |
0.012 |
0.033 |
0.999 |
0.990 |
0.999644 |
0.995163 |
GHSABET |
0.003 |
0.005 |
0.014 |
0.037 |
0.998 |
0.986 |
0.999653 |
0.994231 |
THABAD |
0.008 |
0.016 |
0.037 |
0.076 |
0.990 |
0.958 |
0.999772 |
0.988824 |
SDABIR |
0.009 |
0.006 |
0.027 |
0.030 |
0.992 |
0.990 |
0.998178 |
0.995180 |
MODIRIAT |
0.003 |
0.004 |
0.024 |
0.027 |
0.990 |
0.987 |
0.999121 |
0.995552 |
VAZAR |
0.001 |
0.005 |
0.01 |
0.055 |
0.999 |
0.956 |
0.999978 |
0.994738 |
VBARGH |
0.005 |
0.006 |
0.044 |
0.041 |
0.978 |
0.981 |
0.99758 |
0.994415 |
SNEIR |
0.002 |
0.007 |
0.016 |
0.070 |
0.998 |
0.964 |
0.999625 |
0.984544 |
To facilitate the interpretation and comparison of accuracy results, Figures 4 and 5 present the accuracy metrics of models MAPE and nRMSE, respectively.
Figure 4. MAPE results
Figure 5. nRMSE results
Moreover, as depicted in Figures 4 and 5, the proposed model (M1) does not exhibit better accuracy metrics than the base model (M0).
Discussion and Conclusion
By conducting this research, we aimed to detect if there is any correlation between incorporating the wisdom of crowds and the improvement of the prediction model's accuracy. SAHMETO serves as a social system in which Persian-speaking individuals share their views on the Tehran stock exchange. Subsequently, to extract the wisdom of the crowd, we employed the support vector machine as a classifier to categorize individuals' opinions into buy, sell, or neutral classes. With the assistance of experts, we tagged several opinions and trained the classifier. The research period lasted for eight months, and during this time, people gave their opinions on 823 TSE stocks in total. In order to conduct a thorough examination of the research object, we opted to narrow down our selection to 52 TSE stocks that demonstrated over 100 signals throughout the research period. The study focused on 52 TSE stocks, all categorized into various industries. Our investigation aimed to determine the influence of the crowd's wisdom on the precision of stock price predictions. To do so, we established a prediction model that employed long short-term memory as the basis. As shown in Table 5 and Figures 4 and 5, the accuracy metrics of the proposed model M1 did not improve compared to base model M0. The model presented in this study has achieved acceptable accuracy; however, it is worth mentioning that the correlations between the actual and predicted values exceeded 90%. However, the accuracy metrics of the proposed model compared to the base model were not improved. In a comparative analysis, the results align with (Raie et al., 2016) (Antweiler & Frank, 2004) (Tumarkin & Whitelaw, 2001) findings, and the wisdom arising from the contribution did not result in an enhancement of the research's predictive accuracy model. According to (Nofer & Hinz, 2014) (Oyland, 2015) (Lorenz et al., 2011) (Almaatouq et al., 2020), social influence was found to impact the wisdom of the crowd, which could cause negative outcomes. Social influence can make people change their opinions and estimates. According to Surowiecki, the state of judgment is characterized by the autonomy of individuals' decisions from one another. What is obvious is that social networks provide opportunities to know others' opinions. Also, (Derakhshan Beigy, 2019) suggested that sentiment analysis of the English dataset is more effective and efficient than the Persian dataset for predicting the stock market. The most straightforward approach to merging individual conduct is for the multitude to imitate most individuals in every attempt (Thomas et al., 2021). It is imperative to acknowledge that while the foundation of investor conduct is logical reasoning, they, for diverse reasons, cannot assimilate all the information available in the stock market (Noroozi et al., 2023). This is because of insufficient time to decide and think, limited access to data and information for decision-making, or the insignificance of the cost-benefit of decision-making. In the stock market, the price is impacted by multiple factors and groups of variables. Thus, investors typically rely on available information to make decisions, and the process does not entail the analysis of all pertinent information. This research aimed to investigate the feasibility of leveraging social media, which encompasses vast amounts of data and opinions and a multitude of individuals sharing diverse perspectives and approaches, as a means to evaluate the stock market's status. Nonetheless, the fundamental concept of integrating knowledge from diverse individuals to create a favorable collective outcome is comprehensive. We aimed to explore the applicability of the wisdom of the crowd in complex scenarios, such as stock price prediction, which entails historical data and various market indicators. We use our results to highlight the application of the wisdom of the crowd in studying stock market prediction, which is often approached only from non-psychological statistical perspectives. A primary constraint of this article was the amount of data. The hardware's limitations prevented data acquisition for over eight months. Some TSE stocks were closed because of capital raising during our study. Because of the temporary halt in stock trading, the dispersion of their prices was minimal. Furthermore, the current research was limited by the lack of a dedicated emotional dictionary for the Persian language or specialized software for analyzing financial markets' opinions. For further research, more amount of reliable data may change the results. Also, it is recommended to use other prediction methods or combined deep learning models to compare results. Moreover, exploring the wisdom of the crowd has the potential to predict new financial areas such as digital currency. Our paper helps with the growing literature on social trading platforms. Financial companies and their users can use the results of this article in order to evaluate the profitability of online users` recommendations.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest concerning the research, authorship and, or publication of this article.
Funding
The authors received no financial support for the research, authorship and, or publication of this article.