A Stock-Price-Concerned Messages Analysis System on the Data Mining Technique

ABSTRACT - Planning and programming of databases is essential for the computerization of industrial companies. Wide varieties of information, such as customers’ transaction data and all kinds of stuff, can be found in business databases. To come up with efficient ways to gather and organize different data in different formats so as to enhance the performance of database management, more and more researches have been focused on the development of the data mining technique in recent years. By using the data mining technique, previously unknown but potentially useful information can be dug out from large databases. This paper uses the concept of k line and association rule technique and applies them to the management of market databases to find price fluctuation patterns of stocks, so as to provide the reference for investors. The experiment results of our research shows that the data mining technique can be of considerable help in the prediction of the prices of some individual stocks.



Citation:

June-Horng Shiesh, Tung-Shou Chen, Yi- Chen Liao, and Chi-Te Huang (2002) ,"A Stock-Price-Concerned Messages Analysis System on the Data Mining Technique", in AP - Asia Pacific Advances in Consumer Research Volume 5, eds. Ramizwick and Tu Ping, Valdosta, GA : Association for Consumer Research, Pages: 200-205.

Asia Pacific Advances in Consumer Research Volume 5, 2002      Pages 200-205

A STOCK-PRICE-CONCERNED MESSAGES ANALYSIS SYSTEM ON THE DATA MINING TECHNIQUE

June-Horng Shiesh, National Taichung Institute of Technology, Taiwan

Tung-Shou Chen, National Taichung Institute of Technology, Taiwan

Yi- Chen Liao, National Taichung Institute of Technology, Taiwan

Chi-Te Huang, Providence University, Taiwan

ABSTRACT -

Planning and programming of databases is essential for the computerization of industrial companies. Wide varieties of information, such as customers’ transaction data and all kinds of stuff, can be found in business databases. To come up with efficient ways to gather and organize different data in different formats so as to enhance the performance of database management, more and more researches have been focused on the development of the data mining technique in recent years. By using the data mining technique, previously unknown but potentially useful information can be dug out from large databases. This paper uses the concept of k line and association rule technique and applies them to the management of market databases to find price fluctuation patterns of stocks, so as to provide the reference for investors. The experiment results of our research shows that the data mining technique can be of considerable help in the prediction of the prices of some individual stocks.

INTRODUCTION

Industrial company databases, which store huge quantities of business transaction data, can provide useful messages for policymakers when the data mining technique is properly applied. For example, by using the data mining technique, business policymakers can find some certain customer behavior patterns hidden in the business transaction databases and thus can improve the services and quality of products accordingly to increase the profits [3].

Another example is the Internal Revenue Service of United States, which uses the data mining technique to find out the errors made by taxpayers or the amount of evading taxes, so as to increase the national tax revenue.

In this paper, we shall try to apply the data mining technique to the analysis of the price fluctuation patterns of individual stocks. When constructing this system, we have employed the concepts of association rule and K line. Different stock price patterns are established according to the idea of K line [1]. The price fluctuation patterns of stocks will be found and provided for investors’ reference.

As for the association rule technique [2][3], it is mainly used to find out the relationship between two data items. The expression A?B presents such a relationship. The purpose of the technique is to help explore what data items can lead to changes in other data items. Two thresholds, which are the support and the confidence, must be set in the association rule. To be more specific, the thresholds can be set by the user.

The rest of this paper is organized as follows. Section 2 will serve to describe the system construction which explains the publishing method and experiment steps of the research, followed by the experiment results and conclusions in Section 3 and Section 4, respectively.

SYSTEM CONSTRUCTION

The purpose of this paper is to build up a stock-price-concerned messages analysis system based on the data mining technique. This system is supposed to be able to apply the data mining technique to the gathering and organizing of all kinds of data potentially useful from market databases, and thus to find the underlying rules that govern the price fluctuation patterns of stocks and make use of these patterns to analyze and predict the stock prices in the future.

First, the classification of K line patterns is conducted according to the K line principle. In other words, our system has to transform the everyday stock price extracted from the system database into its K line pattern before the following tasks of comparisons and classifications can be done. Then, the system finds and analyzes the path that led the stock price to go from the past to the present. Finally, the result of the analysis can be referenced when a decision is to be made in the future.

The following paragraphs are the construction steps of our system:

Step 1: Classification of the K line pattern

The K line, a tool to support the technique, consists of the opening price, closing price, ceiling price and upset price in this research. According to all possible conditions, there are eleven different K line patterns in total as shown in Table 1.

Step 2: Dat clustering

In this step, we classify the history prices of stocks by using the eleven K line patterns back in Step 1. For example, we digitize the K line patterns of 1101 TCC and show the statistics in Figure 1.

As Figure 1 shows, in the history of the stock price changes, class A of the K line occurs for 17 times whereas class B occurs 19 times. Among all the classes, the maximum is class D, which occurs 62 times. This message means that class D occurred most frequently in 1101 TCC.

Step 3: Association Analysis and Statistics

In this step, the system gets informed of what K line pattern today stock price is by matching it with one of the K lines defined in Step 1. Then, the system searches the same K line pattern as today in the market database and records the next K line pattern automatically at the same time. Then the system will scan and analyze the database according to the pattern record. Finally, the system will transfer the result into digital data after the analysis and provide the data for investors as references for future investment. Please refer to Figure 2.

In Figure 2, the system adopts a class D pattern for today after considering all the K line patterns. As we mentioned earlier, class D occurred 62 times in the history. Now, the system will record the 62 next-day patterns of class D. The result of the analysis will reveal the most probable pattern for tomorrow as Figure 2 shows. The probabilities are: A: 3: 5%, B: 2: 3%, C: 8: 13%, D: 17: 28%, E: 5: 8%, and F: 5: 8%. The maximum occurrence probability was class D: 17: 28%. This means that when class D is for today, the maximum probability for tomorrow will be class D. The figure "28%" in the expression "D: 17: 28%" comes form 17 divided by 62. The number 62 is for class D. Out of the 62 times where class D occurred, there where 17 times where the next day was still class D (see Figure 3).

TABLE 1

ACCORDING TO THE FOUR STOCK TYPES, THE K LINE PATTERN CAN BE CLASSIFIED INTO 11 TYPES

If the user wants to know more about the prediction for tomorrow, he/she can just click the highlighted line the arrowhead points at in Figure 3. Just one click, and the system will show the prediction results. The system will do the same thing as we just discussed except that the search all over the market database will be for the DD pattern now. The result is shown in the square titled "After the next day" in Figure 3. For example, the expression D: 4: 24% means that the probability for three D’s in a row, starting from today, is 24%.

Step 4: Estimate Suggestion

After the above step, the system will provide a suggestion for investors. It includes the possible conditions for the next day and for the day after that with the interpretation of today’s market graph as Figure 4 shows.

FIGURE 1

THE CLASSIFICATION RESULT OF 1101 TCC ACCORDING TO THE CLUSTERING RULE

FIGURE 2

WE GOT TODAY=S MARKET CONDITION-CLASS D ACCORDING TO THE SYSTEM AND THE PREDICTED MARKET CONDITION FOR TOMORROW

EXPERIMENT RESULTS

Take 1101 TCC [4] for example. The stock prices of 1101 TCC in the past 310 days are transferred into the K line pattern.

The K line pattern of 1101 TCC stock prices in the 310-day history is:

{GGGGHHBBFFBBAADDIIKKBBFFHHGGKKCCCCGG

GGGGDDKKEEDAAHFHCAHFDABCDKGGAFACAEDCC

DGCDDCHEAECKFGDEEFFCAHEFADHGGECBDDBEH

BEHCGGFGDCECDFHCGDDDFGGFDDDGICDGDGGCBHF

CEDFCCGEGECACCCEDGDGGCEGHGDDBDAHGEIDGDF

GKKHDGDDCFHCCGDDDCCCCBFDGCIDCECDDGKE

HDFGGDDEEEBCDGDDCDEGAECDGCEKBCFGDHFDDD

KKBGGFEDEDGICFHEDGJCDECCBBEDHKBBKCKCEDCIHG

GACCAD}?

Suppose the stock price today belongs to class D, the system will extract all the records of the D days plus the days after them from the history of 1101 TCC as follows:

{DD,DI,DD,DK,DA,DA,DK,DC,DG,DD,DC,DE,DH,DD,DB,DC,DF,DD,D

D,DF,DD,DD,DG,DG,DG,DF,DG,DG,DD,DB,DA,DG,DF,DG,DD,

DC,DD,DD,DC,DG,DC,DD,DG,DF,DD,DE,DG,DD,DC,DE,DG,DH,D

D,DD,DK,DE,DG,DG,DE,DH,DC}

Let’s take the second record DI for example. DI means the condition for the just day is D and that for the second day is I. Now, taking away all the initial D’s, we get:

{DIDKAAKCGDCEHDBCFDDFDDGGGFGGDBAGFGDCDD

CGCDGFDEGDCEGHDDKEGGEHC}?

Then, we can count the numbers of all the K line types. Of all the classes, class D, which appears 17 times, shows up most frequently. It means that the most possible condition for tomorrow is class D if it is class D today. Then, the system will search the whole 310-day K line pattern of 1101 TCC for the conditions of the days next to the DD patterns.

FIGURE 3

CLICK ON ONE OF THE POSSIBLE CONDITIONS FOR THE NEXT DAY, AND THEN THE SYSTEM WILL ANALYZE THE POSSIBLE CONDITIONS FOR THE DAY AFTER IT

FIGURE 4

THE CONDITIONS FOR BOTH THE NEXT DAY AND THE DAY AFTER THAT WITH THE MARKET GRAPH INTERPRETATION PROVIDED BY THE SYSTEM

TABLE 2

THE RESULT OF PREDICTION AND ANALYSIS

Table 2 is the analysis result of stocks number 1101 TCC, 2337 MXIC, 2330 TSMC and 1605 WALSIN.

In Table 2, if the prediction for tomorrow is correct, it gets boldfaced and netted such as C. If the prediction for the day after tomorrow is correct, it gets boldfaced, italic and framed such as C. Therefore, if the prediction result for today according to yesterday and the day before it turns out to be correct, it will be marked like C. Finally, the a correct prediction gets a "[O]" in the "Hit" column, and a wrong prediction gets a "[X]."

For example, the prediction result of 1605 WALSIN on December 12 is as follows. As the condition of the day is class D, the possible condition for the next day is C or D, and the day after that is D or C. On December 13, the condition turns out to be class C, which means our prediction for class C is correct. Thus, it is marked C, and the column of "Hit rate" is filled in with a [O]. The rest of the results are marked the same way, too.

In this example, class C or D is the most probable prediction of all.

CONCLUSIONS

The data mining technique can be applied to the gathering and organizing of data from various databases. As long as the contexts and formats of the data in the databases are well-arranged and well-designed, the data association rules can always be found so as to provide suggestions for policymakers.

The main idea of this paper is to make use of the data mining technique to explore data in databases. The experiment results show that the hit rates are quite high for some stocks. Since the records of the stock prices available on the network nowadays only go back as far as two years, the accuracy rate of our prediction result is restricted due to the limited data sources. In the future, if the records of past stock prices get better reserved, then our system will be able to give predictions of even higher accuracy rates.

The research approach brought up in this paper can be applied not only to the organization of market databases but also to businesses, government institutions, the World Wide Web (WWW) and all kind of environments. As long as the database is well-arranged and properly-built, the data mining technique can always make the user benefit from the high efficiency of the information and services.

REFERENCES

[1] Robert D. Edwards and John Magee, Technical analysis of stock trends, John Magee Inc., 1971.

[2] I- Yuan Lin, Xin- Mao Huang, and Ming- Syan Chen, "Capturing User Access Patterns in the Web for Data Mining," IEEE International Conference, pp. 345- 348, 1999.

[3] Ming- Syan Chen, Jiawei Han, and Philip S. Yu, "Data Mining: An overview from a Database Perspective," IEEE Trans. Knowledge and Data Eng, pp. 866B 883, 1996.

[4] Taiwan Stock Exchange website, http://www.tse.com.tw

[5] Smart Net website, http://www.smartnet.com.tw

[6] M. S. Chen, J. S. Park, and P. S. Yu, "Efficient Data Mining for Path Traversal," IEEE Transaction on Knowledge and Data Engineering, pp. 209- 221, 1998.

[7] R. Agrawal, T. Imielinski, and A. Swami, "Mining Association Rules between Sets of Items in Large Databases," Proc. ACMSIGMOD, pp. 207-216, 1993.

----------------------------------------

Authors

June-Horng Shiesh, National Taichung Institute of Technology, Taiwan
Tung-Shou Chen, National Taichung Institute of Technology, Taiwan
Yi- Chen Liao, National Taichung Institute of Technology, Taiwan
Chi-Te Huang, Providence University, Taiwan,



Volume

AP - Asia Pacific Advances in Consumer Research Volume 5 | 2002



Share Proceeding

Featured papers

See More

Featured

P10. Omission Bias in the Gain vs. Loss Domain

Jen H. Park, Stanford University, USA

Read More

Featured

A10. Opting Opt-in or Out? Effects of Defaults on Perceived Control and Valuation of Personal Data

Iris van Ooijen, University of Twente

Read More

Featured

Linguistic Antecedents of Anthropomorphism

N. Alican Mecit, HEC Paris, France
tina m. lowrey, HEC Paris, France
L. J. Shrum, HEC Paris, France

Read More

Engage with Us

Becoming an Association for Consumer Research member is simple. Membership in ACR is relatively inexpensive, but brings significant benefits to its members.