American Journal of Database Theory and Application

p-ISSN: 2326-0831    e-ISSN: 2326-0858

2012;  1(3): 26-38

doi: 10.5923/j.database.20120103.01

Using Data Mining Technique to Predict Cause of Accident and Accident Prone Locations on Highways

Dipo T. Akomolafe1, Akinbola Olutayo2

1Dept. of Mathematical Sciences,Ondo State University of Science and Technology,Okitipupa, Nigeria

2Dept. of Computer Science, Joseph Ayo Babalola University,Ikeji Arakeji, Osun State

Correspondence to: Dipo T. Akomolafe, Dept. of Mathematical Sciences,Ondo State University of Science and Technology,Okitipupa, Nigeria.

Email:

Copyright © 2012 Scientific & Academic Publishing. All Rights Reserved.

Abstract

Road accident is a special case of trauma that constitutes a major cause of disability, untimely death and loss of loved ones as well as family bread winners. Therefore, predicting the likelihood of road accident on high ways with particular emphasis on Lagos – Ibadan express road, Nigeria in order to prevent accident is very important. Various attempts had been made to identify the cause(s) of accidents on highways using different techniques and system and to reduce accident on the roads but the rate of accident keep on increasing. In this study, the various techniques used to analyse the causes of accidents along this route and the effects of accidents were examined. A technique of using data mining tool to predict the likely occurrence of accident on highways, the likely cause of the accident and accident prone locations was proposed using Lagos –Ibadan highway as a case study. WEKA software was used to analyse accident data gathered along this road. The results showed that causes of accidents, specific time/condition that could trigger accident and accident prone areas could be effectively identified.

Keywords: Data Mining, Decision Tree, Accident, WEKA, Data Modelling, Id3 Algorithm, Id3 Tree, Functional Tree Algorithm

Cite this paper: Dipo T. Akomolafe, Akinbola Olutayo, "Using Data Mining Technique to Predict Cause of Accident and Accident Prone Locations on Highways", American Journal of Database Theory and Application, Vol. 1 No. 3, 2012, pp. 26-38. doi: 10.5923/j.database.20120103.01.

1. Introduction

Road accident is a special case of trauma that constitutes a major cause of disability and untimely death. It has been estimated that over 300,000 persons die and 10 to 15 million persons are injured every year in road accidents throughout the world. Statistics have also shown that mortality in road accidents is very high among young adults that constitute the major part of the work force. In actual fact, accidents kill faster than AIDS and it gives no preparatory time to its victims. In order to combat this problem, various road safety strategies have been proposed and used. These methods mainly involve conscious planning, design and operations on roads. One important feature of this method is the identification and treatment of accident prone locations commonly called black spots; black spots are not the only cause of accidents on the highway. Also various organizations such as Police High Way Patrol, Vehicle Inspection Officer (VIO), Federal Road Safety Commission (FRSC) among others are charged with the responsibility of maintaining safety thereby reducing road accidents. However, lack of good forecasting techniques has been a major hindrance to these organizations in achieving their objectives.
It is against this background that Decision Tree is beingproposed to model data from road accident database to determine causes of accidents and accident prone locations using historical data collected from Ibadan-Lagos express road as reference point.

2. Objective

The primary objective of this research is to use data mining technique; decision tree to predict causes of accident and accident prone locations on highways using data collected on Lagos – Ibadan express way.

3. Methods

3.1. Data Mining

Data Mining is an interactive process of discovering valid and novel, useful and understandable patterns or models in large database (Han, Mannila and Smyth, 2001). Data Mining, according to Han, Mannila and Symth (2001) is a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make a valid prediction. Data mining uses advances in the field of Artificial Intelligence (AI) and Statistical techniques. Therefore, decision tree is being used in this research

3.2. Decision Trees

Decision Trees have emerged as a powerful technique for modelling general input / output relationships. They are tree – shaped structures that represents a series of roles that lead to sets of decisions. They generate rules for the classification of a dataset and a logical model represented as a binary (two – way split) tree that shows how the value of a target variable can be predicted by using the values of a set predictor variables. Decision trees, which are considered in a regression analysis problem, are called regression trees. Thus, the decision tree represents a logic model of regularities of the researched phenomenon.

3.3. Accidents along Lagos - Ibadan Express Way

Lagos to Ibadan Express road is one of the busiest roads in Africa. This is because. Lagos was the capital of Nigeria until the seat of government moved to the Federal Capital Territory Abuja and also the headquarters of many national institutions while Ibadan is said to be the largest city in black Africa. The traffic along this route is very heavy because it is a gateway linkage of the heavy traffic going from the Northern, Eastern and Majority of Western states. Fig 3.1 shows the frequency of accidents between the distances of 1 and 40km from Ibadan to Lagos between January 2002 and December 2003.The statistics shows that having a means of predicting likely location of accident base on some input values is essential to advice on dangerous locations.
Figure 3.1. Graph of Frequency of Accidents against Month
Several works have been carried out by different researchers both on road accident analysis and forecasting, using Decision Tree and Artificial Neural Networks. Martin, Grandal and Pilkey (2000), analysed the relationship between road infrastructure and safety by using a cross-sectional time-series data base collected for all 50 U.S. states over 14 years. The result suggested that as highway facilities are upgraded, there are reduced fatalities. Gelfand (1991) studied the effect of new pavement on traffic safety in Sweden. The result of his study shows that Traffic accidents increased by 12 % after one year of resurfacing on all types of roads. Akomolafe (2004) employed Artificial Neural Network using multilayer perceptron to predict likelihood of accident happening at particular location between the first 40 kilometers along Lagos-Ibadan Express road and discovered that location 2 recorded the highest number of road accident occurrence and that, tyre burst was the major cause of accident along the route. Ossenbruggen (2005) used a logistic regression model to identify statistically significant factors that predict the probabilities of crashes and injury crashes aiming at using these models to perform a risk assessment of a given region. Their study illustrated that village sites are less hazardous than residential and shopping sites. Abdalla et al (1987) studied the relationship between casualty frequencies and the distance of the accidents from the zones of residence. As might have been anticipated, the casualty frequencies were higher nearer to the zones of residence, possibly due to higher exposure. Akomolafe et al (2009) used geo spatial technology to identify various positions along major roads in Nigeria. The study revealed that the casualty rates amongst residents from areas classified as relatively deprived were significantly higher than those from relatively affluent areas.
Table 3.1. Record of Accidents along Lagos Ibadan between year 2002 and 2003
S/NO Month No of Accident
1Jan 2002 6
2Feb 200211
3March 2002 10
4April 2002 18
5May 2002 14
6June 2002 4
7July 2002 6
8August 2002 1
9September 2002 9
10October 2002 6
11Nov. 2002 4
12December 2002 5
13Jan 2002 5
14Feb 20035
15March 20034
16April 20037
17May 20032
18June 2003 1
19July 20034
20August 20035
21September 2003 8
22October 2003 5
23Nov. 2003 5
24December 20036

3.4. Process of Data Mining

The process of data mining consists of three steps which are:
3.4.1. Data Preparation
This includes; Data collection, Data cleaning and Data transformation.
3.4.2. Data Modeling
This research considers the data of accident record between the first 40km from Ibadan to Lagos. The data were organized into a relational database.
The unknown causes in Table 3.2 may include other factors such as Law enforcement agent problems, attitude of other road users, inadequate traffic road signs, traffic congestion and general vehicle conditions
The sample data used covered the period of 24 Months, that is, January 2002 to December 2003 as indicated in Fig. 3.1.
The output variable is the location and the locations can be divided into three distinct regions tagged regions A, B and C, meaning we have three outputs. Where
First location 1 – 10km is Region A or location 1, Above10km – 20km is region B or Location 2 and above 20km is region C or Location 3
The data sample used covered a period of twenty four Months starting from January 2002 to December 2003.The data were collected by Akomolafe (2004) and this is presented in Table.3.3.
3.4.3. Deployment
In this stage, new sets are applied to the model selected in the previous stage to generate predictions or estimates of the expected outcome.
Table 3.2. showing variables given both continuous and categorical values
     
Table 3.3. Sample Data collected from FRSC (Akomolafe O.P 2004)
SNODATE TYPETIME SEASON CAUSE LOCATION REG. NO
16.1.2002221231XG 506 LND
27.1.2002211114XC 720 ACD
311.1.2002111114AM 713 LND
412.1.2002211227XE 905 JJJ
519.1.2002121327AA 559 LAF
630.01.02331212AA 156 NWD
703.02.02221235XF 635 JJJ
805.02.02211210XE 141 AKD
905.02.02231214XE 124 AKD
1006.02.02231231XE 124 AKD
1111.02.0211135AG 276 LAR
1214.02.02111214
1318.02.02121218
1421.02.02211219XD 249 SMK
1521.02.02321219XC 361 KTU
1624.02.02211218XE 716 SMK
1727.02.02231235 XC 307 SGM
1803.03.02211216XE 807 NSR
1905.03.02121210XC 348 AKP
2007.03.0221122OY 2270 JB
2107.03.02111213AP 820 LSD
2207.03.02321218XE 322 APP
2319.03.02221219XC 993 AGL
2419.03.02 3122LA 1804 RF
2530.03.02141214AM 343 FST
2631.03.02121214KC 461 ABA
2731.03.02121214BS 142 KJA
2801.04.02212222AA 807 EGB
2901.04.02112222BX 527 GGE
3001.04.02222218AG 787 GNN
3102.04.0211217AU 725 MAP
3202.04.02222227XG 358 APP
3304.04.02112215CY 65 EKY
3404.04.02122217AJ 21 AGG
3505.04.0212216AW 45 FST
3606.04.02212230XB 855 AKD
3707.04.02122113AL 567 YAB
3809.04.02222212.5XA 787 WWP
3913.04.0221211XB 791 GNN
4013.04.02 212111XA 127 AFN
4113.04.021,212111AH 202 AKN
4222.04.02122115RA 01 KRD
4322.04.021,322111BB 731 KJA
4427.04.02222227AU 739 JJJ
4528.04.02122114AE 316 FST
4603.04.02 1322,112AZ 824 AAA
475.8.2002112220AA 654 GBY
485.8.2002112230XF 65 JJJ
495.10.20022&112135DM 207 AAA
2 BL 86 AAA
505.10.20021121&235BR 608 LSR
515.11.2002312226XB 606 APP
525.13.200221212XA 616 YLW
535.13.2002112126.5BM 566 GGE
545.14.2002232215XC 348 AKD
555.15.2002122219OY 2077 JB
565.15.2002122214AJ 101 NND
575.20.20021 2226AU 682 ABC
585.21.20022 2224XG 719 FST
595.25.2002112212AV 70 LSR
606.2.2002 32112AZ 191 MUS
616.3.2002222216AQ 742 YYY
626.15.2002212212XA 682 YRE
636.16.2002112221AL 885 AKN
646.16.2002212221XE 751 SMK
657.15.2002212312XH 649 GGE
667.20.2002222210XB 286 KNR
678.8.2002322212XE 232 SGM
689.19.2002132222XA 940 KNH
699.20.200221224AX 94 JJJ
709.20.200232227XC 768 BDJ
719.21.2002112129BL 254 SMK
729.21.2002212116AP 647 AKR
739.21.20022 2218XC 253 GGE
749.22.2002 212210LA 979 BG
759.22.2002232216XU 510 GGE
769.27.2002 22212
7710.1.200212216AA 05 MHA
7810.14.2002212213XE 869 MUS
7910.16.2002222215XB 888 AKR
8010.29.2002 2227
8110.29.2002222217XD 168 BDJ
8210.29.200231226AA 342 LES
8311.4.200221115BX 877 KJA
8411.10.2002211212XC 637 RKJ
8511.10.2002221211XC 937 SGM
8611.12.20021 1 12AA 466 KNR
872.12.2004211214XG 182 JJJ
8812.7.200232121XA 425 CRC
8912.10.2002231313XD 695 EKY
9012.11.2002221216XA 350 EDY
9112.12.2002 11214XG 955 KSF
9223.01.2002131116XA 411 EJG
9318.01.03131118AE 015 GBN
9427.01.0322128XD 125 LSR
9529.01.03341212XC 616 KTU
9629.01.032 1214XF 797 AKD
9702.02.03211218CW 293 AAA
9812.02.03121118AV 3 GGE
9912.02.03221218XB 6 WWD
10012.02.03131112HB 40 KJA
10117.02.03231211XB 446 MNY
10205.03.0312126AE 753 KRE
10319.03.03211212XH 382 ABC
10428.03.033 1112AG 145 NRK
10531.03.03231213AA 499 GBY
10605.04.03222311.5XD 432 KSF
10706.04.031112312CE 188 JJJ
10806.04.03212212FA 01 JJ
10914.04.03112 28FV 43 AAA
11024.04.0312227OY 01 SE
11124.04.0332229XB 328 MAG
11230.04.03332116XD 644 NRK
11310.05.03 12 40AA 399 KTU
11416.05.03132220XH 327 ADC
11502.06.0311218XB 144 YRE
11620.07.032122275K 324 LND
11726.07.0312229DG 329 LSR
11828.07.03222213XJ 179 LND
11928.07.03222118XF 114 EPE
12002.08.03112213CB 434 MUS
12102.08.0311218XG 954 FST
12209.08.03112119AG 802 SGB
12316.08.0322222XF 450 SMK
12431.08.03112114OY 1281 TD
12501.09.0332218XA 362 KJA
12608.09.031 2 18XH 723 JJJ
12714.09.03 2 19
12816.09.0312226AA 112 YRE
12921.09.03212231XB 766 AGG
13024.09.03222118XC 115 EDE
13128.09.03212214XN 739 AAA
13228.09.03232213XD 642 NRK
13306.10.03122211DG 548 LND
13414.10.03222212XA 730 FUF
13518.10.03232228XA 286 GBH
13619.10.03 12222AA 188 AAA
13720.10.03222227LG 016 KNE
13801.11.0331119XA 847 KEH
13902.11.03221218XC 575 GGE
14025.11.03111324BO 984 APP
14127.11.03111218AJ 06 SGB
14227.11.03221213XB 369 EKY
14306.12.03211213AP 938 KJA
14409.12.03331113BM 130 MAP
14513.12.0321117XA 610 ARP
14622.12.03111111BL 500 GGE
14724.12.03 11312JB 356 KJA
14824.12.03221213 XG 562 AKD

4. Results

4.1. Analysis

The major step required to obtain result of the research was carried out by analysing the data using WEKA. WEKA is a collection of machine learning algorithms and data processing tools. It contains various tools for data pre-processing, classification, regression, clustering, association rules and visualization. There are many learning algorithms implemented in WEKA including Bayesian classifier, Trees, Rules, Functions, Lazy classifiers and miscellaneous classifiers. The algorithms can be applied directly to a data set. WEKA is also data mining software developed in JAVA it has a GUI chooser from which any one of the four major WEKA applications can be selected. For the purpose of this study, the Explorer application was used.
The Explorer window of WEKA has six tabs. The first tab is pre- process that enables the formatted data to be loaded into WEKA environment. Once the data has been loaded, the preprocess panel shows a variety of information as shown in figure 4.3 below.
Figure 4.1. WEKA GUI chooser
Figure 4.2. WEKA Explorer

4.1. Weka Classifiers

There are several classifiers available in WEKA but Function Tree and Id3 were used in this study in case of Decision Tree. Prism Rule based learner was generated using WEKA. Attribute importance analysis was carried out to rank the attribute by significance using information gain. Finally, correlation based feature subset selection (cfs) and consistency subset selection (COE) filter algorithm were used to rank and select the attribute that are most useful. The F- measure and the AUC which are well known measures of probability tree learning was used as evaluation metrics for model generated by WEKA classifiers.
Several numbers of setups of decision tree algorithms have been experimented and the best result obtained is reported as the data set. Each class was trained with entropy of fit measure, the prior class probabilities parameter was set to equal, the stopping option for pruning was misclassification error, the minimum n per node was set to 5, the fraction of objects was 0.05, the maximum number of nodes was 100, surrogates was 5, 10 fold cross-validation was used, and generated comprehensive results.
The best decision tree result was obtained with Id3 with 115 correctly classified instances and 33 incorrectly classified instances which represents 77.70% and 22.29% respectively.
Mean absolute error was 0.1835 and Root mean squared error was 0.3029.
The tree and rules generated with Id3 algorithm are given thus:

4.2. Id3 Tree

TYREBURST = TRUE
| SEASON = WET
| | TYPE = HAEVY VEHICLE
| | | TIME = EVENING: LOCATION2
| | | TIME = AFTERNOON: LOCATION2
| | | TIME = MORNING: LOCATION2
| | | TIME = NIGHT: null
| | TYPE = SMALL CAR: LOCATION2
| | TYPE = MOTOCYCLE: null
| SEASON = DRY
| | TIME = EVENING
| | | TYPE = HAEVY VEHICLE: LOCATION2
| | | TYPE = SMALL CAR: LOCATION3
| | | TYPE = MOTOCYCLE: null
| | TIME = AFTERNOON
| | | TYPE = HAEVY VEHICLE: LOCATION2
| | | TYPE = SMALL CAR: LOCATION2
| | | TYPE = MOTOCYCLE: null
| | TIME = MORNING
| | | TYPE = HAEVY VEHICLE: LOCATION3
| | | TYPE = SMALL CAR: LOCATION3
| | | TYPE = MOTOCYCLE: null
| | TIME = NIGHT: null
TYREBURST = FALSE
| TIME = EVENING
| | OVERSPEEDING = FALSE: LOCATION2
| | OVERSPEEDING = TRUE
| | | TYPE = HAEVY VEHICLE: LOCATION2
| | | TYPE = SMALL CAR: LOCATION2
| | | TYPE = MOTOCYCLE: null
| TIME = AFTERNOON
| | LOSS-OF-CONTROL = FALSE
| | | OVERSPEEDING = FALSE
| | | | BRAKE-FAILURE = FALSE
| | | | | TYPE = HAEVY VEHICLE
| | | | | | WRONG-OVERTAKING = FALSE
| | | | | | | BROKEN-SHAFT = FALSE: LOCATION1
| | | | | | | BROKEN-SHAFT = TRUE: LOCATION3
| | | | | | WRONG-OVERTAKING = TRUE: LOCATION2
| | | | | TYPE = SMALL CAR
| | | | | | SEASON = WET: LOCATION3
| | | | | | SEASON = DRY
| | | | | | | CARELESSDRIVING = FALSE: LOCATION3
| | | | | | | CARELESSDRIVING = TRUE: LOCATION2
| | | | | TYPE = MOTOCYCLE: LOCATION3
| | | | BRAKE-FAILURE = TRUE
| | | | | TYPE = HAEVY VEHICLE: LOCATION1
| | | | | TYPE = SMALL CAR: LOCATION1
| | | | | TYPE = MOTOCYCLE: LOCATION2
| | | OVERSPEEDING = TRUE
| | | | TYPE = HAEVY VEHICLE: LOCATION2
| | | | TYPE = SMALL CAR
| | | | | SEASON = WET: LOCATION2
| | | | | SEASON = DRY: LOCATION2
| | | | TYPE = MOTOCYCLE: null
| | LOSS-OF-CONTROL = TRUE
| | | TYPE = HAEVY VEHICLE: LOCATION2
| | | TYPE = SMALL CAR
| | | | SEASON = WET: LOCATION2
| | | | SEASON = DRY: LOCATION1
| | | TYPE = MOTOCYCLE: LOCATION1
| TIME = MORNING
| | SEASON = WET
| | | OVERSPEEDING = FALSE
| | | | TYPE = HAEVY VEHICLE
| | | | | WRONG-OVERTAKING = FALSE
| | | | | | CARELESSDRIVING = FALSE: LOCATION1
| | | | | | CARELESSDRIVING = TRUE: LOCATION2
| | | | | WRONG-OVERTAKING = TRUE: LOCATION1
| | | | TYPE = SMALL CAR
| | | | | CARELESSDRIVING = FALSE
| | | | | | LOSS-OF-CONTROL = FALSE: LOCATION3
| | | | | | LOSS-OF-CONTROL = TRUE: LOCATION2
| | | | | CARELESSDRIVING = TRUE: LOCATION1
| | | | TYPE = MOTOCYCLE: LOCATION2
| | | OVERSPEEDING = TRUE: LOCATION2
| | SEASON = DRY
| | | BROKEN-SHAFT = FALSE
| | | | TYPE = HAEVY VEHICLE
| | | | | CARELESSDRIVING = FALSE
| | | | | | LOSS-OF-CONTROL = FALSE
| | | | | | | BROKEN-SPRING = FALSE
| | | | | | | | OVERSPEEDING = FALSE: LOCATION2
| | | | | | | | OVERSPEEDING = TRUE: LOCATION2
| | | | | | | BROKEN-SPRING = TRUE: LOCATION2
| | | | | | LOSS-OF-CONTROL = TRUE: LOCATION2
| | | | | CARELESSDRIVING = TRUE: LOCATION3
| | | | TYPE = SMALL CAR
| | | | | CARELESSDRIVING = FALSE
| | | | | | OVERSPEEDING = FALSE
| | | | | | | UNKNOWN-CAUSES = FALSE
| | | | | | | | ROBBERY-ATTACK = FALSE
| | | | | | | | | WRONG-OVERTAKING = FALSE
| | | | | | | | | | LOSS-OF-CONTROL = FALSE
| | | | | | | | | | | TREE-OBSTRUCTION = FALSE
| | | | | | | | | | | | BRAKE-FAILURE = FALSE: LOCATION3
| | | | | | | | | | | | BRAKE-FAILURE = TRUE: LOCATION2
| | | | | | | | | | | TREE-OBSTRUCTION = TRUE: LOCATION2
| | | | | | | | | | LOSS-OF-CONTROL = TRUE: LOCATION2
| | | | | | | | | WRONG-OVERTAKING = TRUE: LOCATION2
| | | | | | | | ROBBERY-ATTACK = TRUE: LOCATION3
| | | | | | | UNKNOWN-CAUSES = TRUE: LOCATION3
| | | | | | OVERSPEEDING = TRUE: LOCATION3
| | | | | CARELESSDRIVING = TRUE: LOCATION1
| | | | TYPE = MOTOCYCLE: null
| | | BROKEN-SHAFT = TRUE: LOCATION3
| TIME = NIGHT: LOCATION2Prism rules
----------
Rule 1 If BROKEN-SHAFT = TRUE then LOCATION3
Rule 2 If ROBBERY-ATTACK = TRUE
and TYPE = SMALL CAR then LOCATION3
Rule 3 If TREE-OBSTRUCTION = TRUE
and TIME = EVENING then LOCATION3
Rule 4 If TYREBURST = TRUE
and TIME = MORNING
and TYPE = SMALL CAR
and SEASON = DRY
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION3
Rule 5 If TYPE = MOTOCYCLE
and CARELESSDRIVING = TRUE then LOCATION3
Rule 6 If ROAD-PROBLEM = TRUE
and TYPE = SMALL CAR
and TIME = AFTERNOON
and SEASON = DRY
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and TYREBURST = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION3
Rule 7 If TYREBURST = TRUE
and SEASON = DRY
and TIME = MORNING
and TYPE = HAEVY VEHICLE
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION3
Rule 8 If UNKNOWN-CAUSES = TRUE
and TYPE = SMALL CAR
and TIME = MORNING
and SEASON = DRY then LOCATION3
Rule 9 If TYREBURST = TRUE
and TYPE = HAEVY VEHICLE
and TIME = AFTERNOON
and SEASON = DRY
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION3
Rule 10 If TIME = MORNING
and OVERSPEEDING = TRUE
and TYPE = SMALL CAR
and SEASON = DRY
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and TYREBURST = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION3
Rule 11 If TYREBURST = TRUE
and TIME = EVENING
and TYPE = SMALL CAR then LOCATION3
Rule 12 If TYREBURST = TRUE
and TYPE = HAEVY VEHICLE
and TIME = AFTERNOON
and SEASON = WET
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION3
Rule 13 If TIME = MORNING
and LOSS-OF-CONTROL = TRUE
and TYPE = HAEVY VEHICLE
and SEASON = DRY
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and TYREBURST = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION3
Rule 14 If UNKNOWN-CAUSES = TRUE
and TYPE = SMALL CAR
and TIME = MORNING
and SEASON = WET
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and TYREBURST = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION3
Rule 15 If TYREBURST = TRUE
and TYPE = HAEVY VEHICLE
and SEASON = WET
and TIME = EVENING
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION3
Rule 16 If TIME = MORNING
and TYREBURST = TRUE
and TYPE = HAEVY VEHICLE
and SEASON = WET
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION3
Rule 17 If CARELESSDRIVING = TRUE
and TYPE = HAEVY VEHICLE
and SEASON = DRY then LOCATION3
Rule 18 If TIME = MORNING
and TYPE = SMALL CAR
and SEASON = DRY
and CARELESSDRIVING = FALSE
and WRONG-OVERTAKING = FALSE
and LOSS-OF-CONTROL = FALSE
and TREE-OBSTRUCTION = FALSE
and BRAKE-FAILURE = FALSE then LOCATION3
Rule 19 If TIME = NIGHT then LOCATION2
Rule 20 If WRONG-OVERTAKING = TRUE
and TYPE = SMALL CAR then LOCATION2
Rule 21 If TIME = EVENING
and CARELESSDRIVING = TRUE then LOCATION2
Rule 22 If TIME = EVENING
and UNKNOWN-CAUSES = TRUE then LOCATION2
Rule 23 If TIME = EVENING
and LOSS-OF-CONTROL = TRUE then LOCATION2
Rule 24 If TIME = EVENING
and ROBBERY-ATTACK = TRUE then LOCATION2
Rule 25 If TIME = EVENING
and TYPE = HAEVY VEHICLE
and SEASON = DRY then LOCATION2
Rule 26 If SEASON = WET
and TYPE = MOTOCYCLE then LOCATION2
Rule 27 If SEASON = WET
and OVERSPEEDING = TRUE
and TIME = MORNING then LOCATION2
Rule 28 If TYREBURST = TRUE
and SEASON = WET
and TYPE = SMALL CAR then LOCATION2
Rule 29 If TYREBURST = TRUE
and SEASON = WET
and TIME = MORNING
and TYPE = HAEVY VEHICLE
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION2
Rule 30 If TYPE = HAEVY VEHICLE
and ROBBERY-ATTACK = TRUE then LOCATION2
Rule 31 If TYPE = HAEVY VEHICLE
and OVERSPEEDING = TRUE
and TIME = AFTERNOON then LOCATION2
Rule 32 If TYREBURST = TRUE
and SEASON = WET
and TIME = EVENING
and TYPE = HAEVY VEHICLE
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION2
Rule 33 If TYREBURST = TRUE
and SEASON = WET
and TYPE = HAEVY VEHICLE
and TIME = AFTERNOON
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION2
Rule 34 If TYPE = HAEVY VEHICLE
and TIME = EVENING then LOCATION2
Rule 35 If TYPE = HAEVY VEHICLE
and OVERSPEEDING = TRUE
and TIME = MORNING
and SEASON = DRY
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and TYREBURST = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION2
Rule 36 If TYREBURST = TRUE
and TIME = AFTERNOON
and TYPE = SMALL CAR
and SEASON = DRY
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION2
Rule 37 If BRAKE-FAILURE = TRUE
and TYPE = MOTOCYCLE then LOCATION2
Rule 38 If WRONG-OVERTAKING = TRUE
and TIME = AFTERNOON then LOCATION2
Rule 39 If TREE-OBSTRUCTION = TRUE
and TIME = MORNING then LOCATION2
Rule 40 If BROKEN-SPRING = TRUE
and TYPE = HAEVY VEHICLE
and TIME = MORNING
and SEASON = DRY
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and TYREBURST = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION2
Rule 41 If TYPE = HAEVY VEHICLE
and TYREBURST = TRUE
and TIME = AFTERNOON
and SEASON = DRY
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION2
Rule 42 If LOSS-OF-CONTROL = TRUE
and TIME = MORNING
and TYPE = SMALL CAR then LOCATION2
Rule 43 If UNKNOWN-CAUSES = TRUE
and TYPE = HAEVY VEHICLE
and SEASON = DRY then LOCATION2
Rule 44 If OVERSPEEDING = TRUE
and TIME = AFTERNOON
and SEASON = WET then LOCATION2
Rule 45 If TYPE = HAEVY VEHICLE
and LOSS-OF-CONTROL = TRUE
and TIME = MORNING
and SEASON = DRY
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and TYREBURST = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION2
Rule 46 If SEASON = WET
and LOSS-OF-CONTROL = TRUE
and TIME = AFTERNOON
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and TYREBURST = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE
and TYPE = HAEVY VEHICLE then LOCATION2
Rule 47 If CARELESSDRIVING = TRUE
and TIME = AFTERNOON
and TYPE = SMALL CAR then LOCATION2
Rule 48 If OVERSPEEDING = TRUE
and TIME = AFTERNOON
and TYPE = SMALL CAR
and SEASON = DRY
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and TYREBURST = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION2
Rule 49 If SEASON = WET
and TIME = EVENING
and TYPE = SMALL CAR
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and LOSS-OF-CONTROL = FALSE
and TYREBURST = FALSE
and OVERSPEEDING = TRUE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION2
Rule 50 If TYPE = HEAVY VEHICLE
and LOSS-OF-CONTROL = TRUE
and TIME = AFTERNOON
and SEASON = DRY
and WRONG-OVERTAKING = FALSE
and CARELESSDRIVING = FALSE
and TYREBURST = FALSE
and OVERSPEEDING = FALSE
and TREE-OBSTRUCTION = FALSE
and PUSHED-BY-A-CAR = FALSE
and BROKEN-SHAFT = FALSE
and BROKEN-SPRING = FALSE
and BRAKE-FAILURE = FALSE
and ROAD-PROBLEM = FALSE
and UNKNOWN-CAUSES = FALSE
and ROBBERY-ATTACK = FALSE then LOCATION2

5. Discussion

There are 50 rules generated from this tree. Rule 1- 18 indicate the occurrence of accident in Location 3 and rule 19-50 also shows the occurrence of accident in location 2.This indicate that, location 2 has the highest number of road accident occurrence with Heavy-vehicle in the afternoon and during the dry season.
Rule 41 is the best one that can be used for prediction. The rule says that, Tyre bust is the cause of road accident with heavy vehicle within location 2 in the day time and during the dry season.Decision Tree Performance Analysis on Id3
Table 5.1. Detailed Accuracy By class
Class TP rate FT rate Precision Recall F- measure Roc Area
Location (3)0.6880.0690.7330.6880.710.942
Location (2)0.8970.3610.780.8970.8340.888
Location (1)0.5170.0250.8330.5170.6380.95
Weighted Avg.0.7770.2320.780.7770.7690.912
Table 5.2. Confusion matrix Predicted category
Actual category Location (3)Location (2)Location (1)
Location (3)22100
Location (2)6783
Location (1)21215
Decision Tree performance Analysis on Function Tree (FT)
Table 5.3. Detailed Accuracy by Class
Class TP rate FT rate Precision Recall F- measure Roc Area
Location (3)0.6250.0860.6670.6250.6450.869
Location (2)0.770.3610.7530.770.7610.736
Location (1)0.5860.1010.5860.5860.5860.832
Weighted Avg.0.7030.250.7020.7030.7020.783
Table 5.4. Confusion Matrix Predicted category
Actual category Location (3)Location (2)Location (1)
Location (3)20120
Location (2)86712
Location (1)21017

6. Conclusions

Using WEKA software to analyze accident data collected on Lagos-Ibadan road, it was found that decision tree can accurately predict the cause(s) of accident and accident prone locations along the road and other roads if relevant data are gathered and analyzed as in this case.
In Decision Tree Performance analysis, the, dataset were experimented with two algorithms; Id3 and FT (function tree) For Id3 algorithm, there were 115 correctly classified instances and 33 incorrectly classified instances which represent 77.70% and 22.29% respectively. Mean absolute error was 0.1835 and Root mean squared error was 0.3029.
Also for functional tree algorithm (FT), total number of tree size was 5 with 105 correctly classified instances representing 70.27% and 44 incorrectly classified instances representing 29.73%.
From the detailed accuracy by class and confusion matrix, Id3 attained accuracy rate of 0.777 and FT attained accuracy rate of 0.703.

References

[1]  Akomolafe et al (2009) “Enhancing road monitoring and safety through the use of geo spatial technology” International Journal of Physical Sciences Vol. 4 (5), pp. 343-348
[2]  Akomolafe, O.P. (2004); predicting possibilities of Road Accidents occurring, using Neural Network. M. Sc. Thesis, Department of Computer Science, University of Ibadan
[3]  Abdalla, I.M., Robert, R., Derek, B. and McGuicagan, D.R.D.,(1987) An investigation into the relationships between area social characteristics and road accident casualties. Accid. Anal prev. 29 5, pp. 583-593, 1997
[4]  Gelfand, S.G., Ravishanker, C.S., and Delp, E.J.(1991) An iterative Growing and Pruning Algorithm for Classification Tree Design, PAMI(13), No. 2, February 1991, pp. 163-174
[5]  Han J. and Kamber M. (2001) Data mining Concepts and Techniques Morgan Kaufmam, Academic Press
[6]  Han J. and Kamber M. (2001) Data mining Concepts and Techniques Morgan Kaufmam, Academic Press
[7]  Hand, D., Mannila, H., & Smyth, P., (2001) Principles of data Mining. The MIT Press, 2001
[8]  Kim, K., Nitz, L., Richardson, J., & Li, L., (1995) Personal and Behavioral Predictors of Automobile Crash and Injury Severity. Accident Analysis and Prevention, Vol. 27, No. 4, 1995, pp. 469-481
[9]  Martin, P. G., Crandall, J. R., & Pilkey, W. D.,(2000) Injury Trends of Passenger Car Drivers in the USA Accident Analysis and Prevention, Vol. 32, 2000, pp. 541-557
[10]  Ossenbruggen, P.J., pendharkar, J. and Ivan, J., (2001) Roadway safety in rural and small urbanized areas. Accid. Anal. Prev. 334, pp. 485-498, 2001