0% found this document useful (0 votes)
87 views7 pages

Society of Information Technology Students Journal

This study aimed to develop an accurate forecasting model to predict the academic performance and probability of Bachelor of Science in Computer Science (BSCS) students graduating within four years using data mining techniques. Descriptive research methodology was used to analyze student data variables such as grades in mathematics, science, major subjects and general education subjects. Correlation and multiple linear regression analysis found the model to have high accuracy, explaining 79% of the variability in graduation probability. The proposed model could help identify students needing additional academic support to improve performance and graduation rates.

Uploaded by

Domingo Ramil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views7 pages

Society of Information Technology Students Journal

This study aimed to develop an accurate forecasting model to predict the academic performance and probability of Bachelor of Science in Computer Science (BSCS) students graduating within four years using data mining techniques. Descriptive research methodology was used to analyze student data variables such as grades in mathematics, science, major subjects and general education subjects. Correlation and multiple linear regression analysis found the model to have high accuracy, explaining 79% of the variability in graduation probability. The proposed model could help identify students needing additional academic support to improve performance and graduation rates.

Uploaded by

Domingo Ramil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 7

Society of Information Technology Students Journal 1

PROPOSED FORECASTING MODEL FOR THE STUDENTS ACADEMIC PERFORMANCE OF BSCS STUDENTS IN NEW ERA UNIVERSITY
Teddy Eddie Q. Disp !". Libis Dike 1,Brgy. Balite St., Montalban, Rodriguez Rizal [email protected] Re#ie$ %e&&e'( P. T)*+& #4 Manansala St., Krus na Ligas, Diliman, Quezon ity [email protected] ABSTRACT !"e Data mining tool is a##e$ted as a de#ision making tool %"i#" is able to &a#ilitate better resour#e utilization in terms o& students' $er&orman#e. (t is essential &or de#ision)makers to obtain early &eedba#k on a#ademi# $er&orman#e and t"e e&&e#ti*eness o& di&&erent learning strategies. (n t"is $a$er t"e data &rom om$uter S#ien#e student "as been taken and *arious data mining met"ods "a*e been $er&ormed to im$ro*e students' a#ademi# $er&orman#e and to in#rease t"e de#reasing $o$ulation o& om$uter S#ien#e students' &rom &irst year to &ourt" year. Des#ri$ti*e met"od %as used to analyze t"e data and &ore#ast Ba#"elor o& S#ien#e in om$uter S#ien#e %ill &inis" #ourse &our years s$an and graduate on time. !o ensure im$artiality o& data t"e resear#"ers used t"e elements in t"e $o$ulation as its sam$le making in more in#lusi*e and re$resented so t"at t"e study %ill "a*e su&&i#ient and ade+uate data &or greater statisti#al e&&i#ien#y. !"e aim o& t"is study is to a$$ly di&&erent data mining te#"ni+ues to analyze t"e best model t"at %ill &it in &ore#asting students' a#ademi# $er&orman#e. !"e result o& study using t%o met"ods o& de#ision tree is to re$resent rule t"at is easy to inter$ret and by t"e used o& t"is met"od (D, algorit"m gi*es -..4./ a##urate results. %ey, "ds- Data Mining, lassi&i#ation, 0ore#asting, De#ision tree, Regression, 1er&orman#e INTRODUCTION !"e ability to $redi#t a students' a#ademi# $er&orman#e is *ery im$ortant in edu#ational en*ironments. 1redi#tion models t"at in#lude all $ersonal, so#ial, $sy#"ologi#al and ot"er en*ironment *ariables are ne#essitated &or t"e e&&e#ti*e $redi#tion in t"e $er&orman#e o& t"e students. !"e $redi#tion o& student $er&orman#e %it" "ig" a##ura#y is bene&i#ial to identi&y %"o among t"e students need a s$e#ial attention in t"eir studies. (t is re+uired t"at t"e identi&ied students be assisted more by t"e tea#"er so t"at t"eir $er&orman#e %ill im$ro*e in t"e &uture 213. Data mining e4tra#ts interesting non)tri*ial, im$li#it, $re*iously unkno%n and $otentially use&ul in&ormation or $atterns &rom data. (t #an be a$$lied to a number o& di&&erent a$$li#ations, su#" as data summarization, learning #lassi&i#ation rules, &inding asso#iations, analyzing #"anges and dete#ting anomalies 253. Data mining is a data analysis met"odology used to identi&y "idden kno%ledge o& a large data in databases and it "as been su##ess&ully used in di&&erent areas in#luding t"e edu#ational en*ironment. Data mining met"odology is used to study students' $er&orman#e and $ro*ide many tasks t"at #ould be used in $redi#ting and &ore#asting a#ademi# $er&orman#e. !"e reasons o& good or bad $er&orman#es o& t"e students s"ould be one o& t"e main interests o& tea#"ers. !"e tea#"ers #an $lan and #ustomize t"eir tea#"ing $rogram, based on t"e &eedba#k o& t"e students 2,3. Data mining is one o& t"e $o%er&ul analyti#al tool a$$roa#"es, %"i#" #an $ro*ide an e&&e#ti*e assistan#e in re*ealing #om$le4 relations"i$s be"ind t"e students' grades and $er&orman#es 243.

Society of Information Technology Students Journal 5


METHODOLOGY ../ DESCRIPTIVE RESEARCH !"is study des#ribed t"e $"enomena and %as analyzed in t"e dis#i$line o& +uantitati*ely t"e main &eatures o& a #olle#tion o& in&ormation. Des#ri$ti*e study is one in %"i#" in&ormation is #olle#ted %it"out #"anging t"e en*ironment and #an in*ol*e a one)time intera#tion %it" t"e grou$s. orrelational resear#" determines t"e relations"i$ bet%een t%o or more *ariables. !"e data is #olle#ted &rom *arious *ariables and #orrelational statisti#al te#"ni+ues are t"en used 263. !"e resear#"ers #onsidered t"e elements in t"e $o$ulation as its sam$le making in more in#lusi*e and re$resented so t"at t"e study %ill "a*e su&&i#ient and ade+uate data &or greater statisti#al e&&i#ien#y. 7lso t"e resear#"ers used di&&erent statisti#al tools to e*aluate t"e #riteria o& t"e &ore#asting Model as %ell su#" as 1er#entage, Mean, Standard De*iation, 1er#entage 8rror, !)test, M718 9mean absolute $er#entage error: and Multi$le Linear Regression. Statisti#al so&t%are $a#kage su#" as Ra$idMiner, S1SS and ;8K7 used to $ro#ess t"e data &or &aster and greater reliability o& t"e results. RESEARCH FRAMEWOR%

Fi*)"e /- F"+0e, "1 2 " A3+de0i3 Pe"2 "0+&3e !"e data &rom t"e student or a$$li#ant %ill store into database. !"e system %ill get t"e data &rom t"e database and &lat &iles to #ombine t"e $ossible data needed in order to get %"at indi#ator or $redi#tor %ill used. !"e large data %ill &iltered using #leanse and trans&orm to utilize t"e $redi#tors to kno% t"e in$ut *alue to #reate &ore#asting model to $redi#t t"e $robability o& t"e students to &inis" t"e Ba#"elor o& S#ien#e in om$uter S#ien#e #ourse in &our years in time and %"o among t"e student are not. !"e de#ision *ariable ser*e as t"e inde$endent *ariable in t"is study and t"e $robability o& graduating %ill be t"e de$endent *ariable. !"e $attern re#ognition $ro*ides t"e reasonable ans%er &or all $ossible in$uts and t"e de#ision makers in*ol*ed on %"at are t"e results in *isualization and *alidation &or t"e $robability o& t"e graduating students. 7s a %"ole t"e de#ision makers "a*e an in&luen#e to de#ide t"ings and #an iterate t"e $ro#ess o& t"e $ro$osed study to make t"e model more e&&i#ient and a##urate.

Society of Information Technology Students Journal ,


E4PECTED OUTPUT !"is in#ludes analyzes, inter$retation and im$li#ations o& t"e &indings &rom t"e data gat"ered by t"e resear#"ers and to look &or%ard to t"e $robable o##urren#e or a$$earan#e %"i#" a#ti*ate and modi&y a $ro#ess. (t also dis#usses t"e ty$es o& testing $er&ormed on t"e &ore#asting model in t"is study. !"e data t"at t"e resear#"ers used in t"e study %ere tabulated and $la#ed into t"e data &ile using statisti#al so&t%are $a#kages. 5./ PREDICTORS IN FORECASTING STUDENTS ACADEMIC PERFORMANCE !"e *ariables used in t"is study %ere di*ided into t%o ty$es o& inde$endent *ariable and de$endent *ariable. 7n inde$endent *ariable is also kno%n as a $redi#tor *ariable, it re$resented t"e in$uts or #auses to see i& t"ey %ere t"e #ause %"ile de$endent *ariable re$resented t"e out$ut or e&&e#t to see i& it is e&&e#ti*e. !"e resear#"ers "ad an internal *ariable %"i#" %as t"e $ro&ile o& t"e res$ondents in#luding student name, student number, sub<e#ts=sub<e#t #odes and grades. !"ese *ariables #onsidered as t"e $redi#tors or t"e inde$endent *ariable &or BS S students %"o #an &inis" #ourse &our years in time %"ile t"e graduates %ere t"e de$endent *ariable or t"e out$ut used in t"is study.

!"e resear#"ers s"o%ed t"e $redi#tors to be #onsidered %"i#" %ere t"e sub<e#t #ode &rom mat"emati#s and s#ien#e sub<e#ts, ma<or sub<e#ts, and general edu#ation sub<e#ts to easily *isualize sub<e#ts in #urri#ulum &rom t"e sub<e#ts in &irst year to &ourt" year in om$uter S#ien#e su#" as S>!8 ?, @S!11, 8@AL>1, S>5,1 A;7, M7!>1B5, S>445, S>1,5 A;7, S>545, 18>,, @S!15, 8@AL>., 8@AL>5, M7!>1B1, S>,44, 1?(LC>1, S>,,6 A;7, 18>5, S>4,,, S>4,4, M7!>,41, S>,,5, S>545 A;7, 18>4, 0(L>57, 1CL>S (>5, S>145 A;7, 1?D>5 A;7, S>,41 A;7, S>141 A;7, 8@AL>4, E7LF8S, S>,41 A;7, S>541 A;7, 1?D>1 A;7, S>5,, A;7, 0(L>1, S>,,1 A;7, L(!>1, S>,,,, S>4,5, S>5,5 A;7 and S>,45 A;7. !"ese *ariables #an be #onsidered to "a*e an in&luen#e on t"e $er&orman#e o& students 2G3. 5.. CORRELATIONS OF THE PREDICTORS TO THE ACADEMIC PERFORMANCE OF BSCS STUDENTS orrelation des#ribed t"e degree o& #orres$onden#e bet%een t%o or t"ree *ariables. !"is ty$e o& Bi*ariate #orrelation test re+uired t"at t"e *ariables bot" "a*e a s#ale le*el o& measurement order &or t"e *alues and t"e distan#e in bet%een t"e *alues #an be determined 2B3. !"e resear#"ers sim$li&ied t"e $redi#tor *ariables into ,

Society of Information Technology Students Journal 4


#ategories t"ey areH Mat"emati#s and S#ien#e 9Mat I S#i.: sub<e#ts, Ma<or sub<e#ts and Aeneral 8du#ation sub<e#ts 9Aen8d:. %"en any one o& t"e inde$endent *ariables is *aried and to model t"e relations"i$ o& bet%een s#alars.

Fi*)"e .- M de$ S)00+"y i& M)$'ip$e Li&e+" Re*"essi & R means is a #om$anion to a$$ly regression and its automati#ally $ro#ess t"e log base 5 o& in#ome in t"e e+uation %"i#" is t"e Multi$le Linear Regression model. R s+uare measures t"e relations"i$ bet%een a $ort&olio and its ben#"mark. (t #an be measure "o% #lose t"e data are to t"e &itted regression line. !"e resear#"ers test #oming &rom t"e "istori#al data o& t"e res$ondents, t"e *alue o& R is e+ual to .B.K and R s+uare is e+ual to .GK-, it means t"at R indi#ates t"e model e4$lains all t"e *ariability o& t"e res$onse data around its mean. !"e result *alue &rom t"e model summary in R L .B.K, R s+uare L .KGK- and ad<usted R s+uare L .41- is better, be#ause in general t"e "ig"er t"e R)s+uared t"e better model &its in t"e data. (& t"e results o& R s+uare indi#ate K/ meaning t"e model e4$lains none o& t"e *ariability o& t"e res$onse data around its mean. !"e standard error o& t"e 8stimate is #losely related to t"e +uantity o& standard de*iation. Standard error o& t"e 8stimate is e+ual to .KG,/ it means &rom 1KK/ a##ura#y o& t"e model t"e test result is almost G/ e+ui*alent o& -4/ to 1KK/. G/ is not t"at bad using standard error be#ause t"e true *alue o& t"e standard de*iation is usually unkno%n. (n su#" #ases it is im$ortant to be #lear about %"at "as been done and to attem$t to take $ro$er a##ount o& t"e &a#t t"at t"e standard error is only an estimate. !"e resear#"ers test t"e true *alue or t"e a##ura#y o& t"e Multi$le Linear Regression using M718 9mean absolute $er#entage error:. @ormal 1robability $lot #om$ares t"e distribution o& t"e residuals to a normal distribution and assessing %"et"er or not a data set is a$$ro4imately normal distributed. !"e data are $lotted against t"eoreti#al normal distribution in a

!"e 1earsonJs orrelation bet%een *ariables is a measure o& "o% %ell t"ey are related. !"e most #ommon measure o& #orrelation in stats is t"e 1earson orrelation 9te#"ni#ally #alled t"e 1earson 1rodu#t Moment orrelation or 11M :. (t s"o%s t"e linear relations"i$ bet%een t%o sets o& data. !"ere is strong relations"i$ bet%een t"e *ariables i& t"e $)*alue is #lose to 1, it means t"at #"anges in one *ariable are strongly #orrelated %it" t"e #"anges in t"e se#ond *ariable. !"e Sig. 95) tailed: *alue tells i& t"ere is a statisti#ally signi&i#an#e #orrelations bet%een your *ariables. (& t"e Sig. 95) tailed: *alue is less t"an to .K1 it #on#lude t"at t"ere is a signi&i#an#e #orrelation bet%een your *ariables. (n t"is #ase, $)*alue &or Ma<or sub<e#ts is e+ual to . G6K, Mat I S#i. sub<e#ts is e+ual to .44- and Aen8d s"o%ed a number o& .66- %"i#" means t"e relations"i$ bet%een t"e Ma<or and Aen8d sub<e#ts are more moderate asso#iation. !"e relations"i$ o& Mat I S#i. sub<e#ts is %eak #orrelated %"ile t"e Sig. 95)tailed: *alue &or Ma<or sub<e#ts, Mat I S#i. sub<e#ts and Aen8d sub<e#ts is .KKK it means t"at t"ere is a signi&i#an#e #orrelations bet%een Ma<or, Mat I S#i. and Aen8d sub<e#ts. 5.6 DATA MINING TECHNIQUES AND ALGORITHMS 5.6./ REGRESSION Regression analysis is a statisti#al te#"ni+ue &or studying linear relations"i$s among *ariables and to $redi#t a #ontinuous de$endent *ariable &rom number o& inde$endent *ariables and t"e a#t or an instan#e o& regressing. !"e resear#"ers used t"e regression analysis to "el$ understand "o% t"e ty$i#al *alue o& t"e de$endent *ariable #"anges

Society of Information Technology Students Journal 6


%ay t"at t"e $oints s"ould &orm an a$$ro4imate straig"t line. !"e diagonal line re$resents t"e normal distribution. !"e #loser t"e obser*ed #umulati*e $robabilities o& t"e residuals are to t"is line, t"e #loser t"e distribution o& t"e residuals is to t"e normal distribution. !"e resear#"ers used t%o met"ods o& de#ision tree %"i#" are t"e ?7(D, and (D, algorit"ms. ?7(D #an be used &or $redi#tion as %ell #lassi&i#ation and &or dete#tion o& intera#tion bet%een *ariables %"ile (D, uses in&ormation gain measure to #"oose t"e s$litting attribute. ?7(D 9 "i)s+uared 7utomati# (ntera#tion Dete#tion: #"ooses t"e inde$endent $redi#tor *ariable t"at "as t"e strongest intera#tion %it" t"e de$endent *ariable %"ile (D, #onstru#t t"e de#ision tree by em$loying a to$)do%n, greedy sear#" t"roug" t"e gi*en sets to test ea#" attribute at e*ery tree node.

Fi*)"e 6- N "0+$ P" 7+7i$i'y P$ ' Usi&* M)$'ip$e Li&e+" Re*"essi & (n e4$e#ted #umulati*e $robability s"o%s t"at uni&orm distribution "as an S s"a$e and it mat#"es t"e $attern o& a set o& $aired data. !"e resear#"ers belie*e t"at it indi#ates normal distribution into long)tailed be#ause t"e #ur*e starts belo% t"e normal line, bends to &ollo% t"e #ur*e and ends abo*e. (t means t"at more *arian#e t"an you %ould e4$e#t in a normal distribution and t"e resear#"ers agree t"at normal distribution #an be im$ro*e u$on as a model &or testing. 5.6.. DECISION TREE De#ision tree #reates a tree)based #lassi&i#ation model. (t #lassi&ies #ases into grou$s or $redi#ts *alues o& a de$endent *ariable based on *alues o& inde$endent *ariables. !"e $ro#edure $ro*ides *alidation tool &or e4$loratory and #on&irmatory #lassi&i#ation analysis 2.3.

Fi*)"e 5- M de$ S)00+"y Usi&* CHAID Me'( d !"e resear#"ers used ?7(D met"od to #ategorize ea#" $redi#tor i& ea#" *ariable are not signi&i#antly di&&erent %it" res$e#t to t"e de$endent *ariable. 0igure 4, indi#ates t"at only one o& t"e sele#ted inde$endent *ariables made a signi&i#ant enoug" #ontribution to be in#luded in t"e model %"i#" is t"e CM!>441.

Society of Information Technology Students Journal G

Fi*)"e 8- M de$ S)00+"y P" d)3ed 7y !59 De3isi & T"ee M4. s"o%s t"e error le*el %"en a$$lying t"e #lassi&ier to t"e training data. !"e most im$ortant &igures &rom model summary are t"e numbers o& #orre#tly and in#orre#tly #lassi&ied instan#es. Fsing M4. #lassi&ier, #orre#tly #lassi&ied instan#es is e+ual to G5/ %"ile in#orre#tly #lassi&ied instan#es is e+ual to ,./. Mean absolute error is e+ual to K.K165 %"i#" is t"e measure "o% #lose t"e &ore#asts or $redi#tion are to t"e e*entual out#omes. !"e results using ?7(D met"od is a$$ro4imately "ig" #om$ared to J48 #lassi&ier.

!able 4 s"o%s t"e a##ura#y and e&&i#ien#y o& t"e model. (D, te#"ni+ue "as a lo%est $er#entage error o& K.K165/ or 1.65/ indi#ates t"at t"e a##ura#y le*el o& t"e gi*en model is -..4./ out o& 1KK/ 21K3. ?7(D met"od also s"o%ed an a##e$table le*el o& a##ura#y. (n Multi$le Linear Regression, t"e resear#"ers used Mean 7bsolute 1er#entage 8rror 9M718: in order to #al#ulate t"e e&&i#ien#y o& t"e model %"i#" results to $er#entage error o& 5.-,/. (t means t"at t"e a##ura#y le*el using Regression analysis is -B.KB/. Multi)layer &eed)&or%ard algorit"m s"o%ed a "ig"est $er#entage error. CONCLUSION !"is study #ould be a great "el$ &or om$uter S#ien#e students and &or t"e tea#"ers to im$ro*e students' a#ademi# $er&orman#e, trim do%n &ailure rate, to better understand students' be"a*ior, and to im$ro*e tea#"ing. !"is study #an "el$ de*elo$ a &ait" on data mining te#"ni+ues so t"at $resent edu#ation systems may ado$t t"is as a strategi# management tool. Arade $oint a*erage 9A17: is used in "ig"er learning institution to dis#o*er kno%ledge &rom edu#ation data and students' $er&orman#e $lays an im$ortant role in $rodu#ing t"e best +uality graduates. 7#ademi# a#"ie*ement, grades are t"e main &a#tors t"at #an se#ure a stable <ob in li&e and all t"e students must gi*e t"e greatest e&&ort. (n sim$li&ying t"e *ariables into t"ree #ategories su#" as Mat"emati#s I S#ien#e, Ma<or, and Aeneral 8du#ation sub<e#ts t"ere are signi&i#ant relations"i$ bet%een t"em. !"e result o& t"is study indi#ates t"at data mining te#"ni+ues $ro*ided e&&e#ti*e im$ro*ing tools &or students' a#ademi# $er&orman#e. (t s"o%s "o% use&ul data mining #an be in "ig"er learning institutions es$e#ially using De#ision tree and Regression $arti#ularly to $redi#t a number and estimates t"e *alue o& t"e target as a &un#tion o& t"e $redi#tors &or ea#" #ase in t"e build data. 7lso S1SS gi*es an entire analyti#al $ro#ess &rom

!able , s"o%s t"e a##ura#y o& ?7(D, (D,, and Multi)layer &eed)&or%ard algorit"ms &or #lassi&i#ation a$$lied on t"e data. ?7(D te#"ni+ue "as "ig"est a##ura#y o& B5.G/ #om$ared to ot"er met"ods. (D, algorit"m also s"o%ed an a##e$table le*el o& a##ura#y %"ile Multi)layer &eed)&or%ard "as a lo%est a##ura#y o& 6K../ 2-3.

Society of Information Technology Students Journal B


$lanning to data #olle#tion, analysis and re$orting de$loyment o& t"e results. RECOMMENDATION Based &rom t"e summary o& &indings and #on#lusions o& t"e study, t"e resear#"ers re#ommend a$$lying t"is &ore#asting model in e4ternal *ariables t"at #an in&luen#e grades o& students su#" as lo#ation, so#ial, be"a*ior, and &amily su$$ort. 7lso a$$ly ot"er data mining te#"ni+ues on an e4$anded data set %it" more distin#ti*e attributes to get more a##urate and e&&i#ient results. 7$$li#ation o& data mining te#"ni+ues in edu#ational &ield #an be used to de*elo$ $er&orman#e monitoring and e*aluation tools system. REFERENCES
Bhardwaj, B.K. and Pal, S. 2011. Data Mining: A prediction for performance improvement using classification. International Journal of Computer Science and Information Security. e!land, "., #o$ert%, S., and &illiam%, '. A Data Mining Tutorial Sin!h, #., (iwari, ". and )imal, *. 201+. An Empirical Study of Application of Data Mining Techniques for Predicting Student Performance in igher Education. International Journal of Computer Science and "o$ile Computin!. (iwari, "., and )imal, *. Evaluation of Student performance !y an Application of Data Mining Techniques. http,--www.a%..com-/ue%tion-definition0of0de%cripti1e0 correlational0re%earch 2hmad, &.3., 24rai, 2., *ayan, 5., *ordin, S., and 5ahya, *. 2012. A "onceptual #rame$or% in E&amining the "ontri!uting #actors to 'o$ Academic Achievement: Self(Efficacy) "ognitive A!ility) Support System and Socio(Economic. International Conference on "ana!ement, Social Science and umanitie% 2012. http,--coo.li$rary.tow%on.edu-help!uide%-!uide%-correlation%p%%. pdf Chuchra, #. 2012. *se of Data Mining Techniques for the Evaluation of Student Performance: A "ase Study Bharadwaj, B., Pal, S., and 5ada1, S.K. 2011+ Data Mining Applications: A comparative Study for Predicting Students, performance. International Journal of Inno1ati1e (echnolo!y and Creati1e 6n!ineerin!. (iwari, "., and )imal, *. Evaluation of Student performance !y an Application of Data Mining Techniques.

You might also like