Volume 8 Issue 9
Volume 8 Issue 9
Abstract: With the development of communication technology, people have higher and higher requirements for
communication quality. Traditional array antennas require complex feed networks and the necessary phase shifters.
Conventional reflector antennas are bulky and difficult to manufacture. It is therefore necessary to analyze the
traditional characteristics of reflective array antennas to accommodate future adaptability. The microstrip antenna
has the advantages of small size, simple structure and small outer shape. Therefore, the working principle and design
process of the reflective microstrip array antenna are introduced in detail. A dual-loop antenna operating at
f 4.5GHz was designed, which simplifies the shape of the antenna and achieves a beam pointing of 30°.
Compared with similar literature, the new unit antenna has a simple structure, can realize beam orientation without a
phase shifter, can work in the low frequency range of 5G, and has high engineering value.
Keywords: Beam orientation; Simple structure; Phase shifter; Microstrip antenna; 5G communication
1. INTRODUCTION
With the rapid development of modern the shape of the reflec Therefore, this paper
microwave communication, satellite communication designs a micro-band unit with a simple double-ring
modern society's requirements for the flexible increases the difficulty of production. The structure is
array antenna is operated. The microstrip reflective communication systems in the frequency band, and has
array combines the advantages of a reflective antenna high engineering practical value.
www.ijsea.com 340
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 340-347, 2019, ISSN:-2319–8656
[8-10].
feed phase to each radiating element on the reflective the reflective array antenna
k0 Ri elements
is the free space wave number、 is the
2.2.1 Unit antana model
phase center of the feed to the position of the i th
An isolated unit model, which does not consider
ri
patch、 represents the position vector from the the mutual coupling effects of surrounding elements,
directly uses plane waves to excite individual isolated
r0
center of the array to the i th patch、 represents the elements, and obtains the phase delay generated by
electromagnetic waves in the unit according to the
unit vector along the outgoing main beam、 2N
phase contrast between the reflected waves and the
incident waves. F. Venneri et al. extracted a square
www.ijsea.com 341
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 340-347, 2019, ISSN:-2319–8656
variable size unit antenna using an isolated unit model. (2) Master-slave boundary method
This model is fast calculated by computer, but the
drawback is that this analysis method only applies to The wireless periodic array is simulated by two
large cell spacing so that mutual coupling can be pairs of master-slave boundaries with a Floquet port.
www.ijsea.com 342
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 340-347, 2019, ISSN:-2319–8656
movement is 363
www.ijsea.com 343
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 340-347, 2019, ISSN:-2319–8656
the literature [11], and the phase shift range exceeds 3.2 Influence of unit structure
450°. The literature [12]. Proposed a windmill type parameters on phase shift characteristics
unit with a phase shift range of 700°. However, the
two units are complicated in structure and difficult to (1)Effect of thickness t on phase shifting
manufacture. Therefore, this paper designs a new type performance.
of unit with double loop structure, the structure is as
follows:
The unit consists of two annular patches, As can be seen from Fig. 11, as the parameter d
wherein the inner ring is a square ring, the outer increases, the outer ring width of the unit antenna
ring is a ring, and there is a gap between the upper increases, the curve becomes steep, and the resonance
and lower sides. Rogers RT/duroid 5880 material point distance is zoomed in. Finally, d = 1.9 mm is
with thickness t=2mm, dielectric constant is 2.2. taken.
(Related units are marked in the figure). (3) Effect of parameter g on phase shift
www.ijsea.com 344
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 340-347, 2019, ISSN:-2319–8656
performance
Parameters Table 1
Variable a t hs h l d g
The optimized curve is shown below: phase shift performance is 0~360°, and the linearity of
(3)
requirements.
The array adopts 4*4 array mode, the cell
spacing is a=12mm, and the dielectric constant is 2.2.
3.3 Microstrip reflection array design
The beam is directed to theta=30 degrees, so the phase
In this paper, the phase shift performance of the of the cells in each row and column needs to be
new reflection unit is analyzed and analyzed. The compensated:
www.ijsea.com 345
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 340-347, 2019, ISSN:-2319–8656
Column
1 2 3 4
Row Phase
1 19 21.75 22.25 23
Make a reflection array based on the dimensional data Figure.14 Microstrip reflective array antenna
Parameters Table 3
Column 1 2 3 4
Row Phase
It can be seen from the figure that the pattern of Letters (2016):1-1.
the main beam is in the direction of 30°, which is
[6] Dahri, M. Hashim , et al. "A Review of Wideband
consistent with the original design, thus verifying that
Reflectarray Antennas for 5G Communication
the double-ring unit can achieve beam orientation.
Systems." IEEE AccessPP.99(2017):1-1.
4. CONCLUSION
This paper proposes a microstrip reflective array [7] Qin, Pei Yuan , Y. J. Guo , and A. R. Weily .
antenna that can be used in the 5G FR1 band, which "Broadband Reflectarray Antenna Using
compensates for the phase shift of the antenna by Sub-wavelength Elements Based on Double Square
changing the structure of the antenna unit. The beam Meander-Line Rings." IEEE Transactions on Antennas
pointing angle is set, and the size of the reflective
and Propagation 64.1(2015):1-1.
array unit is calculated. Finally, the main beam of the
antenna is accurately oriented to a preset 30° to [8] Chaharmir, M. R. , and J. Shaker . "Design of a
achieve beam directivity. In the same literature, the broadband, dual-band, large reflectarray using multi
unit antenna has a simple structure and is easy to
open loop elements." Antennas & Propagation Society
process and design. It can be used in 5G FR1 mobile
International SymposiumIEEE, 2010.
communication systems and other wireless
communication systems, and has high engineering
[9] Venneri, F. , S. Costanzo , and M. G. Di .
practical value.
"Bandwidth Behavior of Closely Spaced
Network." International Journal of Computer Science [12] Encinar, J. A. , and J. A. Zornoza . "Broadband
& Mobile Computing 2.8(2013). design of three-layer printed reflectarrays." IEEE
Transactions on Antennas and
[4] Huang, J. "Analysis of a microstrip reflectarray
Propagation 51.7(2003):1662-1664.
antenna for microspacecraft application." Tda
www.ijsea.com 347
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 348-352, 2019, ISSN:-2319–8656
Abstract: The existence of the motorcycle repair shop business continues to grow, along with the developments of motorcycle riders
in Indonesia. However, the majority of riders do not know the existence of the repair shop, especially in the remote location or in the
area where they have never visited before. This problem can make that business do not last long. The Motorcycle Repair Shop
Information System Application is useful for answering problems related to motorcycle repair shops. "Call for Service" and
"Promotion" are two main features of the application which implement E-CRM. The "Call for Service" feature is used to make
emergency calls to the nearest repair shop if there is an unexpected situation on the road. The "Promotion" feature is used as a medium
to attract as many customers as possible and to increase customer loyalty by providing attractive promotions to the application users.
The implementation process uses computers with React Native software, SQLyog, XAMPP, Visual Studio Code and Android
smartphones. The Black Box Test in the application reveals that the users can use the “Call for Service” and “Promotion” features
from it. The results of data development analysis in the application shows that it only requires a storage space of 73,746 MegaBytes
within a year, if there are 25 new data every day.
.
Keywords: E-CRM; mobile application; emergency call; promotion; customer loyalty.
www.ijcat.com 348
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 348-352, 2019, ISSN:-2319–8656
www.ijcat.com 349
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 348-352, 2019, ISSN:-2319–8656
www.ijcat.com 350
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 348-352, 2019, ISSN:-2319–8656
www.ijcat.com 351
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 348-352, 2019, ISSN:-2319–8656
Table 2 shows an analysis of data growth from the Master IT-enabled absorptive capacity perspective,” Inf.
Table and Transaction Table groups, with the assumption Manag., vol. 55, no. 5, pp. 576–587, 2018.
there are 25 data per day. The results reveal that the Master
Table requires a storage space of 55 kilobytes for a day; [6] I. K. K. Sanjaya, P. W. Buana, and I. M. Sukarsa,
1,652.25 kilobytes for 30 days; and 20,102.39 kilobytes for “Designing Mobile Transactional Based Restaurant
365 days. In other hand, the Transaction Table requires Management,” Int. J. Comput. Eng. Inf. Technol., vol.
storage space of 221,225 kilobytes for a day; 6,636.75 11, no. 6, pp. 130–136, 2019.
kilobytes for 30 days; and 73,746,133 kilobytes for 365 days.
[7] D. P. A. Sanjaya, I. K. A. Purnawan, and N. K. D.
6. CONCLUSION Rusjayanthi, “An Introduction to Balinese Cultural
The Motorcycle Repair Shop Information System Application Traditions through the Android-Based Game Explore
is an Android-based marketplace application that aims to Bali Application (In Indonesian: Pengenalan Tradisi
improve the economic level of repair shop business, and help Budaya Bali melalui Aplikasi Game Explore Bali
the riders in everywhere and at any time by implementing E- Berbasis Android),” Lontar Komput. J. Ilm. Teknol. Inf.,
CRM on the “Call for Service” feature through the vol. 7, no. 3, pp. 162–173, 2016.
application. The Black Box Testing in the application shows
that the user can use the “Call For Service” feature, and [8] E. A. Sholarin and J. L. Awange, “Geographical
reveals that the application has successfully implemented E- information system (GIS),” in Environmental Science
CRM in that feature in order to make an emergency call. In and Engineering (Subseries: Environmental Science),
testing data development analysis, it shows that the Motor 2015.
Repair Shop Information System application only requires a
storage space of 73,746,133 kiloBytes (73,746 MegaBytes) [9] A. Rahmi, I. N. Piarsa, and P. W. Buana, “FinDoctor –
for 365 days, if it is assumed that there are 25 new data per Interactive Android Clinic Geographical Information
day. In the future, the application can still be developed both System Using Firebase and Google Maps API,” Int. J.
in terms of display and new features, such as a “live chat” New Technol. Res., vol. 3, no. 7, pp. 8–12, 2017.
feature with the mechanics when the customers use the “Call
for Service” feature in order to make it easier to communicate [10] I. N. Piarsa, P. W. Buana, and I. G. A. Mahasadhu,
with both parties. “Android Navigation Application with Location-Based
Augmented Reality,” Int. J. Comput. Sci. Issues, vol. 13,
no. 4, 2016.
REFERENCES
[1] POLRI, "The Number of Motorcycle Developments [11] M. Išoraitė, “Customer Loyalty Theoretical Aspects,”
Based on Its Type, from 1987-2008 (In Indonesian: Ecoforum, vol. 5, no. 2, pp. 292–299, 2016.
“Perkembangan Jumlah Kendaraan Bermotor Menurut
Jenis tahun 1987-2008),” 2009. [Online]. Available: [12] A. B. Ramadhan, “The Role Of E-Crm (Electronic
http://www.bps.go.id/tab_sub/view.php?tabel=1&daftar= Customer Relationship Management) in Improving
1&id_subyek=17¬ab=12. [Accessed: 06-May-2019]. Service Quality (Study at Harris Hotel & Conventions
Malang) (In Indonesian: Peran E-CRM (Electronic
[2] A. Dabhade, K. V Kale, and Y. Gedam, “Network Customer Relationship Managemen) dalam
Analysis for Finding Shortest Path in Hospital Meningkatkan Kualitas Pelayanan ( Studi pada Harris
Information System,” Int. J. Adv. Res. Comput. Sci. Hotel & Conventions Malang )),” J. Adm. Bisnis, vol. 40,
Softw. Eng., vol. 5, no. 7, pp. 618–623, 2015. no. 1, pp. 194–198, 2016.
[3] M. Mwiya, J. Phiri, and G. Lyoko, “Public Crime [13] Y. Fauziah, H. C. Rustamaji, and R. P. Ramadhan, “The
Reporting and Monitoring System Model Using GSM Implementation of Mobile Crowdsourcing for Estimating
and GIS Technologies : A Case of Zambia Police Bus Arrival Times Based on Community Information (In
Service,” Int. J. Comput. Sci. Mob. Comput., vol. 4, no. Indonesian: Penerapan Mobile Crowdsourching Untuk
11, pp. 207–226, 2015. Estimasi Waktu Kedatangan Bis Berdasarkan Informasi
Masyarakat),” Lontar Komput. J. Ilm. Teknol. Inf., vol.
[4] T. Le Tan, “Successful Factors of Implementation 7, no. 3, p. 150, 2017.
Electronic Customer Relationship Management (e-CRM)
on E-commerce Company,” Am. J. Softw. Eng. Appl., [14] M. Y. P. Mahendra, I. N. Piarsa, and D. Putra Githa,
vol. 6, no. 5, p. 121, 2017. “Geographic Information System of Public Complaint
Testing Based On Mobile Web (Public Complaint),”
[5] T. Cui, Y. Wu, and Y. Tong, “Exploring ideation and Lontar Komput. J. Ilm. Teknol. Inf., vol. 9, no. 2, p. 95,
implementation openness in open innovation projects: 2018.
www.ijcat.com 352
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 353-357, 2019, ISSN:-2319–8656
Javad Rahmani
Islamic Azad University
Iran
Abstract: In this paper, the particle swarm optimization (PSO) algorithm is proposed to solve the lift gas optimization problem in the
crude oil production industry. Two evolutionary algorithms, genetic algorithm (GA) and PSO, are applied to optimize the gas
distribution for oil lifting problem for a 6-well and a 56-well site. The performance plots of the gas intakes are estimated through the
artificial neural network (ANN) method in MATLAB. Comparing the simulation results using the evolutionary optimization
algorithms and the classical methods, proved the better performance and faster convergence of the evolutionary methods over the
classical approaches. Moreover, the convergence rate of PSO is 13 times faster than GA's for this problem.
Keywords: particle swarm optimization; crude oil lifting; lift gas allocation; optimization; artificial neural network; genetic algorithm.
www.ijcat.com 353
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 353-357, 2019, ISSN:-2319–8656
2. EVOLUTIONARY ALGORITHMS
In this section, the genetic optimization algorithm (GA) and
the particle swarm optimization (PSO) algorithm are
explained in detail.
4. PROPOSED STRATEGY
In order to define the optimization problem, we first need to
estimate the performance diagrams of the wells with different
levels of gas injections. The artificial neural network (ANN)
algorithm is utilized in this step to attain the (good laboratory
practices) GLP-based performance diagrams. The training
where vi(t) and xi(t) denote the velocity and position of model is then used as the fitness function in the optimization
particles at time t. y and parameters represent the personal process. Once the convergence criteria are met, the algorithm
best solution of the particle and the global best solution, stops. The PSO algorithm is simulated in MATLAB
respectively. r1 and r2 are the random vectors with uniform environment. The advantages of coding in MATLAB include:
distribution in the [0,1] interval. w, c1, and c2 are the inertia
coefficient, personal learning coefficient, and collective
learning coefficient, respectively.
Beside the velocity and position updates, the personal best and
global best parameters should also be updated in a standard
PSO algorithm.
www.ijcat.com 354
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 353-357, 2019, ISSN:-2319–8656
5. SIMULATION RESULTS
In this section, the results of gas allocation optimization using
PSO algorithm are presented and discussed. Two different
scenarios; a low-dimension problem with six wells, and a
high-dimension problem with 56 wells are considered in our
simulations. The constraints on the amount of available lift
gas are considered (limited amount of lift gas is available).
The optimization is implemented on the datasets from
Buitrago et al. research. As mentioned, the ANN approach is
employed to estimate the performance diagrams of the lift gas.
The objective in the constrained optimization problem is to
maximize oil production. The upper limit for the gas
consumption is only considered as a constraint, and the gas
consumption is not a term in the objective function. The
objective function and the constraints equation is as (5).
www.ijcat.com 355
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 353-357, 2019, ISSN:-2319–8656
PSO enhances the convergence pace in the algorithm. The [6] Rahimikelarijani, Behnam, et al. "Optimal Ship
main drawback of GA in this regard is that it does not update Channel Closure Scheduling for a Bridge
its parameters, and it does not include any tunable parameter Construction." IIE Annual Conference. Proceedings.
in its process. Institute of Industrial and Systems Engineers (IISE),
2017.
[7] F. Rahmani, F. Razaghian, and A. Kashaninia, "Novel
Approach to Design of a Class-EJ Power Amplifier
Using High Power Technology," World Academy of
Science, Engineering and Technology, International
Journal of Electrical, Computer, Energetic, Electronic
and Communication Engineering, vol. 9, pp. 541-546,
2015.
[8] Rostaghi-Chalaki, Mojtaba, A. Shayegani-Akmal, and
H. Mohseni. "A study on the relation between leakage
current and specific creepage distance." 18th
International Symposium on High Voltage
Engineering (ISH 2013). 2013.
[9] M. Golan and C. H. Whitson, Well Performance,
Figure 2: The number of iterations in PSO and GA for solving Norwegian University of Science and Technology
the 56-well problem (NTNU), Trondheim, Norway, (1991) by Prentice-
Hall. Inc.
[10] J. Redden, T. A. Sherman, and J. Blann, “Optimizing
6. CONCLUSIONS Gas-Lift Systems,” in Proceedings of Fall Meeting of
the Society of Petroleum Engineers of AIME, 1974.
The gas distribution optimization problem is studied in this
paper. The particle swarm optimization (PSO) approach is [11] T. D. Mayhill, “Simplified Method for Gas-Lift Well
used for the first time for this problem. The performance plots Problem identification and Diagnosis,” in Fall Meeting
are attained through an artificial neural network (ANN) of the Society of Petroleum Engineers of AIME, 1974.
learning. The proposed strategy is implemented on a high- [12] E. Kanu, J. Mach, and K. Brown, “Economic
dimensional (56-well) and a low-dimensional (6-well) Approach to Oil Production and Gas Allocation in
problem. The better performance of the evolutionary Continuous Gas Lift (includes associated papers 10858
optimization method (GA and PSO) over the classical and 10865),” J. Pet. Technol., vol. 33, no. 10, pp.
approaches is more recognizable when the problem is of 1,887–1,892, Oct. 1981.
higher dimension (like the 56-well problem). PSO and GA
showed similar performances; however, PSO performed much [13] N. Nishikiori, R. A. Redner, D. R. Doty, and Z.
faster (13 times faster) and required less number of iterations Schmidt, “An Improved Method for Gas Lift
than GA. Allocation Optimization,” in Proceedings of SPE
Annual Technical Conference and Exhibition, 1989.
[14] R. Eini, "Flexible Beam Robust Loop Shaping
Controller Design Using Particle Swarm
7. REFERENCES Optimization," Journal of Advances in Computer
[1] F. Rahmani, F. Razaghian, and A. Kashaninia, "High Research, vol. 5, pp. 55-67, 2014.
Power Two-Stage Class-AB/J Power Amplifier with [15] R. Eini, and S. Abdelwahed. "Distributed Model
High Gain and Efficiency," 2014. Predictive Control Based on Goal Coordination for
[2] M. Ketabdar, "Numerical and Empirical Studies on the Multi-Zone Building Temperature." In 2019 IEEE
Hydraulic Conditions of 90 degree converged Bend Green Technologies Conference (GreenTech),
with Intake," International Journal of Science and Lafayette, LA. 2019.
Engineering Applications, vol. 5, pp. 441-444, 2016. [16] B. T. Hyman, Z. Alisha, S. Gordon, "Secure Controls
for Smart Cities; Applications in Intelligent
[3] A. Hamedi, M. Ketabdar, M. Fesharaki, and A.
Transportation Systems and Smart Buildings,"
Mansoori, "Nappe Flow Regime Energy Loss in
International Journal of Science and Engineering
Stepped Chutes Equipped with Reverse Inclined Steps:
Applications, vol. 8, pp. 167-171, 2019. doi:
Experimental Development," Florida Civil
10.7753/IJSEA0806.1004
Engineering Journal, vol. 2, pp. 28-37, 2016.
[17] Heng, Li Jun, and Abesh Rahman. "Designing a robust
[4] R. Eini and A. R. Noei, "Identification of Singular controller for a missile autopilot based on Loop
Systems under Strong Equivalency," International shaping approach." arXiv preprint
Journal of Control Science and Engineering, vol. 3, pp. arXiv:1905.00958 (2019).
73-80, 2013.
[18] Patel, Dev, Li Jun Heng, Abesh Rahman, and Deepika
[5] Rostaghi-Chalaki, Mojtaba, A. Shayegani-Akmal, and Bharti Singh. "Servo Actuating System Control Using
H. Mohseni. "Harmonic analysis of leakage current of Optimal Fuzzy Approach Based on Particle Swarm
silicon rubber insulators in clean-fog and salt-fog." Optimization." arXiv preprint
18th International Symposium on High Voltage arXiv:1809.04125 (2018).
Engineering. 2013.
www.ijcat.com 356
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 353-357, 2019, ISSN:-2319–8656
www.ijcat.com 357
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 358-362, 2019, ISSN:-2319–8656
Abstract -- The main aim of every academia enthusiast is placement in a reputed MNC’s and even the reputation and every year
admission of Institute depends upon placement that it provides to their students. So, any system that will predict the placements of the
students will be a positive impact on an institute and increase strength and decreases some workload of any institute’s training and
placement office (TPO). With the help of Machine Learning techniques, the knowledge can be extracted from past placed students and
placement of upcoming students can be predicted. Data used for training is taken from the same institute for which the placement
prediction is done. Suitable data pre-processing methods are applied along with the features selections. Some Domain expertise is used
for pre-processing as well as for outliers that grab in the dataset. We have used various Machine Learning Algorithms like Logistic,
SVM, KNN, Decision Tree, Random Forest and advance techniques like Bagging, Boosting and Voting Classifier and achieved 78%
in XGBoost and 78% in AdaBoost Classifier.
Keywords: Pre-processing, Feature Selection, Domain expertise, Outliers, Bagging, Boosting, SVM, KNN, Logistics
1. INTRODUCTION
Nowadays Placement plays an important role in this world
accuracy of 71.66% with tested real-life data indicates that the
full of unemployment. Even the ranking and rating of
institutes depend upon the amount of average package and system is reliable for carrying out its major objectives, which
amount of placement they are providing. is to help teachers and placement cell[2].
So basically main objective of this model is to predict whether
the student might get placement or not. Different kinds of Ajay Kumar Pal, Saurabh Pal (2013) they are predicting the
classifiers were applied i.e. Logistic Regression, SVM, placement of student after doing MCA by the three selected
Decision Tree, Random Forest, KNN, AdaBoost, Gradient classification algorithms based on Weka. The best algorithm
Boosting and XGBoost. For this all over academics of based on the placement data is Naïve Bayes Classification
students are taken under consideration. As placements activity with an accuracy of 86.15% and the total time taken to build
take place in last year of academics so last year semesters are the model is at 0 seconds. Naïve Bayes classifier has the
not taken under consideration lowest average error at 0.28 compared to others.[3]
www.ijcat.com 358
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 358-362, 2019, ISSN:-2319–8656
different status i.e. Dream company, Core Company, Mass we have merged 12th and diploma marks and made
recruiters, Not eligible and Not interested[5] a single column for both.
Some of the tuples where from M.tech background
so we have dropped them and even in
3. DATASET DESCRIPTION AND “current_aggregate” column we have dropped the
SYSTEM FLOW NA values because the whole row was having NA.
Replaced all NA values in columns
“Current_Back_Papers”,
This approach was followed in following Figure 3. “Current_Pending_Back_Papers”, all semester wise
“Sem_Back_Papers”, “Sem_Pending_Back_Papers”
with 0 because it was null only if that student have
no backlogs
Data Gathering Using LabelEncoder from Preprocessing API in
sklearn encoded the labels of columns
“'Degree_Specializations”, “Campus”,” Gender”,
“year_down”, “educational_gap”
Pre-processing
3.2 Feature Selection
As per machine learning Feature Selection algorithms like
“Ridge”, “Lasso”, “RFE”, “plot importance”, “F1 score” and
“feature importance” we have got various outputs
Feature selection
“Feature importance” with DT
Training different
Model
Model Selection
Prediction
www.ijcat.com 359
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 358-362, 2019, ISSN:-2319–8656
“Ridge”
“F1 score”
Sem4_Aggregate_Marks 312.063809
Current_Aggregate_Marks 286.086537
Sem2_Aggregate_Marks 164.183078
12th_/_Diploma_Aggre_marks 142.208129
Sem1_Aggregate_Marks 139.183936
Figure 3.2.3 Feature Selection Using Ridge
Sem6_Aggregat_ Marks 136.333959
Sem5_Aggregate_Marks 131.988165
10th_Aggregate_Marks 128.526784
Sem6_Back_Papers 128.526784
live_atkt 47.908927
Sem5_Back_Papers 45.382049
Sem4_Back_Papers 43.547352
www.ijcat.com 360
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 358-362, 2019, ISSN:-2319–8656
“Lasso”
www.ijcat.com 361
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 358-362, 2019, ISSN:-2319–8656
Figure 5.1 Layering of Classifiers [2] Senthil Kumar Thangavel, Divya Bharathi P, Abijith
Sankar “Student Placement Analyzer: A Recommendation
System Using Machine Learning” 2017 International
We have used Base Classifier as Decision Tree, over that we Conference on Advanced Computing and Communication
have used AdaBoost Classifier and over that we have used Systems (ICACCS -2017), Coimbatore, INDIA, Jan. 06 – 07,
Baagging Classifier because we want to tune the accuracy of 2017
the model
[3] Ajay Kumar Pal, Saurabh Pal “Classification Model of
Prediction for Placement of Students” I.J.Modern Education
and Computer Science, 2013, 11, 49-56 Published Online, 11
6. RESULT AND CONCLUSION November 2013
AdaBoost(DT) 77%
XGBoost 78%
www.ijcat.com 362
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 363-366, 2019, ISSN:-2319–8656
Abstract: When talking about any companies growth within market customers play an essential role in it , having the correct insights about
customer behaviour and their requirements is the current need in this customer driven market . Preserving the interests of customers by
providing new services & products helps in maintaining business relations . Customer churn is great problem faced by companies nowadays
due to lagging in understanding their behaviour & finding solutions for it . In this project we have found causes of the churn for a telecom
industry by taking into consideration their past records & then recommending them new services to retain the customers & also avoid churns in
future . We used pie charts to check churning percentage later analysed whether there are ant outliers [using box plot] then dropped some
features which were of less importance then converted all categorical data into numerical by using [Label Encoding for multiple category data
& map function for two category data] plotted the ROC curve to get to know about true positive & false negative rate getting line at 0.8 then
spitted the data using train test split .We used algorithms decision tree , Random Forest for feature selection wherein we got feature
importance , then used logistic regression & found feature with highest weight assigned leading to cause of churn . Now in order to retain
customers we can recommend them new services.
Keywords : Customer churn analysis telecom , Customer churn prediction & prevention , naïve bayes , logistic regression , decision tree ,
random forest
In this Customer Churn prediction & retention we are analysing the Understanding the Predictive Accuracy of Customer Churn Models”
past behaviour of customers and accordingly finding the real cause of [2]here they have worked on measuring and increasing accuracy for
the churn , then predicting whether churn will happen in future by churn prediction used logistic & tree approach .
customers . By taking into account details like Monthly charges ,
services they have subscribed for , tenures , contract they will We went through one more paper “Customer churn prediction in
contribute into he end results i.e prediction. telecom using machine learning in big data platform” Abdelrahim
Kasem Ahmad* , Assef Jafar and Kadan Aljoumaa [3] they have
Our aim is to use machine learning concepts to not only predict & used decision tree , random forest , XGBoosting , they used this
retain customers but also to avoid further churns which would be algorithm for classification in predictive churn of customers getting
beneficial to industry . better accuracy.
www.ijcat.com 363
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 363-366, 2019, ISSN:-2319–8656
L Miguel APM. "Measuring the impact of data mining on churn 3.2 DATA PRE-PROCESSING
management” , they have proposed a analysis framework which
prefigure impact of data mining for churn management[6] . Data pre-processing is important task in machine learning. It
converts raw data into clean data. Following are technique, we have
Adnan Amin , Babar shah , Awais Adnan "Customer churn applied on data: -
prediction in telecommunication industry using data certainty"[7], Missing Values – Here we had missing values in Totalcharges
The dataset is grouped into different zones based on the distance feature which we then eliminated and adjusted them with mean
factor which are then divided into two categories as data with high values . These are the missing row values within data if not
certainty, and data with low certainty, for predicting customers handled would later lead to errors for converting data type as it
exhibiting Churn and Non-churn behaviour. takes string value for empty spaces .
Label Encoder – For categorical variables this is perfect method
3. PROCESS FLOW to convert them into numeric values , best used when having
multiple categories . We had various categorical values
converted them into numeric for further use in algorithms .
The data we got was mostly balanced & categorical data then we Drop Columns – As we took insights from the data we came to
began with Data Cleaning, Pre-processing, removing unwanted know some of the features were of less importance so we
columns, feature selection, label encoding. dropped them to reduce number of features .
Train-Test Split
TRAIN-TEST SPLIT
FEATURE SELECTION
TechSupport
Model Applied
MODELS OnlineSecurity
tenure
Contract
Model Tuning
MODEL TUNING 0 0.05 0.1 0.15 0.2 0.25 0.3
3.1 DATASET
We took this telecom dataset from online website source took all the
insights regarding the data .
Attributes of the dataset : :
Customerid, gender, SeniorCitizen, Partner, Dependents,tenure,
PhoneService, MultipleLines, InternetService, OnlineSecurity,
OnlineBackup, DeviceProtection, TechSupport, StreamingTV,
Contract, PaperlessBilling, PaymentMethod, MonthlyCharges,
TotalCharges, Churn.
www.ijcat.com 364
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 363-366, 2019, ISSN:-2319–8656
Confusion Matrix :
[[3816 295]
[ 924 590]] .
Now after all the cleaning up & pre-processing of the data now we
separate our data for further applying algorithms on it. By using :
1. Train-Test Split
2. Modeling
3. Tuning Model
5.2 Modelling :
Following are model, we applied to check which model
gives better accuracy:
Here we can see that customer who took fibre optics for
month-to-month contract whether it be male/female • Support Vector Classifier (SVC):
resulted in churn. This algorithm is used for classification problem. The main
objective of SVC is to fit to the data you provide, returning a
“best fit” hyperplane that divides, or categorizes, your data.
From there, when obtaining the hyperplane, you'll then feed
some options to your category to examine what the "predicted"
class is.
• Decision Tree:
Decision tree is non-parametric supervised learning. It is used
for both classification and regression problem. It is flowchart-
like structure in which each internal node represents a “test” on
an attribute, each branch represents the outcome of the test, and
each leaf node represents a class label. The path between root
and leaf represent classification rules. It creates a
comprehensive analysis along with each branch and identifies
Also visualised all the features within the dataset & came to decision nodes that need further analysis.
know the distributions .
• Random Forest:
Got roc curve ; Random Forest is a meta estimator that uses the number of
decision tree to fit the various sub samples drawn from the
original dataset. we also can draw data with replacement as per
the requirements.
Logistic Regression :
Logistic regression is a classification algorithm used to assign
observations to a discrete set of classes . Logistic Regression is a 7. REFERENCES
Machine Learning algorithm which is used for the classification
problems, it is a predictive analysis algorithm and based on the [1] Praveen Ashtana “A comparison of machine learning techniques
concept of probability. for customer churn prediction” International Journal of Pure and
Applied Mathematics Volume 119 No. 10 2018, 1149-1169 ISSN:
Models used & their accuracy : : 1311-8080
Decision Tree 77.81% [3] Abdelrahim Kasem Ahmad* , Assef Jafar and Kadan Aljoumaa
“Customer churn prediction in telecom using machine learning in big
data platform” - Journal of Big Data volume 6,
Random Forest Tree 80.02%
Article number: 28 (2019) , published on 20th March 2019 .
Naïve Bayes 74.91% [4] S-Y. Hung, D. C. Yen, and H.-Y. Wang. "Applying data mining
to telecom churn management." Expert Systems with Applications,
SVM 80.1% vol. 31, no. 3, pp. 515–524, 2006.
K – Nearest Neighbour 76.61% [5] K. Coussement, and D. Van den Poel. "Improving customer
attrition prediction by integrating emotions from client/company
interaction emails and evaluating multiple classifiers." Expert
XGBoost 80% Systems with Applications, vol. 36, no. 3, pp. 6127–6134, 2009
• XGBoost :
XGBoost stands for extreme Gradient Boosting. XGBoost is
an implementation of gradient boosted decision trees designed
for speed and performance[3].
www.ijcat.com 366
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 367-370, 2019, ISSN:-2319–8656
-----------------------------------------------------------------------------------------------------------------------------
Abstract: :Examining and protecting air quality has become one of the most essential activities for the government in many
industrial and urban areas today. The meteorological and traffic factors, burning of fossil fuels, and industrial parameters play
significant roles in air pollution.With this increasing air pollution,Weare in need of implementing models which will record
information about concentrations of air pollutants(so2,no2,etc).The deposition of this harmful gases in the air is affecting the quality of
people’s lives, especially in urban areas. Lately, many researchers began to use Big Data Analytics approach as there are
environmental sensing networks and sensor data available.In this paper, machine learning techniques are used to predict the
concentration of so2 in the environment. Sulphur dioxide irritates the skin and mucous membranes of the eyes, nose, throat, and
lungs.Models in time series are employed to predict the so2 readings in nearing years or months.
Keywords: Machine Learning, Time Series, Prediction, Air Quality, SO2
www.ijcat.com 367
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 367-370, 2019, ISSN:-2319–8656
(ANN), Genetic Algorithm ANN Model, Random forest, So, to summarize we have deleted the following features from
decision tree, Deep belief network are the algorithms which our dataset :
were used and various pros and cons of the model were
state,pm2_5,agency, stn_code,
presented.[5]
sampling_date and location_monitoring_station
We have simplified the type attribute to contain only one of
. the three categories: industrial, residential, other.For SO2 and
3. DATASET NO2, we replaced nan values by mean.For date, we have
dropped nan values as there were only 3 null values.
3.1 Dataset/Source: Kaggle So after pre-processing our dataset contains 60,380 rows and
Structured/Unstructured data:Structured Data in CSV 7 columns.
format.
4. EXPLORATORY DATA ANALYSIS:
Dataset Description: The below graph shows concentration of so2 over
The dataset consists of around 450000 records of all the the years.It was highest in the years of 1997 and
states of India.We worked only on Dataset of 2001 and lowest in the years 1988 and 2003
Maharashtra.So we had 60383 records. This dataset .However,it is stable for the latest years.
consist of 13 attributes listed below.
1)stn_code
2)sampling_date
3) state
4) location
5) agency
6)type
7)so2
8)no2
9)rspm
10) spm This graph shows that the amount of so2 is
highest in the industrial areas.
11)location_monitoring_station
12)pm2_5
13)date
Splitting for Testing :Data Splitting was done as 80% for From this graph we can conclude that Nagpur
training and 20% for testing. has the deadliest amount of so2 as compared to
other cities whereas Akole , Amravati are
Preprocessing and Feature Selection: sparsely polluted followed by Jalna and
We only studied and applied algorithms on the data of Kolhapur.
Maharashtra State .Hence, no. of rows was reduced to 60,383
and state column automatically is of no more use.
All the values in pm2_5 were null values ,so we dropped the
column.The agency’s name have nothing to do with how
much polluted the state is. Similarly, stn_code is also not
useful.
The date is a cleaner representation of sampling_date attribute
and so we will eliminate the redundancy by removing the
latter. location_monitoring_station attribute is again
unnecessary as it contains the location of the monitoring
station which we do not need to consider for the analysis.
www.ijcat.com 368
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 367-370, 2019, ISSN:-2319–8656
X(t+1) = b0 + b1*X(t-1) + b2*X(t-2) This model is not able to show expected output as the data is
Because the regression model uses data from the same input not in sequence as per date column.The same is the problem
variable at previous time steps, it is referred to as an for cities.If we predict for the entire state, it wont be helpful
autoregression (regression of self).[6] So we will be now calculating AQI and use classification
models further.
www.ijcat.com 369
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 367-370, 2019, ISSN:-2319–8656
www.ijcat.com 370
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 371-374, 2019, ISSN:-2319–8656
--------------------------------------------------------------------****************------------------------------------------------------------------
Abstract: Electric energy consumption is the actual energy demand made on existing electricity supply. However, the
mismanagement of its utilisation can lead to a fall in the supply of electricity. It is therefore imperative that everybody should be
concerned about the efficient use of energy in order to reduce consumption [1]. The purposes of this research are to find a model
to forecast the electricity consumption in a household and to find the most suitable forecasting period whether it should be in
daily, weekly, monthly, or quarterly. The time series data in our study is the individual household electric power consumption
[4].To explore and understand the dataset I used line plots for series data and histograms for the data distribution. The data
analysis has been performed with the ARIMA (Autoregressive Integrated Moving Average) model.
---------------------------------------------------------------------------------------------------------------------------------------------
Electricity load forecasting has gained substantial How to Load and Explore Household Electricity Usage
importance nowadays in the modern electrical power Data In this tutorial, you will discover a household power
management systems with elements of smart greed consumption dataset for multi-step time series forecasting
technology. A reliable forecast of electrical power and how to better understand the raw data using
consumption represents a starting point in policy exploratory analysis.[5]
development and improvement of energy production and
distribution. At the level of individual households, the How to Develop an Autoregression Forecast Model for
ability to accurately predict consumption of electricity Household Electricity Consumption In this tutorial, you
power significantly reduces prices by appropriate systems will discover how to develop and evaluate an
for energy storage. Therefore, the energy efficient power autoregression model for multi-step forecasting household
networks of the future will require entirely new ways of power consumption.[6]
forecasting demand on the scale of individual households
[2]. The analysis of a time series used forecasting
Time Series Analysis of Household Electric Consumption
techniques to identify models from the past data. With the
with ARIMA and ARMA Models: In this research, we are
assumption that the information will resemble itself in the
interest in time series analysis with the most popular
future, we can thus forecast future events from the
method, that is, the Box and Jenkins method. The result
occurred data. There are several techniques of forecasting
model of this method is quite accurate compared to other
and these techniques provide forecasting models of
methods and can be applied to all types of data
different accuracy. The accuracy of the prediction is based
movement. There were two forecasting techniques that
on the minimum error of the forecast. The appropriate
were used in this study; Autoregressive Integrated
prediction methods are considered from several factors
Moving Average (ARIMA) and Autoregressive Moving
such as prediction interval, prediction period, characteristic
Average (ARMA).[1]
of time series, and size of time series [4].
www.ijcat.com 371
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 371-374, 2019, ISSN:-2319–8656
years.
months).
4. PRE-PROCESSING:
The dataset contains some missing values in the
measurements (nearly 1,25% of the rows). All calendar
timestamps are present in the dataset but for some
timestamps, the measurement values are missing: a
missing value is represented by the absence of value
between two consecutive semi-colon attribute separators.
For instance, the dataset shows missing values on April
28, 2007.We cannot ignore the missing values in this
dataset therefore we cannot delete the missing values. I
copied the observation from the same time the day before
and implemented this in a function
named fill_missing() that will take the NumPy array of
the data and copy values from exactly 24 hours ago Then
we saved cleaned-up version of the dataset to a new file
household_power_consumption.csv‘[3].
The above plots confirmed our previous discoveries. By year,
it was steady. By quarter, the lowest average power
consumption was in the 3rd quarter. By month, the lowest
average power consumption was in July and August. By day,
the lowest average power consumption was around 8th of the
month
www.ijcat.com 372
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 371-374, 2019, ISSN:-2319–8656
www.ijcat.com 373
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 371-374, 2019, ISSN:-2319–8656
an-autoregression-forecast-model-for-household-
electricity-consumption/#
[5] https://machinelearningmastery.com/how-to-load-
and-explore-household-electricity-usage-data/
[6] https://machinelearningmastery.com/how-to-develop-
an-autoregression-forecast-model-for-household-
electricity-consumption/#:
6. CONCLUSION:
7. REFERENCES:
www.ijcat.com 374
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 375-378, 2019, ISSN:-2319–8656
----------------------------------------------------------------------------------------------------------------------------- --------------------------------
Abstract: Restaurant Rating has become the most commonly used parameter for judging a restaurant for any individual. A lot of research has
been done on different restaurants and the quality of food it serves. Rating of a restaurant depends on factors like reviews, area situated, average
cost for two people, votes, cuisines and the type of restaurant.
The main goal of this is to get insights on restaurants which people like visit and to identify the rating of the restaurant. With this article we
study different predictive models like Support Vector Machine (SVM),Random forest and Linear Regression, XGBoost, Decision Tree and have
achieved a score of 83% with ADA Boost.
Key Words: Pre-processing, EDA, SVM Regressor, Linear Regression, XGBoost Regressor, Boosting.
-------------------------------------------------------------------------------------------------------------------------------------------------------------
1. INTRODUCTION
[4] Rrubaa Panchendrarajan, Nazick Ahamed, Prakhash Sivakumar,
Zomato is the most reputed company in the field of food reviews.
Brunthavan Murugaiah, Surangika Ranathunga and Akila Pemasiri
Founded in 2008, this company started in India and now is in 24
wrote a paper on ‘Eatery, a multi-aspect restaurant rating system’
different countries. Its is so big that the people now use it as a verb.
that identifies rating values for different aspects of a restaurant by
“Did you know about this restaurant? Zomato it”. The rating is the
means of aspect-level sentiment analysis. This research introduced a
most important feature of any restaurant as it is the first parameter
new taxonomy to the restaurant domain that captures the hierarchical
that people look into while searching for a place to eat. It portrays the
relationships among entities and aspects.
quality, hygiene and the environment of the place. Higher ratings
lead to higher profit margins. Notations of the ratings usually are
[5] Neha Joshi wrote a paper in 2012 on A Study on Customer
stars or numbers scaling between 1 and 5.
Preference and Satisfaction towards Restaurant in Dehradun City
Zomato has changed the way people browse through restaurants. It which aims to contribute to the limited research in this area and
has helped customers find good places with respect to their dining provide
budget. insight into the consumer decision making process specifically for
the India foodservice industry. She did hypothesis testing using chi-
Different machine learning algorithms like SVM, Linear regression, square test.
Decision Tree, Random Forest can be used to predict the ratings of
the restaurants. [6] Bidisha Das Baksi, Harrsha P, Medha, Mohinishree Asthana, Dr.
Anitha C wrote a paper that studies various attributes of existing
2. RELATED WORK restaurants and analyses them to predict an appropriate location for
Various researches and students have published related work in higher success rate of the new restaurant. The study of existing
national and international research papers, thesis to understand the restaurants in a particular location and the growth rate of that
objective, types of algorithm they have used and various techniques location is important prior to selection of the optimal location. The
for pre-processing and feature selection. aim is to the create a web application that determines the location
suitable to establish a new restaurant unit, using machine learning
[1] Shina, Sharma S. and Singha A. have used Random forest and and data mining techniques.
decision tree to classifying restaurants into several classes based on
their service parameters. Their results say that the Decision Tree
Classifier is more effective with 63.5% of accuracy than Random
Forest whose accuracy is merely 56%.
3. DATA SET DESCRIPTION
This is a kaggle dataset.
(https://www.kaggle.com/himanshupoddar/zomato-bangalore-
[2] Chirath Kumarasiri’s and Cassim Faroo’s focuses on a Part-of- restaurants).
Speech (POS) Tagger based NLP technique for aspect identification
from reviews. Then a Naïve Bayes (NB) Classifier is used to classify It Represents information of Restaurants in the City of Bangalore.
identified aspects into meaningful categories.
It contains 17Columns and 51,000 Rows
[3] I. K. C. U. Perera and H.A. Caldera have used data mining
techniques like Opinion mining and Sentiment analysis to automate
the analysis and extraction of opinions in restaurant reviews.
375
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 375-378, 2019, ISSN:-2319–8656
Listed_in(city) This box plot helps us look into the outliers. We can also see that
online ordering service also affects the rating. Restaurants with
online ordering service have a rating from 3.5 to 4.
376
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 375-378, 2019, ISSN:-2319–8656
377
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 375-378, 2019, ISSN:-2319–8656
5. RESULTS
[5] Neha Joshi. A Study on Customer Preference and Satisfaction
towards Restaurant in Dehradun City.
Algorithms Accuracy Global Journal of Management and Business Research(2012)
Link:
Linear Regression 30% https://pdfs.semanticscholar.org/fef5/88622c39ef76dd773fcad8bb5d
233420a270.pdf
KNN 44%
[6] Bidisha Das Baksi, Harrsha P, Medha, Mohinishree Asthana, Dr.
Support Vector Machine 43% Anitha C.(2018) Restaurant Market Analysis.
International Research Journal of Engineering and Technology
Decision Tree 69% (IRJET)
Link: https://www.irjet.net/archives/V5/i5/IRJET-V5I5489.pdf
Random Forest 81%
XGBoost 72.26%
6. CONCLUSIONS
This paper studies a number of features about existing restaurants of
different areas in a city and analyses them to predict rating of the
restaurant. This makes it an important aspect to be considered, before
making a dining decision. Such analysis is essential part of planning
before establishing a venture like that of a restaurant.
Lot of researches have been made on factors which affect sales and
market in restaurant industry. Various dine-scape factors have been
analysed to improve customer satisfaction levels.
If the data for other citirs is also collected, such predictions could be
made for accurate.
7. REFERENCES
[1] Chirath Kumarasiri, Cassim Faroo,”User Centric Mobile Based
Decision-Making System Using Natural Language Processing (NLP)
and Aspect Based Opinion Mining (ABOM) Techniques for
Restaurant Selection”. Springer 2018. DOI: 10.1007/978-3-030-
01174-1_4
[2] Shina, Sharma, S. & Singha ,A. (2018). A study of tree based
machine learning Machine Learning Techniques for Restaurant
review. 2018 4th International Conference on Computing
Communication and Automation (ICCCA)
DOI:/10.1109/CCAA.2018.8777649
Sachin Bhoite
Assistant Professor
Department of Computer
Science
Faculty of Science
MIT-WPU
Pune, India
Abstract: As we know that after the 12th board results, the main problem of a student is to find an appropriate college for their
further education. It is a tough decision to make for many students as to which college they should apply to. We have built a
system that compares the student’s data with the past admission data and suggests colleges in a sequence of their preference. We
have used Decision Tree, Support Vector Classifier, Extra Tree Classifier, Naïve Bayes, KNN and Random Forest as our
statistical model to predict the probability of getting admission to a college. It was observed that the performance of Random
Forest was achieved highest among all.
Keywords: Decision Tree, Random Forest, KNN, Random Forest, Extra Tree Classifier, SVC, Probabilities
www.ijcat.com 379
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 379-384, 2019, ISSN:-2319–8656
This work intends to provide decision-makers in the predicts the probability of binary classification. The feature
enrolment management administration, a better understanding vector encoding of a student's file indicates whether the
of the factors that are highly correlated to the enrolment applicant was rejected or admitted. The system was used to
process. They have used real data of the applicants who were predict the probability of admissions committee accepting that
admitted to the University of New Mexico (UNM).In their applicant or not but, in our model, we are trying to make it
dataset, they have different features like gender, GPA, easy for the applicants to understand whether they should
parent’s income, student’s income. They had data issues like apply to that college or not.[7]
missing value and categorical variables. They have divided
classification into two parts – classification at the individual 3. DATA EXTRACTION AND
level and classification at a cohort level. For classification at TRANSFORMATION
the individual level, the model was used to check the We have achieved our goals step-by-step to make the data
probability of enrolment and whether the applicant is enrolled steady, fitting it into our models and finding out suitable
or not. Logistic Regression (LR) provided an accuracy of 89% algorithms of machine learning for our System.
and Support Vector Machine (SVM) provided an accuracy of
91% which was used in the classification at an individual This step contains mainly – Data Extraction, Data
level. The total enrolment in 2016 was actually 3402 but the Cleaning, Pre-processing, removing unwanted columns,
prediction was 3478 by using past year records (2015) using feature selection, label encoding. These steps are shown in
time series for classification at the cohort level. [3] Figure 1.
www.ijcat.com 380
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 379-384, 2019, ISSN:-2319–8656
● Missing Values – Missing Value are those values that 4. EXPLORATORY DATA
failed to load information or the data itself was
corrupted. There are different techniques to handle ANALYSIS
missing values. One of which we have applied is deleting As we saw in feature selection, some features which
rows because some of the rows were blank and they may seemed not so important were contributing to our model. So,
mislead the classification. to understand those features, we need to do exploratory
● Label Encoder – This is one of the most frequently used analysis on this data.
techniques for the categorical variable. Label encoder We did exploratory analysis on a few features by
converts labels into a numeric format so that the machine grouping and plotting it on graphs.
can recognize it. In our data, there are many attributes
which are categorical variable like gender, category, EDA on Gender Column:
branch. By grouping the gender and plotting the admissions in
● Change in data type – Some attributes didn’t include different colleges as per their gender, we identified some
proper input. For example, the Nationality attribute relations between the student’s admission and his or her
included values like Indian, India, IND which all meant gender. As shown in Figure 4 – For different gender, most
the same country. For that purpose, we needed to change students lied in different bins for different colleges. Even for
such values into a single format. ‘Object’ data type different colleges, we are getting different bell curves.
values in some attributes had to be changed into ‘float’ Looking at this we can confirm that the gender column is
data type. Some records included CGPA for S.S.C scores contributing to our model.
so we converted those records into a percentage. We For Extra Tree Classifier, Gender contributes to model –
made all these changes so that it doesn’t affect our 1.3092%.
accuracy.
● Drop Columns – As per domain knowledge, we EDA on Category Column:
removed some columns which were not needed in our By grouping the category and calculating the percentage
model. of students who got admissions with respect to their
categories is shown in Figure 3 – For different categories, we
2. Feature Selection calculated the percentage of students that lie in each category.
As we proceed further, before fitting our model we must This percentage of students was matching to reservation
make sure that all the features that we have selected contribute criteria as per Indian laws. This shows that the Category
to the model properly and weights assigned to it are good column is contributing to our model.
enough so that our model gives satisfactory accuracy. For For Extra Tree Classifier, Category contributes to model –
9.6582%.
that, we have used 4 feature selection techniques: Lasso,
Ridge, F1 Score, Extra Tree Classifier.
Lasso, Ridge and F1 Score were removing the features
that I needed the most and Extra Tree Classifier was giving
me an acceptable importance score. Which is shown below.
www.ijcat.com 381
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 379-384, 2019, ISSN:-2319–8656
5.2 Modeling
Following are models, we have applied to check which
model gives better accuracy:
● Decision Tree:
A decision tree is non-parametric supervised
learning. It is used for both classification and regression
problems. It is a flowchart-like structure in which each
internal node represents a “test” on an attribute, each
branch represents the outcome of the test, and each leaf
node represents a class label. The path between root and
leaf represents classification rules. It creates a
comprehensive analysis along with each branch and
identifies decision nodes that need further analysis.
● Random Forest:
Random Forest is a meta estimator that uses the
number of decision trees to fit the various subsamples
www.ijcat.com 382
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 379-384, 2019, ISSN:-2319–8656
drawn from the original dataset. we can also draw the Following are the models, we applied to check which model
data with replacement as per the requirements. gives better accuracy:
www.ijcat.com 383
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 379-384, 2019, ISSN:-2319–8656
[5] Bhavya Ghai. “Analysis & Prediction of American [7] Austin Waters, Risto Miikkulainen. “GRADE: Machine
Graduate Admissions Process”. Department of Computer Learning Support for Graduate Admissions”. University
Science, Stony Brook University, Stony Brook, New of Texas, Austin, Texas.
York. [8] https://www.datacamp.com/community/tutorials/k-
[6] Dineshkumar B Vaghela, Priyanka Sharma. “Students' nearest-neighbor-classification-scikit-learn
Admission Prediction using GRBST with Distributed
Data Mining”. Gujarat Technological University, [9] https://www.meetup.com/Big-Data-Analytics-and-
Chandkheda. Machine-Learning/events/257926117/
www.ijcat.com 384
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 385-388, 2019, ISSN:-2319–8656
Abstract: Wine classification is a difficult task since taste is the least understood of the human senses. A good wine quality
prediction can be very useful in the certification phase, since currently the sensory analysis is performed by human tasters, being
clearly a subjective approach. An automatic predictive system can be integrated into a decision support system, helping the
speed and quality of the performance. Furthermore, a feature selection process can help to analyze the impact of the analytical
tests. If it is concluded that several input variables are highly relevant to predict the wine quality, since in the production process
some variables can be controlled, this information can be used to improve the wine quality. Classification models used here are
1) Random Forest 2) Stochastic Gradient Descent 3) SVC 4)Logistic Regression .
www.ijcat.com 385
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 385-388, 2019, ISSN:-2319–8656
Dataset Description: The two datasets are related Other than that the selection is being done randomly with
to red wine of the Portuguese "Vinho Verde" wine. For uniform distribution.
more details, consult: [Web Link] or the reference [Cortez Various classification and regression algorithms are used
et al., 2009]. Due to privacy and logistic issues, only to fit the model. The algorithms used in this paper are as
physicochemical (inputs) and sensory (the output) follows:
variables are available (e.g. there is no data about grape
types, wine brand, wine selling price, etc.). For classification:
These datasets can be viewed as classification or Random Forest Decision Trees classifier
regression tasks. The classes are ordered and not balanced
(e.g. there are many more normal wines than excellent or Support Vector Machine classifier
poor ones). Outlier detection algorithms could be used to
Stochastic gradient descent
detect the few excellent or poor wines. Also, we are not
sure if all input variables are relevant. So it could be Logistic Regression classifier
interesting to test feature selection methods.
Preprocessing: Label Encoding is used to convert
1)fixed acidity the labels into numeric form so as to convert it into the
2) volatile acidity machine-readable form. It is an important pre-processing
3) citric acid step for the structured dataset in supervised learning. We
4) residual sugar have used label encoding to label the quality of data as
5) chlorides good or bad. Assigning 1 to good and 0 to bad.
6)free sulfur dioxide
7)total sulfur dioxide
8)density
9)pH Feature Selection:
10) sulphates
As we can clearly see, volatile acidity and residual sugar
11) alcohol
are both not very impact full of the quality of wine. Hence
Output variable (based on sensory data):
we can eliminate these features. Though we are selecting
12)quality (score between 0 and 10)
these features, they will change according to the domain
experts.
IV. DATA PROCESSING METHODS
www.ijcat.com 386
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 385-388, 2019, ISSN:-2319–8656
www.ijcat.com 387
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 385-388, 2019, ISSN:-2319–8656
CONCLUSION
Based on the bar plots plotted we come to an conclusion
that not all input features are essential and affect the data,
for example from the bar plot against quality and residual
sugar we see that as the quality increases residual sugar is
moderate and does not have change drastically. So this
feature is not so essential as compared to others like
alcohol and citric acid, so we can drop this feature while
feature selection.
1) Logistic Regression
4) Random Forest
References:
[1] Yunhui Zeng1 , Yingxia Liu1 , Lubin Wu1 , Hanjiang
Dong1. “Evaluation and Analysis Model of Wine Quality
Based on Mathematical Model ISSN 2330-2038 E-ISSN
2330-2046,Jinan University, Zhuhai,China.
www.ijcat.com 388
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 389-393, 2019, ISSN:-2319–8656
Abstract: - Data mining is a process of extracting and identifying previously unknown and potentially useful
information or pattern from large amount of data using different methods and techniques. Data mining in a
domain of education is known as Educational data mining (EDM). This paper discusses about an expertise
system which can used as student placement prediction system. A statistical model is applied on a reputed
college’s past data after data pre-processing and feature selection. This model can be used to predict the
percentage of chances of a student getting selected in campus placement. It will help students evaluating
themselves and identifying which skills are essential.
Keywords:- Data Mining, Educational Data Mining, Expertise system, Statistical model, Data pre-processing,
Feature selection.
Application of EDM is an evolving trend in the
1. INTRODUCTION worldwide [1]. This will help college faculty to
This model is about concerning those show the precise roadmap to students when it
students who wants to get a better placement for comes to placement and choose their career path.
better future. Sometimes it happens that student
gets sidetracked from studies in initial Semesters It will guide colleges and institutions to maintain
and later they realize the importance of their reputation by making most of the placement.
marks/CGPA. Basically, this will help them to It drives students to ask questions regarding what
enhance their performance and will make them can nurture them. It can give an overview to
believe that they can achieve their dream job.
www.ijcat.com 389
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 389-393, 2019, ISSN:-2319–8656
Junior college students while selecting their suitable for predicting student performance. MLP
stream. gives 87% prediction which is comparatively
higher than other algorithms [4].
1) What subjects to target?
K. Sripath Roy 1, K. Roopkanth, V. Uday Teja, V.
2) What skills to improvise on? Bhavana, J. Priyanka. The data is trained and
3) Probability of getting placed after tested with all three algorithms and out of all SVM
choosing their specialization? gave more accuracy with 90.3% and then the XG
Boost with 88.33% accuracy [5].
Data visualization will help College students to
get a clearer view regarding which stream they Ajay Kumar Pal, met with his goal and proved that
should choose. This can be done using different the top algorithm is Naïve Bayes Classification
libraries of Python like Matplotlib, where student with an accuracy of 86.15% with an error average
and faculties can visualize overview of each of 0.28 with others. He also conveyed that naïve
stream. Bayes has the potential to classify conventional
methods [6].
This paper describes the model to predict the
percentage of skills required by engineering Sudheep Elayidom, Dr. Suman Mary Idikkula and
students pursuing Bachelors and Master’s Degree Joseph Alexander, studying past data and and
with respect to company’s skillset necessity. following the trend, and based on that the
According to the rules generated, percentage of judgment for future will be given.
selection of students will vary. These rules are
generated with the help of Domain Expert. 2. DATASET DESCRIPTION
The data used in this model is supplied by a well-
Percentage of Selection = (Criteria’s known Engineering College situated in Pune,
Satisfied/Number of Criteria’s) * 100 Maharashtra. Data generated is collected from the
details given by graduates, post graduates,
1.1 Literature Review diploma holders in engineering of various streams
The researchers have studied several related
during the year 2019. It includes students 10th, 12th
national & international research papers, thesis to
or Diploma and semester-wise aggregation for
understand aims, technique used, various expert
Bachelors and Master’s. Dataset contains 2330
systems, datasets, data preprocessing approaches,
tuples and 81 attributes holding multiple
features selection methods, etc.
streamwise data of the students.
Siddu P. Algur, Prashant Bhat and Nitin Kulkarni
used two algorithms- Random Tree and J48 to 2.1 Data Pre-Processing
construct a classification models using Decision Data has redundant, incomplete, inconsistent and
Tree concept. The Random Tree classification inaccurate entries. We discovered that there were
model is more effective as compared to J48 many different attributes which seems to be
classification model [2]. superfluous and which won’t affect our results. By
consulting our Domain Expert, we decided to
Machine learning algorithms are applied in weka remove those attributes as well as tuples using
environment and R studio by K. Sreenivasa Rao, tools like excel.
N. Swapna and P. Praveen Kumar. Results is Entries with human errors seem to be illusory. So
tabulated and analyzed, It shows random tree as per discussion with Expert we decide to apply
algorithm gives 100% accuracy in prediction on mean to the data using Python.
their dataset and also in R environment Recursive
Partitioning & Regression Tree performs better 2.2 Feature Selection
and gives 90% accuracy. We also accept that Attributes impacting the placements of students
performance depends on nature of dataset [3]. were taken into consideration with the help of
Expert advice.
V.Ramesh, P. Parkavi and P. Yasodha also proved
that Multilayer Perception algorithm is most
www.ijcat.com 390
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 389-393, 2019, ISSN:-2319–8656
Factors like 10th, 12th or Diploma and Degree 2. Form rule based on stream with the help of
Aggregation are too affecting the placement expert
predictions for students as well as their non- 3. Enter the data of new students for the
educational attributes like work-experience, prediction of the placement.
projects and external certification were also taken 4. Calculate Number of criteria satisfied by
into consideration. that student.
4. EXPLORATORY ANAYLISIS
A pie chart is a circular statistical illustration,
which is divided into different parts to
demonstrate numerical proportion.
www.ijcat.com 391
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 389-393, 2019, ISSN:-2319–8656
Figure.3.3 Pie chart of Class wise placement for Figure.3.6 Pie chart of Class wise placement for
Civil Engineering Mechanical Engineering
The pie charts shown above give the information
about students placed in campus interview based
on their grade as per each stream. This gives an
overview to students about stream and importance
of aggregates according to the stream.
5. CONCLUSION
This paper examines application of Educational
Data Mining (EDM). This paper elaborates the
model to create awareness for students to create a
better careeristic pathway for their future. Students
with the help of their professors and placement
team can make use of this model to get better
Figure.3.4 Pie chart of Class wise placement for placement opportunities and enhance their
E&TC Engineering skillsets.
In Future this model can be compared with
existing Machine Learning Algorithms like Linear
Regression, Logistic Regression and Decision tree
which will help us to understand the accuracy of
www.ijcat.com 392
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 389-393, 2019, ISSN:-2319–8656
percentage of Machine Learning and Statistics. Trends in Engineering, Vol. 1, No. 1, May
We will come to know what accurate percentage 2009.
to rely on with the comparison of statistics and
Machine Learning.
6. REFERENCES
[1] Dr. Mohd Maqsood Ali,” ROLE OF DATA
MINING IN EDUCATION SECTOR”,
IJCSMC, Vol. 2, Issue. 4, April 2013, pg.374
– 383
[2] Siddu P. Algur, Prashant Bhat and Nitin
Kulkarni,“EDUCATIONAL DATA MININ-
G: CLASSIFICATION TECHNIQUES FOR
RECRUITMENT ANALYSIS” I.J. Modern
Education and Computer Science, 2016, 2,
59-65
[3] K. Sreenivasa Rao, N. Swapna, P. Praveen
Kumar “EDUCATIONAL DATA MINING
FOR STUDENT PLACEMENT
PREDICTION USING MACHINE
LEARNING ALGORITHMS”, International
Journal of Engineering & Technology, 7
(1.2) (2018) 43-46
www.ijcat.com 393
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 394-396, 2019, ISSN:-2319–8656
Abstract: Patients in India for liver disease are continuously increasing because of excessive consumption of alcohol, inhale of
harmful gases, intake of contaminated food, pickles and drugs. It is expected that by 2025 India may become the World Capital
for Liver Diseases. The widespread occurrence of liver infection in India is contributed due to deskbound lifestyle, increased
alcohol consumption and smoking. There are about 100 types of liver infections. Therefore, building a model that will help
doctors to predict whether a patient is likely to have liver diseases, at an early stage will be a great advantage. Diagnosis of liver
disease at a preliminary stage is important for better treatment. We also compare different algorithms for the better accuracy.
Keywords: Indian Liver Patients, Machine Learning, Logistic regression, Support Vector Machine, Random Forest, AdaBoost,
Bagging.
www.ijcat.com 394
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 394-396, 2019, ISSN:-2319–8656
www.ijcat.com 395
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 394-396, 2019, ISSN:-2319–8656
The main objective was to predict whether a patient should [2] Joel Jacob, Joseph Chakkalakal Mathew, Johns
be diagnosed or not at an early stage with algorithms such Mathew, Elizabeth Issac “Diagnosis of Liver Disease Using
as SVM, Logistic Regression and random Forest. These Machine Learning Techniques “by International Research
algorithms were also used in previous studies. Now we Journal of Engineering and Technology (IRJET) 1,2,3
have improves accuracy of these algorithms by using Dept. of Computer Science and Engineering, MACE,
Bagging and AdaBoost. Kerala, India 4Assistant Professor, Dept. of Computer
Science and Engineering, MACE, Kerala, India Volume:
05 Issue: 04 | Apr-2018.
As, we can see in the above figure increasing accuracies of [5] Bendi Venkata Ramana1 , Prof. M.Surendra Prasad
the algorithms. We got accuracy 73.5% for Logistic Babu2 1 Associate Professor, “Liver Classification Using
Regression, then by applying Adaboost classifier the Modified Rotation Forest “Dept.of IT, AITAM, Tekkali,
accuracy has been increased to 74.35%. A.P. India. 2 Dept. of CS&SE, Andhra University,
Visakhapatnam-530 003, A.P, India.
For Support Vector Machine we got 70.94%, and for
Random Forest Classification 66.67% here we have got a [6] https://towardsdatascience.com/understanding-
considerable increase in accuracy by using Bagging that is adaboost-2f94f22d5bfe
the accuracy of 72.64.
5. CONCLUSION:
We have applied the machine Learning algorithms on the Indian
Liver Patient dataset to predict the patients by the enzymes
content in their at an early stage. We have used different
machine learning classification algorithm as Logistic
Regression, SVC, Random Forest and further we have applied
bagging to Random Forest and AdaBoost to Logistic
Regression. Logistic Regression is fast in processing and gave
accuracy of 73.5%. Thus for increasing its accuracy we have
used AdaBoost and got accuracy of 74.36%.
www.ijcat.com 396