Analytics

21 pages, 389 KiB

Open AccessArticle

SIMEX-Based and Analytical Bias Corrections in Stocking–Lord Linking

by Alexander Robitzsch

Analytics 2024, 3(3), 368-388; https://doi.org/10.3390/analytics3030020 - 6 Aug 2024

Cited by 1

Stocking–Lord (SL) linking is a popular linking method for group comparisons based on dichotomous item responses. This article proposes a bias correction technique based on the simulation extrapolation (SIMEX) method for SL linking in the 2PL model in the presence of uniform differential [...] Read more.

Stocking–Lord (SL) linking is a popular linking method for group comparisons based on dichotomous item responses. This article proposes a bias correction technique based on the simulation extrapolation (SIMEX) method for SL linking in the 2PL model in the presence of uniform differential item functioning (DIF). The SIMEX-based method is compared to the analytical bias correction methods of SL linking. It turned out in a simulation study that SIMEX-based SL linking performed best, is easy to implement, and can be adapted to other linking methods straightforwardly. Full article

► Show Figures

Figure 1

24 pages, 7013 KiB

Open AccessArticle

Comparative Analysis of Nature-Inspired Metaheuristic Techniques for Optimizing Phishing Website Detection

by Thomas Nagunwa

Analytics 2024, 3(3), 344-367; https://doi.org/10.3390/analytics3030019 - 6 Aug 2024

Abstract

The increasing number, frequency, and sophistication of phishing website-based attacks necessitate the development of robust solutions for detecting phishing websites to enhance the overall security of cyberspace. Drawing inspiration from natural processes, nature-inspired metaheuristic techniques have been proven to be efficient in solving [...] Read more.

The increasing number, frequency, and sophistication of phishing website-based attacks necessitate the development of robust solutions for detecting phishing websites to enhance the overall security of cyberspace. Drawing inspiration from natural processes, nature-inspired metaheuristic techniques have been proven to be efficient in solving complex optimization problems in diverse domains. Following these successes, this research paper aims to investigate the effectiveness of metaheuristic techniques, particularly Genetic Algorithms (GAs), Differential Evolution (DE), and Particle Swarm Optimization (PSO), in optimizing the hyperparameters of machine learning (ML) algorithms for detecting phishing websites. Using multiple datasets, six ensemble classifiers were trained on each dataset and their hyperparameters were optimized using each metaheuristic technique. As a baseline for assessing performance improvement, the classifiers were also trained with the default hyperparameters. To validate the genuine impact of the techniques over the use of default hyperparameters, we conducted statistical tests on the accuracy scores of all the optimized classifiers. The results show that the GA is the most effective technique, by improving the accuracy scores of all the classifiers, followed by DE, which improved four of the six classifiers. PSO was the least effective, improving only one classifier. It was also found that GA-optimized Gradient Boosting, LGBM and XGBoost were the best classifiers across all the metrics in predicting phishing websites, achieving peak accuracy scores of 98.98%, 99.24%, and 99.47%, respectively. Full article

► Show Figures

Figure 1

26 pages, 808 KiB

Open AccessArticle

A Longitudinal Tree-Based Framework for Lapse Management in Life Insurance

by Mathias Valla

Analytics 2024, 3(3), 318-343; https://doi.org/10.3390/analytics3030018 - 5 Aug 2024

Cited by 1

Abstract

Developing an informed lapse management strategy (LMS) is critical for life insurers to improve profitability and gain insight into the risk of their global portfolio. Prior research in actuarial science has shown that targeting policyholders by maximising their individual customer lifetime value is [...] Read more.

Developing an informed lapse management strategy (LMS) is critical for life insurers to improve profitability and gain insight into the risk of their global portfolio. Prior research in actuarial science has shown that targeting policyholders by maximising their individual customer lifetime value is more advantageous than targeting all those likely to lapse. However, most existing lapse analyses do not leverage the variability of features and targets over time. We propose a longitudinal LMS framework, utilising tree-based models for longitudinal data, such as left-truncated and right-censored (LTRC) trees and forests, as well as mixed-effect tree-based models. Our methodology provides time-informed insights, leading to increased precision in targeting. Our findings indicate that the use of longitudinally structured data significantly enhances the precision of models in predicting lapse behaviour, estimating customer lifetime value, and evaluating individual retention gains. The implementation of mixed-effect random forests enables the production of time-varying predictions that are highly relevant for decision-making. This paper contributes to the field of lapse analysis for life insurers by demonstrating the importance of exploiting the complete past trajectory of policyholders, which is often available in insurers’ information systems but has yet to be fully utilised. Full article

(This article belongs to the Special Issue Business Analytics and Applications)

► Show Figures

Figure 1

21 pages, 762 KiB

Open AccessArticle

Enhancing Talent Recruitment in Business Intelligence Systems: A Comparative Analysis of Machine Learning Models

by Hikmat Al-Quhfa, Ali Mothana, Abdussalam Aljbri and Jie Song

Analytics 2024, 3(3), 297-317; https://doi.org/10.3390/analytics3030017 - 15 Jul 2024

Abstract

In the competitive field of business intelligence, optimizing talent recruitment through data-driven methodologies is crucial for better decision-making. This study compares the effectiveness of various machine learning models to improve recruitment accuracy and efficiency. Using the recruitment data from a major Yemeni organization [...] Read more.

In the competitive field of business intelligence, optimizing talent recruitment through data-driven methodologies is crucial for better decision-making. This study compares the effectiveness of various machine learning models to improve recruitment accuracy and efficiency. Using the recruitment data from a major Yemeni organization (2019–2022), we evaluated models including K-Nearest Neighbors, Logistic Regression, Support Vector Machine, Naive Bayes, Decision Trees, Random Forest, Gradient Boosting Classifier, AdaBoost Classifier, and Neural Networks. Hyperparameter tuning and cross-validation were used for optimization. The Random Forest model achieved the highest accuracy (92.8%), followed by Neural Networks (92.6%) and Gradient Boosting Classifier (92.5%). These results suggest that advanced machine learning models, particularly Random Forest and Neural Networks, can significantly enhance the recruitment processes in business intelligence systems. This study provides valuable insights for recruiters, advocating for the integration of sophisticated machine learning techniques in talent acquisition strategies. Full article

► Show Figures

Figure 1

21 pages, 15154 KiB

Open AccessCommunication

Modeling Sea Level Rise Using Ensemble Techniques: Impacts on Coastal Adaptation, Freshwater Ecosystems, Agriculture and Infrastructure

by Sambandh Bhusan Dhal, Rishabh Singh, Tushar Pandey, Sheelabhadra Dey, Stavros Kalafatis and Vivekvardhan Kesireddy

Analytics 2024, 3(3), 276-296; https://doi.org/10.3390/analytics3030016 - 5 Jul 2024

Abstract

Sea level rise (SLR) is a crucial indicator of climate change, primarily driven by greenhouse gas emissions and the subsequent increase in global temperatures. The impact of SLR, however, varies regionally due to factors such as ocean bathymetry, resulting in distinct shifts across [...] Read more.

Sea level rise (SLR) is a crucial indicator of climate change, primarily driven by greenhouse gas emissions and the subsequent increase in global temperatures. The impact of SLR, however, varies regionally due to factors such as ocean bathymetry, resulting in distinct shifts across different areas compared to the global average. Understanding the complex factors influencing SLR across diverse spatial scales, along with the associated uncertainties, is essential. This study focuses on the East Coast of the United States and Gulf of Mexico, utilizing historical SLR data from 1993 to 2023. To forecast SLR trends from 2024 to 2103, a weighted ensemble model comprising SARIMAX, LSTM, and exponential smoothing models was employed. Additionally, using historical greenhouse gas data, an ensemble of LSTM models was used to predict real-time SLR values, achieving a testing loss of 0.005. Furthermore, conductance and dissolved oxygen (DO) values were assessed for the entire forecasting period, leveraging forecasted SLR trends to evaluate the impacts on marine life, agriculture, and infrastructure. Full article

► Show Figures

Figure 1

21 pages, 737 KiB

Open AccessArticle

TaskFinder: A Semantics-Based Methodology for Visualization Task Recommendation

by Darius Coelho, Bhavya Ghai, Arjun Krishna, Maria Velez-Rojas, Steve Greenspan, Serge Mankovski and Klaus Mueller

Analytics 2024, 3(3), 255-275; https://doi.org/10.3390/analytics3030015 - 4 Jul 2024

Abstract

Data visualization has entered the mainstream, and numerous visualization recommender systems have been proposed to assist visualization novices, as well as busy professionals, in selecting the most appropriate type of chart for their data. Given a dataset and a set of user-defined analytical [...] Read more.

Data visualization has entered the mainstream, and numerous visualization recommender systems have been proposed to assist visualization novices, as well as busy professionals, in selecting the most appropriate type of chart for their data. Given a dataset and a set of user-defined analytical tasks, these systems can make recommendations based on expert coded visualization design principles or empirical models. However, the need to identify the pertinent analytical tasks beforehand still exists and often requires domain expertise. In this work, we aim to automate this step with TaskFinder, a prototype system that leverages the information available in textual documents to understand domain-specific relations between attributes and tasks. TaskFinder employs word vectors as well as a custom dependency parser along with an expert-defined list of task keywords to extract and rank associations between tasks and attributes. It pairs these associations with a statistical analysis of the dataset to filter out tasks irrelevant given the data. TaskFinder ultimately produces a ranked list of attribute–task pairs. We show that the number of domain articles needed to converge to a recommendation consensus is bounded for our approach. We demonstrate our TaskFinder over multiple domains with varying article types and quantities. Full article

► Show Figures

Figure 1

14 pages, 4116 KiB

Open AccessArticle

Customer Sentiments in Product Reviews: A Comparative Study with GooglePaLM

by Olamilekan Shobayo, Swethika Sasikumar, Sandhya Makkar and Obinna Okoyeigbo

Analytics 2024, 3(2), 241-254; https://doi.org/10.3390/analytics3020014 - 18 Jun 2024

Cited by 1

Abstract

In this work, we evaluated the efficacy of Google’s Pathways Language Model (GooglePaLM) in analyzing sentiments expressed in product reviews. Although conventional Natural Language Processing (NLP) techniques such as the rule-based Valence Aware Dictionary for Sentiment Reasoning (VADER) and the long sequence Bidirectional [...] Read more.

In this work, we evaluated the efficacy of Google’s Pathways Language Model (GooglePaLM) in analyzing sentiments expressed in product reviews. Although conventional Natural Language Processing (NLP) techniques such as the rule-based Valence Aware Dictionary for Sentiment Reasoning (VADER) and the long sequence Bidirectional Encoder Representations from Transformers (BERT) model are effective, they frequently encounter difficulties when dealing with intricate linguistic features like sarcasm and contextual nuances commonly found in customer feedback. We performed a sentiment analysis on Amazon’s fashion review datasets using the VADER, BERT, and GooglePaLM models, respectively, and compared the results based on evaluation metrics such as precision, recall, accuracy correct positive prediction, and correct negative prediction. We used the default values of the VADER and BERT models and slightly finetuned GooglePaLM with a Temperature of 0.0 and an N-value of 1. We observed that GooglePaLM performed better with correct positive and negative prediction values of 0.91 and 0.93, respectively, followed by BERT and VADER. We concluded that large language models surpass traditional rule-based systems for natural language processing tasks. Full article

► Show Figures

Figure 1

16 pages, 501 KiB

Open AccessFeature PaperArticle

Improving the Giant-Armadillo Optimization Method

by Glykeria Kyrou, Vasileios Charilogis and Ioannis G. Tsoulos

Analytics 2024, 3(2), 225-240; https://doi.org/10.3390/analytics3020013 - 10 Jun 2024

Abstract

Global optimization is widely adopted presently in a variety of practical and scientific problems. In this context, a group of widely used techniques are evolutionary techniques. A relatively new evolutionary technique in this direction is that of Giant-Armadillo Optimization, which is based on [...] Read more.

Global optimization is widely adopted presently in a variety of practical and scientific problems. In this context, a group of widely used techniques are evolutionary techniques. A relatively new evolutionary technique in this direction is that of Giant-Armadillo Optimization, which is based on the hunting strategy of giant armadillos. In this paper, modifications to this technique are proposed, such as the periodic application of a local minimization method as well as the use of modern termination techniques based on statistical observations. The proposed modifications have been tested on a wide series of test functions available from the relevant literature and compared against other evolutionary methods. Full article

► Show Figures

Figure 1

4 pages, 864 KiB

Open AccessEditorial

Beyond the ROC Curve: The IMCP Curve

by Jesus S. Aguilar-Ruiz

Analytics 2024, 3(2), 221-224; https://doi.org/10.3390/analytics3020012 - 27 May 2024

Cited by 1

Abstract

The ROC curve [...] Full article

► Show Figures

Figure 1

27 pages, 1190 KiB

Open AccessArticle

Interconnected Markets: Unveiling Volatility Spillovers in Commodities and Energy Markets through BEKK-GARCH Modelling

by Tetiana Paientko and Stanley Amakude

Analytics 2024, 3(2), 194-220; https://doi.org/10.3390/analytics3020011 - 16 Apr 2024

Abstract

Food commodities and energy bills have experienced rapid undulating movements and hikes globally in recent times. This spurred this study to examine the possibility that the shocks that arise from fluctuations of one market spill over to the other and to determine how [...] Read more.

Food commodities and energy bills have experienced rapid undulating movements and hikes globally in recent times. This spurred this study to examine the possibility that the shocks that arise from fluctuations of one market spill over to the other and to determine how time-varying the spillovers were across a time. Data were daily frequency (prices of grains and energy products) from 1 July 2019 to 31 December 2022, as quoted in markets. The choice of the period was to capture the COVID pandemic and the Russian–Ukrainian war as events that could impact volatility. The returns were duly calculated using spreadsheets and subjected to ADF stationarity, co-integration, and the full BEKK-GARCH estimation. The results revealed a prolonged association between returns in the energy markets and food commodity market returns. Both markets were found to have volatility persistence individually, and time-varying bidirectional transmission of volatility across the markets was found. No lagged-effects spillover was found from one market to the other. The findings confirm that shocks that emanate from fluctuations in energy markets are impactful on the volatility of prices in food commodity markets and vice versa, but this impact occurs immediately after the shocks arise or on the same day such variation occurs. Full article

(This article belongs to the Special Issue Business Analytics and Applications)

► Show Figures

Figure 1

16 pages, 984 KiB

Open AccessArticle

Learner Engagement and Demographic Influences in Brazilian Massive Open Online Courses: Aprenda Mais Platform Case Study

by Júlia Marques Carvalho da Silva, Gabriela Hahn Pedroso, Augusto Basso Veber and Úrsula Gomes Rosa Maruyama

Analytics 2024, 3(2), 178-193; https://doi.org/10.3390/analytics3020010 - 3 Apr 2024

Abstract

This paper explores the dynamics of student engagement and demographic influences in Massive Open Online Courses (MOOCs). The study analyzes multiple facets of Brazilian MOOC participation, including re-enrollment patterns, course completion rates, and the impact of demographic characteristics on learning outcomes. Using survey [...] Read more.

This paper explores the dynamics of student engagement and demographic influences in Massive Open Online Courses (MOOCs). The study analyzes multiple facets of Brazilian MOOC participation, including re-enrollment patterns, course completion rates, and the impact of demographic characteristics on learning outcomes. Using survey data and statistical analyses from the public Aprenda Mais Platform, this study reveals that MOOC learners exhibit a strong tendency toward continuous learning, with a majority re-enrolling in subsequent courses within a short timeframe. The average completion rate across courses is around 42.14%, with learners maintaining consistent academic performance. Demographic factors, notably, race/color and disability, are found to influence enrollment and completion rates, underscoring the importance of inclusive educational practices. Geographical location impacts students’ decision to enroll in and complete courses, highlighting the necessity for region-specific educational strategies. The research concludes that a diverse array of factors, including content interest, personal motivation, and demographic attributes, shape student engagement in MOOCs. These insights are vital for educators and course designers in creating effective, inclusive, and engaging online learning experiences. Full article

(This article belongs to the Special Issue New Insights in Learning Analytics)

► Show Figures

Figure 1

13 pages, 281 KiB

Open AccessArticle

Optimal Matching with Matching Priority

by Massimo Cannas and Emiliano Sironi

Analytics 2024, 3(1), 165-177; https://doi.org/10.3390/analytics3010009 - 19 Mar 2024

Abstract

Matching algorithms are commonly used to build comparable subsets (matchings) in observational studies. When a complete matching is not possible, some units must necessarily be excluded from the final matching. This may bias the final estimates comparing the two populations, and thus it [...] Read more.

Matching algorithms are commonly used to build comparable subsets (matchings) in observational studies. When a complete matching is not possible, some units must necessarily be excluded from the final matching. This may bias the final estimates comparing the two populations, and thus it is important to reduce the number of drops to avoid unsatisfactory results. Greedy matching algorithms may not reach the maximum matching size, thus dropping more units than necessary. Optimal matching algorithms do ensure a maximum matching size, but they implicitly assume that all units have the same matching priority. In this paper, we propose a matching strategy which is order optimal in the sense that it finds a maximum matching size which is consistent with a given matching priority. The strategy is based on an order-optimal matching algorithm originally proposed in connection with assignment problems by D. Gale. When a matching priority is given, the algorithm ensures that the discarded units have the lowest possible matching priority. We discuss the algorithm’s complexity and its relation with classic optimal matching. We illustrate its use with a problem in a case study concerning a comparison of female and male executives and a simulation. Full article

► Show Figures

Figure 1

25 pages, 1197 KiB

Open AccessFeature PaperReview

Artificial Intelligence and Sustainability—A Review

by Rachit Dhiman, Sofia Miteff, Yuancheng Wang, Shih-Chi Ma, Ramila Amirikas and Benjamin Fabian

Analytics 2024, 3(1), 140-164; https://doi.org/10.3390/analytics3010008 - 1 Mar 2024

Cited by 1

Abstract

In recent decades, artificial intelligence has undergone transformative advancements, reshaping diverse sectors such as healthcare, transport, agriculture, energy, and the media. Despite the enthusiasm surrounding AI’s potential, concerns persist about its potential negative impacts, including substantial energy consumption and ethical challenges. This paper [...] Read more.

In recent decades, artificial intelligence has undergone transformative advancements, reshaping diverse sectors such as healthcare, transport, agriculture, energy, and the media. Despite the enthusiasm surrounding AI’s potential, concerns persist about its potential negative impacts, including substantial energy consumption and ethical challenges. This paper critically reviews the evolving landscape of AI sustainability, addressing economic, social, and environmental dimensions. The literature is systematically categorized into “Sustainability of AI” and “AI for Sustainability”, revealing a balanced perspective between the two. The study also identifies a notable trend towards holistic approaches, with a surge in publications and empirical studies since 2019, signaling the field’s maturity. Future research directions emphasize delving into the relatively under-explored economic dimension, aligning with the United Nations’ Sustainable Development Goals (SDGs), and addressing stakeholders’ influence. Full article

(This article belongs to the Special Issue Business Analytics and Applications)

► Show Figures

Figure 1

24 pages, 8409 KiB

Open AccessArticle

Visual Analytics for Robust Investigations of Placental Aquaporin Gene Expression in Response to Maternal SARS-CoV-2 Infection

by Raphael D. Isokpehi, Amos O. Abioye, Rickeisha S. Hamilton, Jasmin C. Fryer, Antoinesha L. Hollman, Antoinette M. Destefano, Kehinde B. Ezekiel, Tyrese L. Taylor, Shawna F. Brooks, Matilda O. Johnson, Olubukola Smile, Shirma Ramroop-Butts, Angela U. Makolo and Albert G. Hayward II

Analytics 2024, 3(1), 116-139; https://doi.org/10.3390/analytics3010007 - 5 Feb 2024

Abstract

The human placenta is a multifunctional, disc-shaped temporary fetal organ that develops in the uterus during pregnancy, connecting the mother and the fetus. The availability of large-scale datasets on the gene expression of placental cell types and scholarly articles documenting adverse pregnancy outcomes [...] Read more.

The human placenta is a multifunctional, disc-shaped temporary fetal organ that develops in the uterus during pregnancy, connecting the mother and the fetus. The availability of large-scale datasets on the gene expression of placental cell types and scholarly articles documenting adverse pregnancy outcomes from maternal infection warrants the use of computational resources to aid in knowledge generation from disparate data sources. Using maternal Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infection as a case study in microbial infection, we constructed integrated datasets and implemented visual analytics resources to facilitate robust investigations of placental gene expression data in the dimensions of flow, curation, and analytics. The visual analytics resources and associated datasets can support a greater understanding of SARS-CoV-2-induced changes to the human placental expression levels of 18,882 protein-coding genes and at least 1233 human gene groups/families. We focus this report on the human aquaporin gene family that encodes small integral membrane proteins initially studied for their roles in water transport across cell membranes. Aquaporin-9 (AQP9) was the only aquaporin downregulated in term placental villi from SARS-CoV-2-positive mothers. Previous studies have found that (1) oxygen signaling modulates placental development; (2) oxygen tension could modulate AQP9 expression in the human placenta; and (3) SARS-CoV-2 can disrupt the formation of oxygen-carrying red blood cells in the placenta. Thus, future research could be performed on microbial infection-induced changes to (1) the placental hematopoietic stem and progenitor cells; and (2) placental expression of human aquaporin genes, especially AQP9. Full article

(This article belongs to the Special Issue Visual Analytics: Techniques and Applications)

► Show Figures

Figure 1

32 pages, 7519 KiB

Open AccessArticle

Interoperable Information Flow as Enabler for Efficient Predictive Maintenance

by Marco Franke, Quan Deng, Zisis Kyroudis, Maria Psarodimou, Jovana Milenkovic, Ioannis Meintanis, Dimitris Lokas, Stefano Borgia and Klaus-Dieter Thoben

Analytics 2024, 3(1), 84-115; https://doi.org/10.3390/analytics3010006 - 1 Feb 2024

Abstract

Industry 4.0 enables the modernisation of machines and opens up the digitalisation of processes in the manufacturing industry. As a result, these machines are ready for predictive maintenance as part of Industry 4.0 services. The benefit of predictive maintenance is that it can [...] Read more.

Industry 4.0 enables the modernisation of machines and opens up the digitalisation of processes in the manufacturing industry. As a result, these machines are ready for predictive maintenance as part of Industry 4.0 services. The benefit of predictive maintenance is that it can significantly extend the life of machines. The integration of predictive maintenance into existing production environments faces challenges in terms of data understanding and data preparation for machines and legacy systems. Current AI frameworks lack adequate support for the ongoing task of data integration. In this context, adequate support means that the data analyst does not need to know the technical background of the pilot’s data sources in terms of data formats and schemas. It should be possible to perform data analyses without knowing the characteristics of the pilot’s specific data sources. The aim is to achieve a seamless integration of data as information for predictive maintenance. For this purpose, the developed data-sharing infrastructure enables automatic data acquisition and data integration for AI frameworks using interoperability methods. The evaluation, based on two pilot projects, shows that the step of data understanding and data preparation for predictive maintenance is simplified and that the solution is applicable for new pilot projects. Full article

► Show Figures

Figure 1

21 pages, 2471 KiB

Open AccessArticle

Analysing the Influence of Macroeconomic Factors on Credit Risk in the UK Banking Sector

by Hemlata Sharma, Aparna Andhalkar, Oluwaseun Ajao and Bayode Ogunleye

Analytics 2024, 3(1), 63-83; https://doi.org/10.3390/analytics3010005 - 26 Jan 2024

Cited by 3

Abstract

Macroeconomic factors have a critical impact on banking credit risk, which cannot be directly controlled by banks, and therefore, there is a need for an early credit risk warning system based on the macroeconomy. By comparing different predictive models (traditional statistical and machine [...] Read more.

Macroeconomic factors have a critical impact on banking credit risk, which cannot be directly controlled by banks, and therefore, there is a need for an early credit risk warning system based on the macroeconomy. By comparing different predictive models (traditional statistical and machine learning algorithms), this study aims to examine the macroeconomic determinants’ impact on the UK banking credit risk and assess the most accurate credit risk estimate using predictive analytics. This study found that the variance-based multi-split decision tree algorithm is the most precise predictive model with interpretable, reliable, and robust results. Our model performance achieved 95% accuracy and evidenced that unemployment and inflation rate are significant credit risk predictors in the UK banking context. Our findings provided valuable insights such as a positive association between credit risk and inflation, the unemployment rate, and national savings, as well as a negative relationship between credit risk and national debt, total trade deficit, and national income. In addition, we empirically showed the relationship between national savings and non-performing loans, thus proving the “paradox of thrift”. These findings benefit the credit risk management team in monitoring the macroeconomic factors’ thresholds and implementing critical reforms to mitigate credit risk. Full article

► Show Figures

Figure 1

17 pages, 1756 KiB

Open AccessArticle

Code Plagiarism Checking Function and Its Application for Code Writing Problem in Java Programming Learning Assistant System

by Ei Ei Htet, Khaing Hsu Wai, Soe Thandar Aung, Nobuo Funabiki, Xiqin Lu, Htoo Htoo Sandi Kyaw and Wen-Chung Kao

Analytics 2024, 3(1), 46-62; https://doi.org/10.3390/analytics3010004 - 17 Jan 2024

Abstract

A web-based Java programming learning assistant system (JPLAS) has been developed for novice students to study Java programming by themselves while enhancing code reading and code writing skills. One type of the implemented exercise problem is code writing problem (CWP), which asks [...] Read more.

A web-based Java programming learning assistant system (JPLAS) has been developed for novice students to study Java programming by themselves while enhancing code reading and code writing skills. One type of the implemented exercise problem is code writing problem (CWP), which asks students to create a source code that can pass the given test code. The correctness of this answer code is validated by running them on JUnit. In previous works, a Python-based answer code validation program was implemented to assist teachers. It automatically verifies the source codes from all the students for one test code, and reports the number of passed test cases by each code in the CSV file. While this program plays a crucial role in checking the correctness of code behaviors, it cannot detect code plagiarism that can often happen in programming courses. In this paper, we implement a code plagiarism checking function in the answer code validation program, and present its application results to a Java programming course at Okayama University, Japan. This function first removes the whitespace characters and the comments using the regular expressions. Next, it calculates the Levenshtein distance and similarity score for each pair of source codes from different students in the class. If the score is larger than a given threshold, they are regarded as plagiarism. Finally, it outputs the scores as a CSV file with the student IDs. For evaluations, we applied the proposed function to a total of 877 source codes for 45 CWP assignments submitted from 9 to 39 students and analyzed the results. It was found that (1) CWP assignments asking for shorter source codes generate higher scores than those for longer codes due to the use of test codes, (2) proper thresholds are different by assignments, and (3) some students often copied source codes from certain students. Full article

(This article belongs to the Special Issue New Insights in Learning Analytics)

► Show Figures

Figure 1

16 pages, 1349 KiB

Open AccessArticle

An Optimal House Price Prediction Algorithm: XGBoost

by Hemlata Sharma, Hitesh Harsora and Bayode Ogunleye

Analytics 2024, 3(1), 30-45; https://doi.org/10.3390/analytics3010003 - 2 Jan 2024

Cited by 2

Abstract

An accurate prediction of house prices is a fundamental requirement for various sectors, including real estate and mortgage lending. It is widely recognized that a property’s value is not solely determined by its physical attributes but is significantly influenced by its surrounding neighborhood. [...] Read more.

An accurate prediction of house prices is a fundamental requirement for various sectors, including real estate and mortgage lending. It is widely recognized that a property’s value is not solely determined by its physical attributes but is significantly influenced by its surrounding neighborhood. Meeting the diverse housing needs of individuals while balancing budget constraints is a primary concern for real estate developers. To this end, we addressed the house price prediction problem as a regression task and thus employed various machine learning (ML) techniques capable of expressing the significance of independent variables. We made use of the housing dataset of Ames City in Iowa, USA to compare XGBoost, support vector regressor, random forest regressor, multilayer perceptron, and multiple linear regression algorithms for house price prediction. Afterwards, we identified the key factors that influence housing costs. Our results show that XGBoost is the best performing model for house price prediction. Our findings present valuable insights and tools for stakeholders, facilitating more accurate property price estimates and, in turn, enabling more informed decision making to meet the housing needs of diverse populations while considering budget constraints. Full article

► Show Figures

Figure 1

16 pages, 3053 KiB

Open AccessArticle

Exploring Infant Physical Activity Using a Population-Based Network Analysis Approach

by Rama Krishna Thelagathoti, Priyanka Chaudhary, Brian Knarr, Michaela Schenkelberg, Hesham H. Ali and Danae Dinkel

Analytics 2024, 3(1), 14-29; https://doi.org/10.3390/analytics3010002 - 31 Dec 2023

Cited by 2

Abstract

Background: Physical activity (PA) is an important aspect of infant development and has been shown to have long-term effects on health and well-being. Accurate analysis of infant PA is crucial for understanding their physical development, monitoring health and wellness, as well as identifying [...] Read more.

Background: Physical activity (PA) is an important aspect of infant development and has been shown to have long-term effects on health and well-being. Accurate analysis of infant PA is crucial for understanding their physical development, monitoring health and wellness, as well as identifying areas for improvement. However, individual analysis of infant PA can be challenging and often leads to biased results due to an infant’s inability to self-report and constantly changing posture and movement. This manuscript explores a population-based network analysis approach to study infants’ PA. The network analysis approach allows us to draw conclusions that are generalizable to the entire population and to identify trends and patterns in PA levels. Methods: This study aims to analyze the PA of infants aged 6–15 months using accelerometer data. A total of 20 infants from different types of childcare settings were recruited, including home-based and center-based care. Each infant wore an accelerometer for four days (2 weekdays, 2 weekend days). Data were analyzed using a network analysis approach, exploring the relationship between PA and various demographic and social factors. Results: The results showed that infants in center-based care have significantly higher levels of PA than those in home-based care. Moreover, the ankle acceleration was much higher than the waist acceleration, and activity patterns differed on weekdays and weekends. Conclusions: This study highlights the need for further research to explore the factors contributing to disparities in PA levels among infants in different childcare settings. Additionally, there is a need to develop effective strategies to promote PA among infants, considering the findings from the network analysis approach. Such efforts can contribute to enhancing infant health and well-being through targeted interventions aimed at increasing PA levels. Full article

(This article belongs to the Special Issue Feature Papers in Analytics)

► Show Figures

Figure 1

Journal Description

Analytics

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Further Information

Guidelines

MDPI Initiatives

Follow MDPI