Data | July 2024 - Browse Articles

16 pages, 12209 KiB

Open AccessReview

Literature-Based Inventory of Chemical Substance Concentrations Measured in Organic Food Consumed in Europe

by Joanna Choueiri, Pascal Petit, Franck Balducci, Dominique J. Bicout and Christine Demeilliers

Data 2024, 9(7), 89; https://doi.org/10.3390/data9070089 - 3 Jul 2024

Viewed by 1486

Populations are exposed daily to numerous environmental pollutants, particularly through food. To address environmental issues, many agricultural production methods have been developed, including organic farming. To date, there is no exhaustive inventory of the contamination of organic foods as there is for conventional [...] Read more.

Populations are exposed daily to numerous environmental pollutants, particularly through food. To address environmental issues, many agricultural production methods have been developed, including organic farming. To date, there is no exhaustive inventory of the contamination of organic foods as there is for conventional foods. The main objective of this work was to construct a growing and updatable database on chemical substances and their levels in organic foods consumed in Europe. To this end, a literature search was conducted, resulting in a total of 1207 concentration values from 823 food–substances pairs involving 166 food matrices and 209 chemical substances, among which 95% were not authorized in organic farming and 80% were pesticides. The most encountered substance groups are “inorganic contaminants” and “organophosphate”, and the most studied food groups are “fruit used as fruit” and “Cereals and cereal primary derivatives”. Further studies are needed to continue updating the database with robust and comprehensive data on organic food contamination. This database could be used to study the health risks associated with these contaminants. Full article

► Show Figures

Figure 1

20 pages, 17344 KiB

Open AccessData Descriptor

Multi-Scale Earthquake Damaged Building Feature Set

by Guorui Gao, Futao Wang, Zhenqing Wang, Qing Zhao, Litao Wang, Jinfeng Zhu, Wenliang Liu, Gang Qin and Yanfang Hou

Data 2024, 9(7), 88; https://doi.org/10.3390/data9070088 - 28 Jun 2024

Viewed by 956

Abstract

Earthquake disasters are marked by their unpredictability and potential for extreme destructiveness. Accurate information on building damage, captured in post-earthquake remote sensing images, is critical for an effective post-disaster emergency response. The foundational features within these images are essential for the accurate extraction [...] Read more.

Earthquake disasters are marked by their unpredictability and potential for extreme destructiveness. Accurate information on building damage, captured in post-earthquake remote sensing images, is critical for an effective post-disaster emergency response. The foundational features within these images are essential for the accurate extraction of building damage data following seismic events. Presently, the availability of publicly accessible datasets tailored specifically to earthquake-damaged buildings is limited, and existing collections of post-earthquake building damage characteristics are insufficient. To address this gap and foster research advancement in this domain, this paper introduces a new, large-scale, publicly available dataset named the Major Earthquake Damage Building Feature Set (MEDBFS). This dataset comprises image data sourced from five significant global earthquakes and captured by various optical remote sensing satellites, featuring diverse scale characteristics and multiple spatial resolutions. It includes over 7000 images of buildings pre- and post-disaster, each subjected to stringent quality control and expert validation. The images are categorized into three primary groups: intact/slightly damaged, severely damaged, and completely collapsed. This paper develops a comprehensive feature set encompassing five dimensions: spectral, texture, edge detection, building index, and temporal sequencing, resulting in 16 distinct classes of feature images. This dataset is poised to significantly enhance the capabilities for data-driven identification and analysis of earthquake-induced building damage, thereby supporting the advancement of scientific and technological efforts for emergency earthquake response. Full article

(This article belongs to the Section Spatial Data Science and Digital Earth)

► Show Figures

Figure 1

23 pages, 9558 KiB

Open AccessData Descriptor

A Point Cloud Dataset of Vehicles Passing through a Toll Station for Use in Training Classification Algorithms

by Alexander Campo-Ramírez, Eduardo F. Caicedo-Bravo and Eval B. Bacca-Cortes

Data 2024, 9(7), 87; https://doi.org/10.3390/data9070087 - 27 Jun 2024

Viewed by 1077

Abstract

This work presents a point cloud dataset of vehicles passing through a toll station in Colombia to be used to train artificial vision and computational intelligence algorithms. This article details the process of creating the dataset, covering initial data acquisition, range information preprocessing, [...] Read more.

This work presents a point cloud dataset of vehicles passing through a toll station in Colombia to be used to train artificial vision and computational intelligence algorithms. This article details the process of creating the dataset, covering initial data acquisition, range information preprocessing, point cloud validation, and vehicle labeling. Additionally, a detailed description of the structure and content of the dataset is provided, along with some potential applications of its use. The dataset consists of 36,026 total objects divided into 6 classes: 31,432 cars, campers, vans and 2-axle trucks with a single tire on the rear axle, 452 minibuses with a single tire on the rear axle, 1158 buses, 1179 2-axle small trucks, 797 2-axle large trucks, and 1008 trucks with 3 or more axles. The point clouds were captured using a LiDAR sensor and Doppler effect speed sensors. The dataset can be used to train and evaluate algorithms for range data processing, vehicle classification, vehicle counting, and traffic flow analysis. The dataset can also be used to develop new applications for intelligent transportation systems. Full article

► Show Figures

Figure 1

25 pages, 686 KiB

Open AccessArticle

Tuning Data Mining Models to Predict Secondary School Academic Performance

by William Hoyos and Isaac Caicedo-Castro

Data 2024, 9(7), 86; https://doi.org/10.3390/data9070086 - 26 Jun 2024

Viewed by 1315

Abstract

In recent years, educational data mining has emerged as a growing discipline focused on developing models for predicting academic performance. The primary objective of this research was to tune classification models to predict academic performance in secondary school. The dataset employed for this [...] Read more.

In recent years, educational data mining has emerged as a growing discipline focused on developing models for predicting academic performance. The primary objective of this research was to tune classification models to predict academic performance in secondary school. The dataset employed for this study encompassed information from 19,545 high school students. We used descriptive statistics to characterise information contained in personal, school, and socioeconomic variables. We implemented two data mining techniques, namely artificial neural networks (ANN) and support vector machines (SVM). Parameter optimisation was conducted through five–fold cross–validation, and model performance was assessed using accuracy and

F_{1}

–Score. The results indicate a functional dependence between predictor variables and academic performance. The algorithms demonstrated an average performance exceeding 80% accuracy. Notably, ANN outperformed SVM in the dataset analysed. This type of methodology could help educational institutions to predict academic underachievement and thus generate strategies to improve students’ academic performance. Full article

(This article belongs to the Special Issue Data Mining and Computational Intelligence for E-Learning and Education—2nd Edition)

► Show Figures

Figure 1

14 pages, 1269 KiB

Open AccessData Descriptor

Evaluation of Online Inquiry Competencies of Chilean Elementary School Students: A Dataset

by Luz Chourio-Acevedo and Roberto González-Ibañez

Data 2024, 9(7), 85; https://doi.org/10.3390/data9070085 - 25 Jun 2024

Viewed by 1129

Abstract

In the age of abundant digital content, children and adolescents face the challenge of developing new information literacy competencies, particularly those pertaining to online inquiry, in order to thrive academically and personally. This article addresses the challenge encountered by Chilean students in developing [...] Read more.

In the age of abundant digital content, children and adolescents face the challenge of developing new information literacy competencies, particularly those pertaining to online inquiry, in order to thrive academically and personally. This article addresses the challenge encountered by Chilean students in developing online inquiry competencies (OICs) essential for completing school assignments, particularly in natural science education. A diagnostic study was conducted with 279 elementary school students (from fourth to eighth grade) from four educational institutions in Chile, representing diverse socioeconomic backgrounds. An instrument aligned with the national curriculum, featuring questions related to natural sciences, was administered through a game named NEURONE-Trivia, which integrates a search engine and a logging component to record students’ search behavior. The primary outcome of this study is a dataset comprising demographic information, self-perception, and information-seeking behaviors data collected during students’ online search sessions for natural science research tasks. This dataset serves as a valuable resource for researchers, educators, and practitioners interested in investigating the interplay between demographic characteristics, self-perception, and information-seeking behaviors among elementary students within the context of OIC development. Furthermore, it enables further examination of students’ search behaviors concerning source evaluation, information retrieval, and information utilization. Full article

(This article belongs to the Special Issue Data Mining and Computational Intelligence for E-Learning and Education—2nd Edition)

► Show Figures

Figure 1

8 pages, 1732 KiB

Open AccessData Descriptor

Gender Distribution of Scientific Prizes Is Associated with Naming of Awards after Men, Women or Neutral

by Katja Gehmlich and Stefan Krause

Data 2024, 9(7), 84; https://doi.org/10.3390/data9070084 - 25 Jun 2024

Viewed by 1079

Abstract

Woman scientists have for long been under-represented as recipients of academic prizes. The reasons for this lack of recognition are manifold, including potential gender bias amongst award panels and nomination practices. This dataset of the gender distribution of 8747 recipients of 345 scientific [...] Read more.

Woman scientists have for long been under-represented as recipients of academic prizes. The reasons for this lack of recognition are manifold, including potential gender bias amongst award panels and nomination practices. This dataset of the gender distribution of 8747 recipients of 345 scientific medals and prizes awarded by 11 General Scientific Societies as well as subject-specific societies in the Earth and Environmental Sciences and in Cardiology between 1731 and 2021 explores the magnitude, temporal trends and potential drivers of observed gender imbalances. Our analysis revealed women were particularly underrepresented in awards named after men with awards not named after a person or named after a woman being more frequently awarded to woman scientists. Time-series analysis confirmed persisting trends that are only starting to change since the early 2000s, indicating that a lot remains to be accomplished to achieve true equity. We encourage the scientific community to extend our data and analysis, as they represent important evidence of the recognition of academic achievements towards other under-represented groups and including also nomination information. Full article

► Show Figures

Figure 1

20 pages, 995 KiB

Open AccessArticle

Leveraging Sports Analytics and Association Rule Mining to Uncover Recovery and Economic Impacts in NBA Basketball

by Vangelis Sarlis, George Papageorgiou and Christos Tjortjis

Data 2024, 9(7), 83; https://doi.org/10.3390/data9070083 - 24 Jun 2024

Cited by 1 | Viewed by 1701

Abstract

This study examines the multifaceted field of injuries and their impacts on performance in the National Basketball Association (NBA), leveraging a blend of Data Science, Data Mining, and Sports Analytics. Our research is driven by three pivotal questions: Firstly, we explore how Association [...] Read more.

This study examines the multifaceted field of injuries and their impacts on performance in the National Basketball Association (NBA), leveraging a blend of Data Science, Data Mining, and Sports Analytics. Our research is driven by three pivotal questions: Firstly, we explore how Association Rule Mining can elucidate the complex interplay between players’ salaries, physical attributes, and health conditions and their influence on team performance, including team losses and recovery times. Secondly, we investigate the relationship between players’ recovery times and their teams’ financial performance, probing interdependencies with players’ salaries and career trajectories. Lastly, we examine how insights gleaned from Data Mining and Sports Analytics on player recovery times and financial influence can inform strategic financial management and salary negotiations in basketball. Harnessing extensive datasets detailing player demographics, injuries, and contracts, we employ advanced analytic techniques to categorize injuries and transform contract data into a format conducive to deep analytical scrutiny. Our anomaly detection methodologies, an ensemble combination of DBSCAN, isolation forest, and Z-score algorithms, spotlight patterns and outliers in recovery times, unveiling the intricate dance between player health, performance, and financial outcomes. This nuanced understanding emphasizes the economic stakes of sports injuries. The findings of this study provide a rich, data-driven foundation for teams and stakeholders, advocating for more effective injury management and strategic planning. By addressing these research questions, our work not only contributes to the academic discourse in Sports Analytics but also offers practical frameworks for enhancing player welfare and team financial health, thereby shaping the future of strategic decisions in professional sports. Full article

(This article belongs to the Special Issue Machine Learning and Data Mining in Exercise, Sports and Health Research)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Data, Volume 9, Issue 7 (July 2024) – 7 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI