Performance Prediction of Compulsory Subjects and Recommendation of Subjects Options for China’s New College Entrance Examination
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Universiti Malaysia Sarawak
Abstract
Description
This study addresses a critical gap in Educational Data Mining by concurrently predicting performance in China’s New College Entrance Examination (NCEE) compulsory subjects and recommending personalized combinations of six optional subjects. Drawing on Bronfenbrenner’s ecological framework, we collected data from 1,127 students and 88 teachers at an urban high school across four dimensions: individual, family, school, and social. Continuous predictors were normalized, and categorical variables were transformed into numerical values. The dataset was split 80/20 for training and testing. Four machine learning algorithms: Naïve Bayes (NB), Decision Tree (DT), Artificial Neural Networks (ANNs), and Support Vector Machines (SVMs) were evaluated using accuracy, precision, recall, F1-score, and Matthews Correlation Coefficient (MCC). Pearson correlations quantified inter subject dependencies. Feature importance analyses revealed that motivation level dominated Chinese performance prediction, followed by teaching method, gender, past Chinese performance, and teacher’s self-efficacy. Mathematics predictors centered on test anxiety, parents’ education levels, socioeconomic status (SES) and peer relationships, while English hinged on annual family income, parental involvement, and past English performance. NB outperformed all competitors, attaining accuracies of 95.1% for Chinese, 96.4% for Mathematics, and 90.7% for English. Correlation coefficients indicated a weak Chinese-Mathematics association (r = 0.124–0.267), a moderate Chinese-English link (r = 0.308–0.416), and a moderate Mathematics-English relationship (r = 0.365–0.402). From DT outputs, we distilled rules mapping student profiles to optional subject trios. For example, high self-efficacy and strong peer relationships paired with quality Chemistry instruction yielded a “Physics–Chemistry–Biology” recommendation, whereas robust SES and moderate Biology performance suggested “History–Politics–Geography.” The above DT rules enable students to optimize their subject options. Limitations include single school sampling and potential regional biases. Future work should replicate across diverse contexts, explore ensemble methods to enhance both accuracy and interpretability, and implement longitudinal follow up.
