ML challenge

ML challenge

Author: Collaborate

Date: April 2025

Category: Google Colab; Machine Learning

Description:

Tested different models to predict student questionnaire answers. My group conducted a thorough data cleaning and preprocessing process to standardize and encode the dataset for modeling. Three models were evaluated—K-Nearest Neighbors (K-NN)—Naive Bayes—and Random Forest. K-NN performed moderately well with optimal validation accuracy around 70% at k = 40 but was sensitive to overfitting and scaling. Naive Bayes performed poorly across all sets due to unrealistic independence assumptions—yielding test accuracy of just 36.7%. Ultimately—the Random Forest classifier emerged as the best model—achieving a validation accuracy of 78.66% and test accuracy of 71.52%. It demonstrated strong generalization—minimal overfitting—and handled both categorical and numerical features effectively without the need for scaling—making it the final model of choice.