Understanding Global Happiness: Feature Analysis and Prediction
December 2024
This project aims to identify key drivers of global happiness using the World Happiness 2023 Report data, highlighting how socioeconomic factors such as GDP per capita, social support, and healthy life expectancy contribute to national well-being. My analysis revealed that social support and GDP are pivotal in predicting happiness scores. Employing a Random Forest model, I achieved an explanatory power of approximately 86%, significantly improving upon a baseline linear regression model with an accuracy of 84%.
Utilizing Python, NumPy, Pandas, Skit-learn, Plotly
Introduction
The World Happiness Report serves as a tool for policymakers and researchers to understand factors that influence happiness across nations. This project analyzes the report to uncover significant predictors of happiness and develop a robust model to predict these scores, facilitating better policy decisions that enhance global well-being.
Data Preparation
The dataset comprises metrics such as Ladder Score (happiness), Logged GDP per capita, Social Support, and Healthy Life Expectancy from over 150 countries in 2023. Initial steps included cleaning missing data, checking for outliers, and engineering relevant features to better capture the impact on happiness. In addition to this dataset, I merged with it another dataset including regions and subregions related to each country.
Exploratory Data Analysis
Through statistical analysis and visualizations like histograms and scatter plots, I identified strong correlations between happiness and features like GDP per capita and social support.
Correlation Matrix
Correlations with ladder score
Social support: 0.8489505
Logged GDP per Captia: 0.7933066
Healthy Life Expectancy: 0.771522
Freedom to Make Life Choices: 0.6598513
Generosity: 0.04101167
Perception of Corruption: -0.4977463
Feature Distribution
Modeling
I began with a linear regression model to establish a baseline, using variables strongly correlated with happiness. To improve accuracy and account for non-linear relationships, we transitioned to a Random Forest model. Hyperparameter tuning was performed using GridSearchCV, which optimized parameters like tree depth and number of estimators.
Model Evaluation and Results
The Random Forest model demonstrated superior performance, with an R² of 0.857 and an RMSE of 0.378, suggesting it can predict happiness scores effectively. Feature importance analysis showed that social support was the most influential predictor, followed by GDP and life expectancy.
Discussion and Limitations
The predictive model developed showcases strong explanatory power with a high R² score, which validates its effectiveness in capturing the key determinants of happiness as reported globally. However, the model’s reliability is somewhat limited by the inherent variability in self-reported data, which can introduce biases and affect the accuracy of predictions. Additionally, the current scope of features, while informative, doesn't encapsulate all potential influencers of national happiness.
Variables such as GDP per capita and social support are significant, yet happiness is a complex phenomenon influenced by a multitude of socio-economic factors that extend beyond the current dataset. For example, employment rates, the type of governmental system, and population dynamics are also likely to play crucial roles in shaping overall well-being but were not included in this analysis.
Conclusion
Our findings emphasize the importance of social infrastructure and economic factors in enhancing happiness. The predictive model not only helps in understanding these relationships but also offers a framework for simulating the potential impact of policy changes on national happiness scores.