Personal Project: Predicting Hourly Wage

IntroductionΒΆ

This notebook builds and compares four machine learning models to investigate factors influencing hourly wages using the Beauty dataset in Introductory Econometrics: A Modern Approach (J.M. Wooldridge) The analysis is divided into two parts:

  1. Implement Ridge and Lasso Regression.
  2. Implement Random Forest and Neural Network, then provide a comparison of four models, and interpret relevant findings.

1. Ridge and Lasso Regression for Hourly Wage PredictionΒΆ

1.1. Data PreparationΒΆ

Data LoadingΒΆ

InΒ [20]:
# Basic Setup and Imports
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import RidgeCV, LassoCV
from sklearn.metrics import root_mean_squared_error, r2_score
from IPython.display import display, HTML # Display options
InΒ [21]:
# Load the dataset
beauty = pd.read_csv('BEAUTY.csv')
beauty.describe()
Out[21]:
wage lwage belavg abvavg exper looks union goodhlth black female married south bigcity smllcity service expersq educ
count 1260.000000 1260.000000 1260.000000 1260.000000 1260.000000 1260.000000 1260.000000 1260.000000 1260.000000 1260.000000 1260.000000 1260.000000 1260.000000 1260.000000 1260.000000 1260.000000 1260.000000
mean 6.306690 1.658800 0.123016 0.303968 18.206349 3.185714 0.272222 0.933333 0.073810 0.346032 0.691270 0.174603 0.219048 0.466667 0.273810 474.482540 12.563492
std 4.660639 0.594508 0.328586 0.460152 11.963485 0.684877 0.445280 0.249543 0.261564 0.475892 0.462153 0.379778 0.413765 0.499086 0.446089 534.645425 2.624489
min 1.020000 0.019803 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 5.000000
25% 3.707500 1.310357 0.000000 0.000000 8.000000 3.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 64.000000 12.000000
50% 5.300000 1.667705 0.000000 0.000000 15.000000 3.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 225.000000 12.000000
75% 7.695000 2.040570 0.000000 1.000000 27.000000 4.000000 1.000000 1.000000 0.000000 1.000000 1.000000 0.000000 0.000000 1.000000 1.000000 729.000000 13.000000
max 77.720000 4.353113 1.000000 1.000000 48.000000 5.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 2304.000000 17.000000

The dataset includes 1260 observations with wage information and 15 potential predictors. No missing values were found. Initial summary statistics reveal the target variable wage is right-skewed. Therefore, I also explore the distribution of wage and lwage below.

InΒ [22]:
# Distribution Plot
plt.figure(figsize=(7, 3))

# 1. Distribution of 'wage'
plt.subplot(1, 2, 1) # 1 row, 2 columns, first plot
sns.histplot(beauty['wage'], kde=True, bins=30, color='skyblue') # Histogram with a density curve
plt.title('Distribution of Hourly Wage (wage)',size =10)
plt.xlabel('Hourly Wage',size =8)
plt.ylabel('Frequency',size =8)

# 2. Distribution of 'lwage'
plt.subplot(1, 2, 2) # 1 row, 2 columns, second plot
sns.histplot(beauty['lwage'], kde=True, bins=30, color='skyblue')
plt.title('Distribution of Log Hourly Wage (lwage)',size =10)
plt.xlabel('Log Hourly Wage',size =8)
plt.ylabel('Frequency',size =8)

plt.tight_layout() # Adjust layout to prevent overlap
plt.show()
No description has been provided for this image

The distribution plots confirm wage is right-skewed, while lwage is closer to a normal distribution. Using the natural logarithm of wage lwage as the dependent variable for model training is expected to yield better results. Final model evaluation will be interpreted in both log and absolute wage.

Data Splitting and ScalingΒΆ

The data is split 70% training / 30% test. Since Ridge and Lasso coefficients are scale-sensitive, predictors are standardized using StandardScaler fitted only on the training data.

Feature selection: I exclude looks variable as its information is captured by the belavg and abvavg dummies, thus avoiding multicollinearity.

InΒ [23]:
# Split data into 70% training and 30% test set
train_set, test_set = train_test_split(beauty, test_size=0.3, random_state=42)

# Define Features (Predictors) and Target
X_cols = ['belavg', 'abvavg', 'exper', 'union',
       'goodhlth', 'black', 'female', 'married', 'south', 
       'bigcity', 'smllcity', 'service', 'expersq', 'educ'] # 14 predictors, excluding `looks`
y_col = ['lwage']

X_train_orig = train_set[X_cols]
y_train_orig  = train_set[y_col].values.ravel() # lwage
X_test_orig  = test_set[X_cols]
y_test_orig = test_set[y_col].values.ravel() # lwage
y_test_wage = test_set['wage'].values.ravel() # Actual wage
InΒ [24]:
# Standardising the data using StandardScaler
scaler = StandardScaler()

# Fit on training data and transform both sets
X_train_scaled  = scaler.fit_transform(X_train_orig)
X_test_scaled = scaler.transform(X_test_orig)

1.2. Model Implementation and EvaluationΒΆ

Next, I implement Ridge and Lasso using RidgeCV and LassoCV respectively, both with 10-fold cross-validation to select the optimal regularization parameter (alpha).

Cross Validated Ridge RegressionΒΆ

Ridge uses L2 regularization, shrinking coefficients towards zero but typically not setting them exactly to zero.

InΒ [25]:
# Cross Validated Ridge Regression
ridge_cv = RidgeCV(cv=10, scoring='neg_root_mean_squared_error')
ridge_cv.fit(X_train_scaled, y_train_orig)

# Predict and evaluate on lwage scale
y_test_pred = ridge_cv.predict(X_test_scaled)
RMSE_lwage_ridge = root_mean_squared_error(y_test_orig, y_test_pred)
r2_lwage_ridge = ridge_cv.score(X_test_scaled, y_test_orig) # R-squared on lwage

# Evaluate on wage scale
y_test_wage_pred = np.exp(y_test_pred) # Transform lwage to wage predictions
RMSE_wage_ridge = root_mean_squared_error(y_test_wage, y_test_wage_pred)

print(f"\n--- Ridge Results ---")
print(f"Ridge Best Alpha: {ridge_cv.alpha_}")
print(f"Ridge RMSE (lwage scale): {RMSE_lwage_ridge}")
print(f"Ridge R-squared (lwage scale): {r2_lwage_ridge}")
print(f"Ridge RMSE (wage scale): {RMSE_wage_ridge}")

# Extract Ridge coefficients
ridge_coeffs = pd.Series(ridge_cv.coef_, index=X_cols)
--- Ridge Results ---
Ridge Best Alpha: 1.0
Ridge RMSE (lwage scale): 0.4409644837921741
Ridge R-squared (lwage scale): 0.43367922539448
Ridge RMSE (wage scale): 3.5740573219647516

Cross Validated LASSOΒΆ

Lasso uses L1 regularization, which can shrink coefficients exactly to zero, performing automatic feature selection.

InΒ [26]:
# Cross Validated Lasso Regression
lasso_cv = LassoCV(cv=10, random_state=42) # Added random_state for reproducible cross-validation fold generation
lasso_cv.fit(X_train_scaled, y_train_orig)

# Predict and evaluate on lwage scale
y_test_pred = lasso_cv.predict(X_test_scaled)
RMSE_lwage_lasso = root_mean_squared_error(y_test_orig, y_test_pred)
r2_lwage_lasso = lasso_cv.score(X_test_scaled, y_test_orig) # R-squared on lwage

# Evaluate on wage scale
y_test_wage_pred = np.exp(y_test_pred) # Transform lwage to wage predictions
RMSE_wage_lasso = root_mean_squared_error(y_test_wage, y_test_wage_pred)

print(f"\n--- Lasso Results ---")
print(f"Lasso Best Alpha: {lasso_cv.alpha_}")
print(f"Lasso RMSE (lwage scale): {RMSE_lwage_lasso}")
print(f"Lasso R-squared (lwage scale): {r2_lwage_lasso}")
print(f"Lasso RMSE (wage scale): {RMSE_wage_lasso}")

# Extract Lasso Feature Selection
lasso_coeffs = pd.Series(lasso_cv.coef_, index=X_cols)
non_zero_coeffs = np.sum(lasso_coeffs != 0)
print(f"\nNumber of features used by Lasso: {non_zero_coeffs} out of {len(X_cols)}")
--- Lasso Results ---
Lasso Best Alpha: 0.0005055586885587609
Lasso RMSE (lwage scale): 0.44101492477569104
Lasso R-squared (lwage scale): 0.4335496575417561
Lasso RMSE (wage scale): 3.5744144564418434

Number of features used by Lasso: 14 out of 14

1.3. Comparison and SummaryΒΆ

InΒ [27]:
# Compare Ridge and Lasso performance
results_data = {
    'RMSE (lwage scale)': [RMSE_lwage_ridge, RMSE_lwage_lasso],
    'RMSE (wage scale)': [RMSE_wage_ridge, RMSE_wage_lasso],
    'R-squared (lwage scale)': [r2_lwage_ridge, r2_lwage_lasso],
    'Best Alpha': [ridge_cv.alpha_, lasso_cv.alpha_],
    'Features Used': [len(X_cols), non_zero_coeffs] # Show total for Ridge, selected for Lasso
}
results_df = pd.DataFrame(results_data, index=['Ridge', 'Lasso'])
print ("The performance metrics and coefficients for Ridge and Lasso are compared below:\n")
display(results_df.style.format())

# Compare coefficients
print ("\n--- Coefficients Comparison (on lwage scale) ---")
coefficients_df = pd.DataFrame({
    'Ridge Coefficient': ridge_coeffs,
    'Lasso Coefficient': lasso_coeffs
})
# Format for readability
coefficients_df_formatted = coefficients_df.style.format('{:.4f}')
display(coefficients_df_formatted)
The performance metrics and coefficients for Ridge and Lasso are compared below:

Β  RMSE (lwage scale) RMSE (wage scale) R-squared (lwage scale) Best Alpha Features Used
Ridge 0.440964 3.574057 0.433679 1.000000 14
Lasso 0.441015 3.574414 0.433550 0.000506 14
--- Coefficients Comparison (on lwage scale) ---
Β  Ridge Coefficient Lasso Coefficient
belavg -0.0386 -0.0383
abvavg 0.0110 0.0105
exper 0.4722 0.4708
union 0.0737 0.0733
goodhlth 0.0199 0.0195
black -0.0159 -0.0154
female -0.1732 -0.1733
married 0.0189 0.0184
south 0.0316 0.0312
bigcity 0.0998 0.0990
smllcity 0.0399 0.0391
service -0.0537 -0.0532
expersq -0.3230 -0.3215
educ 0.1823 0.1823

Summary

In the above tables, both models demonstrate very similar predictive performance. Based on RMSE and RΒ², Ridge performs marginally better, but the difference is negligible.

I manually excluded redundant looks variable before modelling, and Lasso did not identify further redundancy (keeping all 14 features). Lasso performed coefficient shrinkage (like Ridge) but did not exclude any variables. Therefore, coefficients of both models are very similar for all predictors.

For Part 1, both Ridge and Lasso provide similarly effective for predicting hourly wage.

2. Random Forest and Neural Network for Hourly Wage PredictionΒΆ

Next, two more complex models, Random Forest and Neural Network, are implemented.

The same 70/30 train/test split established earlier is used for modeling the lwage variable.

2.1. Random Forest RegressionΒΆ

Random Forest builds multiple decision trees and averages their predictions. It is effective at capturing non-linear relationships and feature interactions while being robust against overfitting compared to single trees.

InΒ [28]:
# Import necessary packages
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline

# Use the original train_set, test_set in part 1
# X_train_orig, y_train_orig
# X_test_orig, y_test_orig
# y_test_wage

Implement Random ForestΒΆ

I use a Pipeline with StandardScaler and RandomForestRegressor. This ensures that scaling is correctly handled within each fold of the cross-validation process. GridSearchCV with 5-fold cross-validation is used to find the optimal hyperparameters.

InΒ [29]:
# Create the Pipeline
rfr_pipeline = Pipeline([
    ('scaler', StandardScaler()), # Step 1: Scale the data
    ('rfr', RandomForestRegressor(random_state=42)) # Step 2: Random Forest 
])

# Define Hyperparameter Grid
param_grid = { 
    'rfr__n_estimators': [100, 200, 300],
    'rfr__max_features': ['log2', 'sqrt'],
    'rfr__max_depth': [15, 25, None],
    'rfr__criterion' : ['squared_error'], 
    'rfr__min_samples_leaf' : [1,3,5],
    'rfr__ccp_alpha' : [0, 0.01]
}

# Run Grid Search: 5-fold cross-validation and minimize RMSE
grid_search_rf = GridSearchCV(estimator=rfr_pipeline, param_grid=param_grid, cv=5,
                        scoring='neg_root_mean_squared_error', return_train_score=True, verbose=1)

# Fit model
grid_search_rf.fit(X_train_orig, y_train_orig)
best_rf_pipe = grid_search_rf.best_estimator_

# Optiomal parameters
print("\nBest RF Pipeline Parameters found:")
print(grid_search_rf.best_params_)
Fitting 5 folds for each of 108 candidates, totalling 540 fits

Best RF Pipeline Parameters found:
{'rfr__ccp_alpha': 0, 'rfr__criterion': 'squared_error', 'rfr__max_depth': 25, 'rfr__max_features': 'log2', 'rfr__min_samples_leaf': 3, 'rfr__n_estimators': 300}

Evaluate Random ForestΒΆ

Feature importance is analyzed using the feature_importances_ attribute (based on Gini importance).

InΒ [30]:
# Analyze Feature Importance
best_rfr_model = best_rf_pipe.named_steps['rfr']
importances_rf = best_rfr_model.feature_importances_

# Create a DataFrame for visualization
feature_importance_rf_df = pd.DataFrame({'Feature': X_cols, 'Importance': importances_rf
                                        }).sort_values(by='Importance', ascending=False).reset_index(drop=True)

# Plot feature importances
plt.figure(figsize=(6, 4))
ax = sns.barplot(x='Importance', y='Feature', data=feature_importance_rf_df, color='skyblue')
for i, (importance, feature) in enumerate(zip(feature_importance_rf_df['Importance'], feature_importance_rf_df['Feature'])):
    ax.text(importance + 0.005, i, f"{importance:.3f}", va='center', fontsize=9)
plt.title('Feature Importances from Random Forest',size =12)
plt.xlabel('Importance Score', size=10)
plt.ylabel('Feature', size=10)
plt.tight_layout()
plt.show()
No description has been provided for this image
InΒ [31]:
# Predict and evaluate on lwage scale
y_test_pred = best_rf_pipe.predict(X_test_orig)
RMSE_lwage_rf = root_mean_squared_error(y_test_orig, y_test_pred)
r2_lwage_rf = best_rf_pipe.score(X_test_orig, y_test_orig) # R-squared on lwage

# Evaluate on the original wage scale
y_test_wage_pred = np.exp(y_test_pred) # Transform lwage to wage predictions
RMSE_wage_rf = root_mean_squared_error(y_test_wage, y_test_wage_pred)

print(f"\n--- Random Forests Results ---")
print(f"Random Forest RMSE (log wage scale): {RMSE_lwage_rf}")
print(f"Random Forest  R-squared (lwage scale): {r2_lwage_rf}")
print(f"Random Forest RMSE (wage scale): {RMSE_wage_rf}")
--- Random Forests Results ---
Random Forest RMSE (log wage scale): 0.4496394322341057
Random Forest  R-squared (lwage scale): 0.41117796879429824
Random Forest RMSE (wage scale): 3.592793131443144

Random Forest Summary: The model explains about 41% of the variance in log wages. Female, exper, expersq, and educ were identified as the most influential predictors, collectively accounting for ~74% of the total importance score in this model.

2.2. Neural Network (NN) RegressionΒΆ

The second additional model explored is Neural Network. A single validation set approach is used. The main training data is further split (75% NN-train / 25% NN-validation). Features are scaled using StandardScaler and fit only on the NN-training data.

InΒ [32]:
import tensorflow as tf
from tensorflow import keras

# Use the original train_set, test_set in part 1
# X_train_orig, y_train_orig
# X_test_orig, y_test_orig
# y_test_wage

# Using 25% of the X_train_orig for validation
X_train, X_valid, y_train, y_valid = train_test_split(X_train_orig, y_train_orig, test_size=0.25, random_state=42)

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_valid_scaled = scaler.transform(X_valid)
X_test_scaled = scaler.transform(X_test_orig)

Build the Neural Network ModelΒΆ

After manually trying some combinations of hyperparameters, a sequential Keras model is defined with:

  • An input layer (14 features).
  • One hidden layer with 36 neurons using ReLU activation function.
  • An output layer with a single neuron for regression.
InΒ [33]:
# Build the Neural Network Model
model_nn = keras.models.Sequential([
    keras.layers.InputLayer(shape=[X_train.shape[1]]),
    keras.layers.Dense(36, activation="relu"),   # first hidden layer
    keras.layers.Dense(1)                        # output layer
 ])
model_nn.summary() # Print model architecture
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
β”‚ dense_2 (Dense)                 β”‚ (None, 36)             β”‚           540 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dense_3 (Dense)                 β”‚ (None, 1)              β”‚            37 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 Total params: 577 (2.25 KB)
 Trainable params: 577 (2.25 KB)
 Non-trainable params: 0 (0.00 B)

Compile and Train ModelΒΆ

The model is compiled using MSE loss and Adam optimizer (lr=0.001), and trained with early stopping

InΒ [34]:
# Compile the model
model_nn.compile(loss="mean_squared_error",
                 optimizer=keras.optimizers.Adam(learning_rate=0.001), # Adam optimizer
                 metrics=[tf.keras.metrics.RootMeanSquaredError()])

# Train the model with early stopping
early_stopping_cb = keras.callbacks.EarlyStopping(patience=15, restore_best_weights=True, monitor='val_loss', verbose=1) 
history = model_nn.fit(X_train_scaled , y_train , epochs=100
                       , validation_data=(X_valid_scaled , y_valid)
                       , callbacks=[early_stopping_cb]
                      )
Epoch 1/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 2s 14ms/step - loss: 2.5352 - root_mean_squared_error: 1.5915 - val_loss: 1.5780 - val_root_mean_squared_error: 1.2562
Epoch 2/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 1.4161 - root_mean_squared_error: 1.1888 - val_loss: 0.9421 - val_root_mean_squared_error: 0.9706
Epoch 3/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 0.7811 - root_mean_squared_error: 0.8832 - val_loss: 0.6653 - val_root_mean_squared_error: 0.8157
Epoch 4/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 0.4774 - root_mean_squared_error: 0.6907 - val_loss: 0.5587 - val_root_mean_squared_error: 0.7474
Epoch 5/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 0.3808 - root_mean_squared_error: 0.6166 - val_loss: 0.5027 - val_root_mean_squared_error: 0.7090
Epoch 6/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.3403 - root_mean_squared_error: 0.5829 - val_loss: 0.4662 - val_root_mean_squared_error: 0.6828
Epoch 7/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2906 - root_mean_squared_error: 0.5388 - val_loss: 0.4421 - val_root_mean_squared_error: 0.6649
Epoch 8/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2902 - root_mean_squared_error: 0.5382 - val_loss: 0.4242 - val_root_mean_squared_error: 0.6513
Epoch 9/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 0.2740 - root_mean_squared_error: 0.5230 - val_loss: 0.4097 - val_root_mean_squared_error: 0.6401
Epoch 10/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 0.2426 - root_mean_squared_error: 0.4923 - val_loss: 0.3979 - val_root_mean_squared_error: 0.6308
Epoch 11/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2267 - root_mean_squared_error: 0.4759 - val_loss: 0.3912 - val_root_mean_squared_error: 0.6254
Epoch 12/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2163 - root_mean_squared_error: 0.4648 - val_loss: 0.3847 - val_root_mean_squared_error: 0.6203
Epoch 13/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2127 - root_mean_squared_error: 0.4611 - val_loss: 0.3767 - val_root_mean_squared_error: 0.6138
Epoch 14/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2161 - root_mean_squared_error: 0.4647 - val_loss: 0.3723 - val_root_mean_squared_error: 0.6101
Epoch 15/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2021 - root_mean_squared_error: 0.4492 - val_loss: 0.3658 - val_root_mean_squared_error: 0.6049
Epoch 16/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 0.2106 - root_mean_squared_error: 0.4584 - val_loss: 0.3636 - val_root_mean_squared_error: 0.6030
Epoch 17/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1953 - root_mean_squared_error: 0.4419 - val_loss: 0.3589 - val_root_mean_squared_error: 0.5991
Epoch 18/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.2030 - root_mean_squared_error: 0.4502 - val_loss: 0.3545 - val_root_mean_squared_error: 0.5954
Epoch 19/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 0.2015 - root_mean_squared_error: 0.4486 - val_loss: 0.3532 - val_root_mean_squared_error: 0.5943
Epoch 20/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 0.2061 - root_mean_squared_error: 0.4530 - val_loss: 0.3500 - val_root_mean_squared_error: 0.5916
Epoch 21/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1862 - root_mean_squared_error: 0.4313 - val_loss: 0.3488 - val_root_mean_squared_error: 0.5906
Epoch 22/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.1570 - root_mean_squared_error: 0.3952 - val_loss: 0.3457 - val_root_mean_squared_error: 0.5879
Epoch 23/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.1871 - root_mean_squared_error: 0.4320 - val_loss: 0.3428 - val_root_mean_squared_error: 0.5855
Epoch 24/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.1770 - root_mean_squared_error: 0.4203 - val_loss: 0.3439 - val_root_mean_squared_error: 0.5864
Epoch 25/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1822 - root_mean_squared_error: 0.4264 - val_loss: 0.3415 - val_root_mean_squared_error: 0.5844
Epoch 26/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1823 - root_mean_squared_error: 0.4269 - val_loss: 0.3393 - val_root_mean_squared_error: 0.5825
Epoch 27/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1716 - root_mean_squared_error: 0.4141 - val_loss: 0.3363 - val_root_mean_squared_error: 0.5799
Epoch 28/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.1670 - root_mean_squared_error: 0.4085 - val_loss: 0.3376 - val_root_mean_squared_error: 0.5810
Epoch 29/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.1730 - root_mean_squared_error: 0.4157 - val_loss: 0.3334 - val_root_mean_squared_error: 0.5774
Epoch 30/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1707 - root_mean_squared_error: 0.4131 - val_loss: 0.3358 - val_root_mean_squared_error: 0.5795
Epoch 31/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1685 - root_mean_squared_error: 0.4102 - val_loss: 0.3336 - val_root_mean_squared_error: 0.5776
Epoch 32/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1522 - root_mean_squared_error: 0.3899 - val_loss: 0.3318 - val_root_mean_squared_error: 0.5760
Epoch 33/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1641 - root_mean_squared_error: 0.4050 - val_loss: 0.3302 - val_root_mean_squared_error: 0.5746
Epoch 34/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1630 - root_mean_squared_error: 0.4033 - val_loss: 0.3299 - val_root_mean_squared_error: 0.5743
Epoch 35/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1647 - root_mean_squared_error: 0.4055 - val_loss: 0.3292 - val_root_mean_squared_error: 0.5737
Epoch 36/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1540 - root_mean_squared_error: 0.3923 - val_loss: 0.3278 - val_root_mean_squared_error: 0.5726
Epoch 37/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1548 - root_mean_squared_error: 0.3933 - val_loss: 0.3293 - val_root_mean_squared_error: 0.5738
Epoch 38/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1552 - root_mean_squared_error: 0.3939 - val_loss: 0.3280 - val_root_mean_squared_error: 0.5728
Epoch 39/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1492 - root_mean_squared_error: 0.3856 - val_loss: 0.3268 - val_root_mean_squared_error: 0.5717
Epoch 40/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.1563 - root_mean_squared_error: 0.3953 - val_loss: 0.3278 - val_root_mean_squared_error: 0.5726
Epoch 41/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1420 - root_mean_squared_error: 0.3763 - val_loss: 0.3265 - val_root_mean_squared_error: 0.5714
Epoch 42/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1578 - root_mean_squared_error: 0.3971 - val_loss: 0.3255 - val_root_mean_squared_error: 0.5705
Epoch 43/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 0.1568 - root_mean_squared_error: 0.3958 - val_loss: 0.3240 - val_root_mean_squared_error: 0.5692
Epoch 44/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.1513 - root_mean_squared_error: 0.3889 - val_loss: 0.3259 - val_root_mean_squared_error: 0.5709
Epoch 45/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1511 - root_mean_squared_error: 0.3887 - val_loss: 0.3236 - val_root_mean_squared_error: 0.5688
Epoch 46/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1596 - root_mean_squared_error: 0.3993 - val_loss: 0.3234 - val_root_mean_squared_error: 0.5687
Epoch 47/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1564 - root_mean_squared_error: 0.3954 - val_loss: 0.3243 - val_root_mean_squared_error: 0.5694
Epoch 48/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1486 - root_mean_squared_error: 0.3853 - val_loss: 0.3236 - val_root_mean_squared_error: 0.5689
Epoch 49/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.1370 - root_mean_squared_error: 0.3698 - val_loss: 0.3228 - val_root_mean_squared_error: 0.5681
Epoch 50/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1463 - root_mean_squared_error: 0.3822 - val_loss: 0.3228 - val_root_mean_squared_error: 0.5682
Epoch 51/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1381 - root_mean_squared_error: 0.3715 - val_loss: 0.3235 - val_root_mean_squared_error: 0.5687
Epoch 52/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1473 - root_mean_squared_error: 0.3835 - val_loss: 0.3214 - val_root_mean_squared_error: 0.5669
Epoch 53/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1308 - root_mean_squared_error: 0.3607 - val_loss: 0.3231 - val_root_mean_squared_error: 0.5684
Epoch 54/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1517 - root_mean_squared_error: 0.3894 - val_loss: 0.3241 - val_root_mean_squared_error: 0.5693
Epoch 55/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1634 - root_mean_squared_error: 0.4037 - val_loss: 0.3212 - val_root_mean_squared_error: 0.5668
Epoch 56/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1352 - root_mean_squared_error: 0.3672 - val_loss: 0.3215 - val_root_mean_squared_error: 0.5670
Epoch 57/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1342 - root_mean_squared_error: 0.3661 - val_loss: 0.3213 - val_root_mean_squared_error: 0.5668
Epoch 58/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.1541 - root_mean_squared_error: 0.3923 - val_loss: 0.3230 - val_root_mean_squared_error: 0.5683
Epoch 59/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1493 - root_mean_squared_error: 0.3859 - val_loss: 0.3212 - val_root_mean_squared_error: 0.5668
Epoch 60/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1462 - root_mean_squared_error: 0.3821 - val_loss: 0.3206 - val_root_mean_squared_error: 0.5662
Epoch 61/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1333 - root_mean_squared_error: 0.3647 - val_loss: 0.3200 - val_root_mean_squared_error: 0.5656
Epoch 62/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1426 - root_mean_squared_error: 0.3774 - val_loss: 0.3217 - val_root_mean_squared_error: 0.5672
Epoch 63/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1435 - root_mean_squared_error: 0.3787 - val_loss: 0.3212 - val_root_mean_squared_error: 0.5667
Epoch 64/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1389 - root_mean_squared_error: 0.3725 - val_loss: 0.3193 - val_root_mean_squared_error: 0.5651
Epoch 65/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1330 - root_mean_squared_error: 0.3644 - val_loss: 0.3214 - val_root_mean_squared_error: 0.5669
Epoch 66/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 0.1467 - root_mean_squared_error: 0.3826 - val_loss: 0.3207 - val_root_mean_squared_error: 0.5663
Epoch 67/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.1425 - root_mean_squared_error: 0.3773 - val_loss: 0.3207 - val_root_mean_squared_error: 0.5663
Epoch 68/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1247 - root_mean_squared_error: 0.3528 - val_loss: 0.3193 - val_root_mean_squared_error: 0.5650
Epoch 69/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1335 - root_mean_squared_error: 0.3652 - val_loss: 0.3197 - val_root_mean_squared_error: 0.5654
Epoch 70/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1323 - root_mean_squared_error: 0.3633 - val_loss: 0.3216 - val_root_mean_squared_error: 0.5671
Epoch 71/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.1324 - root_mean_squared_error: 0.3637 - val_loss: 0.3202 - val_root_mean_squared_error: 0.5659
Epoch 72/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1394 - root_mean_squared_error: 0.3731 - val_loss: 0.3218 - val_root_mean_squared_error: 0.5673
Epoch 73/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1347 - root_mean_squared_error: 0.3662 - val_loss: 0.3223 - val_root_mean_squared_error: 0.5677
Epoch 74/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1394 - root_mean_squared_error: 0.3732 - val_loss: 0.3212 - val_root_mean_squared_error: 0.5668
Epoch 75/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1316 - root_mean_squared_error: 0.3625 - val_loss: 0.3213 - val_root_mean_squared_error: 0.5669
Epoch 76/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1505 - root_mean_squared_error: 0.3877 - val_loss: 0.3219 - val_root_mean_squared_error: 0.5674
Epoch 77/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1455 - root_mean_squared_error: 0.3813 - val_loss: 0.3221 - val_root_mean_squared_error: 0.5675
Epoch 78/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1421 - root_mean_squared_error: 0.3766 - val_loss: 0.3223 - val_root_mean_squared_error: 0.5677
Epoch 79/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1217 - root_mean_squared_error: 0.3485 - val_loss: 0.3223 - val_root_mean_squared_error: 0.5677
Epoch 80/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1237 - root_mean_squared_error: 0.3512 - val_loss: 0.3237 - val_root_mean_squared_error: 0.5689
Epoch 81/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1305 - root_mean_squared_error: 0.3609 - val_loss: 0.3207 - val_root_mean_squared_error: 0.5663
Epoch 82/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1251 - root_mean_squared_error: 0.3533 - val_loss: 0.3220 - val_root_mean_squared_error: 0.5674
Epoch 83/100
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.1416 - root_mean_squared_error: 0.3762 - val_loss: 0.3213 - val_root_mean_squared_error: 0.5669
Epoch 83: early stopping
Restoring model weights from the end of the best epoch: 68.

Monitor Training ProcessΒΆ

Neural networks have many tunable hyperparameters. While an exhaustive search is beyond the scope here, monitoring the training process helps assess the suitability of the chosen configuration.

InΒ [35]:
import pandas as pd
import matplotlib.pyplot as plt

pd.DataFrame(history.history).plot(figsize=(6,4))
plt.grid(True)
plt.title("NN Training History (MSE & RMSE on lwage scale)")
plt.xlabel("Epoch")
plt.ylabel("Metric Value")
plt.show()
No description has been provided for this image

Evaluate Neural NetworkΒΆ

InΒ [36]:
# Evaluate on lwage scale
results_nn = model_nn.evaluate(X_test_scaled, y_test_orig)
y_test_pred = model_nn.predict(X_test_scaled).flatten() # Flatten output
RMSE_lwage_nn = results_nn[1]
r2_lwage_nn = r2_score(y_test_orig, y_test_pred)

# Make predictions and Evaluate on the wage scale
y_test_wage_pred = np.exp(y_test_pred)
RMSE_wage_nn = root_mean_squared_error(y_test_wage, y_test_wage_pred)

print(f"\n--- Neural Networks Results ---")
print(f"Neural Networks RMSE (log wage scale): {results_nn[1]}") # Index 1 corresponds to RootMeanSquaredError metric
print(f"Neural Networks  R-squared (lwage scale): {r2_lwage_nn}")
print(f"Neural Networks RMSE (wage scale): {RMSE_wage_nn}")
12/12 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - loss: 0.2210 - root_mean_squared_error: 0.4696 
12/12 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step 

--- Neural Networks Results ---
Neural Networks RMSE (log wage scale): 0.4596092700958252
Neural Networks  R-squared (lwage scale): 0.3847765523523943
Neural Networks RMSE (wage scale): 3.6098844297052466

Neural Network Summary: The model explains about 38% of the variance in log wages. The above training history plot shows that training and validation losses decreased steadily, with early stopping at epoch 83 to prevent overfitting. The model showed effective learning behavior and appropriate hyperparameter tuning. While this architecture performs reasonably, further hyperparameter tuning could potentially improve results.

2.3. Model Comparison and Final SelectionΒΆ

We compare all four models based on RMSE (log wage and wage scale).

InΒ [37]:
# Consolidate results into a dictionary
results_data = {
    'RMSE (lwage scale)': [RMSE_lwage_ridge, RMSE_lwage_lasso, RMSE_lwage_rf, RMSE_lwage_nn],
    'RMSE (wage scale)': [RMSE_wage_ridge, RMSE_wage_lasso, RMSE_wage_rf, RMSE_wage_nn],
    'R-squared (lwage scale)': [r2_lwage_ridge, r2_lwage_lasso,r2_lwage_rf, r2_lwage_nn],
}

# Create DataFrame
results_df = pd.DataFrame(results_data, index=['Ridge', 'Lasso', 'Random Forests', 'Neural Networks'])
print ("The performance of four models is summarized below:\n")
display(results_df.style.format())
The performance of four models is summarized below:

Β  RMSE (lwage scale) RMSE (wage scale) R-squared (lwage scale)
Ridge 0.440964 3.574057 0.433679
Lasso 0.441015 3.574414 0.433550
Random Forests 0.449639 3.592793 0.411178
Neural Networks 0.459609 3.609884 0.384777

Discussion:

  • Comparing the RMSE, regularized linear models, Ridge and Lasso, exhibit the best predictive performance. They slightly outperform Random Forest and Neural Network.
  • This outcome suggests that the underlying relationship between the predictors and log-wage is reasonably well-approximated by a linear specification, consistent with Mincer earnings function theory in economics (Mincer, 1974). The inclusion of expersq likely captures the expected non-linearity. More complex non-linear models (Random Forest and Neural Network) did not yield a significant predictive advantage here.

Overall Preferred Model:

  • Both Ridge and Lasso are strong candidates. The exclusion of looks variable might already address multicollinearity concern, so Lasso did not perform further feature exclusion. The potential advantage of Lasso's sparsity is not realized here. Therefore, for this dataset, Ridge is marginally preferred. It achieved the slightly lowest RMSE and the highest R-squared (though differences are negligible).

Ridge Coefficients: The coefficient plot below (left panel) illustrates the estimated impact of each standardized predictor on log-wage

  • The bar chart shows that exper (experience), expersq (experience squared), educ (education), and female are the most influential predictors.
  • As expected from Mincer function, exper (0.47) and educ (0.18) have positive coefficients, indicating that more experience and education are associated with higher wages.
  • The negative coefficient on expersq (-0.32) captures the diminishing returns to experience. Wages increase with experience, but at a decreasing rate. This concave relationship is visualized in the below partial effect plot (right panel).
  • The negative coefficient for female (-0.17) suggests that, holding other factors constant, females earn about 15.9% less than males.
  • Being a union member (0.07) and living in a bigcity (0.10) are associated with higher wages.
InΒ [38]:
fig, axes = plt.subplots(1, 2, figsize=(18, 8)) # 1 row, 2 columns

# --- Plot 1: Ridge Coefficients ---
ridge_coeffs_sorted = ridge_coeffs.sort_values()
sns.barplot(x=ridge_coeffs_sorted.values, y=ridge_coeffs_sorted.index, 
            ax=axes[0], hue=ridge_coeffs_sorted.index, palette="coolwarm_r")
axes[0].axvline(0, color='black', linewidth=0.8, linestyle='--')
for i, value in enumerate(ridge_coeffs_sorted.values):
    axes[0].text(value + 0.01 * np.sign(value), i, f"{value:.2f}", 
                 va='center', ha='left' if value > 0 else 'right', fontsize=9)
axes[0].set_title('Ridge Regression Coefficients\n(Impact on log-wage)')
axes[0].set_xlabel('Coefficient Value')
axes[0].set_ylabel('Predictors')

# --- Plot 2: Relationship between Experience and Hourly Wage ---
X_train_unscaled_df = pd.DataFrame(X_train_orig, columns=X_cols)
y_train_unscaled = y_train_orig
# 1. Create experience range
exper_min = X_train_unscaled_df['exper'].min()
exper_max = X_train_unscaled_df['exper'].max()
exper_range = np.linspace(exper_min, exper_max, 100)
expersq_range = exper_range**2

# 2. Create data for prediction, holding others constant at mean
means_other_vars = X_train_unscaled_df.drop(columns=['exper', 'expersq']).mean()
X_plot_unscaled = pd.DataFrame([means_other_vars.values] * len(exper_range), columns=means_other_vars.index)
# Add exper/expersq columns
X_plot_unscaled['exper'] = exper_range
X_plot_unscaled['expersq'] = expersq_range
X_plot_unscaled = X_plot_unscaled[X_cols] # Reorder to match original columns

# 3. Scale the plotting data
X_plot_scaled = scaler.transform(X_plot_unscaled)

# 4. Predict lwage using the scaled data
lwage_pred_curve = ridge_cv.predict(X_plot_scaled)

# 5. Create plot 
axes[1].scatter(X_train_unscaled_df['exper'], y_train_unscaled, alpha=0.2, label='Training Data Points', color='skyblue')
axes[1].plot(exper_range, lwage_pred_curve, label='Predicted log(wage) (Ridge Fit)', color='red', linewidth=2.5)
axes[1].axhline(y=np.mean(y_train_unscaled), color='gray', linestyle='--', label='Mean log(wage) of Training Data')
axes[1].set_title('Partial Effect of Experience on Predicted log(Wage)')
axes[1].set_xlabel('Years of Experience (exper)')
axes[1].set_ylabel('Predicted Log Hourly Wage (lwage)')
axes[1].grid(True)
axes[1].legend()

# --- Final Figure Adjustments ---
plt.tight_layout()
plt.show()
No description has been provided for this image

ReferenceΒΆ

  1. Mincer, J., 1974. Schooling, experience, and earnings. New York: Columbia University Press.