I would like to ask for help with the following.
Given the following dataset, which I have split into train and test sets:
# Loading data df = pd.read_csv("https://raw.githubusercontent.com/karsarobert/Machine_Learning_2024/main/train.csv") # Setting target variable and predictors y = df['target_reg'] corr_col = ['arbevexp_2014', 'arbevexp_2015', 'arbevexp_2016', 'arbevert_2014', 'arbevert_2015', 'arbevert_2016', 'ranyag_2014', 'ranyag_2015', 'ranyag_2016', 'rszem_2016'] X = df[corr_col] # Splitting data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42) # Creating and fitting StandardScaler on the training data scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test)
I have tried many machine learning methods and algorithms. So far, the most accurate (MAE: 50799) has been the Random Forest Regressor and the Bayes Optimizer with the following hyperparameters:
{'max_depth': 20, 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 49}
My question is, how can I find better hyperparameters? What search methods are there besides brute force? Is there a well-functioning genetic algorithm or TPE for this?
I have already tried linear regression, KNN, SVM, GridSearchCV/RandomizedSearchCV, SGB, CatBoost, and Ridge regression etc. I don't think a neural network is suitable for this problem because it overfits.