Say a dataset has 0.5% of its features continuous and 99.5% categorical (binary) with ~2400 features in total. In this dataset, each observation is 1 of 2 classes - Fraud (1) or Not Fraud (0). Furthermore, there is a large class imbalance with only 2.6% of examples being Fraud, and the other ~97% of examples being Not Fraud.
Say we want to to predict whether a given example is Fraud or Not Fraud, and we take an anomaly detection approach using autoencoders.
Given the mixed data types in the dataset, in general, will an autoencoder, trained on only the Non Fraud examples, perform well in predicting Fraud examples? Is there any literature to suggest what architectures work best / if some preprocessing should be performed beforehand (scaling and PCA)? I ask because I feel an autoencoder may be hard to train with binary features.