0
$\begingroup$

I am trying to do highly unbalanced binary classification using Linear Genetic Programming to detect a certain spoken word. I use mel coefficients as features. The instructions include basic arithmetic excluding dividision, sine, cosine and select instructions(a = a if a >= 0 else b).

The positive class is the spoken word, and the negative class is anything else, like just noise or other spoken words.

I have a dataset of about 50K entries. There are about 2K positive entries and the rest are negative. I use about 3K for train, having 500 for positive and 2.5K for negative. During training, depending on the word, I get 90%-99% positive accuracy depending on which word. And I get 100% or close to 100% accuracy for the negative class during training.

As for test part, the test dataset also includes samples of words which are not in the negative train set at all. For instance, if my train negative set are "cat" and "dog", test includes both of them but also about 3K samples another word(s) like "yes".

The problem's that the best found program performs almost as well in training. Positive and negative accuracy are at maximum only 1-2% behind training each.

This looks suspicious. Maybe the program I've written has some bug?

$\endgroup$
2
  • $\begingroup$If you have class imbalance, it's possible to easily get high accuracy by always predicting the dominant class. So, accuracy might not be the best metric in your case. Check out other metrics. For example, you could check the accuracy for the positive class alone (i.e. how many positive samples are correctly labeled as such), not across all classes/samples.$\endgroup$
    – nbro
    CommentedApr 4 at 9:27
  • $\begingroup$Hello. You misread my question. I'm getting 100% accuracy for negative and 90%-99% accuracy for positive. And it performs almost as well on the test as well. Let me know if the question needs clarification.$\endgroup$CommentedApr 4 at 12:28

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.