Compare classification performance of dataset subsets

Question

Let's say I have a dataset like this on which I want to perform classification:

id	feature	class	factor
1	...	1	A
2	...	1	B
3	...	2	A
4	...	2	B
$\vdots$

How can I compare the performance of a model given the values of the factor?

For instance, let's say I'm using a handwritten digit dataset with a factor being if the person is left or right-handed. How could I compare if the model does better with left-handed or right-handed data?

rehaqds · Accepted Answer · 2025-01-06 10:50:28Z

Once you have the predictions for the full dataset, you can create 2 subsets (one filtering on Factor==A and the other on Factor==B) and compute your score on these 2 subsets.

wjktrs · Accepted Answer · 2025-01-12 20:39:22Z

Multinomial logistic regression could be used in which a dummy (0,1) or "one-hot" encoded feature is also input as a predictor where 0=A and 1=B. This would firstly tell you if A vs B "matters" at all, i.e., the binary predictor helps explain classificaiton results. Then, you could determine the difference in classification results for the two types of writers.

Stack Exchange Network

Compare classification performance of dataset subsets

2 Answers 2

Hot Network Questions

Compare classification performance of dataset subsets

2 Answers 2

Related

Hot Network Questions