Description
In several PRs (e.g. #11182#12916) the question arises whether we need to deprecate some object before it can be removed or changed. This goes back to defining what is public API in scikit-learn.
The lest controversial definition is that,
- import paths with that include a module with a leading
_
are private - other modules are public.
However, you could do, for instance,
fromsklearn.cluster.dbscan_importNearestNeighbors
Does it mean that we are supposed to preserve backward compatibility on sklearn.cluster.dbscan_.NearestNeighbors
in terms of import path? How about sklearn.preprocessing.data.sparse
(scipy.sparse
)?
I guess not, meaning that just because we have an import path without an underscore does not mean that it is part of the public API. At the very least it also needs to be documented or used in examples.
If we take this definition,
- most of
sklearn.utils
, is public API (as documented https://scikit-learn.org/stable/modules/classes.html#module-sklearn.utils) but certainly not all 151 objects listed in Public vs. private utils again #6616 (comment) sklearn.utils.fixes
is not.sklearn.externals
are not (except forsklearn.externals.joblib
) that we previously used in examples.
This would mean that we can e.g. remove sklearn.externals.six
in #12916 without a deprecation warning (but possibly with a what's new entry). I have a hard time seeing a user reasonably complaining that we didn't go through a deprecation cycle there.
This would also help resolving the "public vs private utils" discussion in #6616
WDYT, do you have other ideas of how we should define what is public API in scikit-learn?
cc @scikit-learn/core-devs