Scikit-learn has a set_output API where we can get outputs as a Polars DataFrame instead of a Numpy array In this example we change the output from the StandardScaler to Polars. We can also do this for ColumnTransformers and whole Pipelines (as we'll soon see!)
it works also for inputs?
You can actually set this configuration globally using set_output API. It was added in sklearn 1.2 to support pandas and I believe polars is supported since 1.4. Actually I think that any object that has a __dataframe__ attribute should work. I have been watching an interesting video from :probabl. this weekend where they use polars in a scikit-learn pipeline and use a compatible transformer from skrub library, anyone interested should give it a try https://www.youtube.com/live/uevp7zJTM_c?si=QBloxvxBwvU-3qAI
Nice to see Polars and Rust gaining momentum in the classical toolset!
Neat tip, I'll be using this for a project instead of converting the Numpy array
Developer Relations Engineer @ :probabl.
9moI haven't tried, but I wonder now that I think of it ... I guess this won't work for sparse featurizers? I could be mistaken but I recall polars not having sparse types.