Liam Brannigan’s Post

Scikit-learn has a set_output API where we can get outputs as a Polars DataFrame instead of a Numpy array In this example we change the output from the StandardScaler to Polars. We can also do this for ColumnTransformers and whole Pipelines (as we'll soon see!)

  • No alternative text description for this image
👋 Vincent D. Warmerdam

Developer Relations Engineer @ :probabl.

9mo

I haven't tried, but I wonder now that I think of it ... I guess this won't work for sparse featurizers? I could be mistaken but I recall polars not having sparse types.

it works also for inputs?

Like
Reply

You can actually set this configuration globally using set_output API. It was added in sklearn 1.2 to support pandas and I believe polars is supported since 1.4. Actually I think that any object that has a __dataframe__ attribute should work. I have been watching an interesting video from :probabl. this weekend where they use polars in a scikit-learn pipeline and use a compatible transformer from skrub library, anyone interested should give it a try https://www.youtube.com/live/uevp7zJTM_c?si=QBloxvxBwvU-3qAI

Miguel Palencia-Olivar

ML Engineer et Data Engineer | PhD en IA & Data Engineering | Je vous aide à fiabiliser vos systèmes grâce à la tech et à la science 🔐

9mo

Nice to see Polars and Rust gaining momentum in the classical toolset!

Adrian Fletcher

Data Scientist | Data Analyst | Business Intelligence Specialist | BI Consultant

9mo

Neat tip, I'll be using this for a project instead of converting the Numpy array

See more comments

To view or add a comment, sign in

Explore topics