Liam Brannigan’s Post

9mo

Scikit-learn has a set_output API where we can get outputs as a Polars DataFrame instead of a Numpy array In this example we change the output from the StandardScaler to Polars. We can also do this for ColumnTransformers and whole Pipelines (as we'll soon see!)

9 Comments

👋 Vincent D. Warmerdam

Developer Relations Engineer @ :probabl.

9mo

I haven't tried, but I wonder now that I think of it ... I guess this won't work for sparse featurizers? I could be mistaken but I recall polars not having sparse types.

3 Reactions

Petrica Radan

Data Scientist

9mo

it works also for inputs?

Nico Popescul

🐍

9mo

You can actually set this configuration globally using set_output API. It was added in sklearn 1.2 to support pandas and I believe polars is supported since 1.4. Actually I think that any object that has a __dataframe__ attribute should work. I have been watching an interesting video from :probabl. this weekend where they use polars in a scikit-learn pipeline and use a compatible transformer from skrub library, anyone interested should give it a try https://www.youtube.com/live/uevp7zJTM_c?si=QBloxvxBwvU-3qAI

4 Reactions

Miguel Palencia-Olivar

ML Engineer et Data Engineer | PhD en IA & Data Engineering | Je vous aide à fiabiliser vos systèmes grâce à la tech et à la science 🔐

9mo

Nice to see Polars and Rust gaining momentum in the classical toolset!

1 Reaction

Adrian Fletcher

Data Scientist | Data Analyst | Business Intelligence Specialist | BI Consultant

9mo

Neat tip, I'll be using this for a project instead of converting the Numpy array

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Marcin Szczerski

AI Whisperer & Python Magician | Semiotic Data Analyst | Cultural Storyteller
5mo Edited
Report this post
How Many Methods Do You Know to Retrieve the First 3 Rows of a Pandas #DataFrame? Here Are My 6 Solutions:
Like Comment
To view or add a comment, sign in
Bastien Molcrette, PhD

Data editor/biocurator at GigaDB, GigaScience Press
6mo
Report this post
📊 My personal pipeline to gather data from #ClinicalTrials and process it --> Boost your analysis by combining the ClinicalTrials.gov API + Pandas dataframe + #LangChain framework to develop a #RAG agent
Like Comment
To view or add a comment, sign in
Paulo Cysne

Data Science Leader | 28,200+ followers | Transforming businesses with AI / Data Science / Machine Learning / LLMs
9mo
Report this post
As of scikit-learn version 1.3.2, you can use the set_output method to obtain the results as a pandas DataFrame. This method is not limited to individual transformers but can also be applied within a scikit-learn pipeline.
Khuyen Tran

Sr. Data Engineer @ Accenture | Founder of CodeCut
9mo

By default, scikit-learn transformers return a NumPy array. This can pose a challenge if a pandas DataFrame is required for subsequent data processing steps. Luckily, as of scikit-learn version 1.3.2, you can use the set_output method to obtain the results as a pandas DataFrame. This method is not limited to individual transformers but can also be applied within a scikit-learn pipeline. ⭐️ Bookmark this post: https://bit.ly/49cEX4m #pandas
Like Comment
To view or add a comment, sign in
Emmanuel Msafiri Phiri

Web and Software Student Coach & Mentor
4w
Report this post
🛠️ The Ultimate Debugging Duo: Pandas + NumPy Debugging messy datasets? Here’s why Pandas and NumPy shine: 🔍 Pandas lets you visualize your data structure with .head() and .info(). 📏 NumPy ensures precision in numerical computations, avoiding Python’s floating-point quirks. Together, they help you spot errors faster and fix them efficiently. #DebuggingTools #PythonTips #DataPreparation #CodingBestPractices
Like Comment
To view or add a comment, sign in
Dewa Sahu

ML Engineer | Student | Specializing in MLOps & Scalable Solutions | Crafting High-Performance Models & Automated Pipelines |
5mo
Report this post
🔬 Revisiting ML fundamentals today with NumPy power! 🧠 Ever wonder how NumPy can simplify ML basics? 🤔 I stripped away fancy libraries and implemented a simple model with gradient descent using mainly NumPy! 🛠️ Key insight: NumPy's array operations make ML concepts crystal clear! 💡 What I implemented : Simple model with custom weight initialization 🎲 Forward pass calculation ➡️ Error computation 📉 Gradient descent for weight updates 🔄
Like Comment
To view or add a comment, sign in
Siddharta Govindaraj

Writer @ Playful Python | Tech Trainer, Coach, Python Geek, Author, Speaker
6mo
Report this post
🎉numpy 2.0.0 🎉just released this week. So whats so special about that? Well, apart from a bunch of new features and removal of some deprecated API's, this release is the first major release after numpy 1.0 which was released in ... 2006! That's right, numpy has been stable with no breaking changes for __18 YEARS__ until finally now a new major version. In this era of move fast and break things, numpy has been an ocean of stabilily, which has enabled large frameworks like pandas to build on top of it with full confidence.
Like Comment
To view or add a comment, sign in
Jaume Boguñá Urue

Aerospace Engineer | Python Developer | Data Scientist | Data Engineer | Content Creator
1mo
Report this post
Make Your Pandas Code 𝟮𝘅 𝗙𝗮𝘀𝘁𝗲𝗿 by using 𝘄𝗵𝗲𝗿𝗲()! When working with Pandas, many developers rely on apply() for row-wise operations. However, this can be inefficient for simple conditional logic. Using 𝗻𝗽.𝘄𝗵𝗲𝗿𝗲() can often yield 𝟮𝘅 𝗳𝗮𝘀𝘁𝗲𝗿 results. Main Differences: - 𝗮𝗽𝗽𝗹𝘆(): Executes a lambda function on each row. It’s versatile but slower due to row-wise processing. - 𝘄𝗵𝗲𝗿𝗲(): Uses vectorized operations, leveraging NumPy’s speed for element-wise conditional checks. Switching from apply() to np.where() cut execution time by 50%, showcasing the power of vectorized operations in Pandas.
Like Comment
To view or add a comment, sign in
Jason Pollock

Business Analyst | budding Data Scientist
4mo Edited
Report this post
I just completed the "Supervised Learning with scikit-learn" course on DataCamp!

Jason Pollock's Statement of Accomplishment | DataCamp

datacamp.com

1 Comment
Like Comment
To view or add a comment, sign in

10,307 followers

611 Posts

View Profile Follow

Liam Brannigan’s Post

More Relevant Posts

Explore topics