About
Builds things
Articles by Lukas
Contributions
-
How can you improve customer segmentation accuracy with data cleaning?
There are a ton of tools and techniques to clean data. Some are statistically rigorous, some are ad-hoc. But, all techniques require you - the data practitioner - to really *understand* the data that you are working with. How it was collected, what it looks like, the shape, the outliers, distributions, scale, time, and context. My number one recommendation for cleaner data: Understand. Your. Data. Then find the right tools to clean it up.
-
How can you improve customer segmentation accuracy with data cleaning?
In my experience, the quality of a data product is heavily determined by the quality of the input data. Maybe the data is improperly collected, or incomplete, or an outlier. Data is noisy. So, we have no choice but to understand and improve every data set's quality. Data cleaning provides comprehensive, repeatable ways to sanitize input data. In short - data cleaning let's you cut down noise.
Activity
-
Write, Audit, Publish is slowly rising as the standard paradigm for running pipelines in production while preserving data quality. It builds trust…
Write, Audit, Publish is slowly rising as the standard paradigm for running pipelines in production while preserving data quality. It builds trust…
Liked by Lukas Schulte
-
Shift Data Quality Left with SDF in CI/CD 🛠️ Are you ready to take your data quality to the next level? Join us for a demo and webinar on Tuesday…
Shift Data Quality Left with SDF in CI/CD 🛠️ Are you ready to take your data quality to the next level? Join us for a demo and webinar on Tuesday…
Liked by Lukas Schulte
-
Apache DataFusion is now the fastest single node engine for querying Apache Parquet files! That's right - even faster than DuckDB. At SDF Labs, we…
Apache DataFusion is now the fastest single node engine for querying Apache Parquet files! That's right - even faster than DuckDB. At SDF Labs, we…
Liked by Lukas Schulte
Experience
Education
-
Northeastern University
-
Activities and Societies: Northeastern Outdoors Club. NU IDEA. Directed research with Northeastern faculty.
-
-
Publications
-
Active Files as a Measure of Software Maintainability
International Conference on Software Engineering (ICSE)
In this paper, we explore the set of source files which are changed unusually often. We define these files as active files. Although discovery of active files relies only on version history and defect classification, the simple concept of active files can deliver key insights into software development activities. Active files can help focus code reviews, implement targeted testing, show areas for potential merge conflicts and identify areas that are central for program comprehension.
Other authorsSee publication
Honors & Awards
-
Nominated for Best Paper Award, ICSE 2014
ICSE 2014
-
Northeastern University Deans List
Northeastern University College of Engineering
Awarded to students for excellent academic performance. Awarded 2013 &2014
Languages
-
German
Native or bilingual proficiency
More activity by Lukas
-
This has been a long time coming. After the concerted efforts many engineers from across companies and across continents, we were able to post a…
This has been a long time coming. After the concerted efforts many engineers from across companies and across continents, we were able to post a…
Liked by Lukas Schulte
-
Tomorrow morning at 9 AM PST I'll be speaking on a panel for Dagster Labs' Data Platform Week! We'll cover the broad strokes on how we see the data…
Tomorrow morning at 9 AM PST I'll be speaking on a panel for Dagster Labs' Data Platform Week! We'll cover the broad strokes on how we see the data…
Liked by Lukas Schulte
-
SDF just released SDF lint, a sql linter written in rust and intended to be a drop in replacement for sqlfluff. The rust “this python library but…
SDF just released SDF lint, a sql linter written in rust and intended to be a drop in replacement for sqlfluff. The rust “this python library but…
Liked by Lukas Schulte
-
Want to level up your data platform knowledge over your morning coffee? ☕ Join SDF this Friday 9 am PT at Dagster Labs's Data Platform Week. You…
Want to level up your data platform knowledge over your morning coffee? ☕ Join SDF this Friday 9 am PT at Dagster Labs's Data Platform Week. You…
Liked by Lukas Schulte
-
"You're not a real SAAS founder until you've hosted a webinar" - Other SAAS founder who hosted a webinar Looks like I've finally crossed the…
"You're not a real SAAS founder until you've hosted a webinar" - Other SAAS founder who hosted a webinar Looks like I've finally crossed the…
Liked by Lukas Schulte
-
Another successful All Hands/Onsite is on the books when SDF-ers from all corners of the world gather in one location to hack, strategize, and have…
Another successful All Hands/Onsite is on the books when SDF-ers from all corners of the world gather in one location to hack, strategize, and have…
Liked by Lukas Schulte
-
Wow, what an inspiring Coalesce! After years of watching sessions online, I wasn’t sure what to expect attending in person, but it definitely…
Wow, what an inspiring Coalesce! After years of watching sessions online, I wasn’t sure what to expect attending in person, but it definitely…
Liked by Lukas Schulte
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Lukas Schulte
-
Lukas Schulte
Abteilungsleiter / IT Solution Line Owner Vertrieb & Marketing bei Loh Services GmbH & Co. KG
-
Lukas Schulte
S.C.O.R.E. Close. Repeat I Speaker I ConSALEtant
-
Lukas Schulte
Geschäftsführender Gesellschafter bei Diamond Cars
-
Dr. Lukas Schulte
Principal ML Scientist @ Boehringer Ingelheim | Data Science | Machine Learning | Drug Development | Pharmaceutical Research
31 others named Lukas Schulte are on LinkedIn
See others named Lukas Schulte