Nelson Liu's Blog

scikit-learn

scikit-learn test_size and train_size pitfalls and coming changes

I recently authored a scikit-learn PR to edit the behavior of train_size and test_size in most of the classes that use it; I thought that their interaction was...

My Journey in Open Source / How to Get Started Contributing

I just finished the Google Summer of Code Program, wherein I worked on the Python machine learning package scikit-learn. Since I began working with the project in November 2015, I've...

scikit-learn GSoC Summary, Lessons Learned, and Future Work

This summer, I was quite fortunate to work on the scikit-learn project with my mentors Jacob Schreiber and Raghav RV as part of the Google Summer of Code Program. I...

(GSoC Week 10) scikit-learn PR #6954: Adding pre-pruning to decision trees

The scikit-learn pull request I opened to add impurity-based pre-pruning to DecisionTrees and the classes that use them (e.g. the RandomForest, ExtraTrees, and GradientBoosting ensemble regressors and classifiers) was...

(GSoC Week 8) MAE PR #6667 Reflection: 15x speedup from beginning to end

If you've been following this blog, you'll notice that I've been talking a lot about the weighted median problem, as it is intricately related to optimizing the mean absolute error...

(GSoC Week 6) Efficient Calculation of Weighted Medians

In my previous blog post, I discussed a method for using two heaps to efficiently find the median for use in the MAE criterion for finding the best split. However,...

(GSoC Week 4) MAE and Median Calculation

In the first part of my project, I am implementing the Mean Absolute Error criterion for the scikit-learn DecisionTreeRegressor. In this blog post, I'll talk about what the criterion does,...

(GSoC Week 0) How fast is fast, how slow is slow? A look into Cython and Python

The scikit-learn tree module relies heavily on Cython to perform fast operations on NumPy arrays, so I've been learning the language (if you can even call it that) in order...

An Intro to Google Summer of Code

I'm participating in the Google Summer of Code, a program in which students work with an open source organization on a 3 month programming project over summer; I'll be working...