Nelson Liu's Blog

(GSoC Week 10) scikit-learn PR #6954: Adding pre-pruning to decision trees

The scikit-learn pull request I opened to add impurity-based pre-pruning to DecisionTrees and the classes that use them (e.g. the RandomForest, ExtraTrees, and GradientBoosting ensemble regressors and classifiers) was...

Easy Progress Bars For Python File Reading with tqdm

I've been a fan of the tqdm Python module for quite some time, but I found it difficult to find a reason to use it; generally, loops run fast enough...

(GSoC Week 8) MAE PR #6667 Reflection: 15x speedup from beginning to end

If you've been following this blog, you'll notice that I've been talking a lot about the weighted median problem, as it is intricately related to optimizing the mean absolute error...

(GSoC Week 6) Efficient Calculation of Weighted Medians

In my previous blog post, I discussed a method for using two heaps to efficiently find the median for use in the MAE criterion for finding the best split. However,...

(GSoC Week 4) MAE and Median Calculation

In the first part of my project, I am implementing the Mean Absolute Error criterion for the scikit-learn DecisionTreeRegressor. In this blog post, I'll talk about what the criterion does,...