## My Journey in Open Source / How to Get Started Contributing

I just finished the Google Summer of Code Program, wherein I worked on the Python machine learning package scikit-learn. Since I began working with the project in November 2015, I've occasionally received emails asking how…

## scikit-learn GSoC Summary, Lessons Learned, and Future Work

This summer, I was quite fortunate to work on the scikit-learn project with my mentors Jacob Schreiber and Raghav RV as part of the Google Summer of Code Program. I worked on various features for…

## (GSoC Week 10) scikit-learn PR #6954: Adding pre-pruning to decision trees

The scikit-learn pull request I opened to add impurity-based pre-pruning to DecisionTrees and the classes that use them (e.g. the RandomForest, ExtraTrees, and GradientBoosting ensemble regressors and classifiers) was merged a week ago, so…

## (GSoC Week 8) MAE PR #6667 Reflection: 15x speedup from beginning to end

If you've been following this blog, you'll notice that I've been talking a lot about the weighted median problem, as it is intricately related to optimizing the mean absolute error (MAE) impurity criterion. The scikit-learn…

## (GSoC Week 6) Efficient Calculation of Weighted Medians

In my previous blog post, I discussed a method for using two heaps to efficiently find the median for use in the MAE criterion for finding the best split. However, the post did not include…

## (GSoC Week 4) MAE and Median Calculation

In the first part of my project, I am implementing the Mean Absolute Error criterion for the scikit-learn DecisionTreeRegressor. In this blog post, I'll talk about what the criterion does, as well as a technical…

## (GSoC Week 2) Intro to decision trees

Apologies for the late post, I had this sitting in my drafts and forgot to publish it! Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression popular for their ease…

## (GSoC Week 0) How fast is fast, how slow is slow? A look into Cython and Python

The scikit-learn tree module relies heavily on Cython to perform fast operations on NumPy arrays, so I've been learning the language (if you can even call it that) in order to effectively contribute. At first,…

## An Intro to Google Summer of Code

I'm participating in the Google Summer of Code, a program in which students work with an open source organization on a 3 month programming project over summer; I'll be working with the scikit-learn project to…