Nelson Liu's Blog

python

Paraphrase Identification Models in Tensorflow

I've been loosely hacking on the Quora Question Pairs dataset in my free time to get some more experience working with vanilla Tensorflow for NLP in a practical setting. Yesterday,...

Installing and Updating GTX 1080 Ti Drivers / CUDA on Ubuntu

I recently had to figure out how to set up a new Ubuntu 16.04 machine with NVIDIA's new GTX 1080 Ti graphics card for use with CUDA-enabled machine learning...

Making autoenv + conda faster

I've recently switched over to using the fantastic autoenv to automatically activate my anaconda environments and set necessary environment variables when I enter a directory on my terminal. You basically...

scikit-learn test_size and train_size pitfalls and coming changes

I recently authored a scikit-learn PR to edit the behavior of train_size and test_size in most of the classes that use it; I thought that their interaction was...

(GSoC Week 10) scikit-learn PR #6954: Adding pre-pruning to decision trees

The scikit-learn pull request I opened to add impurity-based pre-pruning to DecisionTrees and the classes that use them (e.g. the RandomForest, ExtraTrees, and GradientBoosting ensemble regressors and classifiers) was...

Easy Progress Bars For Python File Reading with tqdm

I've been a fan of the tqdm Python module for quite some time, but I found it difficult to find a reason to use it; generally, loops run fast enough...

(GSoC Week 4) MAE and Median Calculation

In the first part of my project, I am implementing the Mean Absolute Error criterion for the scikit-learn DecisionTreeRegressor. In this blog post, I'll talk about what the criterion does,...

(GSoC Week 0) How fast is fast, how slow is slow? A look into Cython and Python

The scikit-learn tree module relies heavily on Cython to perform fast operations on NumPy arrays, so I've been learning the language (if you can even call it that) in order...