Software Archaeology: Re-generating the CoNLL 2000 Chunking Data

October 27, 2018 research, tooling, tutorial, open source, software archaeology, nlp

I've been using the data from the CoNLL 2000 shared task on syntactic chunking for some ongoing work, but the original dataset is tiny by modern standards. The train set...

Extracting last timestep outputs from PyTorch RNNs

January 24, 2018 research, tooling, tutorial, machine learning, nlp, pytorch

Here's some code I've been using to extract the last hidden states from an RNN with variable length input....

Flattening the Gigaword Corpus

September 23, 2017

Code for flattening the Gigaword corpus and associated usage instructions are at nelson-liu/flatten_gigaword The English Gigaword Corpus is a massive collection of newswire text; the unzipped corpus is...

Paraphrase Identification Models in Tensorflow

May 20, 2017 tensorflow, machine learning, open source, python, nlp, paraphrase-identification

I've been loosely hacking on the Quora Question Pairs dataset in my free time to get some more experience working with vanilla Tensorflow for NLP in a practical setting. Yesterday,...

Installing and Updating GTX 1080 Ti Drivers / CUDA on Ubuntu

April 29, 2017 machine learning, python, nvidia, CUDA, drivers, tensorflow

I recently had to figure out how to set up a new Ubuntu 16.04 machine with NVIDIA's new GTX 1080 Ti graphics card for use with CUDA-enabled machine learning...

Nelson Liu's Blog