Nelson Liu's Blog

Fixing system permissions when writing to Docker volumes

I've been using Docker a lot recently, it's a great way to run old code (think 2016-era Theano code) and ensure reproducible setups across machines. I typically mount my source...

Student Perspectives on Applying to NLP PhD Programs

This post was written by: Akari Asai, John Hewitt, Sidd Karamcheti, Kalpesh Krishna, Nelson Liu, Roma Patel, and Nicholas Tomlin. Thanks to our amazing survey respondents: Akari Asai, Aishwarya Kamath,...

Software Archaeology: Re-generating the CoNLL 2000 Chunking Data

I've been using the data from the CoNLL 2000 shared task on syntactic chunking for some ongoing work, but the original dataset is tiny by modern standards. The train set...

Extracting last timestep outputs from PyTorch RNNs

Here's some code I've been using to extract the last hidden states from an RNN with variable length input....

Flattening the Gigaword Corpus

Code for flattening the Gigaword corpus and associated usage instructions are at nelson-liu/flatten_gigaword The English Gigaword Corpus is a massive collection of newswire text; the unzipped corpus is...