Nelson Liu's Blog

Software Archaeology: Re-generating the CoNLL 2000 Chunking Data

I've been using the data from the CoNLL 2000 shared task on syntactic chunking for some ongoing work, but the original dataset is tiny by modern standards. The train set...

Extracting last timestep outputs from PyTorch RNNs

Here's some code I've been using to extract the last hidden states from an RNN with variable length input....

Flattening the Gigaword Corpus

Code for flattening the Gigaword corpus and associated usage instructions are at nelson-liu/flatten_gigaword The English Gigaword Corpus is a massive collection of newswire text; the unzipped corpus is...

Self-hosted CI for Research, Part 1: Running Jenkins builds in Docker Containers

This post is part of my series on setting up a self-hosted continuous integration server. Part 0 has a table of contents. Using Docker containers for Jenkins builds is attractive...

Self-hosted CI for Research, Part 0: Introduction and Motivation

This month, I'll be writing about how I set up my self-hosted continuous integration setup (powered by Jenkins and Docker). In this initial post, I wanted to provide some motivation...