Nelson Liu's Blog

PhD Statement of Purpose

Nelson Liu — Wed, 11 Nov 2020 08:25:27 GMT

I've been pretty liberal about sharing my PhD statement of purpose with folks who email me, so in the interest of open and equal access, I'm putting them here for anyone to see. I also included some notes on thoughts I have looking back, with some things I would have changed if I were to rewrite it. If you're looking for advice on writing statements of purpose, I like this document by Noah Smith (much of this advice shaped the final document below) and this document by Nathan Schneider. I applied in December 2018 [1]:

PhD Statement of Purpose

I hesitate to do this because it's pretty cringe in hindsight (and my interests have definitely shifted considerably since writing this), but I think it does a reasonable job of avoiding a common failure mode: making your statement of purpose just your CV, but translated into paragraphs. I wrote a fair amount about the past projects that I worked on, but I mainly did so towards the goal of motivating / painting a coherent picture of how I reached my research interests at the time.

For example, the paragraph about my past work at AI2 was mostly to convey that I had hands-on experience with QA models and was pretty disillusioned by the gap between their (hyped) purported capabilities and their actual capabilities.

I also think the paragraph about future career goals could be omitted—don't feel pressured to make any semblance of a decision of what you want to do post-PhD. I thought it might just be useful to mention, since I really enjoyed TAing as an undergrad and one of my letter-writers was someone who I had pretty much only interacted with in the context of TA-ing. You will not be held accountable for whatever career goals you set, people expect them to change :) (and I can confirm that your opinions will certainly develop during your PhD).

If I could change things, I'd definitely write less about my past. For instance, I would shorten the part about the RepL4NLP paper—there's no need to go into the concrete results, it would suffice to just say that my personal experience with the opacity of neural nets led me to do some initial work in the area, and motivated my future interests.

I also regret not being more concrete about maybe specific projects I want to do in the future, but I recognize that it's easy to say that in hindsight. As I was writing the statement, I was definitely afraid that my misinformed senior-year undergraduate research opinions would turn off any NLP faculty with the misfortune of reading my application, so I chose to be conservative instead and say less. Looking back, I think it might have actually looked better to have stronger personal opinions with more evidence for why I feel the way I do.

Have confidence in your ideas and research goals; you've clearly thought them through, so use the statement to convey why you think these problems are interesting, and why your proposed solutions seem reasonable in light of prior work in the area.

[1]: For historical context, BERT was released in October 2018, and BERTology wasn't really a thing yet—anyone writing about analyzing neural nets today probably needs to write more arguing why this is interesting, and what new things they want to bring to the table.

Newer PyTorch Binaries for Older GPUs

Nelson Liu — Tue, 13 Oct 2020 21:54:03 GMT

After recently upgrading some code to a newer version of PyTorch, I found that it would no longer successfully execute on NVIDIA Tesla K40 GPUs. In particular, I was seeing the following error whenever a model's .forward() function was called:

RuntimeError: CUDA error: no kernel image is available for execution on the device

Turns out that PyTorch v1.3.1 dropped support for GPUs with NVIDIA compute capability 3.5 in their prebuilt binaries that you'd get from pip or conda —the stated reason was that supporting these older GPUs would have pushed binary sizes past acceptable limits for distribution.

I built new binaries that add back support for these older GPUs (e.g., K40) and will be uploading them at https://github.com/nelson-liu/pytorch-manylinux-binaries/releases for each PyTorch release. You can also find the download links at nelsonliu.me/files/pytorch/whl/torch_stable.html , and these binaries are pip-installable via (change the desired PyTorch / CUDA version, as necessary):

pip install torch==1.3.1+cu92 -f https://nelsonliu.me/files/pytorch/whl/torch_stable.html

NSF GRFP Application Materials

Nelson Liu — Tue, 13 Oct 2020 21:43:26 GMT

I've been pretty liberal about sharing my NSF GRFP materials with folks who email me, so in the interest of open and equal access, I'm putting them here for anyone to see. I applied for the fellowship in late-2018 as a final-year undergraduate, and was a fortunate recipient.

Research Statement

Reviews

I was a bit personal in my personal statement, so I'm not planning on sharing it publicly. At a high level, the structure looked like:

Background / my motivations for pursuing a research career
The various past projects I've worked on, with concrete examples of what I learned from each of them and how they've helped me develop as a researcher. For instance, "Designing this model was a highly iterative process, and took over a year—struggling through numerous prototypes gave me experience with interpreting negative results to improve my method."
A section on broader impacts, mostly focusing on my participation in outreach activities and my involvement in open-source software development.
A brief paragraph on future goals.

Good luck with your application!

Fixing system permissions when writing to Docker volumes

Nelson Liu — Sat, 28 Mar 2020 00:49:18 GMT

I've been using Docker a lot recently, it's a great way to run old code (think 2016-era Theano code) and ensure reproducible setups across machines. I typically mount my source code as a Docker volume, so I read and write to the directory from my container.

However, because Docker containers are privileged, the output files are owned by root , and I can't even delete them once I return to my system. A quick fix is to launch another Docker container that simply chowns everything in the current directory, recursively:

docker run -it --rm \
    -v $(pwd):/workdir \
    --workdir /workdir \
    alpine \
    chown -R $(id -u):$(id -g) .

Student Perspectives on Applying to NLP PhD Programs

Nelson Liu — Thu, 24 Oct 2019 17:23:03 GMT

This post was written by: Akari Asai, John Hewitt, Sidd Karamcheti, Kalpesh Krishna, Nelson Liu, Roma Patel, and Nicholas Tomlin.

Thanks to our amazing survey respondents: Akari Asai, Aishwarya Kamath, Sidd Karamcheti, Kalpesh Krishna, Lucy Li, Kevin Lin, Nelson Liu, Sabrina Mielke, Roma Patel, Nicholas Tomlin, Eric Wallace, and Michihiro Yasunaga.

This post offers and summarizes student advice and perspectives on the NLP PhD application process, with a focus on programs in the US. We asked twelve recently-successful NLP PhD applicants a range of questions about the application process—this post compiles the broader themes and advice that run through the majority of responses. Make sure to check out the complete set of responses! A tarball is also available for those who cannot access Google Drive.

⚠️Disclaimer⚠️: While we’ve all gone through the application process and have thoughts to share, we aren’t experts or authorities on this (highly random) process. Our advice comes from our unique perspectives and backgrounds, and not everything will generalize. That said, we hope that the differences and similarities in our shared experiences will be useful to consider.

Professors have also written advice to applicants from their side of the process, see Kalpesh Krishna’s compilation of graduate school application advice.

Pre-application
Statement of Purpose
Letters of Recommendation
Publications
Transcripts / Grades
Standardized Exams: GRE / TOEFL
Interviews / Post-application Calls
Deciding where to go
Misc. Topics
In Conclusion

Pre-application

Deciding to apply at all is not an easy choice, and several respondents took additional time, either in school or in industry, to explore new fields and become more certain that pursuing a PhD was the right decision for them. Choosing where to apply is also an involved process, and involves trade-offs between factors like research area fit, location, and (perceived) selectivity. This section explores this preliminary part of the application process, along with useful insights from applicants on different aspects of this decision.

A lot of the perspectives in this post are aimed towards people already seriously considering a PhD—for instance, seniors or MS students. If you are a student considering a PhD, but still have a significant amount of time before you apply, John Hewitt’s blog post contains useful insights and advice on how to make the most of your time in school. In addition, Kalpesh Krishna’s extensive compilation of application advice might yield things to keep in mind through the years.

Why apply now?

For many of the respondents, starting a PhD was the natural “next step”—they were in the final year of their undergraduate or masters degrees, and had spent enough time doing research to realize that a PhD was worth the opportunity cost to them.

While I did not have any *ACL papers while applying...My goal was to get into a good PhD program and start doing research full-time (which is why I was applying to a PhD program in the first place) rather than get into the very best PhD program.
– Kalpesh Krishna

Waiting to apply also has clear benefits—many respondents felt that they would be stronger applicants after an additional year of research experience (and the associated publications and stronger letters of recommendation that might come with it).

“The year away from academia gave me the clarity on how much I really wanted to do a PhD and how much I love academic life. In this year I used my free time to explore interesting research directions and collaborated with friends. It made me realise that I enjoy research and to be able to do it for a living would be just perfect.”
– Aishwarya Kamath

“I was also unsure at that time what kinds of directions I wanted to go in or if I even wanted to commit so many years of my life to additional school...By the time fall 2018 came around, I’d done a full year of thinking and growing my research skills, so I felt a lot better about diving into the process.”
– Lucy Li

Several people found value in waiting because it gave them the time to reflect on their next steps. For instance, Lucy and Aishwarya used the time to further develop their research interests and think about what areas were exciting to them. In particular, Aishwarya spent a year in industry, which made her realize what she was missing an academic setting and drove her to apply and return.

On the other hand, several also offered caution about waiting with the sole intention of improving your profile. As PhD applications get more and more competitive each year, more papers or experience doesn’t necessarily mean a stronger application since things are inherently relative. Several agreed that having publications at top conferences is not a necessary component of a strong application, especially if one has relatively limited research experience (e.g., applicants from undergrad) or has strong recommendation letters. A recent blog post about the machine learning PhD application process investigates admission statistics at one of the top schools (Fall 2018), and shows admission is not determined solely on publication records, but depends on the other factors, especially applicants’ background and letters of recommendations.

For instance, Kalpesh and Akari considered waiting a year since they did not have any top-tier NLP publications at the time, noting that:

Things get more and more competitive each year, so more papers doesn’t necessarily mean a stronger application since things are inherently relative.
Applicants with master's degrees are expected to have more publications and experience than undergraduate applicants.
There is a large amount of uncertainty involved in research / writing papers, so things are not always going to pan out for reasons out of your control.
They thought that they were still reasonably strong applicants for many of the places they were applying to.

Kevin and Akari also mention that, if you have the resources, you can apply multiple times.

If what you really want to do is to immediately get into a grad school and continue doing work that you are excited about, you should apply.
– Roma Patel

Choosing where to apply

When choosing where to apply, the majority of respondents focused on a few factors:

Overwhelmingly, the strongest factor for everyone was faculty: finding schools with professors that you’d want to work with, and with a strong presence in allied fields. Several mentioned applying to places only if there were 2 or more relevant faculty.
Location was also a key factor for many: finding schools in places that you think you’d be happy living in for 5+ years.
Lastly, many also considered proximity to industry connections / possible external collaborators.

Some also took the relative prestige of a school into account, with the thinking that prestigious schools attract strong peers, which means that you can learn more and work with amazing people.

There’s also a case to be made for applying to a mixture of (1) programs that you’re relatively confident you can be admitted to and (2) “top choice” programs that might have a bit more randomness in the admissions process (of course, all the schools you apply to should be places you’d be happy going to). However, it’s easy to be a bit too conservative when choosing where to apply—remember that you only really need 1 offer. The majority of respondents applied to between 8 and 13 schools, though almost everyone was happy with the number of applications they submitted (Kevin, who applied to 4, thought it would have been helpful to apply to more).

NLP applicants in particular are lucky—there are amazing faculty scattered around the world in a variety of different environments. Start with a large list before filtering down, and focus on finding the right fit for you personally.

Talking to Faculty Beforehand?

I did not email faculty beforehand - I don’t think this helps (and in the case of a poorly crafted email, could actually hurt!).
– Sidd Karamcheti

The majority of students did not email faculty before applying. Some faculty ask students to reach out—this will usually be explicitly mentioned on their webpage. In the absence of such a notice, a reasonable policy is to not send an email.

But that said, if you are in the vicinity of a school or doing an academic visit -- feel free to reach out to the faculty there and ask if they have a half-hour slot to meet!
– Roma Patel

I emailed one prospective advisor and asked to meet at a conference. In general, I think this is a good strategy, especially if you have research-related things to talk about with them. (Which hopefully you will, if they’re a good advisor fit!)
– Nicholas Tomlin

Several respondents were fortunate to meet potential future advisors at workshops or conferences / if they happened to be in the area, and found them to be quite receptive to short research meetings. It’s good to go into these meetings with a sense of (1) what you’d like to get out of it, and how to use this meeting effectively, (2) an awareness of their recent work, (3) a mental list of questions that you think have informative or interesting answers.

...one of my undergrad advisors emailed a couple prospective grad advisors on my behalf, and asked them to look out for my application. I think this was particularly helpful and is maybe something worth mentioning to your undergraduate advisor.
– Nicholas Tomlin

It is appropriate to selectively ignore advice about cold-emailing—Prof. Yonatan Bisk has a great guide that walks through the why, when, and how.

Back to the top.

Statement of Purpose

The statement of purpose is an opportunity for you to convey what you’ve worked on and what you’re interested in. Above all, make sure the statement is genuine and uniquely you. The “accept/reject” dichotomy of applications might make this process seem like a game—leading many to believe that it’s better to win the game (that is, be accepted) than to lose. While it’s tempting to shape each application to say what you think faculty might want to hear, being yourself will lead to the best outcome in the end. Remember that programs and students are both looking for the right fit—the statement is a fantastic opportunity for both sides to assess this.

If your statement is genuine and makes clear why you want a PhD, it will resonate with the people you want it to resonate with.
– Sabrina Mielke

Timeline: When to Start and Finish Writing

With respect to starting writing, it is sometimes good to leave it late enough to wrap up any ongoing research projects at the end of the summer so you can write concrete things about them. For finishing writing, it’s good to have a near-ready draft at least a month before.
– Roma Patel

Try to set aside a fixed period of time to work on your statement. While starting earlier rather than later is usually better, try to start writing a draft once you think your current projects and interests are concrete enough to write something substantive. Strive to have a preliminary draft that you’re happy with at least a month before the deadline. You can then send this to your advisors for feedback; continue editing and iterating until the deadline and/or you’re happy with how things look.

Structuring a Statement of Purpose

The goal of the statement is to talk about your past (research) experience, and how that has prepared you for a career in research (why you’re qualified for grad school).
– Sidd Karamcheti

Your statement of purpose should uniquely describe your research experience and elaborate on the process you went through as you undertook your first few research projects. Give enough detail about your past work to allow them to assess the value of the work and also to concretely show that you knew what you were doing at every step of the process. Then fold this into your research as a whole. Try to leverage insights from both the actual work as well the experience of doing research, to formulate how you would undertake future projects during your graduate school career.

Many professors do tell you what they’re looking for in a SoP (JHU CLSP for example has hints at https://www.clsp.jhu.edu/apply-for-phd/phd-admissions-faq/), so do use that resource.
– Sabrina Mielke

Tailoring Each Statement for Specific Universities

I only tweaked the final paragraph. In this paragraph, I specifically mentioned 2--4 faculty that I wanted to work with and provided a one sentence rationale.
– Eric Wallace

Our survey respondents were quite divided on this question. A few respondents significantly tweaked their statements for each university to reflect the subset of their interests relevant to the prospective advisor’s research. However most respondents kept 80-90% of their statement identical and only modified the last 1-2 paragraphs with university specific information - such as the names of the professor they were interested in working with. Most agreed that it is good to have at least some university-specific information to form a connection between your own research goals and a prospective advisor’s research directions.

It is good to have concrete reasons laid out in your statement as to why you want to go to this school and work with these faculty on interesting problems. So definitely tweak the section of your statement that stresses on this.
– Roma Patel

Getting Feedback on Your Statement

Your recommenders will get a better sense of your research interests so it can help them write your recommendation and they have also been through similar processes.
– Kevin Lin

It is good to have a near-complete draft of your statement ready in time to send to your recommenders before they begin to write your letter of recommendation. There are multiple benefits to this. Reading your statement will help them better understand your research interests, which will not only allow them to concretely write things about you in their letter, but might also bring up useful pieces of advice from them based on what they know of the people working in that research area. They will also usually give you feedback on the overall statement—they have possibly read countless statements over the course of their career and will be able to fairly judge and evaluate this in context. Your research advisors and recommenders are likely both extremely knowledgeable and also have your best interests at heart, so remember to ask for feedback and advice on your application!

Using this as a Learning Opportunity

In my statement, I mostly talked about my past experiences and how they feed into my current research interests. I tried to paint a picture that enables the reader to better understand how I reached / why I do the research I do.
– Nelson Liu

Write out your journey as a researcher from the beginning to the present. This will convey important information about you and your research, which can be illuminating for both your reader and for yourself. Chances are that you will write dozens of similar statements in the future, whether they are research statements for fellowships, project proposals, or grant applications. Use this as a learning experience! Writing your statement of purpose is not only good practice for the future, but also a rare invitation to reflect upon your interests and motivations.

Back to the top.

Letters of Recommendation

Letters of recommendation are often cited as the most important part of a PhD application. In our survey, every respondent marked letters as either the most or second-most important component. Given that the admissions committee is optimizing to admit candidates with a high likelihood of reliably producing excellent research, a letter from a fellow academic that effectively claims you’ve been able to do so is a strong signal that you’re a good candidate.

What to look for when choosing letter writers

Your letter writers should be people who know you well enough to speak about your skills and your strengths as a PhD candidate ... people you have worked with who are doing relevant research in the field and people you have genuinely been advisors to you…
– Roma Patel

It can be helpful to view letter writers as your primary advocates in the admissions process. They want their excellent undergraduate students or research assistants to succeed, and they’re singing your praises in order to argue for your spot in graduate school. From this view, it may be clear that they should know you, your strengths, and your goals. Of course, some of your letter writers will know you better than others, but each should be able to at least advocate for your excellence in how you worked or interacted with them.

There’s often a tradeoff between (1) how well you know the letter writer, (2) how cool the work you did with them was, and (3) how well-known they are. As a first approximation, attempt to have all 3 letter writers know you through some kind of research collaboration. Simply doing well in their class, or TAing for them does not necessarily make for a strong letter. On the other hand, an industry researcher who can vouch for your research ability may be able to make a stronger statement. This brings us to (3) how well known the letter-writer is. Perhaps unfortunately, letters from well-known members of the field are (very) highly regarded. This may be due to fame bias—the professors on the application committee can rest assured that they know so-and-so from X university consistently recommends only excellent students. As suggested at the beginning of this paragraph, this will play some role in the tradeoff, but keep in mind that a famous professor who doesn’t really know you won’t write a strong letter.

Each of the components mentioned above—personal knowledge of you and your work, successful research and fame of the writer were mentioned by our respondents.

I chose professors with whom I had completed somewhat successful research, and who were likely to be known by my prospective advisors. For better or worse (probably worse), connections between letter writers and prospective advisors seem to matter a lot.
– Nicholas Tomlin

When to start looking for recommenders

People get started in research at different times, but by the time of application, you need three people who can advocate for your spot in graduate school (though again, not all need to be equally strong or know you equally well). When should you start building these relationships? The easy answer is “as early as possible”. Research takes a long time, as does getting settled in a field and starting to make real progress. This creates a definite bias towards those who start research earlier and collaborate widely (3 professors means a lot of connections to make). However, everyone’s research story looks different, and no student should think it “too late” to go for a PhD (though a master’s and/or further years of research experience may be necessary.)

To back this up, note the wide range of times that our respondents started working with the people who would end up being their LoR writers.

Note that this histogram includes one data point for each letter writer for each respondent. (Not everyone mentioned all three writers, and one mentioned four.) I counted “summer before 3rd year” as “2nd year.” That’s a lot of letter writers from the third and fourth (!) years. Many respondents who met their letter writers after their third year did indicate that it would have been better to start earlier, but the data somewhat makes sense—as you progress through your studies, you gain more research experience.

Asking for specifics in your letter, and getting them submitted

Recall that your letter writers are your advocates—you should feel empowered to bring up all the awesome things that you did with them, and ask (but not demand) that they mention specific things. These requests may be to tailor their letters to your statement of purpose. Think that your efforts in conducting replicable science in a world of AI hype are awesome? Your letter writer may agree, but likely wouldn’t think to mention it if you don’t remind them.

I made sure to send a reminder email 2 weeks, then 1 week, then a few days before applications were due.
– Nelson Liu

Likewise, remember that they’re human and busy, and very well may forget your letter if you don’t send them a few reminders. PhD applications tend to have lenient letter of recommendation deadlines but it’s better to keep on top of them with tastefully-spaced reminder emails—better to not test the waters in this context.

Back to the top.

Publications

I think that having a published conference paper greatly increases your chances, but I think that papers are merely a signal for something more important: can you complete the full research process, from idea inception to experiment execution to writing things up?
– Nelson Liu

Most respondents felt that publications are an important part of a strong application, but are not necessary if you have stellar recommendation letters talking about your research aptitude. Admission into PhD programs in computer science (especially at top schools) is quite competitive, and many candidates have publications, especially candidates applying after year-long research positions such as AI residency programs.

Publications are just tangible evidence - if you can show other evidence that you are able to do research, that you learned something, that you have skills/conclusions that you’ve taken away from the experience, then you should be fine.
– Sidd Karamcheti

Publications are a good way to show concrete research output. This acts like “hard evidence” of research aptitude, which is the primary criterion used to judge PhD applicants. Alternative ways to show concrete research output could be excellent research code releases or insightful blog posts.

Back to the top.

Transcripts / Grades

Almost all survey respondents thought that grades and GPA scores play only a minor role in NLP PhD admissions. It is wise to not stress too much about improving your GPA, especially if compromises the time spent doing research. Things might be different in more theoretical fields though, where coursework might be closer to research.

Take an intro to NLP course! Take machine learning or a specific linguistics course or anything else that clearly shows that you have studied the topics you are excited about in depth.
– Roma Patel

Interesting classes off the beaten path may let you stand out from the crowd.
– Sabrina Mielke

The choice of coursework typically acts like a skillset evaluation during PhD admissions, checking whether candidates are familiar with the fundamental techniques required to conduct their research. Coursework can also help present a coherent academic history when combined with the statement of purpose. Some courses might help an applicant stand out from the crowd, especially if they’re uniquely relevant or off the beaten path.

Sometimes, the exact preparation matters less than evidence that you’re capable of learning important background material. E.g., despite me not having strong probability/stats background, a few professors said they were impressed by my (completely irrelevant) pure math background.
– Nicholas Tomlin

While coursework does not play a major role in admission decisions, many respondents mentioned that courses are a great way to learn the fundamentals and get interested in a particular field, often acting like a precursor to research.

Back to the top.

Standardized Exams: GRE / TOEFL

I get the sense that the GRE doesn’t really matter unless you do abysmally.
– Nelson Liu

Nearly everyone agreed that scores from required standardized tests are not deal-breaking as long as you meet a minimum threshold. Having a suspiciously low score could raise questions, but barring failing the exam, this should not significantly impact your entire application. That said, this is a required checkpoint on your application, so keep aside time to get this done correctly.

There is no glory or shame in taking too much or too little time, so it is better to not compare to others and keep aside the right (and possibly minimal) amount of time you think you need to prepare.
– Roma Patel

Try to give yourself at least 1-2 weeks of study time before the actual test. Don’t consider the amount of time you see others spending on this — assess yourself and allocate larger amounts of time to topics that you are uncertain about and think could use the extra effort. Remember to review all the topics you need to, take a few practice tests, and then just take the exam and don’t stress about the score.

It is usually not worth the extra time, effort, cost (or effect) to redo the exam. So prepare well once, take the exam, and don’t stress about the score once you are done with it. For what it’s worth, future years will likely see this disregard and ambivalence towards scores on tests heightened — lots of schools have already removed the GRE requirement, while others have definite plans of doing so in the coming years.

In general, international students must submit their TOEFL (or IELTS) scores to demonstrate competency in the English language — however for some schools, international students who have received degrees in US schools or received their instruction in English do not need to submit TOEFL scores. Unlike in GRE, applicants MUST score higher than the minimum requirements if universities sets minimum scores. The minimum requirements vary from program to program. For example, the Cornell CS PhD program sets the minimum scores for each section (Listening 15, Writing 20, Reading 20, Speaking 22), while the MIT EECS PhD set the total minimum scores to 100. Make sure that you meet TOEFL scores before the application deadline. Unfortunately, the applicants whose TOEFL scores lower than the minimum are likely to be “desk-rejected”.

Back to the top.

Interviews / Post-application Calls

Interviews in USA are less formal - more general discussions about research interests. Interviews for Europe in my experience were more in depth, as they expect you to already have knowledge of your field (since you can only apply after a Masters), have a research plan and expect you to have already surveyed literature in your chosen field of interest.
– Aishwarya Kamath

The interviews and visit days will differ significantly over the range of schools you’re considering—both in their intended purpose and in the amount of information you can glean about the school and faculty from this one interaction. Some schools do pre-acceptance visit days, with offers conditioned on the interviews and ensuing discussions. Others do virtual interviews over the phone or video calls. And of course, some schools choose not to conduct interviews.

While each interview experience is largely dependent on the candidate in question, most of our survey respondents agreed that these conversations follow the same general pattern.

The general format was like:

“Tell me about a research project you worked on (pick one that is most exciting and introduce)”. The professor would ask some questions, like “why did you consider this model / run this experiment?”, “what is the conclusion?”, “what did you learn through this project?”

“What is your research interest?”, “What are you interested in doing for your PhD (and your career)?” -- it’s good to think in both short term and long term

“Do you have any questions?” -- you can ask any questions about the lab, like the culture, research goals, how advising/meeting works.
– Michi Yasunaga

This is mostly a means of trying to get a sense of what you are like as a person and what your research interests are, to assess both compatibility and mutual interests. Your interviewers will generally ask you to talk about the research you have done — and will interrupt with questions about things that they are interested to hear more about. Overall, this is less of an assessment of your knowledge, rather than them getting insights into how you solve problems and talk about research.

I didn’t enjoy the whiteboard interview.
– Nicholas Tomlin

This sometimes happens. If professors want to assess a specific component of your application, or want to know the extent of your knowledge about a certain topic, they will ask you technical questions that can range from explaining or solving an algorithm, writing out equations or explaining computational and implementation-specific aspects of things you have done. Most of our survey applicants however, did not have to go through this and their interviews largely consisted of general research conversations.

You should definitely know your own work inside-out, but don’t stress about having to know every intricate detail about every subfield in NLP.
– Roma Patel

While it is not important (or even possible) to know everything little thing about every research area in NLP, you should be aware of work being done in areas related to you. Most importantly, if you have written about something in your statement, you should be able to confidently speak about it and answer any questions that they throw at you. Take time to look into every detail and ensure that you know the fundamentals of your work before your interview.

Remember that this is a two way street—while they’re assessing whether you’d be a good fit for their program, you should be probing whether this place / professor is a good match for you.
– Nelson Liu

There is usually a part of the interview where the interviewer steps back and asks you to ask questions — use this time to probe at any uncertainties or lingering questions that you have. If you have questions about their previous work, thoughts about future possibilities, or even just general questions about the program or the department, use this time to clear any doubts and get all the answers you will need to make a decision.

if you don’t know something, it is okay to say that you don’t --- ask questions that help you understand it more and treat it as a learning experience.
– Roma Patel

The only thing I will tell you not to do in an interview: pretend. Professors are good at spotting that kind of thing and they will strongly judge you for it. Just be honest and genuine. You are starting your PhD. You don’t need to know things -- just be willing to grow.
– Sabrina Mielke

Also, don’t worry if you do not know everything the interviewers ask. Just try to be as honest and genuine as you can, and show that you are willing to learn and grow, instead of pretending to know the topics.

I think the interviews as an initial conversation really affected where I seriously considered—the places with interviews that I thought were more fair / reasonable gained legitimacy. In the best case, it was basically a research conversation with a senior researcher, and a great opportunity to get feedback / hear what they think about the field. Overall, I thought they were quite valuable, and I wish that I had treated them less as assessments and more as opportunities.
– Nelson Liu

Make the most of your interviews! All applicants agreed that overall, the interviews were friendly and engaging experiences. Think of this as an opportunity to speak about and answer questions about your work and to have a mutually engaging research conversation.

One useful piece of advice from one of my undergrad advisors was to, “Talk about your research ideas! Remember that what most faculty really want is to be able to discuss the research that is important to them — and if you can do this and make exciting progress through these discussions, you will both mutually have a productive and happy career together.”
– Roma Patel

Back to the top.

Deciding where to go

If you’re fortunate to be considering multiple options, congratulations! It is a hard problem, but a good one to have—be aware of your privilege. The choice between graduate programs is an intensely personal one, and there are a variety of academic and non-academic factors to consider, all of which will influence your health, happiness, and productivity.

Something that people do not always remember when making a decision is that your advisor is possibly someone you will be talking to for upto 3 hours every week for nearly 6 years of your life. It is good to rethink whether or not you will be happy doing this with the faculty in question, if the two of you see eye-to-eye, can comfortably talk about both research-things and also life-things when they come up, and that they will encourage and help guide you in everything you need to do the research that is important to you during your PhD.
– Roma Patel

In general, most respondents agreed that the most important factor is your primary advisor—who will you be working with during your PhD? Do you have mutual research interests? Are your communication and working styles compatible? Would you be comfortable talking to them about your struggles, both academic and non-academic? Do you have much to learn from them and their group? Do you feel supported by them? While it is hard to assess these deep questions before spending time to work with them, conversations and interactions during visit days will help you get a sense of whether things feel right. Trust your instinct—if things feel odd or unnatural, even during these initial conversations, you have plenty of reason to reconsider and be hesitant.

As an undergrad at a school with a large NLP community, I really benefited from having senior researchers around (e.g., grad students and postdocs)---I have so much to learn from them! I felt like I wanted to keep having such an environment in graduate school, which actually ended up being one of the defining factors in my final choice.
– Nelson Liu

Many students also took note of the NLP community at every school they were considering. For instance, some prefer larger groups with many senior students and postdocs, while others prefer smaller, more-intimate groups. There are benefits and drawbacks to both sorts of research environments, and it ultimately boils down to personal preference and taste. It’s important that you feel like you have enough people around to talk about research and life—while your advisor is an important figure in the PhD, you will spend the majority of your time talking to and working alongside fellow students. Make sure that these are people that you’d love to be around for the next stage of your research career.

Sure, you’re picking a place to do research for the next 5+ years of your life, but you also need to be happy / have a life outside of research...I went climbing during a lot of my visits, mostly to assess convenience.
– Nelson Liu

Another important factor to consider is the location. Several expressed weather / culture preference (mostly on the east-coast-vs-west-coast divide). Many also wanted to be in a place that was affordable for students and conveniently located to their favorite hobbies or recreational activities. While research fit is certainly important, you won’t be productive if you’re miserable—put your happiness and your health first, and make sure that you’ll be happy as both a student on-campus and as a resident of the area.

Prestigious schools attract strong peers, which means you can learn more and collaborate with amazing people.
– Eric Wallace

Several also considered the relative “ranking” of a university or program (though this is almost impossible to objectively evaluate without implicitly considering the other factors). While rankings can tell part of the story, they’re not substitute for your own feelings and intuitions about where you belong.

At some schools, it was very clear who my advisors would be, while at others, it wouldn’t be decided until I’d enrolled. I preferred the former scenario since it involved less uncertainty.
– Lucy Li

It’s also useful to consider the program’s requirements and logistics around advising. Are you guaranteed to be able to work with the advisor(s) you are interested in? Does the department have extensive qualification exams or requirements that might be hindrances to your productivity? Will you have to worry about funding?

Personal feelings actually do matter. If you feel (even slightly) uncomfortable, these negative feelings will grow during the five years.
– Akari Asai

Once you have done an extensive comparison on all parameters (professional and personal), you might be stuck between 2-3 very good options. Try reweighting the parameters and see if the balance shifts towards one end. If you are still confused, don’t worry :) If it’s so confusing, both places are surely very good. You will need to work very hard wherever you go, and you won’t lose much choosing one over the other. Go with your heart.
– Kalpesh Krishna

When it comes to the final decision, everyone agrees to go with your heart and feelings of what seems right to you. We’re all logical and analytical people (perhaps to a fault), but if you can’t make up your mind about where to go / are stuck between several options, pick the one that you feel the best about inside. One way to discern this: Suppose you’re picking between two places (this strategy generalizes to N). Take a coin, and assign one place to heads and another to tails. Tell yourself that the result of the coin flip will be where you end up going. Flip the coin, and observe the result. Are you relieved? Would you have preferred the other side? The answer to these questions might help you better understand how you really feel about the decision.

Whatever you do end up deciding, though, don’t regret it—the decision is done now, and you just have to put in the work to ensure that it is a good one.
– Nelson Liu

Making the most of visit days

I didn’t end up going to most visit days -- which is not something that you should do. Go to every visit day! Talk to the other students visiting, the other students currently pursuing PhDs there and to the faculty there. Keep a list of standard questions about schools (requirements, professors, exams, time taken) and make a note of these for every school so that you have an easy way to compare at decision-making time.
– Roma Patel

Many of our survey respondents recommend making the most of the visit days. Treasure this priceless opportunity to talk to professors (both in and outside of your field), meet PhD students, and get to know the other students in your cohort. As you continue your academic career, you’ll be seeing all of these people around in the future—get to know them now!

Talk to students most of all -- disturb them when they’re working to see what it’s like in the lab!
– Sabrina Mielke

Before each visit, it’s useful to think a bit about what you’d like to get out of it. This might result in a list of questions you’d like to answer, or people that you’d like to talk to. Don’t be afraid to contact PhD students in the department and ask to meet; the majority are happy to do so, and would love to give you advice, hear about what you’re working on, and talk about their research. Talking to students is of the utmost importance; they will tell you what it’s really like in the department, and it’s useful for getting a sense of the overall department culture and graduate student community.

My advisor, in her infinite wisdom, gave me a useful piece of insight that had not struck me before. "What most people don't realise, is that the people that you are meeting and talking to over these visits will likely be in your life, for the rest of your life. Go to as many visits and talk to as many prospective students as you can — some of your closest friends and advisors will come out of these interactions."
– Roma Patel

Back to the top.

Misc. Topics

Residency Programs as Precursors to your PhD

I see a couple benefits of working in AI residency which I did at AI2. 1) if you aren’t sure if you want to do a PhD, this is a pretty good way to find out, and after the residency you will be in a reasonable position to pursue both industrial and PhD positions. 2) You will be exposed to a new set of people, and it is helpful to learn from different ways of doing research 3) I personally changed my research direction towards more NLP and this was a great way to explore different research topics and build up the skills I needed to pursue those topics.
– Kevin Lin

Be really really clear why you’re doing the residency - the reason to do the residency/work is to do something you could not otherwise do at grad school/if you’re not sure about grad school.
– Sidd Karamcheti

It’s really important to consider why you want to do a residency program. As our survey respondents mentioned, there are a few different paths that lead to residencies—foremost among them is if you’re not too sure about wanting to do a PhD, and you want some more research experience (working with a couple of different mentors with possibly different areas/interests than what you were exposed to as an undergraduate) before making a final decision.

Another reason a residency program is a good idea is if you’re sure about doing a PhD, but had limited exposure to different areas as an undergraduate. Especially if you’re considering PhD programs where you’re paired with an advisor/placed in a specific area outright, having a year to explore a bunch of different areas and work with different mentors with different styles will let you make a more informed decision. It’s totally possible that the residency program will introduce you to areas you would never have otherwise considered!

That being said, it’s worth noting that not all residency opportunities are created equal—several different companies are just in their first or second year of offering their residency programs, meaning that they’re subject to growing pains—without structured onboarding/tutorials you might spend a lot of time trying to figure out how to use company infrastructure, or you might spend a lot of time trying to figure out what different folks at the company are working on, and how research works in industry.

More importantly, you need to make sure your residency mentors are committed to the same goals that you are—a mismatch in expectations between you and your residency mentors is going to significantly sour your experience! If you want to explore a bunch of different sub-areas of your chosen research area, make sure your mentor is on board to try a few different projects over the course of the year! If you want to instead work on more long-term projects/existing initiatives at the company, make sure that your host is willing to connect you with these existing teams, and that there’s some structure in place that will let you (1) learn, and (2) contribute.

Finally, don’t feel like you need to do a residency to get the industry experience, or to explore different research areas. There is definitely a large amount of time you can spend exploring different areas in grad school, and you’ll have multiple summers to do internships where you’ll possibly get to work on projects very different from your core research agenda.

FWIW, you will likely intern at a lot of the places during the course of your PhD and will have a similar experience, so if the only reason you are considering a residency is because you think that is an experience you will never get at a later time --- this is likely not true.
– Roma Patel

When submitting my application, I was pretty sure that I would defer for a year if I got an offer---there’s no rush, and the extra year might give me some interesting perspective.
– Nelson Liu

Back to the top.

In Conclusion

If you’ve read this far, we hope that this discussion was useful. The admissions process is inherently stochastic, and there’s much that you can’t control—relax, have confidence in yourself, and goodluck!

Another good advice I received from my friend was “Don’t reject (by?) yourself”. I remember how uneasy and stressful I felt at the time of application, as I did not have strong publication records, and came from non top undergraduate schools in the US. Sometimes people value your unique back-ground, experience in other fields or find really positive signals in the letters of recommendation. Don’t hesitate to apply for good schools, because “I think I’m not good enough”.
– Akari Asai

Back to the top.

Software Archaeology: Re-generating the CoNLL 2000 Chunking Data

Nelson Liu — Sun, 28 Oct 2018 00:43:53 GMT

I've been using the data from the CoNLL 2000 shared task on syntactic chunking for some ongoing work, but the original dataset is tiny by modern standards. The train set is sections 15-18 of the Penn Treebank, and the test set is section 20---there is no development split.

Since my specific application doesn't need to be comparable to past work and models on the task, I set about re-generating the data from a larger portion of the Penn Treebank. This was more involved than anticipated, maybe because the data and task are so old---I had to do a bit of software archaeology, and the steps are detailed below.

If you ever do a career in science with computers, you'll be doing software archeology more often than you might think: rewamping old code/simulations/analysis to work in new environments. pic.twitter.com/4ZI2qOnwEe
— Gael Varoquaux (@GaelVaroquaux) January 15, 2018

Step 1: Source the script used to generate the data

The CoNLL 2000 shared task site helpfully notes:

http://ilk.uvt.nl/team/sabine/homepage/software.html
The Perl script that was used for generating these training and test data sets from the Penn Treebank. It has been written by Sabine Buchholz from Tilburg University.

However, following the link and proceeding to the script download results in a dead link. Perhaps expected, since it's been almost 20 years.

By searching for the filename on GitHub (a great tool for finding old software and scripts), I stumbled upon this repo from Matt Gormley that has a modified version of the chunklink Perl script. Here's a gist to the script for posterity: https://gist.github.com/nelson-liu/4a1872d7062868cbc1affb545710b836

Step 2: Run the script used to generate the data.

Perl was before my time, but I managed to run the script with the perl on my Macbook. Here's the output of perl -v

$ perl -v

This is perl 5, version 18, subversion 2 (v5.18.2) built for darwin-thread-multi-2level
(with 2 registered patches, see perl -V for more detail)

To run the script, I downloaded the Penn Treebank and wrote a quick bash script to invoke the script on each Penn Treebank section in turn, redirecting the output for each section to a file.

The files generated by chunklink_2-2-2000_for_conll.pl are not in the CoNLL 2000 format, so I wrote a separate Python script called convert_to_conll2000_format.py to massage the output into proper space-sparated CoNLL chunking format. You can download that script here: https://gist.github.com/nelson-liu/4faaf5ccc67636939b299b289720ea94 , and it should be Python 2.x / 3.x compatible.

#! /usr/bin/env bash
set -e

# Untar the raw PTB data
echo "Unzipping raw PTB data"
tar -xf treebank_3_LDC99T42.tgz

# Make chunking data for each PTB section
mkdir -p chunklink_generated_data
mkdir -p conll2000_data
for section_num in {00..24}
do
    echo "Creating chunking data for section ${section_num}"
    cat treebank_3/parsed/mrg/wsj/${section_num}/*.mrg | perl chunklink_2-2-2000_for_conll.pl -N -ns > chunklink_generated_data/${section_num}.chunklink
    python convert_to_conll2000_format.py chunklink_generated_data/${section_num}.chunklink > conll2000_data/${section_num}.conll
done

This produces a folder named conll2000_data with 00.conll, 01.conll, etc. with the ConLL 2000-formatted data for each of the Penn Treebank sections. You can use cat to combine sections and create whatever train, dev, and test splits you might want.

Happy chunking!

Extracting last timestep outputs from PyTorch RNNs

Nelson Liu — Thu, 25 Jan 2018 05:05:51 GMT

Here's some code I've been using to extract the last hidden states from an RNN with variable length input. In the code example below:

lengths is a list of length batch_size with the sequence lengths for each element in the batch. It's a list because pack_padded_sequence also takes a list, so you already have it probably lying around.
batch_first is a boolean indicating whether the RNN is in batch_first mode or not.
output is the output of a PyTorch RNN as a Variable. If your output isn't a Variable for some reason, just remove the Variable call in the last line on idx.

idx = (torch.LongTensor(lengths) - 1).view(-1, 1).expand(
    len(lengths), output.size(2))
time_dimension = 1 if batch_first else 0
idx = idx.unsqueeze(time_dimension)
if output.is_cuda:
    idx = idx.cuda(output.data.get_device())
# Shape: (batch_size, rnn_hidden_dim)
last_output = output.gather(
    time_dimension, Variable(idx)).squeeze(time_dimension)

Here's a full code example with a RNN and variable-length input, adapted from an example on the PyTorch forums:

import torch
from torch.autograd import Variable
import torch.nn as nn

batch_size = 4
max_length = 3
hidden_size = 2
n_layers = 1
input_dim = 1
batch_first = True

# Data
vec_1 = torch.FloatTensor([[1, 2, 3]])
vec_2 = torch.FloatTensor([[1, 2, 0]])
vec_3 = torch.FloatTensor([[1, 0, 0]])
vec_4 = torch.FloatTensor([[2, 0, 0]])

# Put the data into a tensor.
batch_in = torch.zeros((batch_size, max_length, input_dim))
batch_in[0] = vec_1
batch_in[1] = vec_2
batch_in[2] = vec_3
batch_in[3] = vec_4

# Wrap RNN input in a Variable. Shape: (batch_size, max_length, input_dim)
batch_in = Variable(batch_in)
# The lengths of each example in the batch. Padding is 0.
lengths = [3, 2, 1, 1]

# Wrap input in packed sequence, with batch_first=True
packed_input = torch.nn.utils.rnn.pack_padded_sequence(
    batch_in, seq_lengths, batch_first=True)

# Create an RNN object, set batch_first=True
rnn = nn.RNN(input_dim, hidden_size, n_layers, batch_first=True) 

# Run input through RNN 
packed_output, _ = rnn(packed_input)

# Unpack, with batch_first=True.
output, _ = torch.nn.utils.rnn.pad_packed_sequence(
    out, batch_first=True)
print("Unpacked, padded output: ")
print(output)

# Extract the outputs for the last timestep of each example
idx = (torch.LongTensor(lengths) - 1).view(-1, 1).expand(
    len(lengths), output.size(2))
time_dimension = 1 if batch_first else 0
idx = idx.unsqueeze(time_dimension)
if output.is_cuda:
    idx = idx.cuda(output.data.get_device())
# Shape: (batch_size, rnn_hidden_dim)
last_output = output.gather(
    time_dimension, Variable(idx)).squeeze(time_dimension)
print("Last output: ")
print(last_output)

and the output:

Unpacked, padded output:
Variable containing:
(0 ,.,.) =
 -0.0279  0.8709
  0.7806  0.7903
  0.5799  0.9227

(1 ,.,.) =
  0.7244  0.7105
  0.5795  0.8988
  0.0000  0.0000

(2 ,.,.) =
 -0.7699  0.9169
  0.0000  0.0000
  0.0000  0.0000

(3 ,.,.) =
  0.4918  0.4545
  0.0000  0.0000
  0.0000  0.0000
[torch.FloatTensor of size 4x3x2]

Last output:
Variable containing:
 0.5799  0.9227
 0.5795  0.8988
-0.7699  0.9169
 0.4918  0.4545
[torch.FloatTensor of size 4x2]

As you can see, the code successfully extracted the last-timestep outputs for each example in the batch.

Some more context for those who might not be super familiar with PyTorch

PyTorch RNNs return a tuple of (output, h_n):

output contains the hidden state of the last RNN layer at the last timestep --- this is usually what you want to pass downstream for sequence prediction tasks.
h_n is the hidden state for t=seq_len (for all RNN layers and directions).

output is a tensor of shape seq_len, batch_size, hidden_size * num_directions if batch_first=False in the RNN, and it's a tensor of shape batch_size, seq_len, hidden_size * num_directions if batch_first=False.

If you're using a RNN with variable-length input (made possible with a PackedSequence), seq_len refers to the longest sequence in the PackedSequence. In this case, you often need to extract the output features for each batch at the last timestep (where the "last timestep" is the length of the sequence for the particular example). Note that it doesn't work to simply use output[-1] (or output[:, -1] if batch_first=True) since outputs beyond an example's length are padded with 0.

Flattening the Gigaword Corpus

Nelson Liu — Sat, 23 Sep 2017 07:39:00 GMT

Code for flattening the Gigaword corpus and associated usage instructions are at nelson-liu/flatten_gigaword

The English Gigaword Corpus is a massive collection of newswire text; the unzipped corpus is ~26 gigabytes, and there are are ~4 billion tokens. It's a commonly used corpus for language modeling and other NLP tasks that require large amounts of monolingual English data.

Despite its relative ubiquity, I couldn't find anything online to do something very simple --- extract the text from all the files in the corpus into one large text file. My motivation for doing this was to train a n-gram language model, but there are a variety of other uses for the flattened data as well.

Decompressing

The Gigaword corpus comes with seven directories of data compressed in gzip format. The first step is to naturally unzip all of it. To recursively unzip all the data in these directories, use the -r flag in gunzip.

gunzip -r /gigaword_path/data/

If your gunzip doesn't have this flag, piping the results of find to gunzip should do the trick.

Parsing and tokenizing an individual data file

In each of the directories, there are a variable number of files. Each of these data files are in SGML format. To parse a single file, I used the BeautifulSoup library. Extracting the raw text was as simple as finding all the words between

tags.

However, after looking at the data I quickly realized that it includes the original linebreaks as found inside the newswire text. Thus, one sentence can often have multiple newlines within it --- this confuses many tokenizers. To deal with this, I replace all consecutive newlines with spaces, and then tokenize each paragraph (block of text in a

tag) with SpaCy.

Thus, to parse a file, I:

Iterate through all the paragraphs in the SGML
Extract the text, and tokenize it
Write a new flattened file with one paragraph per line.

Each line in the output flattened file is thus a paragraph, and the tokens (as delimited by SpaCy) are space-separated. These files are perfectly compatible with language modeling toolkits like KenLM.

The script to parse a single file is at flatten_one_gigaword.py

Making it fast with parallel processing

Parsing one file can take quite a while (up to around 3 minutes). Combined with the fact that the Gigaword corpus has 1010 files, it's easy to see how processing the whole dataset be quite slow.

However, the task is embarrasingly parallel, so let's use multiple cores to flatten files simultaneously and merge them all at the end! This was pretty easily accomplished with GNU parallel, like so:

find ${GIGAWORDDIR}/data/*/* | parallel --gnu --progress -j ${NUMJOBS} \
    python flatten_one_gigaword.py \
           --gigaword-path \{\} \
           --output-dir ${OUTPUTDIR}

This command finds all the data files in the Gigaword directory on the disk, and then runs the flatten_one_gigaword.py file on each of them. The output directory is where the flattened version of each data file is written, and we can simply cat them together at the end to get our desired output. The final output is a file named flattened_gigaword.txt with one paragraph per line and with tokens delimited by spaces.

cat ${OUTPUTDIR}/*.flat > ${OUTPUTDIR}/flattened_gigaword.txt

The script to parse the entire dataset in parallel is at flatten_all_gigaword.sh

Paraphrase Identification Models in Tensorflow

Nelson Liu — Sat, 20 May 2017 17:44:50 GMT

I've been loosely hacking on the Quora Question Pairs dataset in my free time to get some more experience working with vanilla Tensorflow for NLP in a practical setting. Yesterday, I opened sourced the code I've written (with some contributions from Omar Khan, thanks!) and you can find it on GitHub (nelson-liu/paraphrase-id-tensorflow).

To be specific, I implemented the following models in the repo:

A basic Siamese LSTM baseline, loosely based on the model
in
Mueller, Jonas and Aditya Thyagarajan. "Siamese Recurrent Architectures for Learning Sentence Similarity." AAAI (2016).
A Siamese LSTM model with an added "matching layer", as described
in
Liu, Yang et al. "Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention." CoRR abs/1605.09090 (2016).
The more-or-less state of the art Bilateral Multi-Perspective Matching model
from
Wang, Zhiguo et al. "Bilateral Multi-Perspective Matching for Natural Language Sentences." CoRR abs/1702.03814 (2017).

Anecdotally, when I was first starting out with Tensorflow, by far the most effective learning strategy for me was to (1) read a paper, (2) find an open source implementation of it, and then (3) read through the code. Through this, I reinforced my knowledge of both the work described in paper as well as how people write Tensorflow code in practice.

Unfortunately, the largest issue was mostly (2) --- it's hard to find well-written projects that explain what's going on with minimal magic involved. This is fairly understandable, though, as research code is basically written ... for the sole purpose of research. Other people are rarely going to run your code after the fact, and frankly researchers get a gold star just for open-sourcing anything regardless of quality; there's thus little incentive to actually write your code as if other people would use it.

To this end, I've took painstaking care to try to implement the models "right way" by adhering Tensorflow best practices, document the code (through function docstrings and comments in general), and provide tests (98% coverage + running on a CI server).

Hopefully someone finds this helpful, and I'm happy to answer questions about the project and/or provide more info about how to get started with Tensorflow.

Installing and Updating GTX 1080 Ti Drivers / CUDA on Ubuntu

Nelson Liu — Sun, 30 Apr 2017 00:41:01 GMT

I recently had to figure out how to set up a new Ubuntu 16.04 machine with NVIDIA's new GTX 1080 Ti graphics card for use with CUDA-enabled machine learning libraries, e.g. Tensorflow and PyTorch; since the card (as of this writing) is relatively new, the process was pretty involved. The same tricks should also work for the newer Titan Xp graphics card.

Edits
(02/01/2019):
I've updated the install instructions to use driver version 410 (necessary for CUDA 10, but should retain backwards compatibility with older CUDA versions).

(1/27/2018):
Tensorflow 1.5.0 and PyTorch 0.3 now have pre-built binaries for CUDA 9. If you install CUDA 9, the driver version that comes with it should be fully compatible with the 1080 Ti. You can easily install CUDA 9 on most Linux distributions with your package manager (see here for details).

If you want to use CUDA 8 for some reason (e.g. using an older Tensorflow), read on...

(5/10/2017):
Looks like driver version 381 is out of beta and on the PPA, so I've updated the recommended driver versions and install instructions accordingly.

1. Install CUDA without the driver

I couldn't just install CUDA and have it work, since certain CUDA version (e.g., 8.0) come with a driver version (in the case of CUDA 8.0, driver version 375.26) that doesn't support the GTX 1080 Ti and other newer cards. As a result, installing CUDA from apt-get doesn't work since it installs this driver version. Thus, you have to install with the runfile, to opt-out of installing the driver.

When running the installer, make sure to not install the driver that comes with CUDA. We'll install the driver with apt-get in the next step.

Post Install Notes (Thanks to Jake Boggan for mentioning this in the comments): After installing, check that the CUDA folders are where you expect them to be (usually /usr/local). The CUDA installer creates a symlink at /usr/local/cuda that automatically points to the version of CUDA installed.

Make sure to add /usr/local/cuda/bin to your $PATH, and /usr/local/cuda/lib64 to your $LD_LIBRARY_PATH if you're on a 64-bit machine / /usr/local/cuda/lib to your $LD_LIBRARY_PATH if you're on a 32-bit machine. There's a bit more info at the CUDA docs, but the paths will likely differ based on version so be sure to manually verify that the folders you're adding to the environment variables exist.

2. Installing the driver with apt-get

To install the driver with apt-get, I used the Ubuntu graphics-drivers PPA. This method isn't officially supported by NVIDIA, but it seems to work well for many people.

At the graphics-drivers PPA homepage, there's a listing of the various graphics drivers that they offer; check the NVIDIA download website to figure out what version of the driver you need for your card. If it's in the PPA, great! If not, you unfortunately have to wait for them to add it. They're pretty timely, though.

Add the PPA to apt-get and update the index by running:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update

Now, we use it to install the desired driver versions (Major version 410 as of this writing):

sudo apt-get install nvidia-410

Reboot your computer, and the GPU should run on the new driver. To verify, run nvidia-smi and confirm that the Driver Version at the top of the output is what you expect and that the rest of the information looks good.

You should now be able to fire up Python and test that it works with Tensorflow or your favorite deep learning framework.

3. Verifying the installation worked

CUDA

To test the CUDA installation, you can run the deviceQuery example bundled with CUDA. If you navigate to the CUDA samples folder (/usr/local/cuda#.#/samples or ~/NVIDIA_CUDA-#.#_Samples by default), you can find the deviceQuery example in /1_Utilities/deviceQuery.

Running make in this directory should compile the CUDA source file to produce a binary that will produce a variety of statistics about your GPU and run some test on it. Run the binary with ./deviceQuery, and you should see a bunch of output about your device; here's my output with a 1080 Ti for comparison.

Drivers

If the driver installation went properly, you should be able to run nvidia-smi and get an output like the one below (the memory usage / temp / fan / GPU utilization will probably differ, since this was measured under load). Make sure that the version displayed in the top-left corner is the same as the one you expect:

Sun May  7 19:54:19 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 381.09                 Driver Version: 381.09                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 0000:02:00.0     Off |                  N/A |
| 42%   73C    P2   194W / 250W |   8417MiB / 11172MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

When I was using driver version 378, it oddly didn't show the name as GeForce GTX 108..., but rather as just Graphics Device. The card worked fine with TensorFlow, though.

It'd probably be a good idea to test that your GPU works with your machine learning library of choice, here are instructions for doing so on Tensorflow.

For the future: updating the apt-get drivers

It's pretty easy to upgrade the drivers to a different version.

First, remove the old drivers:

sudo apt-get purge nvidia*

Now, just install the new driver with the PPA as detailed above and reboot.

Making autoenv + conda faster

Nelson Liu — Sun, 26 Mar 2017 21:51:32 GMT

I've recently switched over to using the fantastic autoenv to automatically activate my anaconda environments and set necessary environment variables when I enter a directory on my terminal. You basically write some bash code in a .env file, put it into a directory, and autoenv will automatically run .env when you enter the directory or any of its subdirectories.

However, I found that putting just source activate desired_environment in my .env (to activate the desired_environment conda environment) made my shell very slow --- I'd have to wait ~2 seconds after issuing a cd into a directory with a .env file (or a subdirectory of one).

The following bash snippet makes activating conda environments with autoenv a lot faster:

current_environment=""
environment_to_activate=test

# $CONDA_PREFIX is non-empty when in an environment
if [[ $CONDA_PREFIX != "" ]]; then
  # Get the name of the environment from the path
  current_environment="${CONDA_PREFIX##*/}"
fi

if [[ $current_environment != $environment_to_activate ]]; then
  # We are not in the environment to activate, so activate it.
  source activate $environment_to_activate
fi

The snippet basically checks if you're already in the conda environment you want to activate (called test in this case, and assigned to environment_to_activate), and doesn't rerun the slow activate script if you are. Handy!

To use this snippet, just drop it into your .env file and replace test with the name of whatever environment you want to activate; your shell should feel a lot less slow.

Easy Progress Bars For Python File Reading with tqdm

Nelson Liu — Sat, 30 Jul 2016 03:47:00 GMT

I've been a fan of the tqdm Python module for quite some time, but I found it difficult to find a reason to use it; generally, loops run fast enough that a progress bar is unnecessary. However, I found a perfect use for it in reading large files.

If the task isn't something I can speed up via multiprocessing, I can use tqdm to decide whether I can grab a cup of coffee or work on something else while I let it run. tqdm allows me to easily add a progress bar to the read operation, like so:

with open(file_path) as file:
    for line in tqdm(file, total=get_num_lines(file_path)):
        # various operations here

As you can see, adding this functionality is as simple as wrapping the file with the tqdm method. However, to display the progress bar, tqdm needs an idea of how many total lines it needs to process. I use this code snippet from StackOverflow to quickly find this information when instantiating the progress bar:

import mmap

def get_num_lines(file_path):
    fp = open(file_path, "r+")
    buf = mmap.mmap(fp.fileno(), 0)
    lines = 0
    while buf.readline():
        lines += 1
    return lines

Here's what it looks like in action:

Pretty neat, in my opinion!

If you have any questions, comments, or suggestions, you're welcome to leave a comment below.

Nelson Liu's Blog

PhD Statement of Purpose

Newer PyTorch Binaries for Older GPUs

NSF GRFP Application Materials

Fixing system permissions when writing to Docker volumes

Student Perspectives on Applying to NLP PhD Programs

Table of Contents

Pre-application

Why apply now?

Choosing where to apply

Talking to Faculty Beforehand?

Statement of Purpose

Timeline: When to Start and Finish Writing

Structuring a Statement of Purpose

Tailoring Each Statement for Specific Universities

Getting Feedback on Your Statement

Using this as a Learning Opportunity

Letters of Recommendation

What to look for when choosing letter writers

When to start looking for recommenders

Asking for specifics in your letter, and getting them submitted

Publications

Transcripts / Grades

Standardized Exams: GRE / TOEFL

Interviews / Post-application Calls

Deciding where to go

Making the most of visit days

Misc. Topics

Residency Programs as Precursors to your PhD

In Conclusion

Software Archaeology: Re-generating the CoNLL 2000 Chunking Data

Step 1: Source the script used to generate the data

Step 2: Run the script used to generate the data.

Extracting last timestep outputs from PyTorch RNNs

Some more context for those who might not be super familiar with PyTorch

Flattening the Gigaword Corpus

Decompressing

Parsing and tokenizing an individual data file

Making it fast with parallel processing

Paraphrase Identification Models in Tensorflow

Installing and Updating GTX 1080 Ti Drivers / CUDA on Ubuntu

1. Install CUDA without the driver

2. Installing the driver with apt-get

3. Verifying the installation worked

CUDA

Drivers

For the future: updating the apt-get drivers

Making autoenv + conda faster

Easy Progress Bars For Python File Reading with tqdm