30 March 2015

Data Carpentry hackathon for genomics

I'm pleased to report back from the Data Carpentry (DC) genomics hackathon, which I attended last week with ~26 other folks at Cold Spring Harbor Labs in New York. The goal of this meeting was to develop modules for a DC workshop focused on analysis of next-generation sequencing and other genomic data. The original DC lessons were designed for a very general audience using ecological data, so we were tasked with outlining, organizing, and starting to write materials for a two-day workshop specifically for genomics.

Each of the following points could be thoroughly explored in their own post, but here are a few highlights from this meeting:

  • Attendees were a great mix of biology researchers and educators from a range of institutions (research intensive, primarily undergraduate), computer scientists, and assessment specialists. This meant we were pulling from a broad range of skills, and incorporating multiple perspectives in planning.
  • The length of the meeting (2.5 days) allowed us to get a running start on actually developing materials (GitHub repos here prefaced with "genomics"). In addition to "intro to Unix" material that would largely remain constant from the original DC lesson, we started developing six modules that cover a general genomics workflow: setting up a project, getting to know your data, data wrangling (QC and alignment), analysis and visualization, and cloud/HPC. I personally found it remarkable and gratifying to see so much attention paid to the initial preparatory stages of a project.
  • Numerous folks emphasized the importance of understanding your target audience. Some of these discussions related to the assumed skill level (or pre-requisites) for workshop attendees. Other conversations related the need to accommodate particular cultural or gender issues while teaching to make the learning environment comfortable for everyone. 
  • What makes DC workshops special and distinct from other courses? In developing the modules described above, we talked about the distinction between Software Carpentry and Data Carpentry, as well as if and when instructors should be expected to teach about biology (rather than computing/data analysis). The general consensus is that the focus of DC on telling a narrative about data means we should be emphasizing "best practices" for improving productivity and reproducibility, rather than advocating for particular types of analyses. That being said, there is ample opportunity during lessons to model rigorous methods, as well as provide extra resources for students to improve their skills in experimental design and statistical reasoning.
  • A particularly challenging aspect of developing such resources is assessment of student improvement following a workshop. It's challenging to evaluate how much students will retain after such a short period of time (2 days), as well as whether these skills will transfer over to their research methods. One breakout group focused on developing a strategy for surveying students prior to and directly following a workshop to measure immediate learning, as well as 3-6 months following to measure long-term gains. We targeted question formats that would address student learning in terms of the following areas: declarative knowledge (Can you recall this fact?), skills (Can you write this code?), and attitude (Will you use this skill?).

I was initially on the fence about whether to apply for the hackathon. I'm a first year professor wallowing in the murky depths of teaching a new course, and my overtaxed brain was whispering that maybe it would cause too much stress. My gut, thankfully, doesn't always listen to my brain. Moreover, the class I'm piloting this semester is an undergraduate bioinformatics class focused on genomics, so the DC hackathon fit naturally into my preparation for the last few weeks of the semester. I'm looking forward to reporting back soon about my semester-long class is wrapping up, as well as my first teaching experience for Software Carpentry workshop in a few weeks.