Learning and unlearning how to code like a scientist

While this may not be the case for younger scientists and students in my field who have grown up in a culture that emphasizes learning to code, climate scientists my age and older are largely self-taught programmers. They, and I (a former scientist) learned Fortran, C, or some other compiled language to write, or run, weather/climate models for research. Most of them use scripted languages, such as Matlab, NCAR’s NCL, R, or Python, for performing statistical analysis on observational data and model output. Unlike computer scientists, who (hopefully) draw up design plans, optimize algorithms, and review code in their development processes, scientist-programmers write code primarily with an end in mind. That is, they write and modify code on the fly depending what they need to accomplish during the course of a specific project. Rarely is scientific code version-controlled, nor is it consistently tested for reproducibility.  Comments are often an afterthought, and provide little information for users (usually grad students and postdocs) tasked with running programs written by someone else

I was lucky enough to learned Fortran 77 (yes, Fortran. No, I am not 60 years old) in a classroom setting as an undergraduate meteorology student, which helped instill good coding practices like comments, writing readable code, and, above all, not using GOTO statements. My final project, which I wish I had saved, was some of the best code I have ever written. Unfortunately, I did not save it, but I recall that I wrote a program to read in monthly time series of the NAO and either temperature or precipitation data from ascii files, calculate the correlation coefficients for each month (I think), and write formatted output to a text file. I did all of this via a terminal (c shell) and the Gedit text editor on old Sun systems in the lab, or in my dorm room on my now ancient laptop (running Windows XP), remotely accessing the lab workstations with Portable puTTY.

Some people are planners, and try to map out their programs before coding them up like a proper computer scientist. I, on the other hand, coded like a scientist from the get-go, preferring to write a program, compile it, wait for the crash, and debug-lather, rinse, repeat. There are, advantages to both approaches, with the former being a better option for long-term projects, and the latter working better when you need to solve a problem NOW. Unfortunately, the nature of graduate research (and, research in general) promotes bad coding habits, as scientists are rewarded for output in the form of publications, grants, and attention-grabbing news blurbs that bring in the $$$$. It doesn’t matter (at least in the short term) if your code is sloppy, or only runs on a specific machine with a specific compiler when the moon is waxing, so long as you complete your thesis/paper before your project funding dries up. In the end, though, buggy code will come back to bite you, whether it involves having to redo your analysis (if you’re lucky like me), or having to retract your work (e.g., [1] and [2] ).

My coding sins included copy-pasting code blocks, re-writing the same program create new figures rather than learning to pass in arguments for things like titles, labels, and file names, failing to name programs clearly, and having to open several to determine if they were the ones I needed. These are all fairly benign problems; I mean, I completed my work satisfactorily, and was (usually) able to back up and fix problems when they arose. Grad school is just as much about learning to recover from falls as it is a way to level up your career. Still, I wasted a lot of time redoing work, and engaged in way to much of what I call “reactive analysis”–the rushed, frantic, and error-prone analysis performed to meet a deadline. Reactive analysis generates figures quickly, but does not provide sufficient time to interpret them, meaning that you’ll have to redo them later when your advisor points out that your error bars make no sense, or the units are incorrect. Reactive analysis provides soundbites for your grant application, and plenty of stupid typos you probably would have caught if you’d had time to proofread the document one more time before you submitted it.

Now, less-than-stellar code is not the only cause of reactive analysis, but is one that the programmer can control to a degree, unlike shifting deadlines, personal crises, or system outages. Thankfully, I have managed to quell some of my bad habits at my current job out of necessity. No, the scripts I write aren’t paragons of Python, Fortran, or Bash, but I do take the time to comment and improve the code I use frequently, or plan on sharing with others.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s