The purpose of this blog
This blog presents a collection of case studies (simulations, data exploration exercises, etc.) that have required me to do a little scientific computing. Each post (listed under “Home”) contains a short description of the problem, the overall approach in writing a script (in Python, D, Julia, or R), and some example results. In most cases, the scripts themselves, not to mention the accompanying parameter input files, are too large and unwieldy to list in the post, so I’ve placed everything in its own respective GitHub repository, with hyperlinks to the location.
A little background
I have frequently observed that my colleagues in the environmental consulting world (mainly hydrogeology, but I do other stuff …) are often not able to extend their analytical tool sets beyond spreadsheets, commercial interfaces to simulation packages such as MODFLOW (for groundwater flow), and occasionally ArcGIS. However, with a little effort, a far more versatile tool set can be developed by learning to custom code environmental models. I am writing this blog to provide a place to share my passion for building environmental models, to illustrate some examples, and to provide links to my repository of scripts and associated input files. The applications range in purpose from support scripts to help formulate environmental simulation problems to sandbox exercises for just kicking around ideas.
I am a big fan of using open source software whenever I can for all sorts of data analysis tasks as well as for developing numerical models. Posting some of my scripts on GitHub (see individual blog posts) is my admittedly small attempt to give back. Some of my favorite applicable packages, which include but are not limited to programming languages, are listed here, in no particular order at all:
- Python, including the numpy (numerical arrays), scipy (math and statistics packages), matplotlib (scientific graphics), pandas (R-like data frames), and scikit-learn (machine learning) packages: These are my go-to set of tools, by far. Python is extremely easy to learn, and there are specialized packages from a vast range of applications. You can use python to script simple data processing routines, or to write sophisticated numerical models. The combination of a scipy and scikit-learn (both supported with matplotlib) also provides a good alternative to R. The downside: python can be slow, especially for code that is not vectorized to process large arrays in chunks.
- D-language: D seems to me to be a hugely under-utilized/under-appreciated programming language. It was intended as an improvement on C++ and consequently uses syntax similar to C++, C#, etc. However, its implementation of garbage collection as well as dispensing with the need for pointers – plus a variety of added language features – provides the computational environmental scientist with a programming platform that produces fast code but is also comparatively easy to learn and use. Its strongly-typed nature distinguishes it readily from python, but this is something that is relatively easy with which to get accustomed. I’ve included a few D-language applications on this blog as demos for how easy it is to set up fast executables: a multi-phase flow model, an alluvial fan formation simulator, and purely as a sandbox experiment, an ecology food-web model (still a work in progress, to be sure).
- Julia: Julia also generates fast code, but employs a just-in-time compiler to achieve this performance. It does not require variables to be explicitly typed (although this will speed up execution) and does not depend on vectorization. It is oriented around functions and does not use the class concept promoted in object-oriented languages like python or D (this can be viewed either as a plus or minus). I’ve used julia to write a simplified river sediment transport and deposition model and a multiphase reactive transport model for partially water-saturated porous media with an active multicomponent gas-phase.
- SciLab: an open-source, partial clone of MATLAB. I have only written a few scripts with it, as I choose to code most of my numerical models using python + numpy/scipy. However, a big plus for SciLab is the integrated nature of its tools (no separate installations of numpy, matplotlib, etc. and occasional dependency problems encountered with python), plus some additional algorithm options for some routines. I’ve included a couple of small examples here on my blog involving 1-D multiphase flow modeling, (e.g., Buckley-Leverett equation solution) using the numerical method of lines.
- R: I am not much of an expert with respect to R, although I’ve written a small set of scripts for various applications. Its obvious advantage is the enormous number of packages available for addressing statistical modeling needs. However, its syntax is a somewhat odd in places, making the code a little less easy to follow. I have also found it to be a little on the slow side compared to python for the limited number of opportunities I’ve had to compare the two.
Beyond programming languages, my other favorite public-domain tools for working with environmental data include:
- Quantum GIS (QGIS): QGIS is an awesome tool for processing and analyzing geospatial data. It features a nice variety of plugins for creating maps and processing vector and raster layers. It is easy to learn and offers much of the everyday capabilities of ArcGIS, but is open-source.
- ModelMuse (MODFLOW pre- and post-processor): MODFLOW is key tool in the hydrogeologist’s toolbox for quantifying groundwater flow, as constrained by the subsurface environment as well as a variety of boundary/recharge conditions. While several commercial pre-/post-processors are in wide use, the USGS’s ModelUse, which is public-domain, offers an easy-to-use, very cost-effective alternative that provides much of the same functionality, including the ability to read and write GIS shape files.
- PHREEQC and PHAST: the USGS’s PHREEQC code is a well-vetted geochemical model, with accompanying thermodynamic databases, that can simulate aqueous complexation, mineral precipitation-dissolution reactions, adsorption processes, etc. PHAST extends PHREEQC’s geochemical modeling capabilities to 2-D/3-D reactive transport.
Lastly, I will mention a couple of exceptions to my open-source preference policy. I am very fond of Surfer and Voxler from Golden Software for contouring and visualization of 2-D and 3-D data sets, respectively. Both packages offer ease-of-use plus great looking graphics.
My name is Walt McNab. I am a computational environmental scientist by vocation (and by passion!), which means that I create quantitative models to help understand the physical and chemical processes that influence environmental data sets. My academic training is in hydrogeology, but I have taught myself to code along my career path in research and consulting. I am presently semi-retired but I am continuing to work on a variety of environmental and water supply consulting projects.