Background

Groundwater contaminant plumes stemming from historic releases in industrial areas are often co-mingled with other plumes. Sorting out what portion of plume mass came from what source can be difficult, because release histories are generally not well known and the subsurface environment is typically heterogeneous and poorly characterized. Nonetheless, because remediation is oftentimes very costly, any tool that can support allocation of costs among potential responsible parties, even when constrained by only limited data, can be helpful.

Analytical plume models – typically amenable to solution by spreadsheet – are often used to inform allocation calculations. Such models are conceptually very simple and require only limited sets of parameters (e.g., mean groundwater velocity, dispersion coefficient tensor components) to run. This simple approach contrasts significantly with discrete numerical models that can address complex flow and transport processes, often in response to time- and spatially-variable boundary conditions. In this context, the lack of complexity in analytical plume models is both an advantage and disadvantage. A particular problem specific to allocation considerations is the inflexibility of the source term provided by simple analytical models. The two- and three-dimensional plume models proposed by Domenico and Robbins (1985) or Domenico (1987), for example, rely upon a rectangular fixed-concentration boundary condition. This approach is limiting because (1) it ignores any mass influx history (or history-matching attempt), and (2) there are conceptual mathematical issues associated with superimposing multiple constant-concentration sources from different source areas.

The Model

A more flexible approach entails employing an instantaneous point source plume model (e.g., Bear, 1972):

equation-1
Equation-1: An instantaneous source (slug) release plume model in two dimensions.

The key to extending the utility of this equation is to recognize that it can be numerically integrated in both space and time, allowing simulation over shaped source regions, with arbitrary source influx terms that are expressed as a function of time:

equation-2
Equation-2: An extended plume model that includes a source term that is finite in space and time.

Equation-2 also allows the dispersion coefficient to be defined as a function of plume length scale by simple superposition. For example, the longitudinal dispersion coefficient, which is defined as the product of the longitudinal dispersivity (units = length) and groundwater velocity, can be set equal to some fraction of the overall plume length (e.g., the product of groundwater velocity and time), so that DL = 0.1 (v*t) *v.

To evaluate Equation-2 for a set of source terms representing an industrial area I am currently evaluating, I wrote a couple of scripts, one in python and one in julia, strictly to get a feel for the performance issues involved. Both script are prototypes, and both could be tweaked to gain better respective performances if warranted.

Computational Considerations

Equation-2 represents a triple numerical integration of Equation-1. This can become a computational burden for certain applications, including (1) evaluating multiple solutions to address parameter uncertainty, and (2) populating a grid to generate a contour map. An additional issue is addressing the flexibility allowed by furnishing arbitrary functions for the time- and spatial dependencies of the source term(s) for the mass fluxes. Ideally, these should be provided as external, user-specified functions that do not require any editing of the code itself. In other words, the time and space functions for the source term mass fluxes should be specified as strings in the input file which can be evaluated by the script.

Both python and julia provide tools with which to efficiently address both the integration and string-specifying-function problems. These include:

  1. Canned numerical integration schemes. Python’s scipy package includes the quad function for single-variable integration and dblquad for two variables, where the integration limits of the first can be used to generate those for the second, as functions. Julia offers the quadgk function for single-variable integration as standard fare, while other multivariable integration packages can be added separately.
  2. An ability to evaluate external strings as functions at run time. Incidentally, this is something difficult to easily realize with a fully-compiled executable one could generate with C, FORTRAN, etc.
  3. A just-in-time (JIT) compiler to improve execution speed.

Although there are some apples-versus-oranges issues here in comparing the two languages, the codes that I came up with performed very differently, with the python code running much slower than the julia equivalent (to the point of being impractical to use, at least in some cases). More on this below …

Example Application

The example problem for the two scripts involves an industrial area that I am currently evaluating as part of an ongoing project. I employed a few trial source terms for four different source areas (see figure); the equations used to describe the temporal and spatial variabilities in the source terms are expressed as polynomials:

  1. Polynomials that describe the mass influx per unit time, per unit area, for each source zone as a function of time (see graph).
  2. Polynomials that describe the limits of integration delineating the extent of each source area, with respect to groundwater flow direction. Source zones with irregular shapes are subdivided into separate, adjoining source areas that are defined piece-wise (included in source term input file, as discussed below).
source_time_graph
Postulated release histories for four separate plume source regions in an industrial area.
plume_map
Modeled plumes stemming from posited source areas (indicated by blue shaded regions) after 50 years of elapsed time. Shaded contours indicate concentration, while pie charts indicate local distributions of plume mass attributable to different source areas. Note that the use of scale-/time-dependent dispersion coefficients eliminates the problem of spurious upstream pseudo-diffusive transport that sometimes plagues analytical plume models.

Python

Here is a complete listing of the python script used to represent the model:

###########################################################################
#
# ShapeSource.py
#
# time- and-space integrated slug source term for advective-dispersive
# solute transport
#
# source is integrate dover an arbitrary shape
#
###########################################################################

from numpy import *
from scipy.integrate import quad, dblquad
from numba import jit
import time

##### classes #####

class Aquifer:

    def __init__(self):
        # aquifer transport properties
        line_input = []
        input_file = open('aquifer.txt','r')
        for line in input_file: line_input.append(line.split())
        input_file.close()
        self.b = float(line_input[0][1]) 				# aquifer thickness
        self.phi = float(line_input[1][1]) 			# porosity
        self.v = float(line_input[2][1]) 				# mean pore velocity
        self.theta = float(line_input[3][1]) 			# flow direction, c-clockwise from due east (radians)
        self.R = float(line_input[4][1]) 				# retardation coefficient
        self.k = float(line_input[5][1]) 				# solute first-order degradation rate constant
        self.fl = float(line_input[6][1]) 				# ratio of longitudinal dispersivity to scale
        self.ft = float(line_input[7][1]) 				# ratio of transverse dispersivity to scale
        self.Dlmin = float(line_input[8][1])*self.v 		# minimum longitudinal dispersion coefficient
        self.Dtmin = float(line_input[9][1])*self.v 		# minimum transverse dispersion coefficient

class Source:

    def __init__(self, name, x0, y0, tend, xs, xf, tfunc, xtopf, xbotf):
        # source zone properties
        self.name = name 				# name of source (to track allocation)
        self.x0 = x0 						# center point location (absolute coordinate); corresponds to (0, 0) in relative coordinates
        self.y0 = y0
        self.tend = tend 				# time at (abrupt) cessation of source flux
        self.xs = xs 						# start of source area w.r.t. flow direction (relative coordinate)
        self.xf = xf						# end of source area w.r.t. flow direction (relative coordinate)
        self.tfunc = compile(tfunc, '<string>', 'eval') 	# source mass influx as a function of absolute t (mass time^-1 area^-1)
        self.xtopf = compile('lambda x: ' + xtopf, '<string>', 'eval') 	# upper boundary to source area, = f(x) (relative coordinate)
        self.xbotf = compile('lambda x: ' + xbotf, '<string>', 'eval') 	# lower boundary to source area, = f(x) (relative coordinate)		

    def dCdt(self, tau, x, y, teval, aquifer):
        # slice-in-time for a small [point] source, based on Bear, 1972 solute transport analytical solution
        # tau = integration value (along source history; absolute value for time)
        # teval = elapsed time since time zero;
        t = teval - tau 				# lag time in evaluation time versus source term
        Dl, Dt = Dispers(tau, aquifer.v, aquifer.R, aquifer.fl, aquifer.ft, aquifer.Dlmin, aquifer.Dtmin)
        dM = eval(self.tfunc)
        return Cslug(x, y, tau, aquifer.v/aquifer.R, dM/aquifer.R, aquifer.phi, aquifer.b, Dl, Dt, aquifer.k)

    def Ct(self, y, x, xp, yp, t, aquifer):
        # concentration from a point source after time-integration
        return quad(self.dCdt, t-self.tend, t, args=(xp-x, yp-y, t, aquifer))[0]

    def C(self, xp, yp, t, aquifer):
        # compute integrated concentration at location (x, y) and time t for this particular source
        return dblquad(self.Ct, self.xs, self.xf, eval(self.xbotf), eval(self.xtopf), args=([xp, yp, t, aquifer]))[0]

class Grid:

    def __init__(self, numSources):

        # read grid settings and set up grid
        line_input = []
        input_file = open('grid.txt','r')
        for line in input_file: line_input.append(line.split())
        input_file.close()
        x0 = float(line_input[1][1])
        y0 = float(line_input[1][2])
        xf = float(line_input[2][1])
        yf = float(line_input[2][2])
        num_x = int(line_input[3][1])
        num_y = int(line_input[3][2])
        x = linspace(x0, xf, num_x+1)    # create a uniform grid with supplied specs
        y = linspace(y0, yf, num_y+1)
        X, Y = meshgrid(x,y)
        self.points =  array([X.flatten(), Y.flatten()])          # locations
        self.concs = zeros((len(self.points[0]), numSources), float)    # concentrations, columns = sources

    def WriteOutput(self, sources):
        # write grid results to output file
        output_file = open('grid_output.txt','w')
        output_file.writelines(['x','\t','y'])
        for i in xrange(len(sources)): output_file.writelines(['\t', sources[i].name])
        output_file.writelines(['\t', 'total'])
        output_file.writelines(['\n'])
        for i in xrange(len(self.points[0])):
            output_file.writelines([str(self.points[0][i]),'\t',str(self.points[1][i])])
            for j in xrange(len(sources)): output_file.writelines(['\t', str(self.concs[i,j])])
            output_file.writelines(['\t', str(self.concs[i,:].sum())])
            output_file.writelines(['\n'])
        output_file.close()

##### utility functions (not assigned to a class) #####	

@jit
def Cslug(x, y, t, v, M, phi, b, Dl, Dt, k):
    C = M/(4.*pi*phi*b*sqrt(Dl*Dt)*t) * exp(-((x-v*t)**2)/(4.*Dl*t) - y**2/(4.*Dt*t) - k*t)
    return C

@jit
def Dispers(t, v, R, fl, ft, Dlmin, Dtmin):
    # compute the dispersion coefficient tensor components as a function of plume length scale
    Dl = fl * v**2 * t / R + Dlmin	      # longitudinal dispersion coefficient
    Dt = ft * v**2 * t / R + Dtmin	      # transverse dispersiion coefficient
    return Dl, Dt	

def Rotate(points, theta):
    # rotate coordinates about standard origin (0., 0.) by angle theta
    xprime = points[0]*cos(theta) - points[1]*sin(theta)
    yprime = points[0]*sin(theta) + points[1]*cos(theta)
    return xprime, yprime	

def ReadSources():
    # read sources file and assign source object
    line_input = []
    input_file = open('sources.txt','r')
    for line in input_file: line_input.append(line.split())
    input_file.close()
    sources = []
    for i in xrange(1, len(line_input)):
        name = line_input[i][0]
        x0 = float(line_input[i][1])
        y0 = float(line_input[i][2])
        tend = float(line_input[i][3])
        xs = float(line_input[i][4])
        xf = float(line_input[i][5])
        tfunc = line_input[i][6]
        xtopf = line_input[i][7]
        xbotf = line_input[i][8]
        sources.append(Source(name, x0, y0, tend, xs, xf, tfunc, xtopf, xbotf))
    return sources

##### main script #####	

def ShapeSource(timeOut):

    start  = time.time()            # track execution time

    # define problem

    aquifer = Aquifer()
    print 'Read aquifer properties.'

    sources = ReadSources()
    print 'Read source(s) properties.'

    # inform contouring

    grid = Grid(len(sources))
    print 'Read grid file.'

    for j, src in enumerate(sources):

        print 'Working on ... ', src.name
        shiftX = grid.points[0] - src.x0
        shiftY = grid.points[1] - src.y0
        xprime, yprime = Rotate(array([shiftX, shiftY]), -aquifer.theta)                   

        for i in xrange(len(xprime)):
            grid.concs[i, j] = src.C(xprime[i], yprime[i], timeOut, aquifer)

    print 'Completed computing; time = ',time.time()-start, 'seconds.'

    grid.WriteOutput(sources)
    print 'Done.'

##### run script #####

timeOut = 50.1
ShapeSource(timeOut)

The text input files that support the script, which address aquifer properties, source term definitions (including spatial and temporal dependencies, expressed as polynomials), and gridding to support generating concentration contours, are the same as those used by the equivalent julia script; these are provided with that script under my GitHub repository.

The bottlenecks in the code performance involve, of course, the nested multiple integrations. Some regions of the grid that lie within source area boundaries are particularly slow with regard to convergence rates for the integration algorithm(s). Therefore, I suspect that while proper vectorization of the calculations could provide a modest performance improvement, the awkwardness of the resulting code would probably not justify the effort. Instead, I added a couple of other features to boost performance. These included (1) compiling the source term string input that is subject to python’s eval() function:

    def __init__(self, name, x0, y0, tend, xs, xf, tfunc, xtopf, xbotf):
        # source zone properties
        self.name = name 				# name of source (to track allocation)
        self.x0 = x0 						# center point location (absolute coordinate); corresponds to (0, 0) in relative coordinates
        self.y0 = y0
        self.tend = tend 				# time at (abrupt) cessation of source flux
        self.xs = xs 						# start of source area w.r.t. flow direction (relative coordinate)
        self.xf = xf						# end of source area w.r.t. flow direction (relative coordinate)
        self.tfunc = compile(tfunc, '<string>', 'eval') 	# source mass influx as a function of absolute t (mass time^-1 area^-1)
        self.xtopf = compile('lambda x: ' + xtopf, '<string>', 'eval') 	# upper boundary to source area, = f(x) (relative coordinate)
        self.xbotf = compile('lambda x: ' + xbotf, '<string>', 'eval') 	# lower boundary to source area, = f(x) (relative coordinate)

and, (2) adding some JIT assistance via Numba (Continuum Analytics), which provides a convenient means to compile individual python functions at runtime, generating a considerable speed up just by applying simple decorators:

@jit
def Dispers(t, v, R, fl, ft, Dlmin, Dtmin):
    # compute the dispersion coefficient tensor components as a function of plume length scale
    Dl = fl * v**2 * t / R + Dlmin	      # longitudinal dispersion coefficient
    Dt = ft * v**2 * t / R + Dtmin	      # transverse dispersiion coefficient
    return Dl, Dt

Execution time for the python script for the 1,200 grid points was approximately 550 seconds, with both the numba and compile features added, on my reasonably fast core i7-based laptop. Separate testing of both additions indicated that each resulted in a speedup of roughly a factor of two. Further improvements in execution time might be achievable with additional numba decorators elsewhere in the code, but there is a problem to overcome. The @jit decorator does not work for functions inside a class (i.e., methods); instead, the entire class is compiled using numba’s @jitclass decorator. This requires that the variables defined as members of the class have their data types (e.g., integer, float, etc.) specified in advance. However, it’s not clear to me how the compile() & eval() functions, which are included as source objects as defined by the Source class, not to mention – separately – the quad and dblquad integration routines provided by scipy, can all play together properly. Suggestions are welcome …

Julia

The julia language code equivalent to the python script shown above is provided in full under my GitHub repository. The corresponding execution time for the julia implementation on the same example problem and same computer was around 150 seconds, thus yielding a substantial improvement over the python version. The julia script employs the quadgk routine three times for the nested integrals shown in Equation-2; perhaps additional performance improvements could be achieved by applying multivariable integration approaches (e.g., cuba ) instead, but that is left for another time. Perhaps some re-writing of the code to do some parallel processing might also help?

My only other concerns regarding the julia script are aesthetic in nature. First, julia is not really object oriented, so the code needs to be organized a little differently. In particular, functions cannot be associated with user-defined composite data types in the same way that methods are assigned to classes in python. This issue applies to julia in general and is not specific to this particular example application. However, a second issue is this: the @eval macro feels a bit clunky in that it only reads strings defined at the global scope level. This forced me to read portions of the input file separately at the global level to populate the source term strings, and then to read other parameters and assign them to the “Source” type:

### read plume source terms; this must be implemented as global scope ###

tfunc_string = []
xtopf_string = []
xbotf_string = []
data = readdlm("sources.txt", '\t', header=true)
for i = 1:size(data[1], 1) 							# can have multiple sources
	push!(tfunc_string, data[1][i, 7])
	push!(xtopf_string, data[1][i, 8])
	push!(xbotf_string, data[1][i, 9])
end

### main script ###

function ShapeSource(time_output)

	# read in model parameter sets
	aquifer = GetAquifer() 						# read aquifer properties
	sources = GetSources() 						# read source term properties
	xgrid, ygrid = GetGrid() 					# read grid settings

	# calculate
	concs = zeros(Float64, length(xgrid), length(sources))
	for (j, src) in enumerate(sources)
	println("Source = " * "\t" * src.name)
		# redefine global functions to match sources
		@eval tfunc(t) = $(parse(tfunc_string[j]))
		@eval xtopf(x) = $(parse(xtopf_string[j]))
		@eval xbotf(x) = $(parse(xbotf_string[j]))
		for i = 1:length(xgrid)
			xprime, yprime = Rotate(xgrid[i]-src.x0, ygrid[i]-src.y0, -aquifer.theta)
			concs[i, j] = C(xprime, yprime, time_output, aquifer, src)
		end
	end

	# write to output file
	WriteOutput(xgrid, ygrid, sources, concs)

	println("Done.")

end

In addition, because @eval macros cannot be assigned to composite user-defined types (as opposed to the equivalent python implementation), they are, in effect, redefined each time a new source term is read. This generates a warning message for every separate source term, which is an annoying artifact for this particular example.

Otherwise, the performance improvement over python is welcome for this particular application, so I won’t complain too loudly 🙂