RDycore Code Structure and Organization
In this section, we describe the organization of RDycore's source files. We have attempted to group functions and data types together into subsystems that can be easily understood separately from one another.
Because RDycore's behavior is determined entirely by input parameters specified in the YAML input format, it's a good idea to familiarize yourself with this format as you read this section.
A Brief Tour of RDycore's Source Tree
The root level of the RDycore source tree
contains a README.md
file, a CMakeLists.txt
file that defines the build
system, and some configuration files for documentation and tooling.
There are several folders at the root of the source tree:
+ RDyore +- .github/workflows
|
+- cmake
|
+- config
|
+- docs
|
+- driver
|
+- external
|
+- include
|
+- share
|
+- src
|
+- tools
.github/workflows
: workflows that support our GitHub Continuous Integration Environmentcmake
: CMake scripts that support the build system, testing, etc.config
: shell scripts to help with running RDycore on our target platformsdocs
: Markdown source files for our mkdocs-based documentationdriver
: C and Fortran driver programs, including the standalone RDycore drivers and a few other development toolsexternal
: third- party libraries used by RDycore that aren't provided by PETScinclude
: therdycore.h.in
template that generates therdycore.h
API header file, and all private header files (located within theprivate
subfolder)share
: data files for initial conditions, material properties, and unstructured grids (used mainly for testing)src
: the source code for RDycoretools
: miscellaneous tools and scripts for building and deploying Docker images
Take a look at each of these folders to familiarize yourself with their
contents. In particular, the src
folder has a few important subfolders:
src/f90-mod
: the RDycore Fortran module that mirrors the C library's capabilitiessrc/swe
: functions and data structures specific to the solution of the 2D shallow water equationssrc/sediment
: functions and data structure specific to the solution of the 2D shallow water equations coupled with diffusion-diffusion equation for sediment dynamicssrc/tests
: unit tests for subsystems within RDycore
The private headers in the include
folder contain definitions of opaque types
and functions that are helpful throughout RDycore but are not part of the API.
The driver
folder also has its own tests
subfolder that defines tests
that run the standalone drivers and perform convergence tests for selected
problems.
The RDycore Object and its Lifecycle
RDycore (the "river dynamical core") is represented in code by a data structure
called RDy
, declared as an opaque type in include/rdycore.h.
It is defined in include/private/rdycoreimpl.h.
In this section, we describe how to manipulate RDy
to set up and run
simulations. It may be helpful to refer to the standalone C driver
(and or the corresponding Fortran driver)
as you read this section.
- Initialization
Before RDycore can be used in any program, the supporting systems must be
initialized with a call to RDyInit
:
// initialize RDycore subsystems
PetscCall(RDyInit(argc, argv, helpString));
This function accepts the argc
and argv
command line argument parameters
accepted by any C program, plus a string printed to display help/usage
information.
Next, create an RDy
object, passing it an MPI communicator, the path to a
YAML input file, and the pointer to the desired RDy
object:
// create an RDy on the given communicator with the given input
RDy rdy;
PetscCall(RDyCreate(MPI_COMM_WORLD, "my-input.yaml", &rdy));
This only creates the object--it does not read input or allocate any resources for the problem described within the input.
- Setup
To read the input file and ready your RDy
object to run the simulation
defined therein, call RDySetup
:
// read input and set up the dycore
PetscCall(RDySetup(rdy));
After this call, you're ready to run a simulation. You can also all any query
or utility functions on your RDy
object to do whatever you need for your own
simulation. For example, you might need information about the unstructured grid,
or perhaps you are specifying specific boundary values for a Dirichlet boundary
condition. See the API header file for all the possibilities.
- Timestepping
The simplest way to start running your simulation after setup is to run
RDyAdvance
(which takes a single step) within a while
loop that uses
RDyFinished
as termination condition:
// run the simulation to completion
while (!RDyFinished(rdy)) {
// ... pre-step logic here ...
// advance the simulation by a single step
PetscCall(RDyAdvance(rdy));
// ... post-step logic here ...
}
If you look in the standalone C driver, you can see various pre- and post-step logic to accommodate data transfer, restarts, etc.
- Finalization
At the end of an RDycore-enabled program, you must destroy the RDy
object with
a call to RDyDestroy
:
// destroy the dycore
PetscCall(RDyDestroy(&rdy));
Finally, call RDyFinalize
to reclaim all resources used by RDycore and its
subsystems:
// clean up
PetscCall(RDyFinalize());
Computational Domain
RDycore solves equations on a topologically-two-dimensional domain with topography represented by an elevation coordinate \(z(x, y)\).
RDycore uses two objects to represent the computional domain:
- DMPlex: a PETSc data structure that stores the minimum set of information needed to define an unstructured grid
- RDyMesh:
An RDycore-specific data structure (implementation here)
that stores data for cells, edges, and vertices, constructed using the
DMPlex
object
Both of these objects are created by a call to RDySetup.
The RDyMesh
object is a simple container with
- metadata like numbers of cells, edges, vertices
- a
cells
field that stores cell data in anRDyCells
data structure - an
edges
field that stores edge data in anRDyEdges
data structure - a
vertices
field that stores vertex data in anRDyVertices
data structure - metadata useful for output/visualization
The domain is partitioned into disjoint regions, each of which consists of a set of contiguous cells, usually triangles or quads. Additionally, the domain is bounded by one or more boundaries, each of which consists of a set of edges (which can be thought of as two-dimensional "faces"). We describe how regions and boundaries work below.
Regions
When a mesh file is read to create a DMPlex
object during RDySetup,
DMLabel objects are created
that represent disjoint sets of cells. Each cell set represents a region. From
each label/cell set, RDycore constructs an RDyRegion,
which is basically a named array of local cell IDs.
RDycore then uses information in the YAML input file to
associate initial condition and source data with each region by name. This
process is implemented in the InitRegions
function (https://github.com/RDycore/RDycore/blob/main/src/rdysetup.c#L182).
Boundaries
Similarly to regions, the DMPlex
object created during RDySetup
constructs DMLabel
objects represents disjoint sets of edges, each of which
represents a boundary. From each label/edge set, RDycore constructs an
RDyBoundary,
a named array of local edge IDs.
RDycore then uses information in the YAML input file to
associate boundary condition data with each boundary by name. This process is
implemented in the InitBoundaries
function, which is called in RDySetup
.
Operators
For purposes of our discussion, an "operator" is a mathematical construct for computing interior and boundary fluxes and source terms. Each operator is represented by a specific data structure associated with functions that implement its operations.
Overview
RDycore uses two different but numerically equivalent technical approaches for solving its governing equations:
- A CPU-only version that uses PETSc. We refer to this version as the PETSc version.
- A CPU and GPU supported version that uses PETSc and CEED. We refer to this version as the CEED version.
The PETSc version is mainly used to troubleshoot our CEED version. Both of the RDycore versions are
in wrapped in a C struct called Operator
defined in the rdyoperatorimpl.h.
The CEED version of the operator code has two parts:
- "host" code that runs on the CPU and dispatches calls to the "device" (a CPU or GPU, depending on your runtime configuration) to create, update, manipulate, and destroy the operator
- "device" code that performs the mathematical operators associated with the operator using the libceed exascale library.
To make the PETSc version of the code be similar to the CEED version, the PETSc operator also has two parts that are similar to the CEED version. However, the PETSc version of the operator code only runs on a CPU.
The AddPhysicsOperators()
in operator.c
sets up the operators for the PETSc or CEED version of the RDycore. Furthermore, each version of operators
sets up physcis-specific (i.e., SWE or sediment dynamics) operators.
NOTE: at the time of writing, the physics in RDycore's operators supports the following configurations: (1) SWE: PETSc and CEED version and (2) Sediment Dynamics: PETSc version.
Shallow Water Equations (SWE) operators
All shallow-water-equations-specific code lives within the swe source subfolder.
The SWE operators are created and configured by AddPetscFlowOperators
and AddCeedFlowOperators
functions for
PETSc and CEED, respecitively. These two functions live in operator.c
.
The interface for the "SWE operator" is visible in the private header
rdysweimpl.h. Few details about the the PETSc and CEED
version is as follows:
-
PETSc version: The creation of operators and the physics implementation lives in
swe/swe_petsc.c
-
CEED version: The creation of operators and the physics implementation lives in two separate files. The creation operators is defined within
swe/swe_ceed.c
, while the header fileswe/swe_ceed_impl.h
defines the physics implementation. The header file defines inlineCeedQFunction
s that run on the device.
For both PETSc and CEED version, we implement the following SWE operators:
-
Riemann Flux operator: computes finite-volume fluxes between interior cells and on the boundaries, according to specified boundary conditions. In the CEED implementation, these two operators (for interior and boundary fluxes) are combined into a composite operator that is called using in-place device data. In the PETSc version, these operators are implemented by CPU code that is applied sequentially to relevant data.
-
Source operator: computes the source term for the shallow water equations
Sediment Dynamics operators
The sediment dynamics-specific code, which comparises of SWE coupled with advection-diffusion equation, lives within the
sediment source subfolder.
Presently, only the PETSc version of the operator is implemented via AddPetscSedimentOperators
that lives
in operator.c
.
Input
The parsing of input is dispatched from RDySetup
and implemented in ReadConfigFile.
At the top of yaml_input.c
, several structs are defined that represent a YAML
"schema". In this "schema," each struct represents a YAML mapping object, with
fields in the mapping corresponding to fields in the struct. Accordinglhy, YAML
arrays within mappings are represented by array fields.
For a more detailed explanation of how this YAML schema is parsed, see the libcyaml documentation.
Output
RDycore can produce output for visualization, as well as diagnostic output helpful for troubleshooting and debugging.
Visualization
RDycore supports two visualizable output formats:
- XDMF: an ancient, marginally-supported format, popular in the earth science community, that stores data in HDF5 files and metadata in XML files
- CFD General Notational System (CGNS): a standard format with lots of traction in the CFD community but less visibility within the earth science community
- PETSc's binary output format, which is better than nothing but probably not very full-featured.
The writing of XDMF files is implemented in xdmf_output.c
in the WriteXDMFOutput
function. CGNS and binary output are handled mostly by
PETSc's built-in output machinery.
All output functions are registered using PETSc's
TSMonitor feature and
called in RDyAdvance
(within rdyadvance.c)
at the frequency specified in the YAML input file.
Diagnostics: time series
RDycore can write out time series data useful for debugging simulations. Currently, the only quantity that can be recorded as a time series is the "boundary flux"--the total flux through the domain boundary, which is needed to account for mass non-conservation in simulations with open boundaries.
Time series data are accumulated and written by logic in time_series.c
.
The function InitTimeSeries
initializes this subsystem, and is called in
RDyAdvance
.
These time series diagnostics are invasive, and can negatively affect simulation performance, particularly when using GPUs. Therefore they are intended only for debugging and troubleshooting.
Checkpoints and Restarts
The writing of checkpoint files (which store all state data necessary to restart a simulation) and the process of restarting a previously-running simulation are largely handled by PETSc:
InitCheckpoints
(defined in checkpoint.c sets up a TSMonitor that calls theWriteCheckpoint
function at an interval specified in the YAML input file. This function is called inRDySetup
.ReadCheckpointFile
(also defined incheckpoint.c
) reads the checkpoint file (in a format specified within the YAML input. This function is called inRDySetup
if the input file specifieѕ that a simulation should be restarted.
Running Ensembles
RDycore can run ensembles, which are sets of concurrently-running simulations ("ensemble members") with variations in selected input parameters. RDycore uses an extremely simple mechanism to run these simulations within a single MPI job:
- on initialization, RDycore splits its "world" communicator into a number of smaller, communicators of equal size
- each of these smaller communicators is assigned to a single ensemble member
- each ensemble member is configured according to the ensemble specification in the YAML input file
- all ensemble members are run concurrently to completion
Currently, ensemble calculations only run ensemble members, possibly generating member-specific output. No post-processing or analysis has been implemented, though it would likely be simple to do so.
Ensemble member configuration
The data for each ensemble member is listed explicitly in the ensemble section
of the YAML input specification. All parameters for ensemble members override
the corresponding parameters in other sections of the file. This logic is
implemented in the ConfigEnsembleMember
function within ensemble.c.
Support for Fortran
A Fortran version of the public C interface is available in an rdycore
Fortran module
in the src/f90-mod
directory. This module is hand-written and uses the
Fortran 2003 iso_c_binding
standard intrinsic module to map C functions to equivalent Fortran subroutines
and functions, and defines appropriate data types.
The mapping from a C function to a Fortran subroutine (or function) is accomplished in two parts:
-
A C function is made available to Fortran by defining a Fortran function within an
interface
block that uses abind(C)
annotation specifying the case-sensitive name of the C function. All data types in this Fortran function must correspond to supported C data types via theiso_c_binding
module. The function returns an integer corresponding to thePetscErrorCode
return type of the C function. We refer to such a function as a "C-bound Fortran function". -
A Fortran "wrapper" subroutine is defined that calls the C-bound Fortran function defined in item 1. This subroutine follows the PETSc convention in which the last argument (
ierr
) gets the return value of the C-bound Fortran function.
For example, let's take a look at the Fortran function for RDyCreate
, which
calls the C function RDyCreateF90
. This separate C function is required
because RDyCreate
accepts an MPI_Comm
input parameter, and Fortran uses
integers for MPI communicators.
First, we create a C-bound Fortran function rdycreate_
and associate it with
the C function RDyCreateF90
within the interface
block near the top of
f90-mod/rdycore.F90
:
interface
...
integer(c_int) function rdycreate_(comm, filename, rdy) bind(c, name="RDyCreateF90")
use iso_c_binding, only: c_int, c_ptr
integer, intent(in) :: comm
type(c_ptr), value, intent(in) :: filename
type(c_ptr), intent(out) :: rdy
end function
...
end interface
Then we define a subroutine RDyCreate
that calls rdycreate_
and sets ierr
to its return value:
subroutine RDyCreate(comm, filename, rdy_, ierr)
use iso_c_binding, only: c_null_char
character(len=1024), intent(in) :: filename
integer, intent(in) :: comm
type(RDy), intent(out) :: rdy_
integer, intent(out) :: ierr
integer :: n
character(len=1024), pointer :: config_file
n = len_trim(filename)
allocate(config_file)
config_file(1:n) = filename(1:n)
config_file(n+1:n+1) = c_null_char
ierr = rdycreate_(comm, c_loc(config_file), rdy_%c_rdy)
deallocate(config_file)
end subroutine
Notice that we have to do some things to construct a NULL-terminated C string
for the filename with a character array and c_null_char
, and then pass a
pointer to this array with c_loc
. This is how things are done with Fortran's
iso_c_binding
module, which you can learn about in the link at the top of this
section.
Whenever a new C function is added to the public interface in rdycore.h
, these
two corresponding Fortran items must be added to the template from which it is generated
to support its use in Fortran.
Special considerations
- C pointers: Perhaps counterintuitively, C pointers must be passed by value
to Fortran-bound C functions. In other words, any argument of type
type(c_ptr)
must have thevalue
attribute andintent(in)
. You can think of a pointer as a memory location, which is morally equivalent to an integer of the appropriate size. Theintent(in)
expresses that the pointer itself remains unchanged even if the data it points to is modified by the function. NOTE: anRDy
C object is expressed as a C pointer in the Fortran module. - C primitives: Because C passes arguments to functions by value and Fortran
by reference, it is necessary to add the
value
attribute to any parameter in a C-bound Fortran function that has a non-pointer C type. This includes mostintent(in)
primitive parameters in C-bound Fortran functions. Note thatintent(out)
parameters in these functions must necessarily be pointers in C functions, so they must not have thevalue
attribute. - Enumerated types: An enumerated type in C can be mapped to Fortran as a
set of related integer parameters. See, for example, the way time units
are expressed in the
rdycore
Fortran module. - PETSc types: PETSc types like
Vec
are passed to C-bound Fortran functions asPetscFortranAddr
with thevalue
attribute andintent(in)
. This is very similar to the way we treat C pointers, with some magic PETSc pixie dust sprinkled on it to satisfy conditions required by PETSc.