llcreader

This module provides optimized routines for reading LLC data from disk or over the web, via xarray and dask. This was motivated explicitly by the need to provide easy access to the NASA-JPL LLC4320 family of simulations, in support of the NASA SWOT Science Team.

Warning

These routines are new and experimental. APIs are subject to change. Eventually the functionality provided by llcreader may become part of the main xmitgcm module.

Limitations

llcreader uses Dask to provide a lazy view of the data; no data are actually loaded into memory until required for computation. The data are represented using a graph, with node corresponding to a specific chunk of the full array. Because of the vast size of the LLC datasets, the user is advised to read about Dask best practices. If you wish to work with the full-size datasets, you must be aware of Dask’s inherent limitations, and first gain experience with smaller problems. In particular, it is quite easy to create very large Dask graphs that can overwhelm your computer’s memory and CPU capacity, even without loading any data!

To make this warning more concrete, let’s consider a single variable such as Theta from the LLC4320 simulation. In “face” form, the dimensions of this array are 9030 timesteps x 90 vertical levels x 13 faces x 4320 j points x 4320 i points = 197170122240000 values. With 32-bit precision, this requires 788 TB of memory! If we divide this array into 1 GB chunks, we will have 788,000 chunks. The lazy representation of the array requires about 1 kB of memory per chunk, and processing the graph for computation requires about 200 µs per chunk. Therefore, the lazy representation will occupy nearly 1GB of memory and take nearly 3 minutes to even begin computation. There are 5 such 3D variables in the LLC4320 dataset, plus another 10 2D variables. The bottom line is that creating graphs of this size will seriously bog down a laptop.

There are several strategies that can be used to mitigate this problem:

  • Subset variables: llc reader allows you to create datasets the contain only the specific variables you need, via the varnames keyword.

  • Subset vertical levels: if you don’t need full depth fields, you can select just the depth levels you need using the k_levels keyword.

  • Subset in time: you can specific the iteration number start, stop and step manually.

  • Use larger chunks: llcreader allows you to vary the size of the chunks in the k dimension using the k_chunksize keyword. Using full depth chunks (k_chunksize=90) will produce chunks up to 20 GB in size, but in a much smaller graph.

Because of how the files are compressed and stored on disk, unfortunately it is not possible to subset in space at the time of dataset creation. However, this can be done later, via xarray.

Pre-Defined Models

We provide some pre-defined references to existing stores and models that can be used right away. These are

  • llcreader.ECCOPortalLLC2160Model: LLC2160 accessed via ECCO data portal

  • llcreader.ECCOPortalLLC4320Model: LLC4320 accessed via ECCO data portal

  • llcreader.PleiadesLLC2160Model: LLC2160 accessed on Pleaides filesystem

  • llcreader.PleiadesLLC4320Model: LLC4320 accessed on Pleaides filesystem

  • llcreader.CRIOSPortalASTE270Model: ASTE Release 1 accessed via AWS

  • llcreader.SverdrupASTE270Model: ASTE Release 1 accessed on Sverdrup filesystem at UT Austin

Below are a few examples of how to use these.

ECCO HTTP Data Portal

NAS has created an experimental data portal to access the LLC data over the web via standard HTTP calls. More info and a data browser can be found at https://data.nas.nasa.gov/ecco/. The llcreader module provides a way to access this data directly via xarray and dask. These examples can be run from anywhere; however the speed at which data can be loaded depends on the network bandwidth to the NAS HTTP server and the rate at which this server can extract data from the underling filesystem.

In this example, we display the whole dataset lazily, following the advice above to use a large k_chunksize. By default, all variables and all timesteps are loaded:

>>> from xmitgcm import llcreader
>>> model = llcreader.ECCOPortalLLC4320Model()
>>> ds = model.get_dataset(k_chunksize=90)
>>> print(ds)
<xarray.Dataset>
Dimensions:   (face: 13, i: 4320, i_g: 4320, j: 4320, j_g: 4320, k: 90, k_l: 90, k_p1: 91, k_u: 90, time: 9030)
Coordinates:
  * face      (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * i         (i) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * i_g       (i_g) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * j         (j) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * j_g       (j_g) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * k         (k) int64 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * k_u       (k_u) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * k_l       (k_l) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * k_p1      (k_p1) int64 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89 90
    niter     (time) int64 ...
  * time      (time) datetime64[ns] 2011-09-13 ... 2012-09-23T05:00:00
    drC       (k_p1) >f4 dask.array<chunksize=(91,), meta=np.ndarray>
    drF       (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    dxC       (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dxF       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dxG       (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyC       (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyF       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyG       (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    hFacC     (k, face, j, i) float32 dask.array<chunksize=(90, 3, 4320, 4320), meta=np.ndarray>
    hFacS     (k, face, j_g, i) float32 dask.array<chunksize=(90, 3, 4320, 4320), meta=np.ndarray>
    hFacW     (k, face, j, i_g) float32 dask.array<chunksize=(90, 3, 4320, 4320), meta=np.ndarray>
    PHrefC    (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    PHrefF    (k_p1) >f4 dask.array<chunksize=(91,), meta=np.ndarray>
    rA        (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    rAs       (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    rAw       (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    Z         (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    Zp1       (k_p1) >f4 dask.array<chunksize=(91,), meta=np.ndarray>
    rhoRef    (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    XC        (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    YC        (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    Zl        (k_l) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    Zu        (k_u) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
Data variables:
    Eta       (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    KPPhbl    (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    oceFWflx  (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    oceQnet   (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    oceQsw    (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    oceSflux  (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    oceTAUX   (time, face, j, i_g) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    oceTAUY   (time, face, j_g, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    PhiBot    (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    Salt      (time, k, face, j, i) float32 dask.array<chunksize=(1, 90, 3, 4320, 4320), meta=np.ndarray>
    SIarea    (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    SIheff    (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    SIhsalt   (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    SIhsnow   (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    SIuice    (time, face, j, i_g) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    SIvice    (time, face, j_g, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    Theta     (time, k, face, j, i) float32 dask.array<chunksize=(1, 90, 3, 4320, 4320), meta=np.ndarray>
    U         (time, k, face, j, i_g) float32 dask.array<chunksize=(1, 90, 3, 4320, 4320), meta=np.ndarray>
    V         (time, k, face, j_g, i) float32 dask.array<chunksize=(1, 90, 3, 4320, 4320), meta=np.ndarray>
    W         (time, k_l, face, j, i) float32 dask.array<chunksize=(1, 90, 3, 4320, 4320), meta=np.ndarray>

This dataset is useless for computations on a laptop, because the individual chunks require nearly 20 GB of memory. Some more more practical examples are the following.

Get a single 2D variable:

>>> ds = model.get_dataset(varnames=['Eta'])
>>> print(ds)

<xarray.Dataset>
Dimensions:  (face: 13, i: 4320, i_g: 4320, j: 4320, j_g: 4320, k: 90, k_l: 90, k_p1: 91, k_u: 90, time: 9030)
Coordinates:
  * face     (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * i        (i) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * i_g      (i_g) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * j        (j) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * j_g      (j_g) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * k        (k) int64 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * k_u      (k_u) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * k_l      (k_l) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * k_p1     (k_p1) int64 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89 90
    niter    (time) int64 ...
  * time     (time) datetime64[ns] 2011-09-13 ... 2012-09-23T05:00:00
    drC      (k_p1) >f4 dask.array<chunksize=(91,), meta=np.ndarray>
    drF      (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    dxC      (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dxF      (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dxG      (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyC      (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyF      (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyG      (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    hFacC    (k, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    hFacS    (k, face, j_g, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    hFacW    (k, face, j, i_g) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    PHrefC   (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    PHrefF   (k_p1) >f4 dask.array<chunksize=(91,), meta=np.ndarray>
    rA       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    rAs      (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    rAw      (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    Z        (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    Zp1      (k_p1) >f4 dask.array<chunksize=(91,), meta=np.ndarray>
    rhoRef   (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    XC       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    YC       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    Zl       (k_l) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    Zu       (k_u) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
Data variables:
    Eta      (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>

Get a few vertical levels from some 3D variables:

>>> ds = model.get_dataset(varnames=['Salt', 'Theta'], k_levels=[1, 10, 40])
>>> print(ds)
<xarray.Dataset>
Dimensions:  (face: 13, i: 4320, i_g: 4320, j: 4320, j_g: 4320, k: 3, k_l: 3, k_p1: 6, k_u: 3, time: 9030)
Coordinates:
  * face     (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * i        (i) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * i_g      (i_g) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * j        (j) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * j_g      (j_g) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * k        (k) int64 1 10 40
  * k_u      (k_u) int64 1 10 40
  * k_l      (k_l) int64 1 10 40
  * k_p1     (k_p1) int64 1 2 10 11 40 41
    niter    (time) int64 ...
  * time     (time) datetime64[ns] 2011-09-13 ... 2012-09-23T05:00:00
    drC      (k_p1) >f4 dask.array<chunksize=(6,), meta=np.ndarray>
    drF      (k) >f4 dask.array<chunksize=(3,), meta=np.ndarray>
    dxC      (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dxF      (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dxG      (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyC      (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyF      (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyG      (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    hFacC    (k, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    hFacS    (k, face, j_g, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    hFacW    (k, face, j, i_g) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    PHrefC   (k) >f4 dask.array<chunksize=(3,), meta=np.ndarray>
    PHrefF   (k_p1) >f4 dask.array<chunksize=(6,), meta=np.ndarray>
    rA       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    rAs      (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    rAw      (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    Z        (k) >f4 dask.array<chunksize=(3,), meta=np.ndarray>
    Zp1      (k_p1) >f4 dask.array<chunksize=(6,), meta=np.ndarray>
    rhoRef   (k) >f4 dask.array<chunksize=(3,), meta=np.ndarray>
    XC       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    YC       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    Zl       (k_l) >f4 dask.array<chunksize=(3,), meta=np.ndarray>
    Zu       (k_u) >f4 dask.array<chunksize=(3,), meta=np.ndarray>
Data variables:
    Salt     (time, k, face, j, i) float32 dask.array<chunksize=(1, 1, 3, 4320, 4320), meta=np.ndarray>
    Theta    (time, k, face, j, i) float32 dask.array<chunksize=(1, 1, 3, 4320, 4320), meta=np.ndarray>

Note that when vertical levels are subset like this, any vertical coordinate associated with dimension k_p1 will have levels above and below the selected k_levels, which are at cell center.

A list of all available variables can be seen as follows:

>>> print(model.varnames)
['Eta', 'KPPhbl', 'oceFWflx', 'oceQnet', 'oceQsw', 'oceSflux', 'oceTAUX',
'oceTAUY', 'PhiBot', 'Salt', 'SIarea', 'SIheff', 'SIhsalt', 'SIhsnow',
'SIuice', 'SIvice', 'Theta', 'U', 'V', 'W']

The full set of options for these commands is enumerated at xmitgcm.llcreader.BaseLLCModel.get_dataset().

Pleaides Filesystem

The home of the LLC is NASA’s Pleaides supercomputer. llc reader provides classes for quickly initializing and loading this data. These classes only work from Pleaides itself and will raise an error if invoked on other systems.

The Pleiades models work very similarly to the ones defined above:

>>> model = llcreader.PleiadesLLC2160Model()
>>> ds = model.get_dataset(varnames=['Eta'], type='latlon')
>>> print(ds)
<xarray.Dataset>
Dimensions:  (face: 13, i: 8640, i_g: 8640, j: 6480, j_g: 6480, k: 90, k_l: 90, k_p1: 90, k_u: 90, time: 18679)
Coordinates:
  * time     (time) datetime64[ns] 2011-03-06 ... 2013-04-22T06:00:00
  * k_p1     (k_p1) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * face     (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * k        (k) int64 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * i        (i) int64 0 1 2 3 4 5 6 7 ... 8633 8634 8635 8636 8637 8638 8639
  * k_u      (k_u) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * j_g      (j_g) int64 0 1 2 3 4 5 6 7 ... 6473 6474 6475 6476 6477 6478 6479
  * i_g      (i_g) int64 0 1 2 3 4 5 6 7 ... 8633 8634 8635 8636 8637 8638 8639
  * k_l      (k_l) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * j        (j) int64 0 1 2 3 4 5 6 7 ... 6473 6474 6475 6476 6477 6478 6479
Data variables:
    Eta      (time, j, i) >f4 dask.array<shape=(18679, 6480, 8640), chunksize=(1, 6480, 2160)>

Because of the high-performance Lustre filesystem on Pleiades, data throughput should be much higher than via the ECCO data portal.

ASTE Release 1 on AWS

Monthly time mean output from the Arctic Subpolar gyre sTate Estimate (ASTE) Release 1 has been made available on AWS servers. ASTE is a medium-resolution data-constrained and dynamically consistent ocean-sea ice synthesis, spanning 2002-2017. Read more about this effort in [Nguyen et al, 2020].

Users can access this output in essentially the same way as they access LLC4320/2160 output on the ECCO Data Portal. The key differences are:

  • specifying type=latlon to get_dataset is not available, because most of the model grid is not on a regular lat/lon configuration. Notice that specifying this for the global models above discards the Arctic - this is most of ASTE!

  • there are 6 “faces” compared to 13 in the global grids

  • the grid is much smaller, for example a single 3D temperature field is 6 x 270 x 270 vs 13 x 4320 x 4320 as above, < 1% the size

  • some variables are named differently than the LLC4320/LLC2160 output, following MITgcm standard naming conventions. These are:

    THETA       : Potential Temperature [degC]
    SALT        : Salinity [psu]
    ETAN        : Sea level anomaly [m]
    UVELMASS    : Zonal Velocity [m/s] (mass weighted)
    VVELMASS    : Meridional Velocity [m/s] (mass weighted)
    WVELMASS    : Vertical Velocity [m/s] (mass weighted)
    

where the “mass-weighted” refers to the fact that ASTE uses the time varying, r* vertical coordinate, and this changing coordinate has been taken into account during the time averages.

Example usage to get temperature and salinity:

>>> aste = llcreader.CRIOSPortalASTE270Model()
>>> ds = aste.get_dataset(varnames=['THETA','SALT'])
>>> ds
<xarray.Dataset>
Dimensions:    (face: 6, i: 270, i_g: 270, j: 270, j_g: 270, k: 50, k_l: 50,
k_p1: 51, k_u: 50, time: 193)
Coordinates:
  * face       (face) int64 0 1 2 3 4 5
  * i          (i) int64 0 1 2 3 4 5 6 7 8 ... 262 263 264 265 266 267 268 269
  * i_g        (i_g) int64 0 1 2 3 4 5 6 7 8 ... 262 263 264 265 266 267 268 269
  * j          (j) int64 0 1 2 3 4 5 6 7 8 ... 262 263 264 265 266 267 268 269
  * j_g        (j_g) int64 0 1 2 3 4 5 6 7 8 ... 262 263 264 265 266 267 268 269
  * k          (k) int64 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49
  * k_u        (k_u) int64 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49
  * k_l        (k_l) int64 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49
  * k_p1       (k_p1) int64 0 1 2 3 4 5 6 7 8 9 ... 42 43 44 45 46 47 48 49 50
    niter      (time) int64 ...
  * time       (time) datetime64[ns] 2002-02-01 ... 2018-01-01T01:20:00
    CS         (face, j, i) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    SN         (face, j, i) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    drC        (k_p1) >f8 dask.array<chunksize=(51,), meta=np.ndarray>
    drF        (k) >f8 dask.array<chunksize=(50,), meta=np.ndarray>
    dxC        (face, j, i_g) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    dxG        (face, j_g, i) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    dyC        (face, j_g, i) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    dyG        (face, j, i_g) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    Depth      (face, j, i) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    PHrefC     (k) >f8 dask.array<chunksize=(50,), meta=np.ndarray>
    PHrefF     (k_p1) >f8 dask.array<chunksize=(51,), meta=np.ndarray>
    rA         (face, j, i) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    rAs        (face, j_g, i) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    rAw        (face, j, i_g) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    rAz        (face, j_g, i_g) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    Z          (k) >f8 dask.array<chunksize=(50,), meta=np.ndarray>
    Zp1        (k_p1) >f8 dask.array<chunksize=(51,), meta=np.ndarray>
    rhoRef     (k) >f8 dask.array<chunksize=(50,), meta=np.ndarray>
    XC         (face, j, i) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    XG         (face, j_g, i_g) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    YC         (face, j, i) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    YG         (face, j_g, i_g) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    hFacC      (k, face, j, i) float64 dask.array<chunksize=(1, 2, 270, 270), meta=np.ndarray>
    hFacS      (k, face, j_g, i) float64 dask.array<chunksize=(1, 2, 270, 270), meta=np.ndarray>
    hFacW      (k, face, j, i_g) float64 dask.array<chunksize=(1, 2, 270, 270), meta=np.ndarray>
    maskC      (k, face, j, i) float64 dask.array<chunksize=(1, 2, 270, 270), meta=np.ndarray>
    maskCtrlC  (k, face, j, i) float64 dask.array<chunksize=(1, 2, 270, 270), meta=np.ndarray>
    maskCtrlS  (k, face, j_g, i) float64 dask.array<chunksize=(1, 2, 270, 270), meta=np.ndarray>
    maskCtrlW  (k, face, j, i_g) float64 dask.array<chunksize=(1, 2, 270, 270), meta=np.ndarray>
    maskInC    (face, j, i) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    maskInS    (face, j_g, i) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    maskInW    (face, j, i_g) float64 dask.array<chunksize=(2, 270, 270), meta=np.ndarray>
    maskS      (k, face, j_g, i) float64 dask.array<chunksize=(1, 2, 270, 270), meta=np.ndarray>
    maskW      (k, face, j, i_g) float64 dask.array<chunksize=(1, 2, 270, 270), meta=np.ndarray>
    Zl         (k_l) >f8 dask.array<chunksize=(50,), meta=np.ndarray>
    Zu         (k_u) >f8 dask.array<chunksize=(50,), meta=np.ndarray>
Data variables:
    THETA      (time, k, face, j, i) float32 dask.array<chunksize=(1, 1, 2, 270, 270), meta=np.ndarray>
    SALT       (time, k, face, j, i) float32 dask.array<chunksize=(1, 1, 2, 270, 270), meta

All available diagnostics are shown here:

>>> aste.varnames
['ADVr_SLT', 'ADVr_TH', 'ADVxHEFF', 'ADVxSNOW', 'ADVx_SLT', 'ADVx_TH',
'ADVyHEFF', 'ADVySNOW', 'ADVy_SLT', 'ADVy_TH', 'DETADT2', 'DFrE_SLT', 'DFrE_TH',
'DFrI_SLT', 'DFrI_TH', 'DFxEHEFF', 'DFxESNOW', 'DFxE_SLT', 'DFxE_TH',
'DFyEHEFF', 'DFyESNOW', 'DFyE_SLT', 'DFyE_TH', 'ETAN', 'ETANSQ', 'GM_PsiX',
'GM_PsiY', 'KPPg_SLT', 'KPPg_TH', 'MXLDEPTH', 'PHIBOT', 'SALT', 'SFLUX',
'SIaaflux', 'SIacSubl', 'SIarea', 'SIatmFW', 'SIatmQnt', 'SIheff', 'SIhsnow',
'SIsnPrcp', 'SItflux', 'SIuice', 'SIvice', 'SRELAX', 'TFLUX', 'THETA', 'TRELAX',
'UVELMASS', 'VVELMASS', 'WSLTMASS', 'WTHMASS', 'WVELMASS', 'oceFWflx',
'oceQnet', 'oceQsw', 'oceSPDep', 'oceSPflx', 'oceSPtnd', 'oceSflux', 'oceTAUX',
'oceTAUY', 'sIceLoad']

Nguyen, A. T., H. Pillar, V. Ocana, A. Bigdeli, T. A. Smith, and P. Heimbach, 2020: The Arctic Subpolar gyre sTate Estimate (ASTE): Description and assessment of a data-constrained, dynamically consistent ocean-sea ice estimate for 2002-2017. J. Adv. Model. Earth Syst., submitted. https://doi.org/10.1002/essoar.10504669.3

ASTE Release 1 on Sverdrup

Much in the same way LLC4320/2160 are available on Pleiades, ASTE Release 1 is available on Sverdrup, a cluster at the University of Texas at Austin. Those with access can get release 1 output with:

>>> aste = llcreader.SverdrupASTE270Model()

Manual Dataset Creation

Another way to use this module is to manually create the necessary objects. This would be needed if you are working with LLC output that is not stored in a known location. First we create a Store object:

>>> from xmitgcm import llcreader
>>> from fsspec.implementations.local import LocalFileSystem
>>> fs = LocalFileSystem()
>>> store = llcreader.BaseStore(fs, base_path='/path/to/global_oce_llc90')

Then we use this to initialize a Model object:

>>> model = llcreader.LLC90Model(store)

From this object, we can create datasets:

>>> ds_faces = model.get_dataset(varnames=['S', 'T', 'U', 'V', 'Eta'],
              iter_start=0, iter_stop=9, iter_step=8)
>>> ds_faces
<xarray.Dataset>
Dimensions:  (face: 13, i: 90, i_g: 90, j: 90, j_g: 90, k: 50, k_l: 50, k_p1: 50, k_u: 50, time: 2)
Coordinates:
  * face     (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * i        (i) int64 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * i_g      (i_g) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * j        (j) int64 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * j_g      (j_g) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * k        (k) int64 0 1 2 3 4 5 6 7 8 9 10 ... 40 41 42 43 44 45 46 47 48 49
  * k_u      (k_u) int64 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49
  * k_l      (k_l) int64 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49
  * k_p1     (k_p1) int64 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49
    niter    (time) int64 ...
  * time     (time) datetime64[ns] 1948-01-01T12:00:00 1948-01-01T20:00:00
Data variables:
    S        (time, k, face, j, i) >f4 dask.array<shape=(2, 50, 13, 90, 90), chunksize=(1, 1, 3, 90, 90)>
    T        (time, k, face, j, i) >f4 dask.array<shape=(2, 50, 13, 90, 90), chunksize=(1, 1, 3, 90, 90)>
    U        (time, k, face, j, i_g) >f4 dask.array<shape=(2, 50, 13, 90, 90), chunksize=(1, 1, 3, 90, 90)>
    V        (time, k, face, j_g, i) >f4 dask.array<shape=(2, 50, 13, 90, 90), chunksize=(1, 1, 3, 90, 90)>
    Eta      (time, face, j, i) >f4 dask.array<shape=(2, 13, 90, 90), chunksize=(1, 3, 90, 90)>

There are many options you can pass to get_dataset to control the output and dask chunking. See the class documentation for more details: xmitgcm.llcreader.BaseLLCModel.get_dataset().

API Documentation

Models

class xmitgcm.llcreader.BaseLLCModel(store)[source]

Class representing an LLC Model Dataset.

Parameters

store : llcreader.BaseStore

The store object where the data can be found

mask_ds : zarr.Group

Must contain variables mask_c, masc_w, mask_s

Attributes

dtype

(numpy.dtype) Datatype of the data in the dataset

nx

(int) Number of gridpoints per face (e.g. 90, 1080, 4320, etc.)

nz

(int) Number of vertical gridpoints

delta_t

(float) Numerical timestep

time_units

(str) Date unit string, e.g ‘seconds since 1948-01-01 12:00:00’

iter_start

(int) First model iteration number (inclusive; follows python range conventions)

iter_stop

(int) Final model iteration number (exclusive; follows python range conventions)

iter_step

(int) Spacing between iterations

iters

(list of ints) Specific iteration numbers in a list, possibly with nonuniform spacing. Either provide this or the iter parameters above.

varnames, grid_varnames

(list) List of data variable and grid variable names contained in the dataset

mask_override

(dict) Override inference of masking variable, e.g. {'oceTAUX': 'c'}

get_dataset(varnames=None, iter_start=None, iter_stop=None, iter_step=None, iters=None, k_levels=None, k_chunksize=1, type='faces', read_grid=True, grid_vars_to_coords=True)[source]

Create an xarray Dataset object for this model.

Parameters

*varnames : list of strings, optional

The variables to include, e.g. ['Salt', 'Theta']. Otherwise include all known variables.

iter_start : int, optional

Starting iteration number. Otherwise use model default. Follows standard range conventions. (inclusive)

iter_stop : int, optional

Stopping iteration number. Otherwise use model default. Follows standard range conventions. (exclusive)

iter_step : int, optional

Iteration number stepsize. Otherwise use model default.

iters : list of ints, optional

Specific iteration numbers in a list, possibly with nonuniform spacing. Either provide this or the iter parameters above.

k_levels : list of ints, optional

Vertical levels to extract. Default is to get them all

k_chunksize : int, optional

How many vertical levels per Dask chunk.

type : {‘faces’, ‘latlon’}, optional

What type of dataset to create

read_grid : bool, optional

Whether to read the grid info

grid_vars_to_coords : bool, optional

Whether to promote grid variables to coordinate status

Returns

ds : xarray.Dataset

Stores

class xmitgcm.llcreader.BaseStore(fs, base_path='/', shrunk=False, mask_fs=None, mask_path=None, grid_fs=None, grid_path=None, shrunk_grid=False, join_char=None)[source]

Basic storage class for LLC data.

Parameters

fs : fsspec.AbstractFileSystem

base_path : str, optional

Where to find the data within the filesystem

shrunk : bool, optional

Whether the data files have been tagged with .shrunk

mask_fs, grid_fs : fsspec.AbstractFileSystem, optional

Where to find the mask or grid datasets to decode the compression

mask_path, grid_path : str, optional

Path to the mask or grid datasets on the mask_fs or grid_fs filesystem

shrunk_grid : bool, optional

Whether the grid files have been tagged with .shrunk not always the same as for data variables

join_char : str or None

Character to use to join paths. Falls back on os.path.join if None.

get_fs_and_full_path(varname, iternum)[source]

Return references to a filesystem and path within it for a specific variable and iteration number.

Parameters

varname : str

iternum : int

Returns

fs : fsspec.AbstractFileSystem

The filesytem where the file can be found

path : str

The path to open

open_data_file(varname, iternum)[source]

Open the file for a specific variable and iteration number.

Parameters

varname : str

iternum : int

Returns

fobj : file-like object

open_mask_group()[source]

Open the zarr group that contains the masks

Returns

mask_group : zarr.Group

class xmitgcm.llcreader.NestedStore(fs, base_path='/', shrunk=False, mask_fs=None, mask_path=None, grid_fs=None, grid_path=None, shrunk_grid=False, join_char=None)[source]

Store where the variable are stored in subdirectories according to iteration number.

Utility Functions

The llcreader module also has a standalone function for converting 13-face-style LLC datasets into rectangular quasi-lat-lon datasets.

xmitgcm.llcreader.faces_dataset_to_latlon(ds, metric_vector_pairs=[('dxC', 'dyC'), ('dyG', 'dxG')])[source]

Transform a 13-face LLC xarray Dataset into a rectancular grid, discarding the Arctic.

Parameters

ds : xarray.Dataset

A 13-face LLC dataset

metric_vector_pairs : list, optional

Pairs of variables that are positive-definite metrics located at grid edges.

Returns

out : xarray.Dataset

Transformed rectangular dataset

Developer Notes

Documentation for some complicated low-level functions, with the ASTE 270 domain as an example.

xmitgcm.llcreader.llcmodel._pad_facet(data, facet_shape, reshape, pad_before, pad_after, dtype)[source]

add padding to facet data that are irregularly shaped, returning data of size facet_shape to make equally sized faces. See https://xmitgcm.readthedocs.io/en/latest/llcreader.html#aste-release-1-example for an example

Parameters

data : array like

data to be reshaped

facet_shape : tuple

“expected” facet shape determined by the _facet_shape function

reshape : bool

whether to reshape each face, True if facet is “rotated”. This determines the axis along which pad_before or pad_after refers to

pad_before, pad_after : int

size of padding added to either the i or j dimension where before vs after determines ordering of: data-then-pad or pad-then-data

dtype : numpy.dtype

Datatype of the data

Returns

padded_data : array like

which has shape = facet_shape, with nan’s for padding

ASTE Release 1 Example

The ASTE model output has the following unique definitions, and the image below shows two examples of the _pad_facet function.

  • nx = 270

  • pad_before = [90,0,0,0,0]

  • pad_after = [0,0,0,90,90]

  • nfaces = 6

  • _facet_strides(6) = ( (0,2), (2,2), (2,3), (3,4), (4,6) )

_images/aste_pad.png

Left: faces 0 and 1 live on facet 0, as shown by _facet_strides. This facet has an expected shape of (2* nx , nx) = (540,270) as determined by _facet_shape. The data shape is only (450,270), however, as shown in color. To make two nx x nx tiles, the facet must be padded with a (90,270) array, shown in white. On this facet, reshape = False, so the j dimension is padded.

Right: face 3 lives on facet 2, and the expected data shape would be (nx , nx) = (270,270), but the data only cover (270,180), shown in color. Therefore, the data are padded with an array of size (270,90), shown in white, and this is padded to the i dimension since reshape = True for this facet.