llcreader

This module provides optimized routines for reading LLC data from disk or over the web, via xarray and dask. This was motivated explicitly by the need to provide easy access to the NASA-JPL LLC4320 family of simulations, in support of the NASA SWOT Science Team.

Warning

These routines are new and experimental. APIs are subject to change. Eventually the functionality provided by llcreader may become part of the main xmitgcm module.

Limitations

llcreader uses Dask to provide a lazy view of the data; no data are actually loaded into memory until required for computation. The data are represented using a graph, with node corresponding to a specific chunk of the full array. Because of the vast size of the LLC datasets, the user is advised to read about Dask best practices. If you wish to work with the full-size datasets, you must be aware of Dask’s inherent limitations, and first gain experience with smaller problems. In particular, it is quite easy to create very large Dask graphs that can overwhelm your computer’s memory and CPU capacity, even without loading any data!

To make this warning more concrete, let’s consider a single variable such as Theta from the LLC4320 simulation. In “face” form, the dimensions of this array are 9030 timesteps x 90 vertical levels x 13 faces x 4320 j points x 4320 i points = 197170122240000 values. With 32-bit precision, this requires 788 TB of memory! If we divide this array into 1 GB chunks, we will have 788,000 chunks. The lazy representation of the array requires about 1 kB of memory per chunk, and processing the graph for computation requires about 200 µs per chunk. Therefore, the lazy representation will occupy nearly 1GB of memory and take nearly 3 minutes to even begin computation. There are 5 such 3D variables in the LLC4320 dataset, plus another 10 2D variables. The bottom line is that creating graphs of this size will seriously bog down a laptop.

There are several strategies that can be used to mitigate this problem:

  • Subset variables: llc reader allows you to create datasets the contain only the specific variables you need, via the varnames keyword.

  • Subset vertical levels: if you don’t need full depth fields, you can select just the depth levels you need using the k_levels keyword.

  • Subset in time: you can specific the iteration number start, stop and step manually.

  • Use larger chunks: llcreader allows you to vary the size of the chunks in the k dimension using the k_chunksize keyword. Using full depth chunks (k_chunksize=90) will produce chunks up to 20 GB in size, but in a much smaller graph.

Because of how the files are compressed and stored on disk, unfortunately it is not possible to subset in space at the time of dataset creation. However, this can be done later, via xarray.

Pre-Defined Models

We provide some pre-defined references to existing stores and models that can be used right away. These are

  • llcreader.ECCOPortalLLC2160Model: LLC2160 accessed via ECCO data portal

  • llcreader.ECCOPortalLLC4320Model: LLC4320 accessed via ECCO data portal

  • llcreader.PleiadesLLC2160Model: LLC2160 accessed on Pleaides filesystem

  • llcreader.PleiadesLLC4320Model: LLC4320 accessed on Pleaides filesystem

Below are a few examples of how to use these.

ECCO HTTP Data Portal

NAS has created an experimental data portal to access the LLC data over the web via standard HTTP calls. More info and a data browser can be found at https://data.nas.nasa.gov/ecco/. The llcreader module provides a way to access this data directly via xarray and dask. These examples can be run from anywhere; however the speed at which data can be loaded depends on the network bandwidth to the NAS HTTP server and the rate at which this server can extract data from the underling filesystem.

In this example, we display the whole dataset lazily, following the advice above to use a large k_chunksize. By default, all variables and all timesteps are loaded:

>>> from xmitgcm import llcreader
>>> model = llcreader.ECCOPortalLLC4320Model()
>>> ds = model.get_dataset(k_chunksize=90)
>>> print(ds)
<xarray.Dataset>
Dimensions:   (face: 13, i: 4320, i_g: 4320, j: 4320, j_g: 4320, k: 90, k_l: 90, k_p1: 91, k_u: 90, time: 9030)
Coordinates:
  * face      (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * i         (i) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * i_g       (i_g) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * j         (j) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * j_g       (j_g) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * k         (k) int64 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * k_u       (k_u) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * k_l       (k_l) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * k_p1      (k_p1) int64 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89 90
    niter     (time) int64 ...
  * time      (time) datetime64[ns] 2011-09-13 ... 2012-09-23T05:00:00
    drC       (k_p1) >f4 dask.array<chunksize=(91,), meta=np.ndarray>
    drF       (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    dxC       (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dxF       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dxG       (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyC       (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyF       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyG       (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    hFacC     (k, face, j, i) float32 dask.array<chunksize=(90, 3, 4320, 4320), meta=np.ndarray>
    hFacS     (k, face, j_g, i) float32 dask.array<chunksize=(90, 3, 4320, 4320), meta=np.ndarray>
    hFacW     (k, face, j, i_g) float32 dask.array<chunksize=(90, 3, 4320, 4320), meta=np.ndarray>
    PHrefC    (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    PHrefF    (k_p1) >f4 dask.array<chunksize=(91,), meta=np.ndarray>
    rA        (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    rAs       (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    rAw       (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    Z         (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    Zp1       (k_p1) >f4 dask.array<chunksize=(91,), meta=np.ndarray>
    rhoRef    (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    XC        (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    YC        (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    Zl        (k_l) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    Zu        (k_u) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
Data variables:
    Eta       (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    KPPhbl    (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    oceFWflx  (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    oceQnet   (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    oceQsw    (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    oceSflux  (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    oceTAUX   (time, face, j, i_g) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    oceTAUY   (time, face, j_g, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    PhiBot    (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    Salt      (time, k, face, j, i) float32 dask.array<chunksize=(1, 90, 3, 4320, 4320), meta=np.ndarray>
    SIarea    (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    SIheff    (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    SIhsalt   (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    SIhsnow   (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    SIuice    (time, face, j, i_g) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    SIvice    (time, face, j_g, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    Theta     (time, k, face, j, i) float32 dask.array<chunksize=(1, 90, 3, 4320, 4320), meta=np.ndarray>
    U         (time, k, face, j, i_g) float32 dask.array<chunksize=(1, 90, 3, 4320, 4320), meta=np.ndarray>
    V         (time, k, face, j_g, i) float32 dask.array<chunksize=(1, 90, 3, 4320, 4320), meta=np.ndarray>
    W         (time, k_l, face, j, i) float32 dask.array<chunksize=(1, 90, 3, 4320, 4320), meta=np.ndarray>

This dataset is useless for computations on a laptop, because the individual chunks require nearly 20 GB of memory. Some more more practical examples are the following.

Get a single 2D variable:

>>> ds = model.get_dataset(varnames=['Eta'])
>>> print(ds)

<xarray.Dataset>
Dimensions:  (face: 13, i: 4320, i_g: 4320, j: 4320, j_g: 4320, k: 90, k_l: 90, k_p1: 91, k_u: 90, time: 9030)
Coordinates:
  * face     (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * i        (i) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * i_g      (i_g) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * j        (j) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * j_g      (j_g) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * k        (k) int64 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * k_u      (k_u) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * k_l      (k_l) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * k_p1     (k_p1) int64 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89 90
    niter    (time) int64 ...
  * time     (time) datetime64[ns] 2011-09-13 ... 2012-09-23T05:00:00
    drC      (k_p1) >f4 dask.array<chunksize=(91,), meta=np.ndarray>
    drF      (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    dxC      (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dxF      (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dxG      (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyC      (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyF      (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyG      (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    hFacC    (k, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    hFacS    (k, face, j_g, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    hFacW    (k, face, j, i_g) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    PHrefC   (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    PHrefF   (k_p1) >f4 dask.array<chunksize=(91,), meta=np.ndarray>
    rA       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    rAs      (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    rAw      (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    Z        (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    Zp1      (k_p1) >f4 dask.array<chunksize=(91,), meta=np.ndarray>
    rhoRef   (k) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    XC       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    YC       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    Zl       (k_l) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
    Zu       (k_u) >f4 dask.array<chunksize=(90,), meta=np.ndarray>
Data variables:
    Eta      (time, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>

Get a few vertical levels from some 3D variables:

>>> ds = model.get_dataset(varnames=['Salt', 'Theta'], k_levels=[1, 10, 40])
>>> print(ds)
<xarray.Dataset>
Dimensions:  (face: 13, i: 4320, i_g: 4320, j: 4320, j_g: 4320, k: 3, k_l: 3, k_p1: 6, k_u: 3, time: 9030)
Coordinates:
  * face     (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * i        (i) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * i_g      (i_g) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * j        (j) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * j_g      (j_g) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * k        (k) int64 1 10 40
  * k_u      (k_u) int64 1 10 40
  * k_l      (k_l) int64 1 10 40
  * k_p1     (k_p1) int64 1 2 10 11 40 41
    niter    (time) int64 ...
  * time     (time) datetime64[ns] 2011-09-13 ... 2012-09-23T05:00:00
    drC      (k_p1) >f4 dask.array<chunksize=(6,), meta=np.ndarray>
    drF      (k) >f4 dask.array<chunksize=(3,), meta=np.ndarray>
    dxC      (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dxF      (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dxG      (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyC      (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyF      (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    dyG      (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    hFacC    (k, face, j, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    hFacS    (k, face, j_g, i) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    hFacW    (k, face, j, i_g) float32 dask.array<chunksize=(1, 3, 4320, 4320), meta=np.ndarray>
    PHrefC   (k) >f4 dask.array<chunksize=(3,), meta=np.ndarray>
    PHrefF   (k_p1) >f4 dask.array<chunksize=(6,), meta=np.ndarray>
    rA       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    rAs      (face, j_g, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    rAw      (face, j, i_g) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    Z        (k) >f4 dask.array<chunksize=(3,), meta=np.ndarray>
    Zp1      (k_p1) >f4 dask.array<chunksize=(6,), meta=np.ndarray>
    rhoRef   (k) >f4 dask.array<chunksize=(3,), meta=np.ndarray>
    XC       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    YC       (face, j, i) float32 dask.array<chunksize=(3, 4320, 4320), meta=np.ndarray>
    Zl       (k_l) >f4 dask.array<chunksize=(3,), meta=np.ndarray>
    Zu       (k_u) >f4 dask.array<chunksize=(3,), meta=np.ndarray>
Data variables:
    Salt     (time, k, face, j, i) float32 dask.array<chunksize=(1, 1, 3, 4320, 4320), meta=np.ndarray>
    Theta    (time, k, face, j, i) float32 dask.array<chunksize=(1, 1, 3, 4320, 4320), meta=np.ndarray>

Note that when vertical levels are subset like this, any vertical coordinate associated with dimension k_p1 will have levels above and below the selected k_levels, which are at cell center.

A list of all available variables can be seen as follows:

>>> print(model.varnames)
['Eta', 'KPPhbl', 'oceFWflx', 'oceQnet', 'oceQsw', 'oceSflux', 'oceTAUX',
'oceTAUY', 'PhiBot', 'Salt', 'SIarea', 'SIheff', 'SIhsalt', 'SIhsnow',
'SIuice', 'SIvice', 'Theta', 'U', 'V', 'W']

The full set of options for these commands is enumerated at xmitgcm.llcreader.BaseLLCModel.get_dataset().

Pleaides Filesystem

The home of the LLC is NASA’s Pleaides supercomputer. llc reader provides classes for quickly initializing and loading this data. These classes only work from Pleaides itself and will raise an error if invoked on other systems.

The Pleiades models work very similarly to the ones defined above:

>>> model = llcreader.PleiadesLLC2160Model()
>>> ds = model.get_dataset(varnames=['Eta'], type='latlon')
>>> print(ds)
<xarray.Dataset>
Dimensions:  (face: 13, i: 8640, i_g: 8640, j: 6480, j_g: 6480, k: 90, k_l: 90, k_p1: 90, k_u: 90, time: 18679)
Coordinates:
  * time     (time) datetime64[ns] 2011-03-06 ... 2013-04-22T06:00:00
  * k_p1     (k_p1) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * face     (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * k        (k) int64 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * i        (i) int64 0 1 2 3 4 5 6 7 ... 8633 8634 8635 8636 8637 8638 8639
  * k_u      (k_u) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * j_g      (j_g) int64 0 1 2 3 4 5 6 7 ... 6473 6474 6475 6476 6477 6478 6479
  * i_g      (i_g) int64 0 1 2 3 4 5 6 7 ... 8633 8634 8635 8636 8637 8638 8639
  * k_l      (k_l) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * j        (j) int64 0 1 2 3 4 5 6 7 ... 6473 6474 6475 6476 6477 6478 6479
Data variables:
    Eta      (time, j, i) >f4 dask.array<shape=(18679, 6480, 8640), chunksize=(1, 6480, 2160)>

Because of the high-performance Lustre filesystem on Pleiades, data throughput should be much higher than via the ECCO data portal.

Manual Dataset Creation

Another way to use this module is to manually create the necessary objects. This would be needed if you are working with LLC output that is not stored in a known location. First we create a Store object:

>>> from xmitgcm import llcreader
>>> from fsspec.implementations.local import LocalFileSystem
>>> fs = LocalFileSystem()
>>> store = llcreader.BaseStore(fs, base_path='/path/to/global_oce_llc90')

Then we use this to initialize a Model object:

>>> model = llcreader.LLC90Model(store)

From this object, we can create datasets:

>>> ds_faces = model.get_dataset(varnames=['S', 'T', 'U', 'V', 'Eta'],
              iter_start=0, iter_stop=9, iter_step=8)
>>> ds_faces
<xarray.Dataset>
Dimensions:  (face: 13, i: 90, i_g: 90, j: 90, j_g: 90, k: 50, k_l: 50, k_p1: 50, k_u: 50, time: 2)
Coordinates:
  * face     (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * i        (i) int64 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * i_g      (i_g) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * j        (j) int64 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * j_g      (j_g) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * k        (k) int64 0 1 2 3 4 5 6 7 8 9 10 ... 40 41 42 43 44 45 46 47 48 49
  * k_u      (k_u) int64 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49
  * k_l      (k_l) int64 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49
  * k_p1     (k_p1) int64 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49
    niter    (time) int64 ...
  * time     (time) datetime64[ns] 1948-01-01T12:00:00 1948-01-01T20:00:00
Data variables:
    S        (time, k, face, j, i) >f4 dask.array<shape=(2, 50, 13, 90, 90), chunksize=(1, 1, 3, 90, 90)>
    T        (time, k, face, j, i) >f4 dask.array<shape=(2, 50, 13, 90, 90), chunksize=(1, 1, 3, 90, 90)>
    U        (time, k, face, j, i_g) >f4 dask.array<shape=(2, 50, 13, 90, 90), chunksize=(1, 1, 3, 90, 90)>
    V        (time, k, face, j_g, i) >f4 dask.array<shape=(2, 50, 13, 90, 90), chunksize=(1, 1, 3, 90, 90)>
    Eta      (time, face, j, i) >f4 dask.array<shape=(2, 13, 90, 90), chunksize=(1, 3, 90, 90)>

There are many options you can pass to get_dataset to control the output and dask chunking. See the class documentation for more details: xmitgcm.llcreader.BaseLLCModel.get_dataset().

API Documentation

Models

class xmitgcm.llcreader.BaseLLCModel(store)[source]

Class representing an LLC Model Dataset.

Parameters

store : llcreader.BaseStore

The store object where the data can be found

mask_ds : zarr.Group

Must contain variables mask_c, masc_w, mask_s

Attributes

dtype

(numpy.dtype) Datatype of the data in the dataset

nx

(int) Number of gridpoints per face (e.g. 90, 1080, 4320, etc.)

nz

(int) Number of vertical gridpoints

delta_t

(float) Numerical timestep

time_units

(str) Date unit string, e.g ‘seconds since 1948-01-01 12:00:00’

iter_start

(int) First model iteration number (inclusive; follows python range conventions)

iter_stop

(int) Final model iteration number (exclusive; follows python range conventions)

iter_step

(int) Spacing between iterations

varnames, grid_varnames

(list) List of data variable and grid variable names contained in the dataset

mask_override

(dict) Override inference of masking variable, e.g. {'oceTAUX': 'c'}

get_dataset(varnames=None, iter_start=None, iter_stop=None, iter_step=None, k_levels=None, k_chunksize=1, type='faces', read_grid=True, grid_vars_to_coords=True)[source]

Create an xarray Dataset object for this model.

Parameters

*varnames : list of strings, optional

The variables to include, e.g. ['Salt', 'Theta']. Otherwise include all known variables.

iter_start : int, optional

Starting iteration number. Otherwise use model default. Follows standard range conventions. (inclusive)

iter_start : int, optional

Stopping iteration number. Otherwise use model default. Follows standard range conventions. (exclusive)

iter_step : int, optional

Iteration number stepsize. Otherwise use model default.

k_levels : list of ints, optional

Vertical levels to extract. Default is to get them all

k_chunksize : int, optional

How many vertical levels per Dask chunk.

type : {‘faces’, ‘latlon’}, optional

What type of dataset to create

read_grid : bool, optional

Whether to read the grid info

grid_vars_to_coords : bool, optional

Whether to promote grid variables to coordinate status

Returns

ds : xarray.Dataset

Stores

class xmitgcm.llcreader.BaseStore(fs, base_path='/', shrunk=False, mask_fs=None, mask_path=None, grid_fs=None, grid_path=None, shrunk_grid=False, join_char=None)[source]

Basic storage class for LLC data.

Parameters

fs : fsspec.AbstractFileSystem

base_path : str, optional

Where to find the data within the filesystem

shrunk : bool, optional

Whether the data files have been tagged with .shrunk

mask_fs, grid_fs : fsspec.AbstractFileSystem, optional

Where to find the mask or grid datasets to decode the compression

mask_path, grid_path : str, optional

Path to the mask or grid datasets on the mask_fs or grid_fs filesystem

shrunk_grid : bool, optional

Whether the grid files have been tagged with .shrunk not always the same as for data variables

join_char : str or None

Character to use to join paths. Falls back on os.path.join if None.

get_fs_and_full_path(varname, iternum)[source]

Return references to a filesystem and path within it for a specific variable and iteration number.

Parameters

varname : str

iternum : int

Returns

fs : fsspec.AbstractFileSystem

The filesytem where the file can be found

path : str

The path to open

open_data_file(varname, iternum)[source]

Open the file for a specific variable and iteration number.

Parameters

varname : str

iternum : int

Returns

fobj : file-like object

open_mask_group()[source]

Open the zarr group that contains the masks

Returns

mask_group : zarr.Group

class xmitgcm.llcreader.NestedStore(fs, base_path='/', shrunk=False, mask_fs=None, mask_path=None, grid_fs=None, grid_path=None, shrunk_grid=False, join_char=None)[source]

Store where the variable are stored in subdirectories according to iteration number.

Utility Functions

The llcreader module also has a standalone function for converting 13-face-style LLC datasets into rectangular quasi-lat-lon datasets.

xmitgcm.llcreader.faces_dataset_to_latlon(ds, metric_vector_pairs=['dxC', 'dyC', 'dyG', 'dxG'])[source]

Transform a 13-face LLC xarray Dataset into a rectancular grid, discarding the Arctic.

Parameters

ds : xarray.Dataset

A 13-face LLC dataset

metric_vector_pairs : list, optional

Pairs of variables that are positive-definite metrics located at grid edges.

Returns

out : xarray.Dataset

Transformed rectangular dataset