11# netchdf
2- _ last updated: 5/17 /2025_
2+ _ last updated: 5/22 /2025_
33
44This is a rewrite in Kotlin of parts of the devcdm and netcdf-java libraries.
55
@@ -17,24 +17,24 @@ The Netcdf-Java library prototyped a "Common Data Model" (CDM) to provide a sing
1717The netcdf* and hdf* file formats are similar enough to make a common API a practical and useful goal.
1818By focusing on read-only access to just these formats, the API and the code are kept simple.
1919
20- In short, a library that focuses on simplicity and clarity is a safeguard for the huge investment in these
20+ In short, a library that focuses on simplicity and clarity is a safeguard for the irreplaceable investment in these
2121scientific datasets.
2222
2323### Why do we need another library besides the standard reference libraries?
2424
25- Its a huge advantage to have independent implementations of any standard. If you dont have multiple implementations, its
26- very easy for the single implementator to mistake the implementation for the actual standard. Its easy to hide problems
25+ Its necessary to have independent implementations of any standard. If you don't have multiple implementations, its
26+ easy for the single implementer to mistake the implementation for the actual standard. Its easy to hide problems
2727that are actually in the standard by adding work-arounds in the code, instead of documenting problems and creating new
28- versions with clear fixes. For Netcdf/Hdf, the standard is the file formats, along with their semantic descriptions. The API
29- is language and library specific, and is secondary to the standard.
28+ versions of the standard with clear fixes. For Netcdf/Hdf, the standard is the file formats, along with their semantic
29+ descriptions. The API is language and library specific, and is secondary to the standard.
3030
3131Having multiple implementations is a huge win for the reference library, in that bugs are more quickly found, and
3232ambiguities more quickly identified.
3333
3434### Whats wrong with the standard reference libraries?
3535
3636The reference libraries are well maintained but complex. They are coded in C, which is a difficult language to master
37- and keep bug free, with implication for memory safety and security. The libraries require various machine and OS dependent
37+ and keep bug free, with implications for memory safety and security. The libraries require various machine and OS dependent
3838toolchains. Shifts in funding could wipe out much of the institutional knowledge needed to maintain them.
3939
4040The HDF file formats are overly complicated, which impacts code complexity and clarity. The data structures do not
@@ -72,32 +72,42 @@ For HDF5 files using deflate filters, the deflate library dominates the read tim
7272are about 2X slower than native code. Unless the deflate libraries get better, there's not much gain in trying to make
7373other parts of the code faster.
7474
75- Its possible we can use Kotlin coroutines to speed up performance bottlenecks. TBD .
75+ We will investigate using Kotlin coroutines to speed up performance bottlenecks.
7676
77- ## What version of the JVM?
77+ ### What version of the JVM, Kotlin, and Gradle ?
7878
79- We will always use the latest the latest LTS (long term support) Java version, and will not be explicitly supporting older versions.
79+ We will always use the latest LTS (long term support) Java version, and will not be explicitly supporting older versions.
8080Currently that is Java 21.
8181
82+ We also use the latest stable version of Kotlin that is compatible with the Java version. Currently that is Kotlin 2.1.
83+
84+ Gradle is our build system. We will use the latest stable version of Gradle compatible with our Java and Kotlin versions.
85+ Currently that is Gradle 8.14.
86+
87+ For now, you must download and build the library yourself. Eventually we will publish it to Maven Central.
88+ The IntelliJ IDE is highly recommended for all JVM development.
89+
90+
8291### Scope
8392
84- We have the goal to give read access to all the content in NetCDF, HDF5, HDF4, and HDF-EOS files.
93+ Our goal is to give read access to all the content in NetCDF, HDF5, HDF4, and HDF-EOS files.
8594
8695The library will be thread-safe for reading multiple files concurrently.
8796
8897We are focussing on earth science data, and dont plan to support other uses except as a byproduct.
8998
90- We will not provide write capabilities.
99+ The core module will remain pure Kotlin with very minimal dependencies and no write capabilities. In particular,
100+ there will be no dependency on the reference C libraries (except for testing).
91101
92- The core module will remain pure Kotlin with very minimal dependencies. In particular, there will be no dependency on the reference C libraries
93- (except for testing). There will be no dependencies on native libraries in the core module, but other modules or
94- projects that use the core are free to use dependencies as needed. We will add runtime discovery to facilitate this, for example
95- HDF5 filters that use native libraries.
102+ There will be no dependencies on native libraries in the core module, but other modules or
103+ projects that use the core are free to use dependencies as needed. We will add runtime discovery to facilitate this,
104+ for example, to use HDF5 filters that link to native libraries.
96105
97106
98107### Testing
99108
100- We use the Foreign Function & Memory API for testing against the Netcdf, HDF5, and HDF4 C libraries.
109+ We use the Java [ Foreign Function & Memory API] ( https://docs.oracle.com/en/java/javase/21/core/foreign-function-and-memory-api.html )
110+ for testing against the Netcdf, HDF5, and HDF4 C libraries.
101111With these tools we can be confident that our library gives the same results as the reference libraries.
102112
103113Currently we have this test coverage from core/test:
@@ -143,26 +153,30 @@ with T indicating the data type returned when read, eg:
143153 fun <T> readArrayData(v2: Variable<T>, section: SectionPartial? = null) : ArrayTyped<T>
144154````
145155
146- For example, a Variable of datatype Float will return an ArrayFloat, which is ArrayTyped<Float >.
156+ For example, a Variable of datatype Float will return an ArrayFloat, which is ArrayTyped\< Float\> .
157+
158+ #### Cdl Names
159+
160+ * spaces are replaced with underscores
147161
148162#### Datatype
149- * __ Datatype.ENUM __ returns an array of the corresponding UBYTE/USHORT/UINT. Call _ data.convertEnums()_ to turn this into
163+ * _ Datatype.ENUM _ returns an array of the corresponding UBYTE/USHORT/UINT. Call _ data.convertEnums()_ to turn this into
150164 an ArrayString of corresponding enum names.
151- * __ Datatype.CHAR __ : All Attributes of type CHAR are assumed to be Strings. All Variables of type CHAR return data as
165+ * _ Datatype.CHAR _ : All Attributes of type CHAR are assumed to be Strings. All Variables of type CHAR return data as
152166 ArrayUByte. Call _ data.makeStringsFromBytes()_ to turn this into Strings with the array rank reduced by one.
153- * _ Netcdf-3 _ does not have STRING or UBYTE types. In practice, CHAR is used for either.
154- * _ Netcdf -4/HDF5 _ library encodes CHAR values as HDF5 string type with elemSize = 1, so we use that convention to detect
167+ * Netcdf-3 does not have STRING or UBYTE types. In practice, CHAR is used for either.
168+ * Netcdf -4/HDF5 library encodes CHAR values as HDF5 string type with elemSize = 1, so we use that convention to detect
155169 legacy CHAR variables in HDF5 files. NC_CHAR should not be used in Netcdf-4, use NC_UBYTE or NC_STRING.
156- * _ HDF4 _ does not have a STRING type, but does have signed and unsigned CHAR, and signed and unsigned BYTE.
170+ * HDF4 does not have a STRING type, but does have signed and unsigned CHAR, and signed and unsigned BYTE.
157171 We map both signed and unsigned to Datatype.CHAR and handle it as above (Attributes are Strings, Variables are UBytes).
158- * __ Datatype.STRING __ is always variable length, regardless of whether the data in the file is variable or fixed length.
172+ * _ Datatype.STRING _ is always variable length, regardless of whether the data in the file is variable or fixed length.
159173
160174#### Typedef
161175Unlike Netcdf-Java, we follow Netcdf-4 "user defined types" and add typedefs for Compound, Enum, Opaque, and Vlen.
162- * __ Datatype.ENUM __ typedef has a map from integer to name (same as Netcdf-Java)
163- * __ Datatype.COMPOUND __ typedef contains a description of the members of the Compound (aka Structure).
164- * __ Datatype.OPAQUE __ typedef may contain the byte length of OPAQUE data.
165- * __ Datatype.VLEN __ typedef has the base type. An array of VLEN may have different lengths for each object.
176+ * _ Datatype.ENUM _ typedef has a map from integer to name (same as Netcdf-Java)
177+ * _ Datatype.COMPOUND _ typedef contains a description of the members of the Compound (aka Structure).
178+ * _ Datatype.OPAQUE _ typedef may contain the byte length of OPAQUE data.
179+ * _ Datatype.VLEN _ typedef has the base type. An array of VLEN may have different lengths for each object.
166180
167181#### Dimension
168182* Unlike Netcdf-3 and Netcdf-4, dimensions may be "anonymous", in which case they have a length but not a name, and are
@@ -187,8 +201,6 @@ local to the variable they are referenced by.
187201
188202An independent implementation of HDF4/HDF5/HDF-EOS in Kotlin.
189203
190- I am working on an independent library implementation of HDF4/HDF5/HDF-EOS in Kotlin
191- [ here] ( https://github.com/JohnLCaron/netchdf ) .
192204This will be complementary to the important work of maintaining the primary HDF libraries.
193205The goal is to give read access to all the content in NetCDF, HDF5, HDF4 and HDF-EOS files.
194206
0 commit comments