Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 42 additions & 30 deletions Readme.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# netchdf
_last updated: 5/17/2025_
_last updated: 5/22/2025_

This is a rewrite in Kotlin of parts of the devcdm and netcdf-java libraries.

Expand All @@ -17,24 +17,24 @@ The Netcdf-Java library prototyped a "Common Data Model" (CDM) to provide a sing
The netcdf* and hdf* file formats are similar enough to make a common API a practical and useful goal.
By focusing on read-only access to just these formats, the API and the code are kept simple.

In short, a library that focuses on simplicity and clarity is a safeguard for the huge investment in these
In short, a library that focuses on simplicity and clarity is a safeguard for the irreplaceable investment in these
scientific datasets.

### Why do we need another library besides the standard reference libraries?

Its a huge advantage to have independent implementations of any standard. If you dont have multiple implementations, its
very easy for the single implementator to mistake the implementation for the actual standard. Its easy to hide problems
Its necessary to have independent implementations of any standard. If you don't have multiple implementations, its
easy for the single implementer to mistake the implementation for the actual standard. Its easy to hide problems
that are actually in the standard by adding work-arounds in the code, instead of documenting problems and creating new
versions with clear fixes. For Netcdf/Hdf, the standard is the file formats, along with their semantic descriptions. The API
is language and library specific, and is secondary to the standard.
versions of the standard with clear fixes. For Netcdf/Hdf, the standard is the file formats, along with their semantic
descriptions. The API is language and library specific, and is secondary to the standard.

Having multiple implementations is a huge win for the reference library, in that bugs are more quickly found, and
ambiguities more quickly identified.

### Whats wrong with the standard reference libraries?

The reference libraries are well maintained but complex. They are coded in C, which is a difficult language to master
and keep bug free, with implication for memory safety and security. The libraries require various machine and OS dependent
and keep bug free, with implications for memory safety and security. The libraries require various machine and OS dependent
toolchains. Shifts in funding could wipe out much of the institutional knowledge needed to maintain them.

The HDF file formats are overly complicated, which impacts code complexity and clarity. The data structures do not
Expand Down Expand Up @@ -72,32 +72,42 @@ For HDF5 files using deflate filters, the deflate library dominates the read tim
are about 2X slower than native code. Unless the deflate libraries get better, there's not much gain in trying to make
other parts of the code faster.

Its possible we can use Kotlin coroutines to speed up performance bottlenecks. TBD.
We will investigate using Kotlin coroutines to speed up performance bottlenecks.

## What version of the JVM?
### What version of the JVM, Kotlin, and Gradle?

We will always use the latest the latest LTS (long term support) Java version, and will not be explicitly supporting older versions.
We will always use the latest LTS (long term support) Java version, and will not be explicitly supporting older versions.
Currently that is Java 21.

We also use the latest stable version of Kotlin that is compatible with the Java version. Currently that is Kotlin 2.1.

Gradle is our build system. We will use the latest stable version of Gradle compatible with our Java and Kotlin versions.
Currently that is Gradle 8.14.

For now, you must download and build the library yourself. Eventually we will publish it to Maven Central.
The IntelliJ IDE is highly recommended for all JVM development.


### Scope

We have the goal to give read access to all the content in NetCDF, HDF5, HDF4, and HDF-EOS files.
Our goal is to give read access to all the content in NetCDF, HDF5, HDF4, and HDF-EOS files.

The library will be thread-safe for reading multiple files concurrently.

We are focussing on earth science data, and dont plan to support other uses except as a byproduct.

We will not provide write capabilities.
The core module will remain pure Kotlin with very minimal dependencies and no write capabilities. In particular,
there will be no dependency on the reference C libraries (except for testing).

The core module will remain pure Kotlin with very minimal dependencies. In particular, there will be no dependency on the reference C libraries
(except for testing). There will be no dependencies on native libraries in the core module, but other modules or
projects that use the core are free to use dependencies as needed. We will add runtime discovery to facilitate this, for example
HDF5 filters that use native libraries.
There will be no dependencies on native libraries in the core module, but other modules or
projects that use the core are free to use dependencies as needed. We will add runtime discovery to facilitate this,
for example, to use HDF5 filters that link to native libraries.


### Testing

We use the Foreign Function & Memory API for testing against the Netcdf, HDF5, and HDF4 C libraries.
We use the Java [Foreign Function & Memory API](https://docs.oracle.com/en/java/javase/21/core/foreign-function-and-memory-api.html)
for testing against the Netcdf, HDF5, and HDF4 C libraries.
With these tools we can be confident that our library gives the same results as the reference libraries.

Currently we have this test coverage from core/test:
Expand Down Expand Up @@ -143,26 +153,30 @@ with T indicating the data type returned when read, eg:
fun <T> readArrayData(v2: Variable<T>, section: SectionPartial? = null) : ArrayTyped<T>
````

For example, a Variable of datatype Float will return an ArrayFloat, which is ArrayTyped<Float>.
For example, a Variable of datatype Float will return an ArrayFloat, which is ArrayTyped\<Float\>.

#### Cdl Names

* spaces are replaced with underscores

#### Datatype
* __Datatype.ENUM__ returns an array of the corresponding UBYTE/USHORT/UINT. Call _data.convertEnums()_ to turn this into
* _Datatype.ENUM_ returns an array of the corresponding UBYTE/USHORT/UINT. Call _data.convertEnums()_ to turn this into
an ArrayString of corresponding enum names.
* __Datatype.CHAR__: All Attributes of type CHAR are assumed to be Strings. All Variables of type CHAR return data as
* _Datatype.CHAR_: All Attributes of type CHAR are assumed to be Strings. All Variables of type CHAR return data as
ArrayUByte. Call _data.makeStringsFromBytes()_ to turn this into Strings with the array rank reduced by one.
* _Netcdf-3_ does not have STRING or UBYTE types. In practice, CHAR is used for either.
* _Netcdf-4/HDF5_ library encodes CHAR values as HDF5 string type with elemSize = 1, so we use that convention to detect
* Netcdf-3 does not have STRING or UBYTE types. In practice, CHAR is used for either.
* Netcdf-4/HDF5 library encodes CHAR values as HDF5 string type with elemSize = 1, so we use that convention to detect
legacy CHAR variables in HDF5 files. NC_CHAR should not be used in Netcdf-4, use NC_UBYTE or NC_STRING.
* _HDF4_ does not have a STRING type, but does have signed and unsigned CHAR, and signed and unsigned BYTE.
* HDF4 does not have a STRING type, but does have signed and unsigned CHAR, and signed and unsigned BYTE.
We map both signed and unsigned to Datatype.CHAR and handle it as above (Attributes are Strings, Variables are UBytes).
* __Datatype.STRING__ is always variable length, regardless of whether the data in the file is variable or fixed length.
* _Datatype.STRING_ is always variable length, regardless of whether the data in the file is variable or fixed length.

#### Typedef
Unlike Netcdf-Java, we follow Netcdf-4 "user defined types" and add typedefs for Compound, Enum, Opaque, and Vlen.
* __Datatype.ENUM__ typedef has a map from integer to name (same as Netcdf-Java)
* __Datatype.COMPOUND__ typedef contains a description of the members of the Compound (aka Structure).
* __Datatype.OPAQUE__ typedef may contain the byte length of OPAQUE data.
* __Datatype.VLEN__ typedef has the base type. An array of VLEN may have different lengths for each object.
* _Datatype.ENUM_ typedef has a map from integer to name (same as Netcdf-Java)
* _Datatype.COMPOUND_ typedef contains a description of the members of the Compound (aka Structure).
* _Datatype.OPAQUE_ typedef may contain the byte length of OPAQUE data.
* _Datatype.VLEN_ typedef has the base type. An array of VLEN may have different lengths for each object.

#### Dimension
* Unlike Netcdf-3 and Netcdf-4, dimensions may be "anonymous", in which case they have a length but not a name, and are
Expand All @@ -187,8 +201,6 @@ local to the variable they are referenced by.

An independent implementation of HDF4/HDF5/HDF-EOS in Kotlin.

I am working on an independent library implementation of HDF4/HDF5/HDF-EOS in Kotlin
[here](https://github.com/JohnLCaron/netchdf).
This will be complementary to the important work of maintaining the primary HDF libraries.
The goal is to give read access to all the content in NetCDF, HDF5, HDF4 and HDF-EOS files.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,14 @@ final class constants$10 {
constants$7.const$3
);
static final FunctionDescriptor const$5 = FunctionDescriptor.of(JAVA_INT,
JAVA_INT,
JAVA_INT,
RuntimeHelper.POINTER,
JAVA_LONG,
RuntimeHelper.POINTER
);
static final MethodHandle const$6 = RuntimeHelper.downcallHandle(
"nc_free_vlen",
"nc_put_vlen_element",
constants$10.const$5
);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,32 +11,38 @@ final class constants$11 {

// Suppresses default constructor, ensuring non-instantiability.
private constants$11() {}
static final FunctionDescriptor const$0 = FunctionDescriptor.of(JAVA_INT,
JAVA_LONG,
static final MethodHandle const$0 = RuntimeHelper.downcallHandle(
"nc_get_vlen_element",
constants$7.const$3
);
static final FunctionDescriptor const$1 = FunctionDescriptor.of(JAVA_INT,
JAVA_INT,
JAVA_INT,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER
);
static final MethodHandle const$1 = RuntimeHelper.downcallHandle(
"nc_free_vlens",
constants$11.const$0
static final MethodHandle const$2 = RuntimeHelper.downcallHandle(
"nc_inq_user_type",
constants$11.const$1
);
static final FunctionDescriptor const$2 = FunctionDescriptor.of(JAVA_INT,
static final FunctionDescriptor const$3 = FunctionDescriptor.of(JAVA_INT,
JAVA_INT,
JAVA_INT,
RuntimeHelper.POINTER,
JAVA_INT,
JAVA_LONG,
RuntimeHelper.POINTER
);
static final MethodHandle const$3 = RuntimeHelper.downcallHandle(
"nc_put_vlen_element",
constants$11.const$2
);
static final MethodHandle const$4 = RuntimeHelper.downcallHandle(
"nc_get_vlen_element",
constants$7.const$3
"nc_put_att",
constants$11.const$3
);
static final MethodHandle const$5 = RuntimeHelper.downcallHandle(
"nc_free_string",
constants$11.const$0
"nc_get_att",
constants$7.const$0
);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,38 +11,36 @@ final class constants$12 {

// Suppresses default constructor, ensuring non-instantiability.
private constants$12() {}
static final FunctionDescriptor const$0 = FunctionDescriptor.of(JAVA_INT,
JAVA_INT,
JAVA_INT,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER
static final MethodHandle const$0 = RuntimeHelper.downcallHandle(
"nc_def_enum",
constants$7.const$0
);
static final MethodHandle const$1 = RuntimeHelper.downcallHandle(
"nc_inq_user_type",
constants$12.const$0
"nc_insert_enum",
constants$7.const$0
);
static final FunctionDescriptor const$2 = FunctionDescriptor.of(JAVA_INT,
JAVA_INT,
JAVA_INT,
RuntimeHelper.POINTER,
JAVA_INT,
JAVA_LONG,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER
);
static final MethodHandle const$3 = RuntimeHelper.downcallHandle(
"nc_put_att",
"nc_inq_enum",
constants$12.const$2
);
static final MethodHandle const$4 = RuntimeHelper.downcallHandle(
"nc_get_att",
constants$7.const$0
static final FunctionDescriptor const$4 = FunctionDescriptor.of(JAVA_INT,
JAVA_INT,
JAVA_INT,
JAVA_INT,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER
);
static final MethodHandle const$5 = RuntimeHelper.downcallHandle(
"nc_def_enum",
constants$7.const$0
"nc_inq_enum_member",
constants$12.const$4
);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,42 +11,31 @@ final class constants$13 {

// Suppresses default constructor, ensuring non-instantiability.
private constants$13() {}
static final MethodHandle const$0 = RuntimeHelper.downcallHandle(
"nc_insert_enum",
constants$7.const$0
);
static final FunctionDescriptor const$1 = FunctionDescriptor.of(JAVA_INT,
static final FunctionDescriptor const$0 = FunctionDescriptor.of(JAVA_INT,
JAVA_INT,
JAVA_INT,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER,
JAVA_LONG,
RuntimeHelper.POINTER
);
static final MethodHandle const$1 = RuntimeHelper.downcallHandle(
"nc_inq_enum_ident",
constants$13.const$0
);
static final MethodHandle const$2 = RuntimeHelper.downcallHandle(
"nc_inq_enum",
constants$13.const$1
"nc_def_opaque",
constants$6.const$1
);
static final FunctionDescriptor const$3 = FunctionDescriptor.of(JAVA_INT,
JAVA_INT,
JAVA_INT,
JAVA_INT,
RuntimeHelper.POINTER,
RuntimeHelper.POINTER
static final MethodHandle const$3 = RuntimeHelper.downcallHandle(
"nc_inq_opaque",
constants$7.const$0
);
static final MethodHandle const$4 = RuntimeHelper.downcallHandle(
"nc_inq_enum_member",
constants$13.const$3
);
static final FunctionDescriptor const$5 = FunctionDescriptor.of(JAVA_INT,
JAVA_INT,
JAVA_INT,
JAVA_LONG,
RuntimeHelper.POINTER
"nc_put_var",
constants$7.const$5
);
static final MethodHandle const$6 = RuntimeHelper.downcallHandle(
"nc_inq_enum_ident",
constants$13.const$5
static final MethodHandle const$5 = RuntimeHelper.downcallHandle(
"nc_get_var",
constants$7.const$5
);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,28 +12,28 @@ final class constants$14 {
// Suppresses default constructor, ensuring non-instantiability.
private constants$14() {}
static final MethodHandle const$0 = RuntimeHelper.downcallHandle(
"nc_def_opaque",
constants$6.const$1
"nc_put_var1",
constants$7.const$0
);
static final MethodHandle const$1 = RuntimeHelper.downcallHandle(
"nc_inq_opaque",
"nc_get_var1",
constants$7.const$0
);
static final MethodHandle const$2 = RuntimeHelper.downcallHandle(
"nc_put_var",
constants$7.const$5
"nc_put_vara",
constants$7.const$3
);
static final MethodHandle const$3 = RuntimeHelper.downcallHandle(
"nc_get_var",
constants$7.const$5
"nc_get_vara",
constants$7.const$3
);
static final MethodHandle const$4 = RuntimeHelper.downcallHandle(
"nc_put_var1",
constants$7.const$0
"nc_put_vars",
constants$12.const$2
);
static final MethodHandle const$5 = RuntimeHelper.downcallHandle(
"nc_get_var1",
constants$7.const$0
"nc_get_vars",
constants$12.const$2
);
}

Expand Down
Loading