Skip to content

BUFR ECCODES : 2.44.0 data wrongly encoded when precision is too high / widht is too big #144

@antoinemerle

Description

@antoinemerle

What happened?

Hey @shahramn,

I’m seeing another issue that feels related: when the change of scale or width is too large, the encoded values get corrupted.

Sequence before encoding

2 02 132 | Change scale
2 01 142 | Change data width
0 14 044 | Channel Radiance

Values before encoding for #3#channelRadiance (0 14 044)

[0.04446902 0.0443561  0.04421728 0.04409232 0.04456472 0.04444372
 0.04430212 0.04417145 0.04465705 0.04452827 0.04438906 0.04425236
 0.04473894 0.04461614 0.04447776 0.0443286  0.04394687 0.04385525
 0.04377634 0.04370359 0.04401626 0.04391416 0.04382203 0.04374794
 0.04408451 0.04396805 0.04386687 0.04378401 0.04414416 0.0440255
 0.04391074 0.04381992 0.04363312 0.04357282 0.04352307 0.04349085
 0.0436744  0.04360713 0.0435598  0.04351918 0.04369757 0.04363627
 0.04359001 0.04354533 0.04371746 0.04366205 0.04361034 0.04356573
 0.04344143 0.04340577 0.04337966 0.04336203 0.04346674 0.04343274
 0.04340791 0.04339026 0.04349062 0.04346098 0.04343556 0.04341582
 0.04350889 0.04348587 0.04345985 0.04343704 0.04334192 0.04333122
 0.04331655 0.04330126 0.04336793 0.04336292 0.04334869 0.04333252
 0.04338875 0.0433871  0.04337548 0.0433577  0.04340639 0.04340467
 0.04340379 0.04338043 0.04329825 0.0433037  0.04326704 0.04322596
 0.04332169 0.04332502 0.0432953  0.04326439 0.04334372 0.0433383
 0.04331811 0.04329461 0.04336591 0.04335373 0.04333759 0.04331805
 0.0431785  0.04314479 0.04310482 0.04305345 0.04322112 0.04318363
 0.04313881 0.0430943  0.04325825 0.04322009 0.04318298 0.04314296
 0.04327949 0.04325058 0.04322099 0.04318557 0.04298948 0.04293785
 0.04288587 0.04283365 0.04303745 0.04298583 0.04293556 0.04288307
 0.04307953 0.04303213 0.04298223 0.04293146 0.04311695 0.04307364
 0.04302703 0.04297666 0.04275687 0.0427003  0.04263294 0.04256347
 0.04280465 0.04274773 0.04267966 0.04260267 0.04284727 0.04279273
 0.04272599 0.04264922 0.04289066 0.04283379 0.04276797 0.04269554
 0.0424805  0.04241407 0.04234686 0.04227745 0.04251381 0.0424507
 0.04238236 0.04231047 0.04255567 0.04249222 0.04242066 0.04234224
 0.04259613 0.04253162 0.04245618 0.04237467 0.04219047 0.04212669
 0.04206411 0.04200748 0.04221739 0.04214675 0.04207996 0.04202104
 0.04224307 0.04216729 0.04209596 0.04203505 0.04226664 0.04218952
 0.04211556 0.04204924 0.04194257 0.04190085 0.04187385 0.04186409
 0.04195225 0.04190436 0.0418725  0.04185942 0.04195988 0.04190829
 0.04187289 0.04185517 0.04196549 0.04191292 0.04187324 0.0418501
 0.04187265 0.04190863 0.04196642 0.04205053 0.04186384 0.04189453
 0.04194862 0.04202445 0.04185389 0.04187894 0.04192615 0.04199718
 0.04184338 0.04186244 0.04190138 0.04196844 0.04220955 0.04238364
 0.04264227 0.04296868 0.04217612 0.04234864 0.04258724 0.04290241
 0.04214178 0.04230801 0.042535   0.04283334 0.0421091  0.04226496
 0.04248085 0.04275971]

Dump right after encoding (incorrect)

#3#channelRadiance={
  0.00151934, 0.00140643, 0.00126760, 0.00114265, 0.00161505, 0.00149405,
  0.00135245, 0.00122178, 0.00170738, 0.00157860, ... -0.000189959
}

When I switch to smaller bit additions for both scale and width, the data is preserved:

Sequence with smaller changes

2 02 129 | Change scale
2 01 139 | Change data width
0 14 044 | Channel Radiance

Values before encoding (same as above)

Dump after encoding (correct)

#3#channelRadiance={
  0.044469, 0.0443561, 0.0442173, 0.0440923, 0.0445647, 0.0444437,
  0.0443021, 0.0441714, 0.0446571, 0.0445283, ... 0.0427597
}

It seems that when the precision is too high or widht is too big, eccodes can not preserve the data and did not say anything about it. With smaller increments on both sclae and widht, everything decodes correctly.

Is this overflow expected for this descriptor too in eccodes ? I think this is an issue.

here is the small test file you can use to reproduce my test

#!/usr/bin/env python3

"""

Case A (bad):  2 02 132 | Change scale,   2 01 142 | Change data width, 0 14 044 | Channel Radiance
Case B (good): 2 02 129 | Change scale,   2 01 139 | Change data width, 0 14 044 | Channel Radiance

"""

import sys
import math
import numpy as np

import eccodes as ec

DATA = np.array([
    0.04446902, 0.04435610, 0.04421728, 0.04409232, 0.04456472, 0.04444372,
    0.04430212, 0.04417145, 0.04465705, 0.04452827, 0.04438906, 0.04425236,
    0.04473894, 0.04461614, 0.04447776, 0.04432860, 0.04394687, 0.04385525,
    0.04377634, 0.04370359, 0.04401626, 0.04391416, 0.04382203, 0.04374794,
    0.04408451, 0.04396805, 0.04386687, 0.04378401, 0.04414416, 0.04402550,
    0.04391074, 0.04381992, 0.04363312, 0.04357282, 0.04352307, 0.04349085,
    0.04367440, 0.04360713, 0.04355980, 0.04351918, 0.04369757, 0.04363627,
    0.04359001, 0.04354533, 0.04371746, 0.04366205, 0.04361034, 0.04356573,
    0.04344143, 0.04340577, 0.04337966, 0.04336203, 0.04346674, 0.04343274,
    0.04340791, 0.04339026, 0.04349062, 0.04346098, 0.04343556, 0.04341582,
    0.04350889, 0.04348587, 0.04345985, 0.04343704, 0.04334192, 0.04333122,
    0.04331655, 0.04330126, 0.04336793, 0.04336292, 0.04334869, 0.04333252,
    0.04338875, 0.04338710, 0.04337548, 0.04335770, 0.04340639, 0.04340467,
    0.04340379, 0.04338043, 0.04329825, 0.04330370, 0.04326704, 0.04322596,
    0.04332169, 0.04332502, 0.04329530, 0.04326439, 0.04334372, 0.04333830,
    0.04331811, 0.04329461, 0.04336591, 0.04335373, 0.04333759, 0.04331805,
    0.04317850, 0.04314479, 0.04310482, 0.04305345, 0.04322112, 0.04318363,
    0.04313881, 0.04309430, 0.04325825, 0.04322009, 0.04318298, 0.04314296,
    0.04327949, 0.04325058, 0.04322099, 0.04318557, 0.04298948, 0.04293785,
    0.04288587, 0.04283365, 0.04303745, 0.04298583, 0.04293556, 0.04288307,
    0.04307953, 0.04303213, 0.04298223, 0.04293146, 0.04311695, 0.04307364,
    0.04302703, 0.04297666, 0.04275687, 0.04270030, 0.04263294, 0.04256347,
    0.04280465, 0.04274773, 0.04267966, 0.04260267, 0.04284727, 0.04279273,
    0.04272599, 0.04264922, 0.04289066, 0.04283379, 0.04276797, 0.04269554,
    0.04248050, 0.04241407, 0.04234686, 0.04227745, 0.04251381, 0.04245070,
    0.04238236, 0.04231047, 0.04255567, 0.04249222, 0.04242066, 0.04234224,
    0.04259613, 0.04253162, 0.04245618, 0.04237467, 0.04219047, 0.04212669,
    0.04206411, 0.04200748, 0.04221739, 0.04214675, 0.04207996, 0.04202104,
    0.04224307, 0.04216729, 0.04209596, 0.04203505, 0.04226664, 0.04218952,
    0.04211556, 0.04204924, 0.04194257, 0.04190085, 0.04187385, 0.04186409,
    0.04195225, 0.04190436, 0.04187250, 0.04185942, 0.04195988, 0.04190829,
    0.04187289, 0.04185517, 0.04196549, 0.04191292, 0.04187324, 0.04185010,
    0.04187265, 0.04190863, 0.04196642, 0.04205053, 0.04186384, 0.04189453,
    0.04194862, 0.04202445, 0.04185389, 0.04187894, 0.04192615, 0.04199718,
    0.04184338, 0.04186244, 0.04190138, 0.04196844, 0.04220955, 0.04238364,
    0.04264227, 0.04296868, 0.04217612, 0.04234864, 0.04258724, 0.04290241,
    0.04214178, 0.04230801, 0.04253500, 0.04283334, 0.04210910, 0.04226496,
    0.04248085, 0.04275971
], dtype=np.float64)

DESC_BAD  = [202132, 201142, 14044]  # big width adn high precision==> will fail
DESC_GOOD = [202129, 201139, 14044]  # smaller precisicion and size ==> wont' fail

def encode_decode(descriptors, data, out_path):
    n = int(data.size)
    h = ec.codes_new_from_samples("BUFR4", ec.CODES_PRODUCT_BUFR)
    try:
        ec.codes_set(h, "compressedData", 1)
        ec.codes_set(h, "numberOfSubsets", n)
        ec.codes_set_array(h, "unexpandedDescriptors", descriptors)
        ec.codes_set_array(h, "channelRadiance", data)
        ec.codes_set(h, "pack", 1)
        with open(out_path, "wb") as f:
            ec.codes_write(h, f)
    finally:
        ec.codes_release(h)

    # Decode
    with open(out_path, "rb") as f:
        d = ec.codes_bufr_new_from_file(f)
        if d is None:
            raise RuntimeError("Could not create handle from encoded BUFR.")
        try:
            ec.codes_set(d, "unpack", 1)
            decoded = np.array(ec.codes_get_array(d, "channelRadiance"), dtype=np.float64)
        finally:
            ec.codes_release(d)

    return decoded

def summarize(case_name, original, decoded, show=20):
    diffs = decoded - original
    max_abs = float(np.max(np.abs(diffs)))
    mae = float(np.mean(np.abs(diffs)))
    print(f"\n=== {case_name} ===")
    print(f"ORIGINAL (first {show}): {np.array2string(original[:show], precision=8, floatmode='maxprec')}")

    print(f"\n\n")
    print(f"Decoded  (first {show}): {np.array2string(decoded[:show],  precision=8, floatmode='maxprec')}")
    print(f"Len: {decoded.size}, Max|Δ|: {max_abs:.8g}, MAE: {mae:.8g}")

def main():
    # Case A: large scale/width change , it is corruption expected
    try:
        bad_out = "/tmp/width_scale_bad.bufr"
        bad_dec = encode_decode(DESC_BAD, DATA, bad_out)
        summarize("ERROR Case A (2 02 132, 2 01 142)  expected BAD", DATA, bad_dec)
    except Exception as e:
        print(f"Case A failed with exception: {e}")

    # Case B: smaller scale/width change, eccodes does not complain
    try:
        good_out = "/tmp/width_scale_good.bufr"
        good_dec = encode_decode(DESC_GOOD, DATA, good_out)
        print("========================================================================")
        summarize("SUCCESS Case B (2 02 129, 2 01 139) expected GOOD", DATA, good_dec)
    except Exception as e:
        print(f"Case B failed with exception: {e}")

if __name__ == "__main__":
    main()

and the stacktrace I have on my side

=== ERROR Case A (2 02 132, 2 01 142)  expected BAD ===
ORIGINAL (first 20): [0.04446902 0.0443561  0.04421728 0.04409232 0.04456472 0.04444372
 0.04430212 0.04417145 0.04465705 0.04452827 0.04438906 0.04425236
 0.04473894 0.04461614 0.04447776 0.0443286  0.04394687 0.04385525
 0.04377634 0.04370359]



Decoded  (first 20): [0.00151935 0.00140643 0.00126761 0.00114265 0.00161505 0.00149405
 0.00135245 0.00122178 0.00170738 0.0015786  0.00143939 0.00130269
 0.00178927 0.00166647 0.00152809 0.00137893 0.0009972  0.00090558
 0.00082667 0.00075392]
Len: 224, Max|Δ|: 0.042949673, MAE: 0.042949673
========================================================================

=== SUCCESS Case B (2 02 129, 2 01 139) expected GOOD ===
ORIGINAL (first 20): [0.04446902 0.0443561  0.04421728 0.04409232 0.04456472 0.04444372
 0.04430212 0.04417145 0.04465705 0.04452827 0.04438906 0.04425236
 0.04473894 0.04461614 0.04447776 0.0443286  0.04394687 0.04385525
 0.04377634 0.04370359]



Decoded  (first 20): [0.04446902 0.0443561  0.04421728 0.04409232 0.04456472 0.04444372
 0.04430212 0.04417145 0.04465705 0.04452827 0.04438906 0.04425236
 0.04473894 0.04461614 0.04447776 0.0443286  0.04394687 0.04385525
 0.04377634 0.04370359]
Len: 224, Max|Δ|: 1.3877788e-17, MAE: 8.2089593e-18

What are the steps to reproduce the bug?

check comment

Version

2.44.0

Platform (OS and architecture)

RHEL

Relevant log output

Accompanying data

No response

Organisation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions