Skip to content

start-stop-mwaa-environment - mwaa_import_data.py - variable.csv - fails for field larger than field limit  #74

@mvitale-kensu

Description

@mvitale-kensu

Hello guys,

In our case the resume step fails because the mwaa_import_data dag fails while importing variable.csv
This is the error:

[2024-05-16, 08:01:16 UTC] {{taskinstance.py:1937}} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/operators/python.py", line 192, in execute
    return_value = self.execute_callable()
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/operators/python.py", line 209, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dags/mwaa_import_data.py", line 146, in importVariable
    for row in reader:
_csv.Error: field larger than field limit (131072)

Just FYI I've fixed it by adding these few lines of code to mwaa_import_data.py:

import sys
import csv
maxInt = sys.maxsize

while True:
    # decrease the maxInt value by factor 10 
    # as long as the OverflowError occurs.

    try:
        csv.field_size_limit(maxInt)
        break
    except OverflowError:
        maxInt = int(maxInt/10)

Coming from this: https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072

I am not sure if this is the correct way to manage this, but for what I've seen it seems to be working fine for us.

We are on airflow 2.7.2 and I am using the latest code of this project available in main.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions