Skip to content

OOM Error when running evalation step #2

@wtheune

Description

@wtheune

Hi there! Im getting an out of memory error when running the evalation code 'python main.py --inference_option Evaluation'. I tried it on an A10 20GB and A100 80GB but getting the same error either way.

2024-01-09 15:35:39.453000: W external/local_tsl/tsl/framework/bfc_allocator.cc:497] ******************************************************************************__________________
2024-01-09 15:35:39.453020: W tensorflow/core/framework/op_kernel.cc:1827] RESOURCE_EXHAUSTED: failed to allocate memory
Traceback (most recent call last):
File "/miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/TAG-DTA/TAG-DTA/source/main.py", line 694, in
run_eval_model(FLAGS)
File "/miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/TAG-DTA/TAG-DTA/source/main.py", line 664, in run_eval_model
aff_preds = model([affinity_data_val[0], affinity_data_val[1]], training=False)[1]
File "miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/TAG-DTA/TAG-DTA/source/transformer_encoder.py", line 174, in call
x, attn_enc_w = layer(x, mask)
File "/miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/TAG-DTA/TAG-DTA/source/transformer_encoder.py", line 83, in call
attn_out, attn_w = self.mha_layer([x, x, x], mask=mask)
File "/miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/TAG-DTA/TAG-DTA/source/mha_layer.py", line 154, in call
attention_output, attention_weights = self.attention([query, key, value], mask=mask)
File "/miniconda3/envs/TAG-DTA_mantis/lib/python3.9/site-packages/TAG-DTA/TAG-DTA/source/mha_layer.py", line 62, in call
scaled_attention_scores += (mask * -1e9)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Exception encountered when calling layer 'scaled_dot_product_attention' (type scaled_dot_product_attention).

{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} failed to allocate memory [Op:AddV2] name:

Call arguments received by layer 'scaled_dot_product_attention' (type scaled_dot_product_attention):
• inputs=['tf.Tensor(shape=(4867, 4, 576, 64), dtype=float32)', 'tf.Tensor(shape=(4867, 4, 576, 64), dtype=float32)', 'tf.Tensor(shape=(4867, 4, 576, 64), dtype=float32)']
• mask=tf.Tensor(shape=(4867, 1, 1, 576), dtype=float32)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions