Skip to content

Erro na obtenção de parágrafos #114

@robertatakenaka

Description

@robertatakenaka

Descrição do problema

celeryworker-1  | [2025-10-24 12:21:56,513: INFO/ForkPoolWorker-23081] section: {'lang': 'pt', 'text': 'Teorias, Investigações e Estudos de Caso'}
celeryworker-1  | [2025-10-24 12:21:56,516: INFO/ForkPoolWorker-23081] duplicated section: Teorias, Investigações e Estudos de Caso
celeryworker-1  | [2025-10-24 12:21:57,416: INFO/ForkPoolWorker-23080] Skip update: equal
celeryworker-1  | [2025-10-24 12:21:57,510: INFO/ForkPoolWorker-23080] {'input_data': {'sps_pkg_name': '0101-3106-ide-46-78-44', 'pid_v3': 'n9DWRr9cCM4DvTBdrzKyDLh', 'pid_v2': 'S0101-31062024000200044', 'aop_pid': None, 'filename': '0101-3106-ide-46-78-44.xml', 'files': ['0101-3106-ide-46-78-44.xml'], 'filenames': ['0101-3106-ide-46-78-44.xml'], 'origin': '/tmp/tmpytgbpjeo/0101-3106-ide-46-78-44.zip'}, 'xml_adapter_data': {'pkg_name': '0101-3106-ide-46-78-44', 'issn_print': '0101-3106', 'issn_electronic': None, 'article_pub_year': '2025', 'pub_year': '2024', 'main_doi': '10.5935/0101-3106.v46n78.04', 'elocation_id': None, 'volume': '46', 'number': '78', 'suppl': None, 'fpage': '44', 'fpage_seq': None, 'lpage': '54', 'z_surnames': None, 'z_collab': None, 'z_links': None, 'z_partial_body': '211f8e44c34c0bf4b2bb43fac23a8d510d962e8c06b2967becc526ff789a73ef'}, 'registered': True, 'v3': 'n9DWRr9cCM4DvTBdrzKyDLh', 'v2': 'S0101-31062024000200044', 'aop_pid': None, 'pkg_name': '0101-3106-ide-46-78-44', 'finger_print': '45a4f62547e7f53334103a98021ad6e36c4f7793a77b3ab12b438fcc71bab12c', 'created': '2025-08-25T17:21:33.309081+00:00', 'updated': '2025-08-25T17:21:33.373595+00:00', 'record_status': 'updated', 'registered_in_core': True, 'is_equal': False, 'do_remote_registration': True, 'do_local_registration': True, 'xml_changed': {}, 'apply_xml_changes': False, 'skip_update': True, 'xml_with_pre': <packtools.sps.pid_provider.xml_sps_lib.XMLWithPre object at 0x7f5b5270d290>, 'registered_in_upload': True, 'synchronized': True, 'filename': '0101-3106-ide-46-78-44.xml', 'changed': True}
celeryworker-1  | [2025-10-24 12:21:57,725: INFO/ForkPoolWorker-23081] first_reference: 105
celeryworker-1  | [2025-10-24 12:21:57,726: INFO/ForkPoolWorker-23081] last_reference: 156
celeryworker-1  | [2025-10-24 12:21:57,810: ERROR/ForkPoolWorker-23081] Type 'NoneType' cannot be serialized.
celeryworker-1  | Traceback (most recent call last):
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_body.py", line 172, in get_text_block
celeryworker-1  |     return build_text(paragraphs)
celeryworker-1  |            ^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_body.py", line 209, in build_text
celeryworker-1  |     document = "".join(fix_paragraphs(p_records))
celeryworker-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_body.py", line 221, in fix_paragraphs
celeryworker-1  |     yield html_fixer.avoid_mismatched_p(item.data["text"])
celeryworker-1  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  | AttributeError: module 'scielo_classic_website.htmlbody.html_fixer' has no attribute 'avoid_mismatched_p'
celeryworker-1  | 
celeryworker-1  | During handling of the above exception, another exception occurred:
celeryworker-1  | 
celeryworker-1  | Traceback (most recent call last):
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_fixer.py", line 321, in html2xml
celeryworker-1  |     content = tostring(body, method="xml", encoding="utf-8").decode("utf-8")
celeryworker-1  |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/lxml/html/__init__.py", line 1841, in tostring
celeryworker-1  |     html = etree.tostring(doc, method=method, pretty_print=pretty_print,
celeryworker-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "src/lxml/etree.pyx", line 3513, in lxml.etree.tostring
celeryworker-1  | TypeError: Type 'NoneType' cannot be serialized.
celeryworker-1  | [2025-10-24 12:21:57,896: INFO/ForkPoolWorker-23081] main_html_paragraphs
celeryworker-1  | [2025-10-24 12:21:57,896: ERROR/ForkPoolWorker-23081] Type 'NoneType' cannot be serialized.
celeryworker-1  | Traceback (most recent call last):
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_body.py", line 172, in get_text_block
celeryworker-1  |     return build_text(paragraphs)
celeryworker-1  |            ^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_body.py", line 209, in build_text
celeryworker-1  |     document = "".join(fix_paragraphs(p_records))
celeryworker-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_body.py", line 221, in fix_paragraphs
celeryworker-1  |     yield html_fixer.avoid_mismatched_p(item.data["text"])
celeryworker-1  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  | AttributeError: module 'scielo_classic_website.htmlbody.html_fixer' has no attribute 'avoid_mismatched_p'
celeryworker-1  | 
celeryworker-1  | During handling of the above exception, another exception occurred:
celeryworker-1  | 
celeryworker-1  | Traceback (most recent call last):
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/models/document.py", line 147, in main_html_paragraphs
celeryworker-1  |     return BodyFromISIS(self.p_records).parts
celeryworker-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_body.py", line 153, in parts
celeryworker-1  |     parts["before references"] = get_text_block(self.before_references_paragraphs)
celeryworker-1  |                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_body.py", line 174, in get_text_block
celeryworker-1  |     return get_text(get_paragraphs_data(paragraphs))
celeryworker-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_body.py", line 203, in get_text
celeryworker-1  |     return "".join([item["text"] for item in items or []])
celeryworker-1  |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_body.py", line 203, in <listcomp>
celeryworker-1  |     return "".join([item["text"] for item in items or []])
celeryworker-1  |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_body.py", line 184, in get_paragraphs_data
celeryworker-1  |     hc = HTMLContent(item.data["text"])
celeryworker-1  |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_body.py", line 32, in __init__
celeryworker-1  |     self.fixed_or_original = html_fixer.get_html_fixed_or_original(content)
celeryworker-1  |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_fixer.py", line 24, in get_html_fixed_or_original
celeryworker-1  |     fixed_html = get_fixed_html(original)
celeryworker-1  |                  ^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_fixer.py", line 76, in get_fixed_html
celeryworker-1  |     return html2xml(tree)
celeryworker-1  |            ^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/scielo_classic_website/htmlbody/html_fixer.py", line 321, in html2xml
celeryworker-1  |     content = tostring(body, method="xml", encoding="utf-8").decode("utf-8")
celeryworker-1  |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "/usr/local/lib/python3.11/site-packages/lxml/html/__init__.py", line 1841, in tostring
celeryworker-1  |     html = etree.tostring(doc, method=method, pretty_print=pretty_print,
celeryworker-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celeryworker-1  |   File "src/lxml/etree.pyx", line 3513, in lxml.etree.tostring
celeryworker-1  | TypeError: Type 'NoneType' cannot be serialized.
celeryworker-1  | [2025-10-24 12:21:58,034: ERROR/ForkPoolWorker-23081] XMLBodyPipe not found raw.xml_body
celeryworker-1  | NoneType: None
celeryworker-1  | [2025-10-24 12:21:58,153: INFO/ForkPoolWorker-23080] Generating new SciELO Publishing Package /tmp/tmp9zjrf9rb/0101-3106-ide-46-78-44.zip
celeryworker-1  | [2025-10-24 12:21:58,153: INFO/ForkPoolWorker-23080] Optimizing XML file 0101-3106-ide-46-78-44.xml [0/1]
celeryworker-1  | [2025-10-24 12:21:58,169: INFO/ForkPoolWorker-23080] Writing remained files from package in new SciELO Publishing Package
celeryworker-1  | [2025-10-24 12:21:58,933: INFO/ForkPoolWorker-23081] dict_items([])

Passos para reproduzir o problema

  1. Acesse a página ...
  2. Clique no link ...
  3. Role a página até ...
  4. Observe o erro apresentado

Comportamento esperado

Descreva com clareza qual seria o comportamento esperado (correto) ao reproduzir os passos acima.

Screenshots ou vídeos

Para dar mais detalhes e contexto sobre o erro, considere anexar fotos ou vídeos do problema.

Anexos

Está seção é opcional, utilize para referenciar arquivos que servem de insumo para reproduzir o erro, ex:

  • XML utilizado
  • HTML produzido
  • PDF criado

Ambiente utilizado

Quando aplicável, forneça detalhes sobre o ambiente utilizado, ex:

  • Navegador Mozilla Firefox versão 30
  • Windows XP
  • PC programs versão 1.0
  • Aparelho celular iPhone 7, iOS 7

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions