Skip to content

Conversation

@hon1nbo
Copy link
Contributor

@hon1nbo hon1nbo commented Aug 4, 2025

Fix PR for Issue #110

Whilst adding "Quantum Vibe" I found a bug in the URL parser implementation within comic.py

parsed_filepath = urlparse(url).path
file_extension = parsed_filepath[parsed_filepath.rindex(".") :]

This will throw a ValueError exception when the image is identified within a query parameter rather than a URL path.
Example that throws exception from "Quantum Vibe"
https://quantumvibe.com/disppageV3?story=qv&file=/simages/qv/qv1-001.jpg

To remedy this, I implemented exception handling and query parsing to identify where the filename/extension lives within the full URL. If none can be identified, it will gracefully save the file with a .unknown extension to at least preserve the data and allow later recovery through a rename operation when the file type is known. I could probably implement some MIME detection here in that scenario, but have not yet.

Sample config for Quantum Vibe for testing:

    "QuantumVibe": {
        "name": "QuantumVibe",
        "start_url": "https://quantumvibe.com/strip?page=1",
        "comic_image_selector": "//a[contains(@href, 'strip?page=')]//img[contains(@src, 'disppage')]/@src",
        "next_page_selector": "//a[img[contains(@src, 'nav/NextStrip2.gif')]]/@href",
    },

Limitations with this draft PR:

  • Does not gracefully handle an edge case where the path has an extension but it's not the image extension which is located deeper in the URL ( example: something.php?comic=somecomic&page=pagexyz.jpg)
  • Despite logic changes leaving the version number change to package maintainer.

@hon1nbo
Copy link
Contributor Author

hon1nbo commented Aug 4, 2025

Looks like this is failing on JackRabbit again; I swapped it to the line you had edited in the last PR.

Testing locally I couldn't get it running still after you proposed the change. I opened a separate PR to with a change to jackrabbit as a hard comparison of @rel='Next' vs the contains function. Building with poetry and running the downloader with that seems to be running smoothly.

@coveralls
Copy link

Coverage Status

coverage: 92.951% (-1.5%) from 94.439%
when pulling b02e708 on hon1nbo:queryparsing
into 28bed86 on J-CPelletier:master.

Comment on lines +202 to +230
parts = urlparse(url)

try:
parsed_filepath = parts.path
file_extension = parsed_filepath[parsed_filepath.rindex(".") :]
except:
if "?" in url:
if "&" in parts.query:
parsed_queries = parts.query.split("&")
for current_query in parsed_queries:
try:
file_extension = current_query[current_query.rindex(".") :]
break
except:
# nothing, loop again
dummy_var = ""
else:
try:
file_extension = parts.query[parts.query.rindex(".") :]
except:
print(
"File extension unknown; setting as '.unknown' to preserve data"
)
file_extension = ".unknown"
else:
# worst case we can't identify the extension, setting 'unknown' to allow saving file for evaluation
print("File extension unknown; setting as '.unknown' to preserve data")
file_extension = ".unknown"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We'd probably want to extract this into a different method (like get_file_path that returns both file path and extension)
  2. We could also use the parse_qs(...) method available in the urllib.parse module. Here's an example of what the results would look like:
>>> from urllib.parse import parse_qs
>>> parts = urlparse("https://quantumvibe.com/disppageV3?story=qv&file=/simages/qv/qv1-001.jpg")
>>> parse_qs(parts.query)
{'story': ['qv'], 'file': ['/simages/qv/qv1-001.jpg']}
  1. We should also add unit tests for the extracted method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah that parse_qs would probably be more efficient, I didn't catch that in urlparse documentation.

I'll break this out in the next couple days due to #job

@hon1nbo
Copy link
Contributor Author

hon1nbo commented Oct 28, 2025

Quick comment, I haven't forgotten about this life stuff just happened. I still plan on a refactor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants