added exception handing & query parameter parsing for comics where fi… #111

hon1nbo · 2025-08-04T02:27:04Z

Fix PR for Issue #110

Whilst adding "Quantum Vibe" I found a bug in the URL parser implementation within comic.py

parsed_filepath = urlparse(url).path
file_extension = parsed_filepath[parsed_filepath.rindex(".") :]

This will throw a ValueError exception when the image is identified within a query parameter rather than a URL path.
Example that throws exception from "Quantum Vibe"
https://quantumvibe.com/disppageV3?story=qv&file=/simages/qv/qv1-001.jpg

To remedy this, I implemented exception handling and query parsing to identify where the filename/extension lives within the full URL. If none can be identified, it will gracefully save the file with a .unknown extension to at least preserve the data and allow later recovery through a rename operation when the file type is known. I could probably implement some MIME detection here in that scenario, but have not yet.

Sample config for Quantum Vibe for testing:

    "QuantumVibe": {
        "name": "QuantumVibe",
        "start_url": "https://quantumvibe.com/strip?page=1",
        "comic_image_selector": "//a[contains(@href, 'strip?page=')]//img[contains(@src, 'disppage')]/@src",
        "next_page_selector": "//a[img[contains(@src, 'nav/NextStrip2.gif')]]/@href",
    },

Limitations with this draft PR:

Does not gracefully handle an edge case where the path has an extension but it's not the image extension which is located deeper in the URL ( example: something.php?comic=somecomic&page=pagexyz.jpg)
Despite logic changes leaving the version number change to package maintainer.

…le is not in URL.path

hon1nbo · 2025-08-04T02:45:10Z

Looks like this is failing on JackRabbit again; I swapped it to the line you had edited in the last PR.

Testing locally I couldn't get it running still after you proposed the change. I opened a separate PR to with a change to jackrabbit as a hard comparison of @rel='Next' vs the contains function. Building with poetry and running the downloader with that seems to be running smoothly.

coveralls · 2025-08-04T23:38:38Z

coverage: 92.951% (-1.5%) from 94.439%
when pulling b02e708 on hon1nbo:queryparsing
into 28bed86 on J-CPelletier:master.

J-CPelletier · 2025-08-05T03:10:33Z

webcomix/comic.py

+        parts = urlparse(url)
+
+        try:
+            parsed_filepath = parts.path
+            file_extension = parsed_filepath[parsed_filepath.rindex(".") :]
+        except:
+            if "?" in url:
+                if "&" in parts.query:
+                    parsed_queries = parts.query.split("&")
+                    for current_query in parsed_queries:
+                        try:
+                            file_extension = current_query[current_query.rindex(".") :]
+                            break
+                        except:
+                            # nothing, loop again
+                            dummy_var = ""
+                else:
+                    try:
+                        file_extension = parts.query[parts.query.rindex(".") :]
+                    except:
+                        print(
+                            "File extension unknown; setting as '.unknown' to preserve data"
+                        )
+                        file_extension = ".unknown"
+            else:
+                # worst case we can't identify the extension, setting 'unknown' to allow saving file for evaluation
+                print("File extension unknown; setting as '.unknown' to preserve data")
+                file_extension = ".unknown"
+


We'd probably want to extract this into a different method (like get_file_path that returns both file path and extension)

We could also use the parse_qs(...) method available in the urllib.parse module. Here's an example of what the results would look like:

>>> from urllib.parse import parse_qs >>> parts = urlparse("https://quantumvibe.com/disppageV3?story=qv&file=/simages/qv/qv1-001.jpg") >>> parse_qs(parts.query) {'story': ['qv'], 'file': ['/simages/qv/qv1-001.jpg']}

We should also add unit tests for the extracted method

ah yeah that parse_qs would probably be more efficient, I didn't catch that in urlparse documentation.

I'll break this out in the next couple days due to #job

hon1nbo · 2025-10-28T22:18:12Z

Quick comment, I haven't forgotten about this life stuff just happened. I still plan on a refactor

added exception handing & query parameter parsing for comics where fi…

b02e708

…le is not in URL.path

Merge branch 'master' into queryparsing

eeb50a7

J-CPelletier requested changes Aug 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

added exception handing & query parameter parsing for comics where fi… #111

added exception handing & query parameter parsing for comics where fi… #111

Uh oh!

hon1nbo commented Aug 4, 2025

Uh oh!

hon1nbo commented Aug 4, 2025

Uh oh!

coveralls commented Aug 4, 2025

Uh oh!

J-CPelletier Aug 5, 2025

Uh oh!

hon1nbo Aug 5, 2025

Uh oh!

hon1nbo commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

added exception handing & query parameter parsing for comics where fi… #111

Are you sure you want to change the base?

added exception handing & query parameter parsing for comics where fi… #111

Uh oh!

Conversation

hon1nbo commented Aug 4, 2025

Fix PR for Issue #110

Uh oh!

hon1nbo commented Aug 4, 2025

Uh oh!

coveralls commented Aug 4, 2025

Uh oh!

J-CPelletier Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

hon1nbo Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

hon1nbo commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants