Skip to content

Conversation

@windymilla
Copy link
Collaborator

At bottom of Tools menu.

Load one of the files, e.g. the HTML file.
In the dialog, choose the HTML file and text file.
Click "Compare" (top right of dialog)

Some of the checkboxes work, such as Ignoring Case, and also suppressing Word Joiners - maybe some others 🤷

ppcomp sample.zip

@windymilla windymilla requested a review from srjfoo January 27, 2026 21:13
@rtonsing
Copy link
Contributor

rtonsing commented Jan 28, 2026

Might warn that HTML must pass W3C first. and/or report a failure better. With a missing end </span>, I get this:
error

Guiguts version: 2.0.14
Python version: 3.14.0 (tags/v3.14.0:ebf955d, Oct 7 2025, 10:15:03) [MSC v.1944 64 bit (AMD64)]
Tk/Tcl version: 8.6.15
OS Platform: Windows-11-10.0.26200-SP0

@rtonsing
Copy link
Contributor

The "Show Line Numbers for:" option seems awkward, since only one file can be in the main window, clicking a line only goes to the correct line if that file is the one chosen. Unless I'm missing something?

Must be some serious processing, takes a couple of minutes and spins up the fan on my Xeon notebook, with fairly routine files.

@rtonsing
Copy link
Contributor

Is the spaced punctuation on purpose? OK if it is, just takes getting used to.
"(3 +a.m.+)" shows as "( 3 + a . m . + )"

Interesting that inserting an actual space in the text file "(3 +a. m.+)" is not reported in either this or the online ppcomp, & is shown exactly the same as above.

@windymilla
Copy link
Collaborator Author

windymilla commented Jan 28, 2026

Might warn that HTML must pass W3C first. and/or report a failure better. With a missing end , I get this:...

In this draft, I am not attempting to catch any of the exceptions raised by the ppcomp code - they will be caught before this becomes eligible for merging into the main GG code.
(I've taken the ppcomp code from PPWB and made the smallest changes I can to link it into GG2)

@windymilla
Copy link
Collaborator Author

windymilla commented Jan 28, 2026

The "Show Line Numbers for:" option seems awkward, since only one file can be in the main window, clicking a line only goes to the correct line if that file is the one chosen. Unless I'm missing something?

Do you have an alternative suggestion? We can only load one file (non-negotiable), and the display of differences can only click correctly for one of the two files.

Must be some serious processing, takes a couple of minutes and spins up the fan on my Xeon notebook, with fairly routine files.

It was only 10-15 seconds for an average file on my laptop, I think - definitely not two minutes. Yes, it does quite a lot of processing! Bear in mind that when you run it at PPWB, it's running on the server, not your own computer.

@windymilla
Copy link
Collaborator Author

Is the spaced punctuation on purpose? OK if it is, just takes getting used to. "(3 +a.m.+)" shows as "( 3 + a . m . + )"

You'll notice lots of spacing in the online ppcomp too - maybe not exactly the same, but things like underscores often have additional spaces.

Interesting that inserting an actual space in the text file "(3 +a. m.+)" is not reported in either this or the online ppcomp, & is shown exactly the same as above.

Unsurprising that it's like online ppcomp, since it uses some of the same code.

@srjfoo
Copy link
Member

srjfoo commented Jan 28, 2026

The "Show Line Numbers for:" option seems awkward, since only one file can be in the main window, clicking a line only goes to the correct line if that file is the one chosen. Unless I'm missing something?

Do you have an alternative suggestion? We can only load one file (non-negotiable), and the display of differences can only click correctly for one of the two files.

Probably document it in the manual. I did not have problems with it. The linked line numbers were on the left, and beside them were the line numbers for the other file. I very much appreciated that, because I was able to find the corresponding place in the other file, open in another instance of GG. I can't think of a better way of doing it.

Must be some serious processing, takes a couple of minutes and spins up the fan on my Xeon notebook, with fairly routine files.

It was only 10-15 seconds for an average file on my laptop, I think - definitely not two minutes. Yes, it does quite a lot of processing! Bear in mind that when you run it at PPWB, it's running on the server, not your own computer.

Same here. Definitely took some time, but not enough that I got worried. I didn't time it, but probably in the range of 15 to 20 seconds.

@rtonsing
Copy link
Contributor

All just general feedback, not complaining, especially the time. I imagine it has to do with no native dwdiff on Windows, I'm impressed that it works at all.

Except for the show line numbers: it would be better to auto-detect which file is in the main window, it is useless to have the numbers for the other file on the left, as far as I can tell. Otherwise, clicking on the line causes the main window to go to a line that is unrelated.

Possibly auto-fill the relevant "Choose file" also. The user can always override it.

Although there may be scenarios where a 3rd file is active, etc., I imagine it would be most common to want to go to the line in the file being edited. If it is a 3rd file, line selection would not work at all.

My 2 cents.

1. Auto-detect if loaded file is an HTML file (e.g. `abc.html`).
If so, put the HTML line numbers first. If not, put the text line
numbers first. This should work for the vast majority of uses.
2. Remove the previous radio button to flip HTML/Text mode
3. Attempt to stop the tinycss macOS warning in the terminal
4. Attempt to stop the HTML parser outputting warnings or
errors in the terminal if the HTML file is faulty.
5. Catch the various exceptions, particularly the faulty HTML
file one. Just report the error message that PPcomp returns.
6. Improve spacing around punctuation (a bit)
@windymilla
Copy link
Collaborator Author

@rtonsing - adding as a comment here to make it easier for comments - I've pushed a commit with some of what you requested:

  1. Auto-detect if loaded file is an HTML file (e.g. abc.html). If so, put the HTML line numbers first. If not, put the text line numbers first. This should work for the vast majority of uses.
  2. Remove the previous radio button to flip HTML/Text mode
  3. Attempt to stop the tinycss macOS warning in the terminal
  4. Attempt to stop the HTML parser outputting warnings or errors in the terminal if the HTML file is faulty.
  5. Catch the various exceptions, particularly the faulty HTML file one. Just report the error message that PPcomp returns.
  6. Improve spacing around punctuation (a bit)

@windymilla
Copy link
Collaborator Author

Still untested/unimplemented (and therefore probably/certainly not working):

  1. Extract footnotes
  2. Custom CSS

@windymilla
Copy link
Collaborator Author

All just general feedback, not complaining, especially the time. I imagine it has to do with no native dwdiff on Windows, I'm impressed that it works at all.

Thanks - the feedback is helpful. That's right - I've used Python's difflib to try to do the job of dwdiff.

Except for the show line numbers: it would be better to auto-detect which file is in the main window, it is useless to have the numbers for the other file on the left, as far as I can tell. Otherwise, clicking on the line causes the main window to go to a line that is unrelated.

Done (after a fashion)

Possibly auto-fill the relevant "Choose file" also. The user can always override it.

I don't think this is worth it. It only takes a moment to choose the two files.

Although there may be scenarios where a 3rd file is active, etc., I imagine it would be most common to want to go to the line in the file being edited. If it is a 3rd file, line selection would not work at all.

Yes, by far the majority of the time, here would be no 3rd file.

1. Hide "Bold" option - not visible at ppwb
2. Hide "Greek title" option - we don't do transcription
3. Hide "Suppress Zero Space" - there is no code to support
it. At ppwb, you can check the box but it does nothing
4. Re-organize options to make space for custom CSS.
5. Make Custom CSS be added to the transformations used.
6. Make "Extract Footnotes" work. As at ppwb - line number
in HTML (& maybe text sometimes) will be wrong. This is
noted in the tooltip over the checkbox.
@windymilla
Copy link
Collaborator Author

Latest commit:

  1. Hide "Bold" option - not visible at ppwb
  2. Hide "Greek title" option - we don't do transcription
  3. Hide "Suppress Zero Space" - there is no code to support it. At ppwb, you can check the box but it does nothing
  4. Re-organize options to make space for custom CSS.
  5. Make Custom CSS be added to the transformations used.
  6. Make "Extract Footnotes" work. As at ppwb - line number in HTML (& maybe text sometimes) will be wrong. This is noted in the tooltip over the checkbox.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants