-
Notifications
You must be signed in to change notification settings - Fork 14
Add script to render codebook #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Claude Sonnet 4.5 prompt usage
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds infrastructure for rendering and formatting the Party Facts codebook by introducing:
- A new R script (
codebook-render.R) that renders the codebook to HTML, Markdown, and PDF formats - A formatting prompt document that provides guidelines for synchronizing and standardizing Markdown formatting between the source Rmd and generated Markdown files
- Extensive formatting updates to both
codebook.Rmdandcodebook.mdincluding text wrapping, consistent use of en-dashes, proper escaping of special characters, and standardized spacing
Key Changes
- Added
codebook-render.Rscript that automates rendering to multiple output formats with error handling for optional PDF generation - Created formatting prompt documentation in
.github/prompts/with a convenience symlink inscripts/ - Applied comprehensive formatting standardization throughout the codebook files including line wrapping, escape character handling, and punctuation consistency
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
codebook/codebook-render.R |
New R script to render codebook to HTML, Markdown, and PDF formats with error handling |
.github/prompts/format-codebook.prompt.md |
Documentation of the formatting workflow and guidelines for maintaining consistency |
scripts/format-codebook.prompt.md |
Symlink to the prompt file for easier access from the scripts directory |
codebook/codebook.Rmd |
Source RMarkdown file with extensive formatting updates including text wrapping, consistent punctuation, proper escaping, and spacing standardization |
codebook/codebook.md |
Generated Markdown output reflecting all formatting changes from the Rmd source |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cat("\n\n2. Rendering Markdown (GitHub-flavored)...") | ||
| rmarkdown::render( | ||
| input = input_file, | ||
| output_format = "md_document", |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 27 states "GitHub-flavored Markdown", but the actual output format specified is "md_document" which is a generic markdown format. To render GitHub-flavored markdown, you should specify output_format = "github_document" instead, or use md_document with variant: gfm as shown in the YAML header of codebook.Rmd.
Note that the md_document format with variant gfm (as specified in the YAML) should produce GitHub-flavored markdown, but the explicit call to rmarkdown::render() uses just "md_document" without variant specification. Consider using github_document for clarity and consistency with the comment.
| output_format = "md_document", | |
| output_format = "github_document", |
| description: 'Synchronize codebook.Rmd and codebook.md with consistent formatting.' | ||
| --- | ||
|
|
||
| 1. Render the RMarkdown file: `Rscript -e "rmarkdown::render('codebook.Rmd', output_format = 'github_document')"` |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatting prompt instructions (line 6) specify using output_format = 'github_document', but the actual rendering script (codebook-render.R, line 31) uses output_format = "md_document". These should be consistent to avoid confusion.
If the intention is to use GitHub-flavored markdown (as suggested by the prompt and the YAML header's variant: gfm), both the prompt and the rendering script should use the same format specification.
| low = "thistle2", high = "darkblue", | ||
| # low="#fff7bc", high="#d95f0e", | ||
| guide="colorbar", na.value="lightgrey", name = expression(parties (log[e])) | ||
| guide = "colorbar", na.value = "lightgrey", name = expression(parties(log[e])) |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from expression(parties (log[e])) to expression(parties(log[e])) removes the space between "parties" and the opening parenthesis. This alters the visual appearance of the axis label in the plot.
If this formatting change is intentional for consistency, it's acceptable. However, if the previous spacing was deliberate for readability of the mathematical expression, consider whether this change is desired. The space might have been added to make "parties (log_e)" more readable as "parties (in logarithmic scale)" rather than appearing as a function call.
| } | ||
|
|
||
| body { | ||
| body { |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from an empty line to body { combines the CSS style tag closing with the body selector using an HTML entity for newline. While this may be part of the formatting standardization, it reduces readability of the source code.
The HTML entity represents a line feed character. This appears to be an artifact of the markdown rendering/formatting process. Consider whether this change improves or harms readability - typically having CSS selectors on their own lines is more readable than using HTML entities to represent line breaks within the code block.
| body { | |
| body { |
| For public datasets we complete the linking of all parties that meet | ||
| the | ||
| Party Facts population criteria (\>5%, see above). Public datasets are | ||
| visible on PF-Web without login and are archived regularly in the | ||
| PF-Dataverse. | ||
|
|
||
| For smaller parties, we may apply a staggered approach. All parties >5%, are linked during the initial import. Parties **<5% and >1%** (or 2%) are imported but **may not be linked completely** in Party Facts. A section "Linking status" in the "readme" of the dataset documents the status. | ||
| For smaller parties, we may apply a staggered approach. All parties | ||
| \>5%, are linked during the initial import. Parties **\<5% and \>1%** | ||
| (or 2%) are imported but **may not be linked completely** in Party | ||
| Facts. A section "Linking status" in the "readme" of the dataset | ||
| documents the status. | ||
|
|
||
| **Note** -- Staggered inclusion for parties <5% introduced in March 2021. Previously, all public and archived datasets were completely linked. | ||
| **Note** – Staggered inclusion for parties \<5% introduced in March |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 308, 309, 313, and 319 contain trailing spaces (two spaces at the end of lines) which are used in Markdown to create hard line breaks. While this is valid Markdown syntax, trailing spaces can be problematic as they are often invisible and may be automatically removed by editors or pre-commit hooks.
Consider using explicit line break syntax (backslash \ at end of line) instead of trailing spaces for better visibility and maintainability. Some lines in the same section already use backslash line breaks (e.g., lines 225-226, 230-231), so using backslash consistently would be better.
| script_path <- sub("--file=", "", args[grep("--file=", args)]) | ||
| script_dir <- dirname(normalizePath(script_path)) |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script attempts to extract the script path using grep("--file=", args) which may return an empty result when the script is run in certain contexts (e.g., via Rscript without the --file parameter, or through source()). This will cause sub() to operate on an empty character vector, and normalizePath() will fail.
Consider adding error handling to check if the script path was found, or provide a fallback mechanism. For example, you could check if length(script_path) > 0 before calling normalizePath(), or use tryCatch() around the working directory setup.
| script_path <- sub("--file=", "", args[grep("--file=", args)]) | |
| script_dir <- dirname(normalizePath(script_path)) | |
| script_arg_index <- grep("--file=", args) | |
| if (length(script_arg_index) > 0) { | |
| script_path <- sub("^--file=", "", args[script_arg_index[1L]]) | |
| script_dir <- dirname(normalizePath(script_path)) | |
| } else { | |
| # Fallback: use current working directory if script path is not available | |
| script_dir <- getwd() | |
| } |
Add codebook rendering script and Markdown formatting.