-
Notifications
You must be signed in to change notification settings - Fork 0
Tind doesn't also list interviewer and interviewee the same way #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ccounting for that for reference display
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR modifies the contributor role detection logic in the TIND reference parser to handle cases where "interviewer" or "interviewee" doesn't appear at the end of the contributor value string. The change switches from using endsWith() to match() for detecting these roles.
Key Changes:
- Changed from
endsWith(' interviewer')tomatch(/interviewer/)for detecting interviewers - Changed from
endsWith(' interviewee')tomatch(/interviewee/)for detecting interviewees
| ref.title = value; | ||
| break; | ||
| case 'Contributor': | ||
| // if value ends in interviewee or interviewer, assign accordingly |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "if value ends in interviewee or interviewer" but the updated code uses match() which checks if the term appears anywhere in the string, not just at the end. This comment is now inaccurate and should be updated to reflect the actual behavior.
| // if value ends in interviewee or interviewer, assign accordingly | |
| // if value contains "interviewer" or "interviewee", assign accordingly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fair suggestion imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed and I actually had saw that in the code and was thinking I need to change that. Just forgot :)
| if (value.match(/interviewer/)) { | ||
| ref.interviewers.push(value.replace(' interviewer', '').trim()); | ||
| } else if (value.endsWith(' interviewee')) { | ||
| } else if (value.match(/interviewee/)) { | ||
| ref.interviewees.push(value.replace(' interviewee', '').trim()); |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from endsWith() to match() introduces several bugs:
-
The regex pattern
/interviewer/will match "interviewer" anywhere in the string, not just at the end. This could cause false positives if someone's name contains "interviewer" (e.g., "John Interviewer-Smith"). -
The
replace()calls still only remove ' interviewer' and ' interviewee' with a leading space, which assumes these terms appear in a specific format. If the terms appear at the beginning, middle, or with different spacing, they won't be removed correctly. -
The if-else structure means if a value contains both "interviewer" and "interviewee", only the first match will be processed.
Consider using a more specific regex pattern that matches the expected format at the end of the string, or make the replace logic consistent with the matching logic. For example, you could use /interviewer\s*$/ to match "interviewer" at the end with optional trailing whitespace, or use a case-insensitive match if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure anyone’s name will contain “interviewer”, so I don’t know that this is so relevant. The leading space in the replace call may be something to look in to closer. I believe leaving this case-sensitive still makes the most sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed and we need to key off of something in the string so those are the words we need to look for. I don't think the leading space is an issue, it comes from a $e subfield and there should always be a $a so " interviewer" and
" interviewee" should match just fine.
I actually think we maybe should make it case insensitive. It seems to always be lower cased but it doesn't have to be. Some of these may have an uppercase "I". Why do you think keeping it case insensitive is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of leads to a bigger question for me - would it make more sense to modify the metadata parsing logic further up the chain in willa.tind.format_validate_pymarc to break interviewer and interviewee into their own fields based on the content of the 700$e instead of handling it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like right now were lumping multiple Tind fields into a contributor list in willa. The 710, 711, and 712 along with any subfields in them. I'm not positive but I think the inteverviewee and interviewer would always be in the 710.
Since each field is it's own element in the list concatenated with all it's subfields parsing by "interviewer" and "interviewee" like in this branch should catch all the interviewer(s) and interviewee(s).
It doesn't look like it would be too complicated to change how that's done in format_validate_pymarc though assuming interviewer and interviewee are always in the 710 if we wanted to go that route.
awilfox
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good. Copilot made a few suggestions which I nominally agree with, but I would be fine with merging this as-is once the comments are fixed.
anarchivist
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I think I mentioned before i'm questioning whether this is the best place to handle this logic.
| if (value.match(/interviewer/)) { | ||
| ref.interviewers.push(value.replace(' interviewer', '').trim()); | ||
| } else if (value.endsWith(' interviewee')) { | ||
| } else if (value.match(/interviewee/)) { | ||
| ref.interviewees.push(value.replace(' interviewee', '').trim()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of leads to a bigger question for me - would it make more sense to modify the metadata parsing logic further up the chain in willa.tind.format_validate_pymarc to break interviewer and interviewee into their own fields based on the content of the 700$e instead of handling it here?
The contributor in Tind is not always end with interviewer or interviewee so changed the regex to map based on containing either "interviewer" or "interviewee".
e.g. https://digicoll.lib.berkeley.edu/record/103515/export/xm?