Skip to content

Conversation

@davezuckerman
Copy link
Contributor

The contributor in Tind is not always end with interviewer or interviewee so changed the regex to map based on containing either "interviewer" or "interviewee".

e.g. https://digicoll.lib.berkeley.edu/record/103515/export/xm?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modifies the contributor role detection logic in the TIND reference parser to handle cases where "interviewer" or "interviewee" doesn't appear at the end of the contributor value string. The change switches from using endsWith() to match() for detecting these roles.

Key Changes:

  • Changed from endsWith(' interviewer') to match(/interviewer/) for detecting interviewers
  • Changed from endsWith(' interviewee') to match(/interviewee/) for detecting interviewees

ref.title = value;
break;
case 'Contributor':
// if value ends in interviewee or interviewer, assign accordingly
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "if value ends in interviewee or interviewer" but the updated code uses match() which checks if the term appears anywhere in the string, not just at the end. This comment is now inaccurate and should be updated to reflect the actual behavior.

Suggested change
// if value ends in interviewee or interviewer, assign accordingly
// if value contains "interviewer" or "interviewee", assign accordingly

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair suggestion imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed and I actually had saw that in the code and was thinking I need to change that. Just forgot :)

Comment on lines +30 to 33
if (value.match(/interviewer/)) {
ref.interviewers.push(value.replace(' interviewer', '').trim());
} else if (value.endsWith(' interviewee')) {
} else if (value.match(/interviewee/)) {
ref.interviewees.push(value.replace(' interviewee', '').trim());
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from endsWith() to match() introduces several bugs:

  1. The regex pattern /interviewer/ will match "interviewer" anywhere in the string, not just at the end. This could cause false positives if someone's name contains "interviewer" (e.g., "John Interviewer-Smith").

  2. The replace() calls still only remove ' interviewer' and ' interviewee' with a leading space, which assumes these terms appear in a specific format. If the terms appear at the beginning, middle, or with different spacing, they won't be removed correctly.

  3. The if-else structure means if a value contains both "interviewer" and "interviewee", only the first match will be processed.

Consider using a more specific regex pattern that matches the expected format at the end of the string, or make the replace logic consistent with the matching logic. For example, you could use /interviewer\s*$/ to match "interviewer" at the end with optional trailing whitespace, or use a case-insensitive match if needed.

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure anyone’s name will contain “interviewer”, so I don’t know that this is so relevant. The leading space in the replace call may be something to look in to closer. I believe leaving this case-sensitive still makes the most sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and we need to key off of something in the string so those are the words we need to look for. I don't think the leading space is an issue, it comes from a $e subfield and there should always be a $a so " interviewer" and
" interviewee" should match just fine.

I actually think we maybe should make it case insensitive. It seems to always be lower cased but it doesn't have to be. Some of these may have an uppercase "I". Why do you think keeping it case insensitive is better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of leads to a bigger question for me - would it make more sense to modify the metadata parsing logic further up the chain in willa.tind.format_validate_pymarc to break interviewer and interviewee into their own fields based on the content of the 700$e instead of handling it here?

Copy link
Contributor Author

@davezuckerman davezuckerman Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like right now were lumping multiple Tind fields into a contributor list in willa. The 710, 711, and 712 along with any subfields in them. I'm not positive but I think the inteverviewee and interviewer would always be in the 710.

Since each field is it's own element in the list concatenated with all it's subfields parsing by "interviewer" and "interviewee" like in this branch should catch all the interviewer(s) and interviewee(s).

It doesn't look like it would be too complicated to change how that's done in format_validate_pymarc though assuming interviewer and interviewee are always in the 710 if we wanted to go that route.

Copy link
Member

@awilfox awilfox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good. Copilot made a few suggestions which I nominally agree with, but I would be fine with merging this as-is once the comments are fixed.

Copy link
Member

@anarchivist anarchivist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I think I mentioned before i'm questioning whether this is the best place to handle this logic.

Comment on lines +30 to 33
if (value.match(/interviewer/)) {
ref.interviewers.push(value.replace(' interviewer', '').trim());
} else if (value.endsWith(' interviewee')) {
} else if (value.match(/interviewee/)) {
ref.interviewees.push(value.replace(' interviewee', '').trim());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of leads to a bigger question for me - would it make more sense to modify the metadata parsing logic further up the chain in willa.tind.format_validate_pymarc to break interviewer and interviewee into their own fields based on the content of the 700$e instead of handling it here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants