Skip to content

Conversation

@martindanka
Copy link
Contributor

Fixes

  • Current qualification studied (educaim)

    • educaim19 (age 19, W6Saim)

      • Code 13 "Other level 1" is now grouped as 1 = NVQ 1–3 instead of −3 = Not asked at the fieldwork stage / participated / interviewed.
    • educaim20 (age 20, W7SAim)

      • Code −94 "Insufficient information" is now mapped to −8 = Don't know / insufficient information instead of −2 = Script error / information lost.
    • educaim25 (age 25, S8 tick-box items)

      • W8ACQUC0P "Academic qualifications studying: Don't know" is now mapped to −8 = Don't know / insufficient information rather than 4 = None of these qualifications.
      • W8VCQUC0P "Vocational qualifications studying: None of the above" is now mapped to 4 = None of these qualifications instead of 2 = None / entry, so that "none of the listed qualifications" is not treated as a low-level qualification.
      • When both “Don’t know” and “Refusal” are selected, the code now gives priority to Refusal (−9) over Don't know (−8) in the final educaim25 variable, as intended.
    • educaim32 (age 32, S9)

      • The code 5 = Not studying is now only assigned when W9ECONACT2 is in 1–5 or 8–14, that is, when there is a valid code showing the respondent is doing something other than education.
      • If W9ECONACT2 == −1, respondents now stay as −1 = Item not applicable, instead of being re-coded as “Not studying” via the older rule W9ECONACT2 != 6 & W9ECONACT2 != 7.
  • Parents’ highest qualifications (educma / educpa)

    • Code 19 "Qualification, level unspecified" is now mapped to 3 = Other instead of −2 = Script error / information lost.
    • For mother’s full education (educdtlma), the order for pulling information across sweeps is now S1 → S2 → S4. This follows a simple rule: use the first positive (non-missing) code, and is in line with how father’s education is handled.

Improvements

  • Current qualification studied (educaim)

    • For S8 and S9, I added lookup lists (for example, educaim_groups_s8, educaim_groups_s9) that group the tick-box educational qualifications.

    • I added a small helper function has_any_tick() to derive “any tick in this group”.

    • For S8 and S9, I amended the recode logic so that the helpers are used.

  • Parental education

    • I defined explicit label vectors, parent_edu_detailed_labels and parent_edu_simple_labels, and applied them via haven::labelled(). This keeps the numeric codes and labels aligned with the documentation and makes them easier to inspect.
    • The code that builds the detailed education (educdtlma, educdtlpa) and the simpler 3-level variables (educma, educpa) is now more compact, and separates:
      • sweep-level recoding from
      • the final 3-level summary.
  • All derived variables now stored as haven::labelled rather than factors, so that values/labels match the documentation.

Unresolved

  • Own qualifications: I am not able to sign off this derivation at this stage, as the intended logic is not documented clearly enough. The code combines derived variables with individual tick-box items, but it is not clear why these sources are combined, what the precedence rules are, or how different missing-value codes should be handled when they conflict. If helpful, I'd be happy to revisit, but would need more comments or brief documentation.

Suggestions for documentation

  • It may help users if the User Guide briefly explains the logic of how the various educational qualification variables were derived. This would also be good for maintaining the code in the future (complicated logic without documentation is quite error prone).

@martindanka martindanka requested review from dbann and fwwu December 20, 2025 03:20
@martindanka martindanka self-assigned this Dec 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants