Skip to content

Conversation

@joshwlambert
Copy link
Member

This PR addresses #252 by adding the ability for users to specify the probability of sampling male and female contacts and cases using the new prob_male option in create_config(). The config list is used in .sim_internal() to sample the sex of each contact.

The default (prob_male = 0.5) is backwards compatible, however, because the prob argument is now specified for sample() it changes the random number chain meaning that the output is different for line list or contact tracing data that is sampled after sex (e.g. age, name, case type, etc.).

Unit tests are added to ensure the simulation errors as expected when prob_male $\geq$ 1 or $\leq$ 0. Due to the change in the random number generation chain the snapshots for sim_linelist(), sim_contacts() and sim_outbreak() have all be updated.

@joshwlambert joshwlambert added the enhancement New feature or request label Nov 10, 2025
Copy link
Member

@avallecam avallecam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @joshwlambert just run this reprex to test it. think it is ready to merge 🚀

# pak::pak("epiverse-trace/simulist@sex_sampling")

library(simulist)
library(magrittr)
sessioninfo::package_info(pkgs = "attached")
#>  package  * version date (UTC) lib source
#>  magrittr * 2.0.3   2022-03-30 [1] RSPM
#>  simulist * 0.6.0   2025-11-24 [1] Github (epiverse-trace/simulist@4b81aea)
#> 
#>  [1] C:/Users/AndreeValleCampos/Documents/0projects/epicatador/renv/library/windows/R-4.5/x86_64-w64-mingw32
#>  [2] C:/Users/AndreeValleCampos/AppData/Local/R/cache/R/renv/sandbox/windows/R-4.5/x86_64-w64-mingw32/0eea1ca5
#>  [3] C:/Program Files/R/R-4.5.1/library
#>  * ── Packages attached to the search path.

set.seed(1)

sim_data <- simulist::sim_linelist(
  outbreak_size = c(1000, 1500),
  config = simulist::create_config(prob_male = 0.2)
) %>%
  dplyr::as_tibble()
#> Warning: Number of cases exceeds maximum outbreak size. 
#> Returning data early with 1546 cases and 3059 total contacts (including cases).

sim_data %>% 
    dplyr::count(sex)
#> # A tibble: 2 × 2
#>   sex       n
#>   <chr> <int>
#> 1 f      1261
#> 2 m       285

out <- sim_data %>% 
  incidence2::incidence(
    date_index = "date_onset",
    interval = "day",
    groups = "sex"
)

incidence2::estimate_peak(x = out)
#> # A tibble: 2 × 8
#>   sex   count_variable observed_peak observed_count bootstrap_peaks lower_ci  
#>   <chr> <chr>          <date>                 <int> <list>          <date>    
#> 1 f     date_onset     2023-05-01                18 <df [100 × 1]>  2023-03-24
#> 2 m     date_onset     2023-04-18                 5 <df [100 × 1]>  2023-02-17
#> # ℹ 2 more variables: median <date>, upper_ci <date>

Created on 2025-11-24 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants