Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mean() output discrepancy #189

Open
pratikunterwegs opened this issue Sep 13, 2023 · 2 comments
Open

mean() output discrepancy #189

pratikunterwegs opened this issue Sep 13, 2023 · 2 comments

Comments

@pratikunterwegs
Copy link
Collaborator

pratikunterwegs commented Sep 13, 2023

There's a potential issue with mean.epidist() where the mean returned when the distribution is parameterised and when it is not are different. Is this the expected behaviour, or should the two be mostly similar? See reprex.

I haven't tried this for distributions other than the gamma, so it might be an issue with convert_params_gamma() alone.

Alternatively, is this an issue with the values for some of the delay distributions in the database, where the summary stats and parameters don't match?

library(epiparameter)

onset_to_death_ebola <- epiparameter::epidist_db(
  disease = "Ebola Virus Disease",
  epi_dist = "onset_to_death",
  author = "Barry_etal",
  single_epidist = TRUE
)
#> Using Barry, etal (2018). "Outbreak of Ebola virus disease in the Democratic
#> Republic of the Congo, April–May, 2018: an epidemiological study." _The
#> Lancet_. doi:10.1016/S0140-6736(18)31387-4
#> <https://doi.org/10.1016/S0140-6736%2818%2931387-4>. 
#> To retrieve the citation use the 'get_citation' function

# code to convert params to summary stats
dist <- family(onset_to_death_ebola)
params <- get_parameters(onset_to_death_ebola)
args <- c(dist, as.list(params))
summary_stats <- do.call(convert_params_to_summary_stats, args = args)

# mean estimated from summary stats
summary_stats$mean
#> [1] 7.999999

# mean from centre_spread
mean(onset_to_death_ebola)
#> [1] 9.3

# remove mean and rely on parameter conversion
onset_to_death_ebola$summary_stats$centre_spread$mean = NA

mean(onset_to_death_ebola)
#> [1] 7.999999

Created on 2023-09-13 by the reprex package (v2.0.1)

@joshwlambert
Copy link
Member

One reason for the discrepancy is that some <epiparameter> objects are parameterised using numerical optimisation and sometimes this doesn't use the mean.

For example in the reprex below the Chikungunya generation time is reported with a mean and percentiles and as a gamma distribution. In this case the gamma parameters are inferred using the percentiles (using extract_param()) and then when mean() is called the <epiparameter> object it returns the mean from the paper, this differs from the mean calculated from the parameters due to numerical imprecision of the optimisation.

library(epiparameter)
db <- epiparameter_db()  
#> Returning 125 results that match the criteria (100 are parameterised). 
#> Use subset to filter by entry variables or single_epiparameter to return a single entry. 
#> To retrieve the citation for each use the 'get_citation' function
chik <- db[[124]]
chik
#> Disease: Chikungunya
#> Pathogen: Chikungunya Virus
#> Epi Parameter: generation time
#> Study: Guzzetta G, Vairo F, Mammone A, Lanini S, Poletti P, Manica M, Rosa R,
#> Caputo B, Solimini A, della Torre A, Scognamiglio P, Zumla A, Ippolito
#> G, Merler S (2020). "Spatial modes for transmission of chikungunya
#> virus during a large chikungunya outbreak in Italy: a modeling
#> analysis." _BMC Medicine_. doi:10.1186/s12916-020-01674-y
#> <https://doi.org/10.1186/s12916-020-01674-y>.
#> Distribution: gamma (days)
#> Parameters:
#>   shape: 8.633
#>   scale: 1.447
chik$prob_distribution
#> <distribution[1]>
#> [1] Γ(8.6, 0.69)
chik$summary_stats
#> $mean
#> [1] 12.4
#> 
#> $mean_ci_limits
#> [1] 11.7 13.3
#> 
#> $mean_ci
#> [1] 95
#> 
#> $quantiles
#>  2.5 97.5 
#>  5.6 22.1
mean(chik)
#> [1] 12.4
dist <- epi_namedist <- family(chik)
params <- get_parameters(chik)
args <- c(dist, as.list(params))
summary_stats <- do.call(convert_params_to_summary_stats, args = args)
summary_stats$mean
#> [1] 12.49139
# difference
abs(mean(chik) - summary_stats$mean)
#> [1] 0.09139252

Created on 2025-01-08 with reprex v2.1.0

In these cases I don't think there is much we can do, and just make clear that users should use the mean() function. I'll add it somewhere to the documentation.

@pratikunterwegs
Copy link
Collaborator Author

Ah I see, thanks for explaining!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants