Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purging data_path from flepiMoP #480

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open

Conversation

emprzy
Copy link
Collaborator

@emprzy emprzy commented Jan 23, 2025

Describe your changes.

This pull request attempts to remove the deprecated configuration option data_path from flepiMoP code and documentation. I found all instances of "data_path" in flepiMoP via a grep search, so I may have also removed data_path as a CLI option, which I can revert if needed.

Presently, there are still data_path references in the following files for the following reasons:

What does your pull request address? Tag relevant issues.

This pull request addresses GH #472.

@emprzy emprzy changed the base branch from main to dev January 23, 2025 20:17
@emprzy
Copy link
Collaborator Author

emprzy commented Jan 23, 2025

Some files that may be concerned with CLI (rather than config keys) that we may want to revert:

  • post0processing scripts
  • pre-precoessing scripts

@TimothyWillard TimothyWillard added documentation Relating to ReadMEs / gitbook / vignettes / etc. medium priority Medium priority. cli Relating to command line interfaces labels Jan 23, 2025
@TimothyWillard TimothyWillard linked an issue Jan 23, 2025 that may be closed by this pull request
@TimothyWillard TimothyWillard added the next release Marks a PR as a target to include in the next release. label Jan 23, 2025
Copy link
Contributor

@TimothyWillard TimothyWillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks fine to me, CI passes and purges outdated concept. However, I doubt I'm the best reviewer for this, particularly the post/preprocessing scripts since those exist outside of my knowledge base at the moment. It's also not clear to me that those are commonly used today anyways.

Copy link
Contributor

@saraloo saraloo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had practically removed this in operations already so was mainly just documentation and unused functionality in common and config afaik. Rest looks good, thanks

# subpop_setup:
# modeled_states: <list of state postal codes> e.g. MD, CA, NY
# mobility: <path to file relative to data_path> optional; default is 'mobility.csv'
# geodata: <path to file relative to data_path> optional; default is 'geodata.csv'
# mobility: <path to file relative> optional; default is 'mobility.csv'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relative to ... what?

I think unfortunately we don't have a super-clear answer to that. For the time being, maybe more like "path to a file; may be absolute or relative" - and leave specification of how relative works to future updates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also applies to next line)

# subpop_setup:
# modeled_states: <list of country ISO3 codes> e.g. ZMB, BGD, CAN
# mobility: <path to file relative to data_path> optional; default is 'mobility.csv'
# geodata: <path to file relative to data_path> optional; default is 'geodata.csv'
# mobility: <path to file> optional; default is 'mobility.csv'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this seems fine as well, though a little less explicit - probably just use this language for ..._US_... as well?

@@ -59,7 +58,7 @@ filtering:

With inference model runs, the number of simulations `nsimulations` refers to the number of final model simulations that will be produced. The `filtering$simulations_per_slot` setting refers to the number of iterative simulations that will be run in order to produce a single final simulation (i.e., number of simulations in a single MCMC chain).

<table><thead><tr><th>Item</th><th width="104.33333333333331">Required?</th><th>Type/Format</th></tr></thead><tbody><tr><td>simulations_per_slot</td><td><strong>required</strong></td><td>number of iterations in a single MCMC inference chain</td></tr><tr><td>do_filtering</td><td>required</td><td>TRUE if inference should be performed</td></tr><tr><td>data_path</td><td>required</td><td>file path where observed data are saved</td></tr><tr><td>likelihood_directory</td><td>required</td><td>folder path where likelihood evaluations will be stored as the inference algorithm runs</td></tr><tr><td>statistics</td><td>required</td><td>specifies which data will be used to calibrate the model. see <code>filtering::statistics</code> for details</td></tr><tr><td>hierarchical_stats_geo</td><td>optional</td><td>specifies whether a hierarchical structure should be applied to any inferred parameters. See <code>filtering::hierarchical_stats_geo</code> for details.</td></tr><tr><td>priors</td><td>optional</td><td>specifies prior distributions on inferred parameters. See <code>filtering::priors</code> for details</td></tr></tbody></table>
<table><thead><tr><th>Item</th><th width="104.33333333333331">Required?</th><th>Type/Format</th></tr></thead><tbody><tr><td>simulations_per_slot</td><td><strong>required</strong></td><td>number of iterations in a single MCMC inference chain</td></tr><tr><td>do_filtering</td><td>required</td><td>TRUE if inference should be performed</td></tr><tr></tr><tr><td>likelihood_directory</td><td>required</td><td>folder path where likelihood evaluations will be stored as the inference algorithm runs</td></tr><tr><td>statistics</td><td>required</td><td>specifies which data will be used to calibrate the model. see <code>filtering::statistics</code> for details</td></tr><tr><td>hierarchical_stats_geo</td><td>optional</td><td>specifies whether a hierarchical structure should be applied to any inferred parameters. See <code>filtering::hierarchical_stats_geo</code> for details.</td></tr><tr><td>priors</td><td>optional</td><td>specifies prior distributions on inferred parameters. See <code>filtering::priors</code> for details</td></tr></tbody></table>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some white space to this or switch to markdown formatting? impossible tell what's actually going to be in here and really incovenient to render locally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is explicitly "old" can probably leave it untouched?

@@ -10,7 +10,7 @@ These functions are used to print specific sections of the configuration files.

Used to generate the global header. For more information on global headers click [HERE](../../gempyor/model-implementation/introduction-to-configuration-files.md#global-header).

<table><thead><tr><th width="172.33333333333331">Variable name</th><th>Required (default value if optional)</th><th>Description</th></tr></thead><tbody><tr><td>sim_name</td><td><strong>Required</strong></td><td>Name of the configuration file to be generated. Generally based on the type of simulation</td></tr><tr><td>setup_name</td><td><strong>Optional</strong> (SMH)</td><td>Type of run - a Scenario Modeling Hub ("SMH") or Forecasting Hub ("FCH") Simulation.</td></tr><tr><td>disease</td><td><strong>Optional</strong> (covid19)</td><td>Pathogen or disease being simulated</td></tr><tr><td>smh_round</td><td><strong>Optional</strong> (NA)</td><td>Round number for Scenario Modeling Hub Submission</td></tr><tr><td>data_path</td><td><strong>Optional</strong> (data)</td><td>Folder path which contains where population data (size, mobility, etc) and ground truth data files are stored</td></tr><tr><td>model_output_dir_name</td><td><strong>Optional</strong> (model_output)</td><td>Folder path where the outputs of the simulated model is stored</td></tr><tr><td>sim_start_date</td><td><strong>Required</strong></td><td>Start date for model simulation</td></tr><tr><td>sim_end_date</td><td><strong>Required</strong></td><td>End date for model simulation</td></tr><tr><td>start_date_groundtruth</td><td><strong>Optional</strong> (NA)</td><td>Start date for fitting data for inference runs</td></tr><tr><td>end_date_groundtruth</td><td><strong>Optional</strong> (NA)</td><td>End date for fitting data for inference runs</td></tr><tr><td>nslots</td><td><strong>Required</strong></td><td>number of independent simulations to run</td></tr></tbody></table>
<table><thead><tr><th width="172.33333333333331">Variable name</th><th>Required (default value if optional)</th><th>Description</th></tr></thead><tbody><tr><td>sim_name</td><td><strong>Required</strong></td><td>Name of the configuration file to be generated. Generally based on the type of simulation</td></tr><tr><td>setup_name</td><td><strong>Optional</strong> (SMH)</td><td>Type of run - a Scenario Modeling Hub ("SMH") or Forecasting Hub ("FCH") Simulation.</td></tr><tr><td>disease</td><td><strong>Optional</strong> (covid19)</td><td>Pathogen or disease being simulated</td></tr><tr><td>smh_round</td><td><strong>Optional</strong> (NA)</td><td>Round number for Scenario Modeling Hub Submission</td></tr><tr></tr><tr><td>model_output_dir_name</td><td><strong>Optional</strong> (model_output)</td><td>Folder path where the outputs of the simulated model is stored</td></tr><tr><td>sim_start_date</td><td><strong>Required</strong></td><td>Start date for model simulation</td></tr><tr><td>sim_end_date</td><td><strong>Required</strong></td><td>End date for model simulation</td></tr><tr><td>start_date_groundtruth</td><td><strong>Optional</strong> (NA)</td><td>Start date for fitting data for inference runs</td></tr><tr><td>end_date_groundtruth</td><td><strong>Optional</strong> (NA)</td><td>End date for fitting data for inference runs</td></tr><tr><td>nslots</td><td><strong>Required</strong></td><td>number of independent simulations to run</td></tr></tbody></table>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as previous re table rendering.

@@ -573,18 +561,6 @@ validation_list$inference$do_inference<- function(value,full_config,config_name)
return(TRUE)
}

validation_list$inference$data_path<-function(value,full_config,config_name){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does inference still support a data_path or not? might be different from the top level option (definitely no longer supported).

@@ -12012,7 +12011,6 @@ outcomes:
inference:
iterations_per_slot: 1000
do_inference: TRUE
data_path: data/us_data.csv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same Q re inference

cases_deaths <- readr::read_csv(data_path)
print(paste("Successfully loaded data from ", data_path, "for seeding."))
cases_deaths <- readr::read_csv(gt_data_path)
print(paste("Successfully loaded data from ", gt_data_path, "for seeding."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something weird w/ spaces here - either there's auto space, and the first string is wrong, or there isn't, and the last string is wrong.

@@ -200,7 +200,7 @@ def generate_pdf(
# In[5]:

# gempyor.config.set_file(run_info.config_filepath)
# gt = pd.read_csv(gempyor.config["inference"]["data_path"].get())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is now the correct argument name for inference, need to revise the previous tweaks that totally delete data path to instead refer to gt_data_path.

@@ -310,7 +302,6 @@ save_reps <- smh_or_fch=="smh" & !full_fit

scenario_dir <- opt$results_path
round_directory <- opt$results_path
data_path <- opt$data_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this variable go unused in the rest of the file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli Relating to command line interfaces documentation Relating to ReadMEs / gitbook / vignettes / etc. medium priority Medium priority. next release Marks a PR as a target to include in the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Refactor]: purge deprecated configuration option data_path
4 participants