Purging `data_path` from flepiMoP #480

emprzy · 2025-01-23T20:16:50Z

Describe your changes.

This pull request attempts to remove the deprecated configuration option data_path from flepiMoP code and documentation. I found all instances of "data_path" in flepiMoP via a grep search, so I may have also removed data_path as a CLI option, which I can revert if needed.

Presently, there are still data_path references in the following files for the following reasons:

model_ouput_notebook.Rmd (postprocessing), inference_job_launcher.py, and model_output_notebook.Rmd (flepiMoP examples)because they are for the CLI.
yaml_utils.R and documentation.Rmd, because they cite data_var, which is explained as a column in data_path and are respectively being reworked and old.

What does your pull request address? Tag relevant issues.

This pull request addresses GH #472.

emprzy · 2025-01-23T21:47:16Z

Some files that may be concerned with CLI (rather than config keys) that we may want to revert:

post0processing scripts
pre-precoessing scripts

TimothyWillard

I think this looks fine to me, CI passes and purges outdated concept. However, I doubt I'm the best reviewer for this, particularly the post/preprocessing scripts since those exist outside of my knowledge base at the moment. It's also not clear to me that those are commonly used today anyways.

saraloo

We had practically removed this in operations already so was mainly just documentation and unused functionality in common and config afaik. Rest looks good, thanks

pearsonca · 2025-01-27T18:16:47Z

datasetup/build_US_setup.R

 # subpop_setup:
 #   modeled_states: <list of state postal codes> e.g. MD, CA, NY
-#   mobility: <path to file relative to data_path> optional; default is 'mobility.csv'
-#   geodata: <path to file relative to data_path> optional; default is 'geodata.csv'
+#   mobility: <path to file relative> optional; default is 'mobility.csv'


relative to ... what?

I think unfortunately we don't have a super-clear answer to that. For the time being, maybe more like "path to a file; may be absolute or relative" - and leave specification of how relative works to future updates.

(also applies to next line)

pearsonca · 2025-01-27T18:18:29Z

datasetup/build_nonUS_setup.R

 # subpop_setup:
 #   modeled_states: <list of country ISO3 codes> e.g. ZMB, BGD, CAN
-#   mobility: <path to file relative to data_path> optional; default is 'mobility.csv'
-#   geodata: <path to file relative to data_path> optional; default is 'geodata.csv'
+#   mobility: <path to file> optional; default is 'mobility.csv'


yeah, this seems fine as well, though a little less explicit - probably just use this language for ..._US_... as well?

pearsonca · 2025-01-27T18:21:48Z

documentation/gitbook/model-inference/inference-implementation/configuration-options.md

@@ -59,7 +58,7 @@ filtering:

 With inference model runs, the number of simulations `nsimulations` refers to the number of final model simulations that will be produced. The `filtering$simulations_per_slot` setting refers to the number of iterative simulations that will be run in order to produce a single final simulation (i.e., number of simulations in a single MCMC chain).

-<table><thead><tr><th>Item</th><th width="104.33333333333331">Required?</th><th>Type/Format</th></tr></thead><tbody><tr><td>simulations_per_slot</td><td><strong>required</strong></td><td>number of iterations in a single MCMC inference chain</td></tr><tr><td>do_filtering</td><td>required</td><td>TRUE if inference should be performed</td></tr><tr><td>data_path</td><td>required</td><td>file path where observed data are saved</td></tr><tr><td>likelihood_directory</td><td>required</td><td>folder path where likelihood evaluations will be stored as the inference algorithm runs</td></tr><tr><td>statistics</td><td>required</td><td>specifies which data will be used to calibrate the model. see <code>filtering::statistics</code> for details</td></tr><tr><td>hierarchical_stats_geo</td><td>optional</td><td>specifies whether a hierarchical structure should be applied to any inferred parameters. See <code>filtering::hierarchical_stats_geo</code> for details.</td></tr><tr><td>priors</td><td>optional</td><td>specifies prior distributions on inferred parameters. See <code>filtering::priors</code> for details</td></tr></tbody></table>
+<table><thead><tr><th>Item</th><th width="104.33333333333331">Required?</th><th>Type/Format</th></tr></thead><tbody><tr><td>simulations_per_slot</td><td><strong>required</strong></td><td>number of iterations in a single MCMC inference chain</td></tr><tr><td>do_filtering</td><td>required</td><td>TRUE if inference should be performed</td></tr><tr></tr><tr><td>likelihood_directory</td><td>required</td><td>folder path where likelihood evaluations will be stored as the inference algorithm runs</td></tr><tr><td>statistics</td><td>required</td><td>specifies which data will be used to calibrate the model. see <code>filtering::statistics</code> for details</td></tr><tr><td>hierarchical_stats_geo</td><td>optional</td><td>specifies whether a hierarchical structure should be applied to any inferred parameters. See <code>filtering::hierarchical_stats_geo</code> for details.</td></tr><tr><td>priors</td><td>optional</td><td>specifies prior distributions on inferred parameters. See <code>filtering::priors</code> for details</td></tr></tbody></table>


can we add some white space to this or switch to markdown formatting? impossible tell what's actually going to be in here and really incovenient to render locally.

pearsonca · 2025-01-27T18:22:14Z

documentation/gitbook/model-inference/inference-implementation/old-configuration-setup.md

since this is explicitly "old" can probably leave it untouched?

pearsonca · 2025-01-27T18:22:37Z

documentation/gitbook/more/setting-up-the-model-and-post-processing/config-writer.md

@@ -10,7 +10,7 @@ These functions are used to print specific sections of the configuration files.

 Used to generate the global header. For more information on global headers click [HERE](../../gempyor/model-implementation/introduction-to-configuration-files.md#global-header).

-<table><thead><tr><th width="172.33333333333331">Variable name</th><th>Required (default value if optional)</th><th>Description</th></tr></thead><tbody><tr><td>sim_name</td><td><strong>Required</strong></td><td>Name of the configuration file to be generated. Generally based on the type of simulation</td></tr><tr><td>setup_name</td><td><strong>Optional</strong> (SMH)</td><td>Type of run - a Scenario Modeling Hub ("SMH") or Forecasting Hub ("FCH") Simulation.</td></tr><tr><td>disease</td><td><strong>Optional</strong> (covid19)</td><td>Pathogen or disease being simulated</td></tr><tr><td>smh_round</td><td><strong>Optional</strong> (NA)</td><td>Round number for Scenario Modeling Hub Submission</td></tr><tr><td>data_path</td><td><strong>Optional</strong> (data)</td><td>Folder path which contains where population data (size, mobility, etc) and ground truth data files are stored</td></tr><tr><td>model_output_dir_name</td><td><strong>Optional</strong> (model_output)</td><td>Folder path where the outputs of the simulated model is stored</td></tr><tr><td>sim_start_date</td><td><strong>Required</strong></td><td>Start date for model simulation</td></tr><tr><td>sim_end_date</td><td><strong>Required</strong></td><td>End date for model simulation</td></tr><tr><td>start_date_groundtruth</td><td><strong>Optional</strong> (NA)</td><td>Start date for fitting data for inference runs</td></tr><tr><td>end_date_groundtruth</td><td><strong>Optional</strong> (NA)</td><td>End date for fitting data for inference runs</td></tr><tr><td>nslots</td><td><strong>Required</strong></td><td>number of independent simulations to run</td></tr></tbody></table>
+<table><thead><tr><th width="172.33333333333331">Variable name</th><th>Required (default value if optional)</th><th>Description</th></tr></thead><tbody><tr><td>sim_name</td><td><strong>Required</strong></td><td>Name of the configuration file to be generated. Generally based on the type of simulation</td></tr><tr><td>setup_name</td><td><strong>Optional</strong> (SMH)</td><td>Type of run - a Scenario Modeling Hub ("SMH") or Forecasting Hub ("FCH") Simulation.</td></tr><tr><td>disease</td><td><strong>Optional</strong> (covid19)</td><td>Pathogen or disease being simulated</td></tr><tr><td>smh_round</td><td><strong>Optional</strong> (NA)</td><td>Round number for Scenario Modeling Hub Submission</td></tr><tr></tr><tr><td>model_output_dir_name</td><td><strong>Optional</strong> (model_output)</td><td>Folder path where the outputs of the simulated model is stored</td></tr><tr><td>sim_start_date</td><td><strong>Required</strong></td><td>Start date for model simulation</td></tr><tr><td>sim_end_date</td><td><strong>Required</strong></td><td>End date for model simulation</td></tr><tr><td>start_date_groundtruth</td><td><strong>Optional</strong> (NA)</td><td>Start date for fitting data for inference runs</td></tr><tr><td>end_date_groundtruth</td><td><strong>Optional</strong> (NA)</td><td>End date for fitting data for inference runs</td></tr><tr><td>nslots</td><td><strong>Required</strong></td><td>number of independent simulations to run</td></tr></tbody></table>


same comment as previous re table rendering.

pearsonca · 2025-01-27T18:23:52Z

flepimop/R_packages/flepicommon/R/config_test_new.R

@@ -573,18 +561,6 @@ validation_list$inference$do_inference<- function(value,full_config,config_name)
  return(TRUE)
 }

-validation_list$inference$data_path<-function(value,full_config,config_name){


does inference still support a data_path or not? might be different from the top level option (definitely no longer supported).

pearsonca · 2025-01-27T18:24:45Z

flepimop/R_packages/flepiconfig/tests/testthat/sample_config.yml

@@ -12012,7 +12011,6 @@ outcomes:
 inference:
  iterations_per_slot: 1000
  do_inference: TRUE
-  data_path: data/us_data.csv


same Q re inference

pearsonca · 2025-01-27T18:28:40Z

flepimop/main_scripts/create_seeding_added.R

-cases_deaths <- readr::read_csv(data_path)
-print(paste("Successfully loaded data from ", data_path, "for seeding."))
+cases_deaths <- readr::read_csv(gt_data_path)
+print(paste("Successfully loaded data from ", gt_data_path, "for seeding."))


something weird w/ spaces here - either there's auto space, and the first string is wrong, or there isn't, and the last string is wrong.

pearsonca · 2025-01-27T18:29:43Z

postprocessing/postprocess_auto.py

@@ -200,7 +200,7 @@ def generate_pdf(
        # In[5]:

        # gempyor.config.set_file(run_info.config_filepath)
-        # gt = pd.read_csv(gempyor.config["inference"]["data_path"].get())


if this is now the correct argument name for inference, need to revise the previous tweaks that totally delete data path to instead refer to gt_data_path.

pearsonca · 2025-01-27T18:53:53Z

postprocessing/run_sim_processing_SLURM.R

@@ -310,7 +302,6 @@ save_reps <- smh_or_fch=="smh" & !full_fit

 scenario_dir <- opt$results_path
 round_directory <- opt$results_path
-data_path <- opt$data_path


does this variable go unused in the rest of the file?

Removing data_path from flepiMoP

1b18ec1

emprzy changed the base branch from main to dev January 23, 2025 20:17

emprzy requested review from pearsonca, TimothyWillard, saraloo and MacdonaldJoshuaCaleb January 23, 2025 21:48

TimothyWillard added documentation Relating to ReadMEs / gitbook / vignettes / etc. medium priority Medium priority. cli Relating to command line interfaces labels Jan 23, 2025

TimothyWillard linked an issue Jan 23, 2025 that may be closed by this pull request

[Refactor]: purge deprecated configuration option data_path #472

Open

TimothyWillard added this to the Installation/Usage Documentation And Ease milestone Jan 23, 2025

TimothyWillard added the next release Marks a PR as a target to include in the next release. label Jan 23, 2025

TimothyWillard reviewed Jan 24, 2025

View reviewed changes

saraloo approved these changes Jan 24, 2025

View reviewed changes

pearsonca requested changes Jan 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Purging `data_path` from flepiMoP #480

Purging `data_path` from flepiMoP #480

emprzy commented Jan 23, 2025

emprzy commented Jan 23, 2025

TimothyWillard left a comment

saraloo left a comment

pearsonca Jan 27, 2025

pearsonca Jan 27, 2025

pearsonca Jan 27, 2025

pearsonca Jan 27, 2025

pearsonca Jan 27, 2025

pearsonca Jan 27, 2025

pearsonca Jan 27, 2025

pearsonca Jan 27, 2025

pearsonca Jan 27, 2025

pearsonca Jan 27, 2025

pearsonca Jan 27, 2025

Purging data_path from flepiMoP #480

Are you sure you want to change the base?

Purging data_path from flepiMoP #480

Conversation

emprzy commented Jan 23, 2025

Describe your changes.

What does your pull request address? Tag relevant issues.

emprzy commented Jan 23, 2025

TimothyWillard left a comment

Choose a reason for hiding this comment

saraloo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Purging `data_path` from flepiMoP #480

Purging `data_path` from flepiMoP #480