-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Purging data_path
from flepiMoP
#480
base: dev
Are you sure you want to change the base?
Conversation
Some files that may be concerned with CLI (rather than config keys) that we may want to revert:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks fine to me, CI passes and purges outdated concept. However, I doubt I'm the best reviewer for this, particularly the post/preprocessing scripts since those exist outside of my knowledge base at the moment. It's also not clear to me that those are commonly used today anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had practically removed this in operations already so was mainly just documentation and unused functionality in common and config afaik. Rest looks good, thanks
# subpop_setup: | ||
# modeled_states: <list of state postal codes> e.g. MD, CA, NY | ||
# mobility: <path to file relative to data_path> optional; default is 'mobility.csv' | ||
# geodata: <path to file relative to data_path> optional; default is 'geodata.csv' | ||
# mobility: <path to file relative> optional; default is 'mobility.csv' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relative to ... what?
I think unfortunately we don't have a super-clear answer to that. For the time being, maybe more like "path to a file; may be absolute or relative" - and leave specification of how relative works to future updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also applies to next line)
# subpop_setup: | ||
# modeled_states: <list of country ISO3 codes> e.g. ZMB, BGD, CAN | ||
# mobility: <path to file relative to data_path> optional; default is 'mobility.csv' | ||
# geodata: <path to file relative to data_path> optional; default is 'geodata.csv' | ||
# mobility: <path to file> optional; default is 'mobility.csv' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this seems fine as well, though a little less explicit - probably just use this language for ..._US_...
as well?
@@ -59,7 +58,7 @@ filtering: | |||
|
|||
With inference model runs, the number of simulations `nsimulations` refers to the number of final model simulations that will be produced. The `filtering$simulations_per_slot` setting refers to the number of iterative simulations that will be run in order to produce a single final simulation (i.e., number of simulations in a single MCMC chain). | |||
|
|||
<table><thead><tr><th>Item</th><th width="104.33333333333331">Required?</th><th>Type/Format</th></tr></thead><tbody><tr><td>simulations_per_slot</td><td><strong>required</strong></td><td>number of iterations in a single MCMC inference chain</td></tr><tr><td>do_filtering</td><td>required</td><td>TRUE if inference should be performed</td></tr><tr><td>data_path</td><td>required</td><td>file path where observed data are saved</td></tr><tr><td>likelihood_directory</td><td>required</td><td>folder path where likelihood evaluations will be stored as the inference algorithm runs</td></tr><tr><td>statistics</td><td>required</td><td>specifies which data will be used to calibrate the model. see <code>filtering::statistics</code> for details</td></tr><tr><td>hierarchical_stats_geo</td><td>optional</td><td>specifies whether a hierarchical structure should be applied to any inferred parameters. See <code>filtering::hierarchical_stats_geo</code> for details.</td></tr><tr><td>priors</td><td>optional</td><td>specifies prior distributions on inferred parameters. See <code>filtering::priors</code> for details</td></tr></tbody></table> | |||
<table><thead><tr><th>Item</th><th width="104.33333333333331">Required?</th><th>Type/Format</th></tr></thead><tbody><tr><td>simulations_per_slot</td><td><strong>required</strong></td><td>number of iterations in a single MCMC inference chain</td></tr><tr><td>do_filtering</td><td>required</td><td>TRUE if inference should be performed</td></tr><tr></tr><tr><td>likelihood_directory</td><td>required</td><td>folder path where likelihood evaluations will be stored as the inference algorithm runs</td></tr><tr><td>statistics</td><td>required</td><td>specifies which data will be used to calibrate the model. see <code>filtering::statistics</code> for details</td></tr><tr><td>hierarchical_stats_geo</td><td>optional</td><td>specifies whether a hierarchical structure should be applied to any inferred parameters. See <code>filtering::hierarchical_stats_geo</code> for details.</td></tr><tr><td>priors</td><td>optional</td><td>specifies prior distributions on inferred parameters. See <code>filtering::priors</code> for details</td></tr></tbody></table> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add some white space to this or switch to markdown formatting? impossible tell what's actually going to be in here and really incovenient to render locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is explicitly "old" can probably leave it untouched?
@@ -10,7 +10,7 @@ These functions are used to print specific sections of the configuration files. | |||
|
|||
Used to generate the global header. For more information on global headers click [HERE](../../gempyor/model-implementation/introduction-to-configuration-files.md#global-header). | |||
|
|||
<table><thead><tr><th width="172.33333333333331">Variable name</th><th>Required (default value if optional)</th><th>Description</th></tr></thead><tbody><tr><td>sim_name</td><td><strong>Required</strong></td><td>Name of the configuration file to be generated. Generally based on the type of simulation</td></tr><tr><td>setup_name</td><td><strong>Optional</strong> (SMH)</td><td>Type of run - a Scenario Modeling Hub ("SMH") or Forecasting Hub ("FCH") Simulation.</td></tr><tr><td>disease</td><td><strong>Optional</strong> (covid19)</td><td>Pathogen or disease being simulated</td></tr><tr><td>smh_round</td><td><strong>Optional</strong> (NA)</td><td>Round number for Scenario Modeling Hub Submission</td></tr><tr><td>data_path</td><td><strong>Optional</strong> (data)</td><td>Folder path which contains where population data (size, mobility, etc) and ground truth data files are stored</td></tr><tr><td>model_output_dir_name</td><td><strong>Optional</strong> (model_output)</td><td>Folder path where the outputs of the simulated model is stored</td></tr><tr><td>sim_start_date</td><td><strong>Required</strong></td><td>Start date for model simulation</td></tr><tr><td>sim_end_date</td><td><strong>Required</strong></td><td>End date for model simulation</td></tr><tr><td>start_date_groundtruth</td><td><strong>Optional</strong> (NA)</td><td>Start date for fitting data for inference runs</td></tr><tr><td>end_date_groundtruth</td><td><strong>Optional</strong> (NA)</td><td>End date for fitting data for inference runs</td></tr><tr><td>nslots</td><td><strong>Required</strong></td><td>number of independent simulations to run</td></tr></tbody></table> | |||
<table><thead><tr><th width="172.33333333333331">Variable name</th><th>Required (default value if optional)</th><th>Description</th></tr></thead><tbody><tr><td>sim_name</td><td><strong>Required</strong></td><td>Name of the configuration file to be generated. Generally based on the type of simulation</td></tr><tr><td>setup_name</td><td><strong>Optional</strong> (SMH)</td><td>Type of run - a Scenario Modeling Hub ("SMH") or Forecasting Hub ("FCH") Simulation.</td></tr><tr><td>disease</td><td><strong>Optional</strong> (covid19)</td><td>Pathogen or disease being simulated</td></tr><tr><td>smh_round</td><td><strong>Optional</strong> (NA)</td><td>Round number for Scenario Modeling Hub Submission</td></tr><tr></tr><tr><td>model_output_dir_name</td><td><strong>Optional</strong> (model_output)</td><td>Folder path where the outputs of the simulated model is stored</td></tr><tr><td>sim_start_date</td><td><strong>Required</strong></td><td>Start date for model simulation</td></tr><tr><td>sim_end_date</td><td><strong>Required</strong></td><td>End date for model simulation</td></tr><tr><td>start_date_groundtruth</td><td><strong>Optional</strong> (NA)</td><td>Start date for fitting data for inference runs</td></tr><tr><td>end_date_groundtruth</td><td><strong>Optional</strong> (NA)</td><td>End date for fitting data for inference runs</td></tr><tr><td>nslots</td><td><strong>Required</strong></td><td>number of independent simulations to run</td></tr></tbody></table> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as previous re table rendering.
@@ -573,18 +561,6 @@ validation_list$inference$do_inference<- function(value,full_config,config_name) | |||
return(TRUE) | |||
} | |||
|
|||
validation_list$inference$data_path<-function(value,full_config,config_name){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does inference still support a data_path or not? might be different from the top level option (definitely no longer supported).
@@ -12012,7 +12011,6 @@ outcomes: | |||
inference: | |||
iterations_per_slot: 1000 | |||
do_inference: TRUE | |||
data_path: data/us_data.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same Q re inference
cases_deaths <- readr::read_csv(data_path) | ||
print(paste("Successfully loaded data from ", data_path, "for seeding.")) | ||
cases_deaths <- readr::read_csv(gt_data_path) | ||
print(paste("Successfully loaded data from ", gt_data_path, "for seeding.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something weird w/ spaces here - either there's auto space, and the first string is wrong, or there isn't, and the last string is wrong.
@@ -200,7 +200,7 @@ def generate_pdf( | |||
# In[5]: | |||
|
|||
# gempyor.config.set_file(run_info.config_filepath) | |||
# gt = pd.read_csv(gempyor.config["inference"]["data_path"].get()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is now the correct argument name for inference, need to revise the previous tweaks that totally delete data path to instead refer to gt_data_path.
@@ -310,7 +302,6 @@ save_reps <- smh_or_fch=="smh" & !full_fit | |||
|
|||
scenario_dir <- opt$results_path | |||
round_directory <- opt$results_path | |||
data_path <- opt$data_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this variable go unused in the rest of the file?
Describe your changes.
This pull request attempts to remove the deprecated configuration option
data_path
from flepiMoP code and documentation. I found all instances of "data_path" in flepiMoP via agrep
search, so I may have also removeddata_path
as a CLI option, which I can revert if needed.Presently, there are still
data_path
references in the following files for the following reasons:data_var
, which is explained as a column indata_path
and are respectively being reworked and old.What does your pull request address? Tag relevant issues.
This pull request addresses GH #472.