Calculate the five categories for each state #35

natemcintosh · 2025-01-23T21:38:06Z

This pull request includes changes to the cfa_rt_postprocessing/main_functions.py file to add functionality for calculating and saving categories from sample data. The most important changes include the addition of a new function for calculating categories and the integration of this function into the existing workflow.

New functionality:

Added a new function calculate_categories to calculate five categories for each geo_value, disease, and reference_date from sample data using duckdb. This function returns a DataFrame with the calculated categories.

Workflow integration:

Integrated the calculate_categories function into the merge_and_render_anomaly function to calculate categories from final samples and save the results as both parquet and CSV files.

I figured that small PRs are better than big ones.

natemcintosh · 2025-01-23T21:41:23Z

I would like to add some tests for this. The general strategy I'm thinking of would be to take a sample of ~100 rows from a samples.parquet file, alter the values, and save it in a test folder. Then test that we get the general categories we expect.

I think that will work, but it feels kind of clunky. Does anyone have any other suggestions?

first attempt at calculating categories

ac1aa72

natemcintosh marked this pull request as draft January 23, 2025 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate the five categories for each state #35

Calculate the five categories for each state #35

natemcintosh commented Jan 23, 2025

natemcintosh commented Jan 23, 2025

Calculate the five categories for each state #35

Are you sure you want to change the base?

Calculate the five categories for each state #35

Conversation

natemcintosh commented Jan 23, 2025

natemcintosh commented Jan 23, 2025