Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate the five categories for each state #35

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

natemcintosh
Copy link
Collaborator

This pull request includes changes to the cfa_rt_postprocessing/main_functions.py file to add functionality for calculating and saving categories from sample data. The most important changes include the addition of a new function for calculating categories and the integration of this function into the existing workflow.

New functionality:

  • Added a new function calculate_categories to calculate five categories for each geo_value, disease, and reference_date from sample data using duckdb. This function returns a DataFrame with the calculated categories.

Workflow integration:

  • Integrated the calculate_categories function into the merge_and_render_anomaly function to calculate categories from final samples and save the results as both parquet and CSV files.

I figured that small PRs are better than big ones.

@natemcintosh natemcintosh marked this pull request as draft January 23, 2025 21:38
@natemcintosh
Copy link
Collaborator Author

I would like to add some tests for this. The general strategy I'm thinking of would be to take a sample of ~100 rows from a samples.parquet file, alter the values, and save it in a test folder. Then test that we get the general categories we expect.

I think that will work, but it feels kind of clunky. Does anyone have any other suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant