You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
About DataTransformer.transform() in ctgan/data_transformer.py. If the data has both too many rows and columns, the processing speed is quite low. For data of ~1.5M rows and ~1500 discrete columns, the transformation will take ~3 hours.
Expected behavior
Have you considered making the column traverse in parallel? Or is it possible at all?
Additional context
I think this question is also somehow related to #141 where it proposes to save intermediate results to make the repeat of the whole process faster. Here the question is about the possibility to accelerate the transformation process.
The text was updated successfully, but these errors were encountered:
liuzrcc
changed the title
Single thread data transform is slow for huge tabular
Single thread data transform is slow for huge table
Apr 28, 2021
Hi Zhuoran, thanks for filing this feedback! We're aware that there are many performance-related suggestions and I think it'll make a good focus for a future release.
Let's keep this open and I'll label this as a new feature. We can use this space to discuss parallelization in CTGAN, and will update it once we have improvements.
Problem Description
About
DataTransformer.transform()
inctgan/data_transformer.py
. If the data has both too many rows and columns, the processing speed is quite low. For data of ~1.5M rows and ~1500 discrete columns, the transformation will take ~3 hours.Expected behavior
Have you considered making the column traverse in parallel? Or is it possible at all?
Additional context
I think this question is also somehow related to #141 where it proposes to save intermediate results to make the repeat of the whole process faster. Here the question is about the possibility to accelerate the transformation process.
The text was updated successfully, but these errors were encountered: