Replies: 1 comment 1 reply
-
It is an interesting topic. I can probably write a book about it at this point. A few ideas:
Custom ETL works well when organized in the following pairs: (1) From source to object storage, e.g. S3 (2) From S3 to Snowflake or any other DBMS This approach makes your ETL highly modular and resilient to schema changes and sudden outages. Best ingestion format for Snowflake is CSV. Other data formats are substantially slower. I guess it is expensive to generate VARIANT data structure which is used for other formats. It might be a good idea to avoid Snowpipe, if possible. Snowpipe makes it harder to address duplicates and reloads. Monitoring is harder. You still need to implement reload logic anyway, so maybe just go ahead with DELETE + COPY INTO from the start. |
Beta Was this translation helpful? Give feedback.
-
Hey everyone,
As we all work with SnowDDL, it's a given that we're using Snowflake. I wanted to open up this discussion to gather insights, opinions, and shared experiences on the different solutions available for getting data into Snowflake.
I'm particularly interested in batch processing with file formats like Parquet, JSON, or CSV. However, if you have experience with streaming data pipelines, your input is definitely welcome too!
For batch processing, I've observed that there are typically two main stages:
1. Extracting data from source systems and staging it in a cloud storage service (e.g., AWS S3, Azure Blob Storage, etc.).
2. Loading data from the staging area into Snowflake tables using COPY INTO commands, whether these are run manually or automated through services like Snowpipe.
I’d love to hear what solutions you all use for both of these stages:
Looking forward to your thoughts and insights!
Beta Was this translation helpful? Give feedback.
All reactions