You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We upgraded EMR from 6.11.1 to 7.2.0 and Hudi from 0.13 to 0.14.1-amzn-1
I am trying to run a hudi job which runs for 4 data sources. I am able to execute the job for 3 data sources but the jobs keeps failing for 1 source with the below error
I have tried re ingesting the source tables used for this job as well as re creating the table where the data is written.
We upgraded EMR from 6.11.1 to 7.2.0 and Hudi from 0.13 to 0.14.1-amzn-1
I am trying to run a hudi job which runs for 4 data sources. I am able to execute the job for 3 data sources but the jobs keeps failing for 1 source with the below error
I have tried re ingesting the source tables used for this job as well as re creating the table where the data is written.
I am using the following hudi options
hudi_options = {
'hoodie.table.name': table_name,
'hoodie.datasource.write.table.type': table_type or 'MERGE_ON_READ',
'hoodie.datasource.write.table.name': table_name,
'hoodie.datasource.write.payload.class': payload_class,
'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.CustomKeyGenerator',
'hoodie.datasource.write.recordkey.field': primary_keys.replace(' ', ''),
'hoodie.datasource.write.precombine.field': precombine_key,
'hoodie.datasource.write.partitionpath.field': 'src_db_id:SIMPLE',
'hoodie.embed.timeline.server': False,
'hoodie.index.type': 'BLOOM',
'hoodie.parquet.compression.codec': 'snappy',
'hoodie.clean.async': True,
'hoodie.clean.max.commits': 3,
'hoodie.parquet.max.file.size': 125829120,
'hoodie.parquet.small.file.limit': 104857600,
'hoodie.parquet.block.size': 125829120,
'hoodie.metadata.enable': not overwrite,
'hoodie.metadata.validate': True,
'hoodie.allow.empty.commit': True,
'hoodie.datasource.write.hive_style_partitioning': True,
'hoodie.datasource.hive_sync.support_timestamp': True,
'hoodie.datasource.hive_sync.jdbcurl': hive_jdbcurl,
'hoodie.datasource.hive_sync.username': hive_username,
'hoodie.datasource.hive_sync.password': hive_password,
'hoodie.datasource.hive_sync.database': cdm_db,
'hoodie.datasource.hive_sync.table': table_name,
'hoodie.datasource.hive_sync.partition_fields': 'src_db_id',
'hoodie.datasource.hive_sync.enable': True,
'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor',
'hoodie.compact.inline': True,
'hoodie.compact.inline.trigger.strategy': 'NUM_OR_TIME',
'hoodie.compact.inline.max.delta.commits': 1,
'hoodie.compact.inline.max.delta.seconds': 3600
}
Application being used-
EMR 7.2.0
Spark 3.5.1
Hadoop 3.3.6
Hudi 0.14.1-amzn-1
The same job is working without any issue with the old EMR.
Another suggestion I found from AWS was to use Java 8 instead of 17. Even with Java 8 the issue persists
The text was updated successfully, but these errors were encountered: