Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for HASH key computation of Natural keys with null replacement #254

Open
Kalyani5665 opened this issue Nov 18, 2024 · 2 comments
Open
Assignees
Labels
feature This is is requesting a new feature

Comments

@Kalyani5665
Copy link

Is your feature request related to a problem? Please describe.
I am using [staging macros] for Natural(composite) key implementation with null replacement tokens for required and optional business keys . Hash Key should be computed from Natural(composite)key.

Describe the solution you'd like

BK1 BK2 BK3 Natural Key Hash key  HASH(Natural Key)
ABC 1 null ABC||1||-1 HASH(ABC||1||-1)
XYZ null 100 XYZ||-2||100 HASH(XYZ||-2||100)
DEF 1.1 200 DEF||1.1||200 HASH(DEF||1.1||200)

Since Null columns replacement macro gets executed after derived columns macro , null replacement happens after the Natural Key calculation . Hence records with null business keys are lost before they reach null replacement stage . We need records with null replaced business keys as above.

Describe alternatives you've considered
Assigned Natural Key with null replacement with required and optional tokens as below in derived columns section of staging macro -
NATURAL_KEY:
-IFNULL(NULLIF(UPPER(TRIM(CAST(BK1 AS STRING))), ''), '-1')
-IFNULL(NULLIF(UPPER(TRIM(CAST(BK2 AS STRING))), ''), '-2')
-IFNULL(NULLIF(UPPER(TRIM(CAST(BK3 AS STRING))), ''), '-1')

Additional context
Overall if support for Natural Key and its hash calculation is provided it will be helpful , as it will be reusable with staging macro .

@Kalyani5665 Kalyani5665 added the feature This is is requesting a new feature label Nov 18, 2024
@DVAlexHiggs
Copy link
Member

Hi @Kalyani5665. Thanks or this suggestion.

I may be misunderstanding but you should already be able to do this. The raw source columns from the source table are in-scope of the null key step of the stage() macro, you do not have to only reference derived columns.

@Kalyani5665
Copy link
Author

Hi @DVAlexHiggs Thanks for quick update !!
I would like to present 2 scenarios for better understanding.
The only difference in both scenarios is how hash key is configured in the macro. In both scenarios , in output , hash key and natural key are not in sync.
Scenario 1:
Staging macro:
NATURAL_KEY:
- BK1
- BK2
- BK3

NATURAL_KEY_HKEY:
- BK1
- BK2
- BK3

null_columns:
required:
- BK3
optional:
- BK2
Hub macro:
{%- set src_pk = "NATURAL_KEY_HKEY"-%}
{%- set src_nk = "NATURAL_KEY"-%}

Output:
Code which run -
In derived columns part=>
CONCAT(
BK1, '||',
BK2, '||',
BK3
) AS NATURAL_KEY,

Above discards null records .

In null columns part =>
BK3 AS BK3_ORIGINAL,
IFNULL(BK3, '-1') AS BK3,
BK2 AS BK2_ORIGINAL,
IFNULL(BK2, '-2') AS BK2

Output in this scenario:

BK1 BK2 BK3 Natural Key Hash key  HASH(Natural Key)
ABC 1 null null HASH(ABC||1||-1)
XYZ null 100 null HASH(XYZ||-2||100)
DEF 1.1 200 DEF||1.1||200 HASH(DEF||1.1||200)

As we can see above , records with null BKs are null since concatenation happens before null replacement .
Please note hash keys for same records are not null since hash keys refer to BKs with required and optional replacement tokens .
So the requirement is to have not null natural key i.e. correct natural key in the hub, in such scenarios.

Expected Output:

BK1 BK2 BK3 Natural Key Hash key  HASH(Natural Key)
ABC 1 null ABC||1||-1 HASH(ABC||1||-1)
XYZ null 100 XYZ||-2||100 HASH(XYZ||-2||100)
DEF 1.1 200 DEF||1.1||200 HASH(DEF||1.1||200)

Scenario 2:
Staging macro:
NATURAL_KEY:
- BK1
- BK2
- BK3

NATURAL_KEY_HKEY: NATURAL_KEY

null_columns:
required:
- BK3
optional:
- BK2

Hub macro:
{%- set src_pk = "NATURAL_KEY_HKEY"-%}
{%- set src_nk = "NATURAL_KEY"-%}

Output:
Code which run -
In derived columns part=>
CONCAT(
BK1, '||',
BK2, '||',
BK3
) AS NATURAL_KEY,
Above discards 2 null records . Hence in hashed columns only 1 record is seen for hash calculation.

In null columns part =>
BK3 AS BK3_ORIGINAL,
IFNULL(BK3, '-1') AS BK3,
BK2 AS BK2_ORIGINAL,
IFNULL(BK2, '-2') AS BK2

Hash calcualtion=>
CAST(SHA256(NULLIF(UPPER(TRIM(CAST(NATURAL_KEY AS STRING))), '')) AS BYTES) AS NATURAL_KEY_HKEY,

Output in this scenario:

BK1 BK2 BK3 Natural Key Hash key  HASH(Natural Key)
DEF 1.1 200 DEF||1.1||200 HASH(DEF||1.1||200)

Expected output:

BK1 BK2 BK3 Natural Key Hash key  HASH(Natural Key)
ABC 1 null ABC||1||-1 HASH(ABC||1||-1)
XYZ null 100 XYZ||-2||100 HASH(XYZ||-2||100)
DEF 1.1 200 DEF||1.1||200 HASH(DEF||1.1||200)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature This is is requesting a new feature
Projects
None yet
Development

No branches or pull requests

2 participants