Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uadk supports heterogeneous computing #658

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

Liulongfang
Copy link
Collaborator

@Liulongfang Liulongfang commented Jan 3, 2025

    After uadk supports hardware acceleration and instruction acceleration functions. Users expect to be able to

use both hardware acceleration and instruction acceleration. It is used to use instructions to continue to improve
and accelerate business performance after the hardware business is full. And it can automatically adapt to a variety
of acceleration devices.
The current patchset was developed for this purpose. And it has been fully adapted to all algorithm types of uadk.

   When using the updated framework, compared with separate hardware acceleration, the performance of

hybrid acceleration is significantly higher, and the acceleration effect has been significantly improved.

sm3 test cmd:
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm3 --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm3 --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch --init2

SM3 1024B Performance(MB/s)                

tds------init1(HW)-----init2(HW + CE)----increase
1-----------393.3--------437.1-------------11.14%
2----------762.1---------823.4------------8.04%
4----------1508.4-------1564.1------------3.69%
8----------3007.4------3074.9-----------2.24%
16---------4851.8-------5429.2-----------11.90%
32--------4854.1-------8698.8------------79.21%

sm4 test cmd:
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm4-128-ecb --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm4-128-ecb --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch --init2
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm4-128-ecb --mode sva --opt 0 --async --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch --init2

SM4 1024B Performance(MB/s)                

tds-------init1(HW)----init2(HW + CE)---------increase
1-------------461----------1482.5---------------221.58%
2------------914----------2575.4---------------181.77%
4-----------1699.9--------4737.6---------------178.70%
8-----------3301.5--------7327.8---------------121.95%
16----------5837.5--------9737.4---------------66.81%
32----------8897.7-------10432.4--------------17.25%

SM4 1024B async Performance(MB/s)

tds-------init1(HW)----init2(HW + CE)---------increase
1-----------1368.3--------1683.9---------------23.07%
2------------2652---------3235.5---------------22.00%
4-----------3979.5--------5094.5---------------28.02%
8-----------6667.7---------8587----------------28.79%
16----------8900.9-------11067.8---------------24.34%
32----------8905.9-------10209.1--------------14.63%

Liulongfang and others added 6 commits December 30, 2024 12:02
Unify the software ctx and hardware ctx in uadk and merge
them on the scheduler.
Realize the function of software and hardware calculation together

Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling
function, the corresponding scheduler needs to add a new
scheduling solution.

Signed-off-by: Longfang Liu <[email protected]>
After adapting the new heterogeneous hybrid acceleration function.
The initialization of the device driver requires adaptation
updates.
In addition, the instruction acceleration algorithm driver needs
to fully adapt to the synchronous and asynchronous mode of the uadk
framework.

Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling
function, the internal implementation functions of the aead
algorithm need to be adapted and modified.

Signed-off-by: Longfang Liu <[email protected]>
Longfang Liu added 7 commits January 3, 2025 14:46
After the uadk framework updates the heterogeneous scheduling
function, the internal implementation functions of the hash-agg
algorithm need to be adapted and modified.

Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling
function, the internal implementation functions of the cipher
algorithm need to be adapted and modified.

Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling
function, the internal implementation functions of the comp
algorithm need to be adapted and modified.

Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling
function, the internal implementation functions of the dh
algorithm need to be adapted and modified.

Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling
function, the internal implementation functions of the digest
algorithm need to be adapted and modified.

Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling
function, the internal implementation functions of the ECC
algorithm need to be adapted and modified.

Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling
function, the internal implementation functions of the rsa
algorithm need to be adapted and modified.

Signed-off-by: Longfang Liu <[email protected]>
@gaozhangfei
Copy link
Collaborator

gaozhangfei commented Jan 3, 2025

有单侧ce的数据么
还有测试命令,要是方便也贴下

@Liulongfang
Copy link
Collaborator Author

Liulongfang commented Jan 3, 2025

单侧ce的数据如下:
SM4 1024B CE Performance(MB/s)
tds-------init1(CE)
1-----------2955.9
2-----------3446.6
4-----------5774.3
8-----------8399.2
16----------10035.8
32----------10638.9

SM3 1024B CE Performance(MB/s)
tds-------init1(CE)
1-----------436.2
2-----------824.9
4-----------1571.6
8-----------3107.2
16----------5571.2
32----------9071.6

@gaozhangfei
Copy link
Collaborator

硬件性能偏低,可有测过 --thread 8 --ctxnum 8?
有1+1>2的情形么
可以选择是否打开调度吧。

After adapting to uadk's heterogeneous scheduling framework,
all uadk algorithms have completed functional adaptation. After
the adaptation is completed, the old functions need to be deleted.

Signed-off-by: Longfang Liu <[email protected]>
@Liulongfang Liulongfang force-pushed the master branch 2 times, most recently from b628563 to 6bb4cdb Compare January 10, 2025 07:09
Completed the update of uadk test tool function to adapt to
heterogeneous scheduling function

Signed-off-by: Longfang Liu <[email protected]>
@Liulongfang
Copy link
Collaborator Author

无法完全达到1+1 > 2的情况,只能是1+1 ≈ 2。也就是CPU使用率没有增加情况下,通过软算硬算的混合计算,强化业务性能,让综合性能尽可能的发挥出所有计算设备的算力:

SM4算法,8KB业务包长,分别测试硬算,软算,混合计算的性能,以及达成情况(混合算力/(硬算算力 + 软算算力))
sync mode:
tds-----------HW------------CE-----------(HW+CE)-----achievement rate
1-----------1417.1---------5299.5----------3629.4-------54.04%
2------------2817----------7439.3----------6175.3-------60.21%
4-----------5438.4---------9680.8----------9854.2-------65.18%
8-----------9032.7--------11140.2---------11701.2------58.00%
16----------9143.3--------11837.1---------12495.5------59.56%
32----------9128.6--------12115.2---------13709.7------64.54%

async mode:
tds-----------HW------------CE-----------(HW+CE)-----achievement rate
1-----------9113.1---------5372.8----------7837.7-------54.11%
2-----------9139.1---------7211.1---------11365.7-------69.51%
4-----------9132.6---------9750.4---------13306.6-------70.47%
8-----------9144.1--------11145.6---------13948.9-------68.75%
16----------9139.3--------11727.8---------14644.3-------70.18%
32----------9124.8--------11951-----------13959.6-------66.24%

SM3算法,8KB业务包长,分别测试硬算,软算,混合计算的性能,以及达成情况(混合算力/(硬算算力 + 软算算力))
sync mode:
tds-----------HW------------CE-----------(HW+CE)-----achievement rate
1------------962.2----------508.7-----------549.9--------37.39%
2-----------1905.4----------998.9----------1094.2-------37.68%
4-----------3810.9---------2000.1----------2163.1-------37.22%
8-----------5161.1---------3989.5----------4305.7-------47.05%
16----------5161.1---------7606.1----------8107.1-------63.50%
32----------5161.1--------13482.8---------14493.8------77.74%

async mode:
tds-----------HW------------CE-----------(HW+CE)-----achievement rate
1-----------5161.2----------508.5----------1419.7-------25.04%
2-----------5161.2---------1005.7----------2046.6-------33.19%
4-----------5161.1---------2014.1----------5683.1-------79.20%
8-----------5161.0---------4001.3----------8801.7-------96.06%
16----------5159.2---------7529.5---------12098.6-------95.35%
32----------5160.8--------12587.8---------17534.1-------98.79%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants