Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypre build failure with xsdk+rocm #227

Open
balay opened this issue Oct 12, 2023 · 15 comments
Open

Hypre build failure with xsdk+rocm #227

balay opened this issue Oct 12, 2023 · 15 comments

Comments

@balay
Copy link
Member

balay commented Oct 12, 2023

balay@petsc-gpu-02:/scratch/balay/spack$ ./bin/spack spec xsdk+rocm amdgpu_target=gfx90a |grep hypre@
 -       ^[email protected]%[email protected]~caliper~complex~cuda~debug+fortran~gptune~int64~internal-superlu~magma~mixedint+mpi~openmp~rocm+shared+superlu-dist~sycl~umpire~unified-memory build_system=autotools arch=linux-ubuntu22.04-zen4

I thought this build was successful last week [but don't know for sure]

ref:./bin/spack install -j64 xsdk+rocm amdgpu_target=gfx90a

spack-build-out.txt

@balay
Copy link
Member Author

balay commented Oct 12, 2023

tried the following change and still get errors:

diff --git a/var/spack/repos/builtin/packages/xsdk/package.py b/var/spack/repos/builtin/packages/xsdk/package.py
index b52d692b78..629f240a8f 100644
--- a/var/spack/repos/builtin/packages/xsdk/package.py
+++ b/var/spack/repos/builtin/packages/xsdk/package.py
@@ -109,8 +109,8 @@ class Xsdk(BundlePackage, CudaPackage, ROCmPackage):
     variant("hiop", default=True, description="Enable hiop build")
     variant("raja", default=(sys.platform != "darwin"), description="Enable raja for hiop, exago")
 
-    xsdk_depends_on("hypre@develop+superlu-dist+shared", when="@develop", cuda_var="cuda")
-    xsdk_depends_on("[email protected]+superlu-dist+shared", when="@1.0.0", cuda_var="cuda")
+    xsdk_depends_on("hypre@develop+superlu-dist+shared", when="@develop", cuda_var="cuda", rocm_var="rocm")
+    xsdk_depends_on("[email protected]+superlu-dist+shared", when="@1.0.0", cuda_var="cuda", rocm_var="rocm")
     xsdk_depends_on("[email protected]+superlu-dist+shared", when="@0.8.0", cuda_var="cuda")
     xsdk_depends_on("[email protected]+superlu-dist+shared", when="@0.7.0", cuda_var="cuda")

spack-build-out.txt

@victorapm
Copy link

victorapm commented Oct 12, 2023

We need to merge PR hypre-space/hypre#869 into hypre's master to make hypre +rocm +superlu-dist work

Related: #225

@victorapm
Copy link

Oops, now I notice that the issue here was hypre ~rocm +superlu-dist

I'm wondering why SuperLU_dist is defining GPU stuff while this build isn't supposed to use GPUs:

/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/openmpi-4.1.6-726wgocxnrdcpvhnyctkrwk5brwxpble/bin/mpicc -O2  -fPIC -DHAVE_CONFIG_H -I.. -I../distributed_ls/Euclid -I. -I./.. -I./../blas -I./../lapack -I./../multivector -I./../utilities -I./../krylov -I./../seq_mv -I./../parcsr_mv -I./../distributed_matrix -I./../matrix_matrix -I./../IJ_mv -I./../parcsr_block_mv -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include           -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/openmpi-4.1.6-726wgocxnrdcpvhnyctkrwk5brwxpble/include -c par_nodal_systems.c
In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_api_utils.h:26,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_defs.h:104,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_ddefs.h:37,
                 from par_amg.c:18:
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:151:20: error: unknown type name 'hipEvent_t'
  151 | #define gpuEvent_t hipEvent_t
      |                    ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/util_dist.h:128:5: note: in expansion of macro 'gpuEvent_t'
  128 |     gpuEvent_t *GemmStart, *GemmEnd, *ScatterEnd;  /*GPU events to store gemm and scatter's begin and end*/
      |     ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:151:20: error: unknown type name 'hipEvent_t'
  151 | #define gpuEvent_t hipEvent_t
      |                    ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/util_dist.h:129:5: note: in expansion of macro 'gpuEvent_t'
  129 |     gpuEvent_t *ePCIeH2D;
      |     ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:151:20: error: unknown type name 'hipEvent_t'
  151 | #define gpuEvent_t hipEvent_t
      |                    ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/util_dist.h:130:5: note: in expansion of macro 'gpuEvent_t'
  130 |     gpuEvent_t *ePCIeD2H_Start;
      |     ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:151:20: error: unknown type name 'hipEvent_t'
  151 | #define gpuEvent_t hipEvent_t
      |                    ^~~~~~~~~~
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/util_dist.h:131:5: note: in expansion of macro 'gpuEvent_t'
  131 |     gpuEvent_t *ePCIeD2H_End;
      |     ^~~~~~~~~~

@balay
Copy link
Member Author

balay commented Oct 12, 2023

Related: #225

Ah, sorry for creating a duplicate issue. We can close this one [if needed]

I'm wondering why SuperLU_dist is defining GPU stuff while this build isn't supposed to use GPUs:

Yeah - Ideally we should have both superlu-dist+rocm and hypre+rocm

Current mode of superlu-dist+rocm hypre~rocm is a carry-over from prior xsdk release.

@victorapm
Copy link

Ideally we should have both superlu-dist+rocm and hypre+rocm

Ah ok, I see! We will have that with the hypre PR I mentioned

@balay
Copy link
Member Author

balay commented Oct 12, 2023

Ok - adding 'hypre+rocm' to '[email protected]' now [so this is the mode that will get tested].

@balay
Copy link
Member Author

balay commented Oct 13, 2023

We need to merge PR hypre-space/hypre#869 into hypre's master to make hypre +rocm +superlu-dist work

@victorapm , I tried building with the above change [i.e use dsuperlu branch instead of master branch] - I still see build failures

spack-build-out.txt

In file included from dsuperlu.c:12:
In file included from ./dsuperlu.h:11:
In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_ddefs.h:37:
In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_defs.h:104:
In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_api_utils.h:26:
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:108:10: fatal error: 'hipblas.h' file not found
#include "hipblas.h"
         ^~~~~~~~~~~

Hm - maybe its an issue with hypre [or superlu-dist?] spec in spack wrt rocm dependencies..

cc: @xiaoyeli @liuyangzhuan

@victorapm
Copy link

Thanks for the feedback!

hypre links to rocblas with the rocm build. It seems superlu_dist needs hipblas? We could add this as an additional LDFLAGS maybe?

@balay
Copy link
Member Author

balay commented Oct 13, 2023

This gets the hypre+rocm build going for me

diff --git a/var/spack/repos/builtin/packages/hypre/package.py b/var/spack/repos/builtin/packages/hypre/package.py
index ede99fafcc..5364a3bb73 100644
--- a/var/spack/repos/builtin/packages/hypre/package.py
+++ b/var/spack/repos/builtin/packages/hypre/package.py
@@ -24,7 +24,7 @@ class Hypre(AutotoolsPackage, CudaPackage, ROCmPackage):
     test_requires_compiler = True
 
     version("develop", branch="master")
-    version("2.30.0", branch="master")
+    version("2.30.0", branch="dsuperlu")
     version("2.29.0", sha256="98b72115407a0e24dbaac70eccae0da3465f8f999318b2c9241631133f42d511")
     version("2.28.0", sha256="2eea68740cdbc0b49a5e428f06ad7af861d1e169ce6a12d2cf0aa2fc28c4a2ae")
     version("2.27.0", sha256="507a3d036bb1ac21a55685ae417d769dd02009bde7e09785d0ae7446b4ae1f98")
@@ -108,6 +108,7 @@ def patch(self):  # fix sequential compilation in 'src/seq_mv'
     depends_on("rocthrust", when="+rocm")
     depends_on("rocrand", when="+rocm")
     depends_on("rocprim", when="+rocm")
+    depends_on("hipblas", when="+rocm")
     depends_on("umpire", when="+umpire")
     depends_on("caliper", when="+caliper")
 
@@ -258,7 +259,7 @@ def configure_args(self):
                 configure_args.append("--disable-cub")
 
         if "+rocm" in spec:
-            rocm_pkgs = ["rocsparse", "rocthrust", "rocprim", "rocrand"]
+            rocm_pkgs = ["rocsparse", "rocthrust", "rocprim", "rocrand", "hipblas"]
             rocm_inc = ""
             for pkg in rocm_pkgs:
                 if "^" + pkg in spec:

@balay
Copy link
Member Author

balay commented Oct 13, 2023

Looks like this issue will be with all pkgs that use superlu-dist.

@xiaoyeli can this dependency [on hipblas.h] be avoided from public include files? [assuming its primarily required in superlu-dist sources]

@victorapm
Copy link

Maybe we need to incorporate this into hypre's configure/CMakeLists for folks not building it via spack. The spack fix wouldn't be necessary then (although much appreciated!)

@victorapm
Copy link

can this dependency [on hipblas.h] be avoided from public include files

This sounds great if possible :)

@balay
Copy link
Member Author

balay commented Oct 13, 2023

@xiaoyeli I get the following with petsc [this warning breaks the build]

stderr:
In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:108,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_api_utils.h:26,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_defs.h:104,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_ddefs.h:37,
                 from /tmp/petsc-xuiyjmih/config.headers/conftest.cc:3:
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hipblas-5.5.1-inaizutkrwbopn5i5pvghmqlmuhj2ahw/include/hipblas.h:16:2: warning: #warning "This file is deprecated. Use the header file from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hipblas-5.5.1-inaizutkr\
wbopn5i5pvghmqlmuhj2ahw/include/hipblas/hipblas.h by using #include <hipblas/hipblas.h>" [-Wcpp]
   16 | #warning "This file is deprecated. Use the header file from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hipblas-5.5.1-inaizutkrwbopn5i5pvghmqlmuhj2ahw/include/hipblas/hipblas.h by using #include <hipblas/hipblas.h>"
      |  ^~~~~~~
Source:
#include "confdefs.h"
#include "conffix.h"
#include <superlu_ddefs.h>

I guess this should go into a "new" issue..

@liuyangzhuan
Copy link

@xiaoyeli I get the following with petsc [this warning breaks the build]

stderr:
In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:108,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_api_utils.h:26,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_defs.h:104,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_ddefs.h:37,
                 from /tmp/petsc-xuiyjmih/config.headers/conftest.cc:3:
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hipblas-5.5.1-inaizutkrwbopn5i5pvghmqlmuhj2ahw/include/hipblas.h:16:2: warning: #warning "This file is deprecated. Use the header file from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hipblas-5.5.1-inaizutkr\
wbopn5i5pvghmqlmuhj2ahw/include/hipblas/hipblas.h by using #include <hipblas/hipblas.h>" [-Wcpp]
   16 | #warning "This file is deprecated. Use the header file from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hipblas-5.5.1-inaizutkrwbopn5i5pvghmqlmuhj2ahw/include/hipblas/hipblas.h by using #include <hipblas/hipblas.h>"
      |  ^~~~~~~
Source:
#include "confdefs.h"
#include "conffix.h"
#include <superlu_ddefs.h>

I guess this should go into a "new" issue..

Fixed in #236 (comment)

@balay
Copy link
Member Author

balay commented Nov 9, 2023

This gets the hypre+rocm build going for me

The updated change(for hypre+rocm with superlu-dist+rocm) is now at spack/spack#40980

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants