Compiling CalculiX Extras with NVIDIA CUDA Support and the Cholesky Method

CalculiX is a rather well-known in narrow circles a pre / postprocessor and a solver for problems of mechanics of a deformable solid and problems of mechanics of a liquid and a gas. The main code is written entirely by a team of two people - their names are “in free Russian transcription” by Guido Dont (solver), Klaus Wittig (pre / post). In addition to the main code, you can find a convenient functional graphical Launcher for preparing a calculation file. A little more detailed information is in the Russian Wiki , several years ago driven by me and a couple of caring users (ETCartman and Prul, hello! By the way, I can suspect that ETCartman wrote CalculiX Launcher, but this is a story covered with secrets of nicknames).

From the point of view of an ordinary Russian engineer, CalculiX is not so important and necessary in the daily work of software to pay attention to it. Quite differently, researchers who previously conducted computational experiments at Abaqus - CalculiX can actually look at CalculiX as an open Abaqus clone, since they have one founder, Gido Dont.

Big two advantages CalculiX - cross-platform and open source. Big two minuses - almost complete obscurity among engineers in the CIS and somewhat less functional compared to Abaqus. Nevertheless, I decided to make a short note on how to get the CalculiX binary with the support for the CUDA solver, in the weak hope that this information will be useful to anyone in the CIS.

I mean that this article will be read by a person (or cyber, LOL), who had time to get acquainted with the structure of a typical calculation file * .inp and the principle of calculation organization in CalculiX and is familiar with Linux OS at least at the level of “went into the console and I know about apt-get” , besides GIK and interested in the work of the CalculiX and CUDA bundles, well, or compiling a project with Cholesky decomposition (cholmod).
To get started, look at what exactly is proposed by the author of CalculiX Extras on the project page , do not miss the link to compile the project in Ubuntu . An attentive reader on the author’s page will understand that compiling a project without ~~beer and weekend~~ detailed manuals will not work. Therefore, I took the liberty to decipher all the spaces that the author of CalculiX Extras kindly put between the lines.
')

1. Required libraries

I don’t know how things are on Linux distributions other than Mint, and in Mint 18 you will need to install approximately the following necessary minimum libraries for the project:

binutils
cpp-5
gcc-5
gfortran-5
libstdc ++ 6
libstdc ++ 6: i386
autoconf
autoconf2.64
g ++
g ++ - 5
libarpack ++ 2-dev
libarpack ++ 2c2a
libbtf1.2.1
libcr0
libcsparse3.1.4
libcxsparse3.1.4
libhdf5-mpi-dev
libhdf5-mpich-10
libhdf5-openmpi-10
libhdf5-openmpi-dev
libldl2.2.1
metis
libmetis-dev
libmetis5
libmpich12
netcdf-bin
libnetcdf-c ++ 4
libnetcdf-c ++ 4-1
libnetcdf-c ++ 4-dev
libopenblas-base
libopenblas-dev
libparpack2-dev
libstdc ++ - 5-dev
libsuitesparse-dev
libexodusii5
libexodusii-dev
libnemesis3.

2. Video card drivers

It is best to install the latest driver through the Setup-> Driver Manager utility (in Linux Mint) and reboot.

The second option - from the official NVIDIA website download for your NVIDIA-driver video card with CUDA support and install it:

i) `dpkg -i nvidia-diag-driver-local-repo-ubuntu1604_375.66-1_amd64.deb 'for Ubuntu
ii) `apt-get update`
iii) `apt-get install cuda-drivers`
iv) `reboot`

It is also possible to install the driver through the Synaptic package manager. Search for packages named NVIDIA.

3. Installing the CUDA-Toolkit

Downloading the 8th version of the official site (7.5 I get errors when compiling cudacusp.cu), start the installation in the console with the command:

sudo ./cuda_8.0.61_375.26_linux.run --override

The driver from the Toolkit kit is not installed. Answers to installer questions:

Do you accept the previously read EULA?
accept / decline / quit: accept

You are attempting to install on an unsupported configuration. Do you wish to continue?
(y) es / (n) o [default is no]: y

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26?
(y) es / (n) o / (q) uit: n

Install the CUDA 8.0 Toolkit?
(y) es / (n) o / (q) uit: y

Enter Toolkit Location
[default is /usr/local/cuda-8.0]:

Do you want to install a symbolic link at / usr / local / cuda?
(y) es / (n) o / (q) uit: y

Install the CUDA 8.0 Samples?
(y) es / (n) o / (q) uit: y

Enter CUDA Samples Location
[default is / home / usr]:

The second installation option is via Synaptic package manager after updating the system to the latest.

The third option - as written by NVIDIA developers:

Update the CUDA network repo keys using the following command
# sudo apt-key adv - fetch-keys developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
There is no need for any additional software.
# sudo sh -c 'echo "deb developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 /"> /etc/apt/sources.list.d/cuda.list'
# sudo apt-get update
The NVIDIA driver, install the cuda-drivers meta-package. Then, the NVIDIA driver is a 375.66 driver.
# sudo apt-get -y --no-install-recommends install cuda-drivers
# sudo reboot
If you also need to install the CUDA toolkit, then install the CUDA 8.
# sudo apt-get -y install cuda-toolkit-8-0

CUDA Installation Guide for CUDA Toolkit | The guide is located at the following URL: (http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)

4. You need to download the sources CalculiX, ARPACK, CUDA cusp

The build process of ARPACK is described in detail in the article . Yes, it is not small, but then you are adults as well. Understand.

I put the cusp folder from the CUDA cusp archive into the hamster (/ home / usr in my case), or version 0.4.0 itself! (this is important) took here .

At this stage, you can check the collection of CalculiX without CUDA (see the article).

5. Changing system variables

I added in /home/usr/.bashrc the paths to the CUDA libraries and the path to the source code of CalculiX:

 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/usr/CalculiX/ccx/src PATH=$PATH:/usr/local/cuda/bin LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/lib

6. Connecting libraries

The only problem for beginners may arise with the indication of the necessary dynamic libraries, which the object file ccx_2.12.a will require when linking the cudacusp.a library. To find out the required libraries, go to the CUDA cusp examples directory (/ home / usr / cusp / examples in my case) and compile a binary with the command:

 nvcc -o example example.cu -I/home/usr

Further you learn the list of necessary libraries:

 ldd example

In my case, the list of data in the Makefile will be as follows:

 CUDACUSPLDFLAGS = -L/lib64 -l:libcufft_static.a -lstdc++ -lcuda -lcudart -lm -lgcc_s -lc -l:ld-linux-x86-64.so.2 -ldl -lpthread -lrt #-llinux-vdso

7. Patch CCX-Extras and build cudacusp.a

We put patches as written on the page of the author CCX-Extras . You will have modified sources, including files cudacusp.h, cudacusp.cu, cudacusp.thrustassembly.cu. Delete cudacusp.cu (or rename), copy cudacusp.thrustassembly.cu -> cudacusp.cu. Open cudacusp.cu in a text editor and remove the word “thrustassembly” from the function name. Next, compile libu:

 nvcc -O3 -lib -o cudacusp.a -c cudacusp.cu -arch=sm_20 -I. -I/home/usr -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib -L. -lstdc++ -lcuda -lcudart -DCUDACUSP

8. Build ccx

Change the Makefile (see my):

 ## This file is heavily modified from the Makefile distributed by with ## ccx. ## pkg-config is used to pull all flags from local enviroment. You ## must set this up. Typically this would be defining ## PKG_CONFIG_PATH=$PKG_CONFIG_PATH:~/local/lib64/pkgconfig or ## similar. You will also have to add .pc file for each library you ## use. I did this because I have several machines with different ## library enviroments. You could alternatively use compiler flags ## such as -L/path/to/lib -I/path/to/include -llib. Examples of this ## on my distribution are in the comments below ################################################################################ ################################################################################ ## ## ## Flags related to CPU, optimizations, etc ## ## GPU based optimization later in the make file ## ## ## ################################################################################ ################################################################################ CC=gcc FC=gfortran ## CFLAGS=-march=native -g -O0 ## debugging -pg ## FFLAGS=-march=native -g -O0 ## debugging -pg ## LDFLAGS += -g ## CFLAGS=-march=native -O2 -Wall ## conservative ## FFLAGS=-march=native -O2 -Wall ## conservative CFLAGS=-march=native -O3 -Wall ## -flto ## aggressive -fprofile-generate -fprofile-use FFLAGS=-march=native -O3 -Wall ## -flto ## aggressive -fprofile-generate -fprofile-use ## Integer8. Note at least Arpack and Pardiso need to be compiled for ## Integer8 as well. Spooles does not appear to be int8 according to Guido ## LONGLONG = -DLONGLONG ## CFLAGS += $(LONGLONG) ## FFLAGS += -fdefault-integer-8 ## INTEXT = 64 ## Other CCX Options CFLAGS += -DARCH="Linux" -DMATRIXSTORAGE LDFLAGS += ## Multi Threaded and MPI CFLAGS += -DUSE_MT #CFLAGS += -DCALCULIX_MPI -fopenmp ## This is now default for calculix and relates to CFD CFLAGS += -DNETWORKOUT ################################################################################ ################################################################################ ## ## ## Flags related to CPU based solvers ## ## ## ################################################################################ ################################################################################ ## SPOOLES CFLAGS += -I/usr/include/spooles -I/usr/include/spooles/MT -DSPOOLES LDFLAGS += -lspooles -lpthread ## CFLAGS += `pkg-config --cflags spooles` -DSPOOLES ## LDFLAGS += `pkg-config --libs spooles` ## ARPACK CFLAGS += -DARPACK LDFLAGS += -L/home/usr/ARPACK/ -l:libarpack_linux.a ## CFLAGS += `pkg-config --cflags arpack$(INTEXT)` -DARPACK ## LDFLAGS += `pkg-config --libs arpack$(INTEXT)` ## TAUCS ## CFLAGS += -DTAUCS ## LDFLAGS += -ltaucs -lmetis ## LAPACK ## CFLAGS += -I/usr/include/openblas ## LDFLAGS += -lreflapack -lopenblas CFLAGS += `pkg-config --cflags lapack$(INTEXT)` LDFLAGS += `pkg-config --libs lapack$(INTEXT)` ## BLAS ## CFLAGS += -I/usr/include/openblas ## LDFLAGS += -lopenblas CFLAGS += `pkg-config --cflags blas$(INTEXT)` LDFLAGS += `pkg-config --libs blas$(INTEXT)` ## PARDISO ## CFLAGS += -DPARDISO ## LDFLAGS += -L/home/pete/local/lib64/ -lpardiso -lgfortran -lpthread -lm -fopenmp ## CFLAGS += `pkg-config --cflags pardiso` -DPARDISO ## LDFLAGS += `pkg-config --libs pardiso` ################################################################################ ################################################################################ ## ## ## Flags related to GPU based solvers ## ## ## ################################################################################ ################################################################################ # these libraries you can see when compile examples in cusp folder and see results of command "ldd <binary_name>" CUDACUSPLDFLAGS = -L/lib64 -l:libcufft_static.a -lstdc++ -lcuda -lcudart -lm -lgcc_s -lc -l:ld-linux-x86-64.so.2 -ldl -lpthread -lrt #-llinux-vdso CUDACUSPCFLAGS = -I/usr/include -I/usr/local/include -I/usr/local/cuda-8.0/include -I/usr/local/cuda-8.0/include/crt ## Flags for the gpu compiler NVCCCFLAGS = $(CUDACUSPCFLAGS) -arch=sm_20 -I. -I/home/usr NVCCLDFLAGS = -lib -L. -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib -DCUDACUSP NVCCLDFLAGS += $(CUDACUSPLDFLAGS) ## #NVCC=nvcc -O3 $(LONGLONG) `pkg-config --cflags cusp` `pkg-config --libs cusp` $(NVCCCFLAGS) # -Xcompiler -fopenmp NVCC=nvcc -O3 $(LONGLONG) -o cudacusp.a -c cudacusp.cu $(NVCCCFLAGS) $(NVCCLDFLAGS) # -Xcompiler -fopenmp # wrong nvcc -O3 --compiler-options '-fPIC' -dc cudacusp.cu -arch=sm_20 -I. -I/home/usr -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib -L. -lstdc++ -lcuda -lcudart -DCUDACUSP # wrong nvcc -O3 -lib -o cudacusp.a cudacusp.a -arch=sm_20 -I. -I/home/usr-L/usr/local/cuda/lib64 -L/usr/local/cuda/lib -L. -lstdc++ -L. -lcuda -lcudart -DCUDACUSP # wrong cd /home/usr/cusp/examples/Solvers/ # right for compilation cudacusp.a: # nvcc -O3 -lib -o cudacusp.a -c cudacusp.cu -arch=sm_20 -I. -I/home/usr -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib -L. -lstdc++ -lcuda -lcudart -DCUDACUSP #nvcc -O3 -lib -o cudacusp.o -c cudacusp.cu -arch=sm_20 -I. -I/home/usr -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib -L. -lstdc++ -lcuda -lcudart -DCUDACUSP ## CUDACUSP ## This is unique because it a template library rather than binary library CFLAGS += -I/home/usr/local/include -DCUDACUSP -I/home/usr/cusp -I. LDFLAGS += -L. -L/usr/local/cuda-8.0/lib64/stubs -L/opt/cuda/lib64 -l:cudacusp.a CFLAGS += $(CUDACUSPCFLAGS) LDFLAGS += $(CUDACUSPLDFLAGS) ## CFLAGS += `pkg-config --cflags cusp` `pkg-config --libs cusp` -DCUDACUSP ## LDFLAGS += `pkg-config --libs cusp` ## CHOLDMOD ## This is unique because it can be CPU or GPU based, depending on how ## SuiteSparse was compiled. Here it is assumed that SuiteSparse also ## uses CUDA CFLAGS += -DSUITESPARSE LDFLAGS += -L/usr/local/cuda-8.0/lib64 -lcublas LDFLAGS += -lcholmod -lmetis -lcolamd -lccolamd -lamd -lcamd -ldl -lcxsparse -lbtf ## LDFLAGS += `pkg-config --libs cublas$(INTEXT)` ## LDFLAGS += `pkg-config --libs cholmod$(INTEXT)` ## LDFLAGS += `pkg-config --libs metis$(INTEXT)` ## LDFLAGS += `pkg-config --libs colamd$(INTEXT)` ## LDFLAGS += `pkg-config --libs ccolamd$(INTEXT)` ## LDFLAGS += `pkg-config --libs amd$(INTEXT)` ## LDFLAGS += `pkg-config --libs camd$(INTEXT)` ## LDFLAGS += `pkg-config --libs ldl$(INTEXT)` ## LDFLAGS += `pkg-config --libs cxsparse$(INTEXT)` ## LDFLAGS += `pkg-config --libs btf$(INTEXT)` ################################################################################ ################################################################################ ## ## ## Flags related to ExodusII output ## ## ## ################################################################################ ################################################################################ ## EXODUSII CFLAGS += -DEXODUSII LDFLAGS += -lexoIIv2c -lnetcdf ## CFLAGS += `pkg-config --cflags exodusii` -DEXODUSII ## LDFLAGS += `pkg-config --libs exodusii` ################################################################################ ################################################################################ ## ## ## Recipes ## ## ## ################################################################################ ################################################################################ ## .cu file so not have a default implicit rule. Define all implicit rules used. .SUFFIXES: .o .c .cu .co : $(CC) $(CFLAGS) -c $< .fo : $(FC) $(FFLAGS) -c $< .cu.o: $(NVCC) -DCUDACUSP -c $< include Makefile.inc SCCXMAIN = ccx_2.12.c ## Define all the object file rules to identify dependencies OCCXCU = $(SCCXCU:.cu=.o) OCCXF = $(SCCXF:.f=.o) OCCXC = $(SCCXC:.c=.o) OCCXMAIN = $(SCCXMAIN:.c=.o) ## Link to math and standard c CFLAGS += -lm -lc ccx_2.12: $(OCCXMAIN) ccx_2.12.a ./date.pl; $(CC) $(CFLAGS) -c ccx_2.12.c $(LDFLAGS); $(FC) -Wall $(FFLAGS) -o $@ $(OCCXMAIN) ccx_2.12.a $(LDFLAGS) ccx_2.12.a: $(OCCXF) $(OCCXC) # $(OCCXCU) ar vr $@ $? clean: rm *.a *.o

Compile:

 make ./ccx_2.12 --help

9. Test ccx with CUDA support

Take an example with static analysis and check the solvers:

* STEP
* STATIC ** - working (fast) (this is SPOOLES)
*** STATIC, solver = CUDACUSP - working (slow)
*** STATIC, solver = CHOLMOD - working (fast)
*** STATIC, solver = SUITESPARSEQR - working (slow)
*** STATIC, solver = ITERATIVESCALING - working (very slow)
*** STATIC, solver = ITERATIVECHOLESKY - working (very slow)
*** STATIC, solver = SPOOLES - working (fast)

Conclusion

I did not like to solve static test problems using the CalculiX + CUDA bundle - too slowly, because the CUSP library is designed to solve equations of the form A * X = B with sparse matrices of large dimension. Maybe a bunch of much more efficient work in the tasks MZHG. But I do not dare to try this field of numerical aeronautics - maybe the reader will dare?

PS CalculiX + CUDA binding performance check was my personal initiative and I aimed at analyzing the possibility of using CalculiX Extras for the tasks of metal forming and sheet metal stamping. What can I say from this test? A bummer, as in a joke with a bash, “the flower didn’t grow for you”, which, by the way, Peter A. Gustafson (by CalculiX Extras) warned me in advance in a personal letter,

I note your doing metal forming. Fyi cuda is not implemented. Also, it’s not worth it. This includes the static cuda solver implemented so far.

for which deep gratitude. Although, for the sake of sporting interest (and contrary to common sense), I still compiled the project, having received a storm of fiery emotions from the process (in particular, from translation from Japanese).

With respect to the authors CalculiX, CalculiX Extras, CalculiX Launcher and other add-ons,
and also to the residents of Habrahabr and Hiktames, with gratitude to the community OpenSource, AlexKaz.

Source: https://habr.com/ru/post/370677/

All Articles