Analysis of GPU Computation of Parabolic, Bessel, Wright and Riemann Zeta Functions

. This paper deals with GPU computing of special mathematical functions that are used in Fractional Calculus. The graphics processing unit (GPU) has grown to be an integral part of nowadays’s mainstream computing structures. The special mathematical functions are an integral part of Fractional Calculus. This paper deals with a novel parallel approach for computing special mathematical functions used in Fractional Calculus. NVIDIA’s GPU hardware is used to speed up the parallel algorithm. A comparison of the sequential code, vectorized code and GPU code is performed. We have successfully reduced the computation time of special mathematical functions using the parallel computing capabilities of GPU.


Introduction
Fractional calculus (FC) is widely used in the field of engineering because of its ability to model real-world systems with more precision [1]. In recent years the use of this technology has grown exponentially these days as a growing community of researchers has come together to revitalize the field with the help of advanced computing technology [3]. In most cases, this leads to certain differential equations whose solutions are given by Special Mathematical Function. These are nonelementary functions that are found to be useful in some applications and satisfy some special properties. Computation of such functions has proven to be a resource and time intensive task. To tackle this problem, coprocessors such as GPU and parallel algorithms have been used by researchers in the past [4]. The general purpose use of GPUs for computing is commonly known as GPGPU (General Purpose Computing on Graphics Processing Unit) [3]. This is achieved by Parallel Computing. In Parallel Computing, a large number of computing nodes or CPU cores are used to carry out logical and arithmetic operations from an algorithm in parallel fashion. This increases efficiency and reduces compute time.
It has been found that, for the special functions of mathematics are also very useful for finding solutions to the initial-and / or regional issues governed by partial differential equations, and fractional partial differential equations. These special functions are having wide application in different areas of mathematics. Some commonly used special functions are Wright function, Bessel function, Riemann Zeta function, Parabolic Cylinder function, etc.
In this paper we have done computation of different special functions using hardware Section 2 deals with the Introduction GPU. Section 3 will go through the Definition of Special Mathematical Function which we have done the analysis. In section 4 deals with the Implementation and the design part. In section 5 we have taken the results of the functions. And lastly in Section 6 we have concluded.

Graphics Processing Unit (GPU)
The Graphics Processing Unit (GPU) is a many-core device used for parallel computing applications. Initially intended only for graphics rendering purposes, now have found its way into scientific computing as well. The CPU has been designed for low latency to be able to switch between operations quickly. In the past, CPU designers opted towards making the multicore design for overcoming the ludicrously high power requirements that came with increasing clock speed [7]. Because graphics rendering is a simultaneous task, GPU designers chose the many-core technique early on. GPUs are designed for high throughput, allowing users to perform as many operations as possible. There is an ample amount of infrastructure on the chip to get low latency on the CPU, such as caches to ensure that a vast amount of data is readily accessible. There is a significant amount of control flow silicon. Counter to that, the balance has shifted in the GPU, where more arithmetic and logic units are needed. It is designed for compute-intensive, highly parallel computations and hence is specialized for them. The GPUs use stream processing architecture that is well suited for compute-intensive parallel tasks [6]. Coupled with the current spike in interest in similar research endeavours, collectively known as GPGPU computing, GPUs have become a viable and efficient tool for implementing compute-intensive algorithms [8].
MATLAB is an abbreviation for Matrix Laboratory. It is one of the most widely used proprietary software and programming language by scientists and engineers for numerical computation, due to its excellent graphical capabilities, it has become a well-known tool for data analysis. Researchers also have been known to be using MATLAB scripts and toolboxes to construct immensely complicated systems [7]. The most crucial part of MATLAB is its Toolboxes. MATLAB Toolboxes are software packages/libraries tailored for a specific domain or application. One such toolbox was used in this study, The Parallel Computing Toolbox (PCT), which is the backbone of parallel computing in MATLAB [6]. We will be using the GPU enabled functions from PCT for a single GPU system, a detailed explanation of GPU focused part of PCT is in [3]

Special Mathematical Functions Used in Fractional Calculus
Special functions implemented in this paper are Wright function, Bessel function, Riemann Zeta function, Parabolic Cylinder function. The Wright function plays an important part in the theory of the partial differential equations of the fractional order that are used for modeling of most phenomena including. The anomalous diffusion processes or in the theory of the complex systems. Bessel functions are used to solve in 3D wave equation at a given frequency. The solution is generally a sum of spherical bessels functions that gives the acoustic pressure at a given location of the 3D space. The Riemann zeta function used in analytic number theory and has applications in applied statistics, physics, probability theory. We have used numerical method of this functions and coded that using MATLAB.

Parabolic Function:
The parabolic cylinder functions (PCFs), also known as Weber parabolic cylinder functions. It is a class of functions sometimes called Weber functions. (1) The above two equations 1 and 2 are the even and odd equation respectively. Where, 1F1(a:b:z) is a confluent hypergeometric function of the first kind. We have checked the results for the values of N = 200, and parameter A = 1, 4, 6, and having step size -10 : 1e-3 : 10, -10 : 1e-4 : 10, -10 : 1e-5 : 10.

Bessel Function:
Bessel's function was first formulated by mathematician Daniel Bernoulli. Bessel's function is a solution of differential equation which is known as Bessel equation (3) For an arbitrary complicated range α, the order of the bessel function. Although α and −α produce the equal differential equation, it is conventional to define different bessel capabilities for those two values in this sort of manner that the bessel features are by and large easy capabilities of α. Bessel function are mainly important for most of the problems of wave propogation and in static potentials and in cylindrical co-ordinate systems. It is can be used to define the function by its series expansion around x = 0 Here, Γ(z) is the gamma function. Using the above equation 4 we have analysed the function and obtained the results. We have checked the results for the values of N=200, parameter Alpha = 0.9, 1.5, 1.9, and have plot against step size 0 : 1e-3 : 3, 0 : 1e-4 : 3, 0 : 1e-5 : 3.

Wright Function:
The Wright function is defined by (5) Where, Γ(z) is the gamma function. It became added for the primary time via E.M. Wright in [8] in connection together with his investigations inside the asymptotic principle of partitions. The Wright function characteristic belongs to the magnificence of the hypergeometric capabilities; it may be reduced to the recognized special function for some values of the parameters ρ and β because of the relation. The Wright function along with the Mittag-Leffler function plays a prominent role in the theory of the partial differential equations of the fractional order. Using the above equation 5 we have analysed the function and and computed results for the values N=200, and have plot against step size 0 : 1e-2 : 3, 0 : 1e-3 : 3, 0 : 1e-4 : 3.

Riemann Zeta Function :
Riemann expanded the definition of Euler (s) function in all complex numbers (except for a simple pole in s = 1 and one remainder). Euler's product description for this function is still valid if the actual s is more than one. To help understand the values of some of the more complex numbers, Riemann found the performance efficiency of the Riemann zeta function: Using the above equation 6 we have analysed the function and results for the values of N=200, and have plot against step size: 1e6, 1e4, 1e5.

Design and Implementation
While changing over MATLAB code to run on the GPU, it is best to begin with a correct and optimized MATLAB code. The code should provide the required results accurately. For CPU programming, vectorization is considered to be a great practice, whereas for GPU, vectorization is essential for achieving high execution. Subsequently it is essential to vectorize the code by replacing looped scalar operations with MATLAB matrix and vector operations. Profiling gives accurate information about the time taken by each line or section of the code. The Run & Time include of MATLAB is an viable tool which profiles the code while executing it.
The section of code that the profiler appears taking the maximum execution time on the CPU are the ones that we need to focus on. Writing the GPU code utilizing the functions in MATLAB which are overloaded to run on GPU is very useful. The variables or data required for execution on GPU device is to begin with transferred to device memory using gpuArray() function. The capacities with these variables as arguments are specifically executed on GPU. After the handling on GPU, the results are replicated back utilizing gather() function.
Following algorithm is used for converting the MATLAB code for GPU execution: 1. Start with the code that gives correct results. 2. Make the vectorization code of the function. 3. Run the code using Run & Time Function. 4. Analyze and Convert the section that takes more time to execute on GPU. 5. Transfer the variables used in the section described in step #4 to GPU memory using GPU Array function. 6. After GPU execution, transfer the output variables back to CPU memory using gather function. 7. After receiving the gathered data from GPU, rest of the execution is performed on CPU.
To calculate the speed up use the following equation

Result and Analysis
First we have written sequential code using the definition of functions. After checking the correctness of the code we have profiled the sequential code using MATLAB profiler. Then we have observed the section of code which consume more time for executions, that section we have vectorized using Parallel Computing toolbox of MATLAB. Output of vectorized code is also benchmark with sequential code for verifying correctness of code. Finally we have converted the vectorized code to GPU code by transferring data to GPU memory using inbuilt function of MATLAB like gpuArray(). Fig 1, Fig 2, Fig  3 and Fig 4 shows the output benchmarking plots of different functions. From the benchmarking plots it is clear that output obtained from vectorized code is same as the output obtained from GPU code.

Conclusion
The graphical processing unit is visually and visibly changing the path of standard cause computing. In this work, different special mathematical functions are computed using Graphical processing unit (GPU). While computing special function we have observed acceleration because of multiple processing cores of graphics processor. Special codes for these functions were developed. Then using Parallel Computing Toolbox of MATLAB vectorized version of these codes are developed and computed using GPU hardware. Thus we can observe that the special mathematical functions can be successfully implemented on GPU.