Some properties of various types of matrix factorization

Matrix factorizations or matrix decompositions are methods that represent a matrix as a product of two or more matrices. There are various types of matrix factorizations such as LU factorization, Cholesky factorization, singular value decomposition etc. Matrix factorization is widely used in pattern recognition, image denoising, data clustering etc. Motivated by these applications, some properties and applications of various types of matrix factorizations are studied. One of the purposes of matrix factorization is to ease the computation. Thus, comparisons in term of computation time of various matrix factorizations in different areas are carried out.


Introduction
Matrix factorizations or matrix decompositions are methods that represent a matrix as a product of two or more matrices. Matrix factorization is usually used to simplify the computations in solving a problem which is relatively difficult to solve in its original form. We list the notations and definitions used in this paper in the next section.

Notations and definitions
Let and be positive integers and let M , () denote the linear space of × matrices over a field . We use M () for M , (). stands for the transpose of A. If = ( ) ∈ M (), ̅ = ( ̅ ). Here, ̅ is the conjugate of any ∈ . Next, we state the definitions of various types of matrix factorizations. An factorization of a matrix A is to represent ∈ M , () as a product of a lower triangular matrix, ∈ M () and an upper triangular matrix, ∈ M , (), i.e. = . Cholesky factorization is a factorization of a positive-definite Hermitian matrix or a positive-definite symmetric matrix A into the form = ̅ , where L is a lower triangular matrix with nonnegative diagonal entries. QR factorization is a factorization of a matrix ∈ M , () of rank in the form = , where ∈Mm() such that ̅ = and ∈Mm,n() is an upper triangular matrix. A singular value decomposition (SVD) is a factorization of ∈ such that ̅ = and is a × diagonal matrix with all positive entries. Nonnegative matrix factorisation (NMF) is a group of algorithms where a nonnegative matrix, ∈ M , () , is factorized into = where ∈ M , () , ∈ M , () are two nonnegative matrices by minimizing the distance between and , min‖ − ‖ 2 .

Some properties and applications of matrix factorizations
factorization method was introduced by a mathematician named Tadeusz Banachiewicz in 1938 [1]. In 2014, Sudipto and Anindya [2] proved that a non-singular matrix ∈ M () has an factorization if and only if all its leading principal submatrices are non-singular. They also proved that factorization of a non-singular matrix is unique. factorization is useful in solving a linear system. As stated by Molala [3], matrix factorization helps to ease certain operations of the matrix by breaking down the matrix into smaller pieces. Let be a non-singular matrix where = . Then, det( ) = det ( ) = det( ) det( ). Since and are both triangular matrices, determinants of and can be found by finding the product of the diagonal entries of the respective matrix. Similarly, if the inverse of is to be found, instead of finding the inverse of directly, one may find the inverses of and which are much easier. Hence, −1 = −1 −1 .
Cholesky factorization was developed by André-Louis Cholesky who was a French military officer and mathematician [4]. He used the factorization in his surveying work. Cholesky factorization is a factorization of a positive-definite Hermitian matrix or a positivedefinite symmetric matrix A into = ̅ . [2] mentioned that if is a positive definite symmetric matrix, Cholesky factorization of always exists. In many econometric contexts, matrices used are mostly positive definite symmetric matrices. An algorithm for finding the inverse of a matrix by using Cholesky factorization was proposed by Krishnamoorthy and Menon in 2013 [5]. The algorithm reduces the number of operations by 16% to 17%. It is achieved by avoiding computation of some known intermediate results. [5] proposed to find the inverse of a positive definite matrix ∈ M (), let ∈ M () such that = −1 . The matrix is the solution of the upper diagonal elements of matrix and backward substitution can be used to solve for using the equation = .
Ye and Qin [6] use QR factorization to determine the number of hidden nodes in generalized single-hidden-layer feedforward networks. In [7,8], parallel QR factorization algorithms are used in shared memory, synchronous message passing, and asynchronous message passing. QR factorization is also used in subspace based blind channel identification [9,10]. According to the paper, the QR factorization method is computationally more efficient because the method requires fewer computations compared to SVD.
In 2009, Konda and Nakamura [11] introduced an algorithm for SVD in parallel computing. Besides parallel computing, SVD is also used in compression to obtain low bit rate and good quality image coding [12]. Sudipto and Anindya [2] proved that it is always possible to factorize a matrix ∈ M , () in the form = where U, and V are unique, U and V are real orthogonal matrices such that = and = , and is a diagonal matrix with the diagonal entries are in decreasing order along the diagonal. In the example below, each row of matrix A contains the ratings (1 -5) of a person on five different movies. If a movie is rated zero, it means the person has never watched the movie. The matrix that we obtain is . In this example, the first three movies are action movies while the other two are romance. The first column of matrix represents the action concept of the movies as we can see that the first four persons tend to watch action movies. The second column represents the romance concept of the movies, thus the last three rows of the second column have higher values. Matrix represents the strength of the concepts. We can see the fourth and fifth concepts have a very low strength and thus we can ignore them in our data. Matrix is a movie to concept similarity matrix. We can see that the first three movies correspond heavily to the action concept. From this example, we can see that SVD is good at reducing the dimension of a matrix while keeping the important information.
In data mining applications, the matrices obtained are often nonnegative. Thus, NMF is used to factorize a nonnegative matrix into two nonnegative matrices. NMF is not unique and convergence is not guaranteed for all NMF algorithms. In 1999, Lee and Seung [13] investigated the properties of the algorithm for finding NMF and hence NMF has become more widely known. NMF and principal component analysis (PCA) are used to extract facial features [13]. A face is represented by a matrix and factorize = where represents the basis components of a face and represents the weightage of each components. They have also published some simple and useful algorithms in [14]. They showed that NMF is able to learn to represent an object as a combination of various parts. [15] mentioned that NMF is extensively used in machine learning. It is used to analyze high-dimensional data because it can extract sparse and meaningful features from nonnegative data vectors. A local coordinate-based graph regularized NMF method (LCGNMF) was proposed [16]. The sparse coefficients are induced by the proposed LCGNMF. In addition, the geometric structure of dataspace was considered.

Comparisons
In this section, we compare the effectiveness of various matrix factorizations in solving system of linear equations, finding the inverse of a matrix and image processing. We have used Python to carry out these comparisons. Matrices are generated randomly by using Python.

Solving system of linear equations
Consider a system of linear equations with equations and unknowns, x = b where ∈ M () is non-singular. There are many ways to solve such linear system. The linear system can be solved by finding the inverse of A, A -1 and compute x = −1 b. It can also be solved by using factorization, factorization, Cholesky decomposition, and singular value decomposition (SVD). We compare the efficiency of the above-mentioned methods in solving the linear system. We have applied the methods on matrix when it is a real matrix, Hermitian matrix, and complex matrix.
The time (in second) taken to find x in the equation x = b by using different types of matrix factorization is shown in Table 1. We first use Python to randomly generate a 1000 × 1000 non-singular matrix and a column vector b with 1000 entries. Then, we use different methods to solve the system of linear equations and the computation time is recorded. The same process is repeated for other types of matrices which include Hermitian matrix and complex matrix.  Table 2 shows the results when is a 1500 × 1500 non-singular matrix whereas Table 3 shows the results when is a 2000 × 2000 non-singular matrix.  As mentioned in Section 1, Cholesky factorization is a factorization of Hermitian matrices. Thus, the method is used in the comparison for Hermitian matrices only. From the results shown in the three tables above, we found that Cholesky factorization is the most efficient way in solving the linear system when A is Hermitian. However, when Cholesky factorization is not applicable, LU factorization is the fastest among the methods used.
The time complexity as shown in the three tables above is defined to be the computational complexity that describes the amount of time taken to run an algorithm. An algorithm with time complexity ( ) for some constant is a polynomial time algorithm. If LU factorization is computed using Gaussian elimination, the time complexity is O( 2 3 3 ). If QR factorization is computed using householder reflections, the time complexity is O( 4 3 3 ) which is two times slower than LU factorization. However, the time taken shown in the tables are not only two times slower. This is because there is a built-in function in Python that solve a linear system by using LU factorization with partial pivoting. If a linear system is solved by using the inverse, it is much slower than solving the linear system by using LU factorization as shown in the results obtained. The reason is to find the matrix, −1 is similar to solving the linear system = for ∈ M () where more unknowns are involved. Therefore, more processing time is needed. The time complexity for Cholesky factorization is O( 1 3 3 ) which is two times faster than LU factorization.

Matrix inversion
In this section, we study the efficiency of factorization, factorization and SVD in computing the inverse of ∈ M () when is real or Hermitian. The following three tables show the time taken (in second) to find the inverse of a matrix A. Table 4 shows the results when A is 1000 × 1000. Table 5 shows the results when A is 1500 × 1500 whereas Table 6 shows the results when A is 2000 × 2000. Again, we observe that Cholesky factorization is the fastest among the tested factorization methods in computing the inverse of a matrix whenever it is applicable. One of the reasons is that Cholesky factorization is two times faster than factorization based on the complexity of the two methods. Another reason is if we factorized by using Cholesky factorization, i.e. = ̅ , only −1 needs to be found to obtain −1 = ( ̅ −1 ) −1 . However, if = , −1 and −1 need to be found in order to obtain −1 = −1 −1 . When comparing with factorization, factorization is slightly faster in finding the inverse of . is an orthogonal matrix where −1 = . This should have eased the computation of finding −1 = −1 −1 . However, based on the complexity of and factorizations, computation of factorization requires longer time than factorization. The time taken for SVD is much longer and hence SVD is not a good way in finding the inverse of a matrix.

Image processing
SVD and NMF are commonly used in image processing. One of the reasons is when a matrix is factorized using SVD and NMF, the sizes of the factorized matrices are smaller compared to the original matrix. However, in and factorizations, the sizes of the factorized matrices remain the same. Thus, we only compare the speed of SVD and NMF in compressing an image in this section.
Every image is made up of pixels and each pixel of an RGB image is made up of three colors, red, green and blue. An RGB image is split into three matrices, each represents one color. Since each color is 8-bit, each entry ranges from 0 to 255. NMF is used as all the entries are nonnegative. The entries of each matrix indicate the intensity of the color in the respective pixel. We consider an image with 512 × 512 pixels as shown in Fig. 1. This implies that the matrix size is 512 × 512. After the matrices of the image are extracted, we apply SVD or NMF to compress the matrix. For comparison, the matrix is compressed to the same size in pixels. The number of components in NMF is the same with the rank of matrix in SVD. The compressed images with rank 80 are shown in Fig. 2. The results in Table 7 show that NMF needs more time than SVD to compress an image. In addition, the time taken for NMF increase much faster comparing with SVD when the number of components increases. Frobenius norm is used to compute the distance of two matrices are. Therefore, the results in Table 8 show that SVD has lower approximation errors than NMF. However, SVD does not always preserve the nonnegativity of the matrix. Hence, NMF is one of the most popular low-rank approximations used for image processing, text mining etc.

Conclusion
This paper is an overview of various matrix factorizations. We have done three types of comparisons. These comparisons are the preliminary results of our future project. We shall continue this research by performing more comparisons.
The authors would like to express their sincere appreciation to Universiti Tunku Abdul Rahman (Malaysia) for all kind of support in working out this paper.