Performance Assessment of Hetero-Junction Intrinsic Thin Film HIT Photovoltaic Module Using Machine Learning Methods

. A solar cell built of ultra-thin amorphous silicon and high-quality mono-crystalline silicon is known as a hetero-junction intrinsic thin film. It has a pyramid surface on the front that increases sunlight absorption. The operating environment has a significant impact on the performance of hetero-junction intrinsic thin-film photovoltaic modules with real I–V (current-voltage) characteristics. Changes in the environment have a significant impact on solar irradiation. Clouds also have a significant impact on the solar irradiation that a PV cell receives. In this project, we will use the Random Forest Regression machine learning algorithm to investigate the effects of sudden changes in environmental conditions on power output and module temperature of an HIT (Heterojunction with Intrinsic Thin Layer) module, where irradiance, temperature, and module efficiency parameters are taken into account when designing modules. The algorithm's output will be studied to gain a better understanding of performance variations as well as the behavior of the power output and module temperature when subjected to random influences induced by various environmental variables. The suggested algorithm is not restricted to a certain module technology or geographic location.


INTRODUCTION
PV cells, also referred to as solar cells, are electrical components that create electricity when they are exposed to photons, or light particles. The French physicist Edmond Becquerel gave the name to the photovoltaic phenomenon, which was discovered in 1839. PV cell modules started appearing on roofs towards the end of the 1980s. As the photovoltaic industry has flourished in the 21 st century, the construction of vast solar farms has continually increased in photovoltaic capacity.
A photovoltaic cell is made up of several layers of materials, each with its own function. The specifically treated semiconductor layer is the most crucial layer in a solar cell. It has two separate layers (p-type and n-type) and is responsible for converting the Sun's energy into usable power via the photovoltaic effect. A layer of conducting material is present on both sides of the semiconductor, which "collects" the electricity generated. Note that the backside of the cell, which is shaded, can afford to have the conductor totally covered, whereas the front, which is illuminated, must use the conductors sparingly to prevent blocking too much of the Sun's energy from reaching the semiconductor. The anti-reflection coating is the final layer, which is only applied to the illuminated side of the cell. Reflection loss can be severe because all semiconductors are naturally reflective. To limit the amount of solar radiation reflected off the cell's surface, one or more layers of an anti-reflection coating can be applied. Solar cells can be grouped into arrays. Homeowners have installed solar cells in considerably smaller designs on their rooftops, referred to as solar cell panels or just solar panels, to replace or supplement their normal energy source. Solar photovoltaic (PV) module temperature impacts the efficiency of PV modules, which increases the overall power output of the PV system [5]. PV modules lose output power when they heat up, and degradation accelerates as a result. There is a minor increase in short circuit current as the module temperature rises; nevertheless, the open-circuit voltage of the module drops dramatically, lowering the module's output power.
The wind is one of the most critical environmental elements that affects the temperature of the module, and whose effect is finally translated into the PV module's performance [3]. The PV module can be more efficient when the temperature is lower. As well as wind speed, the wind direction relative to the PV module's orientation determines what degree of temperature reduction the air circulation around the PV module causes. When comparing the variation of the module temperature with the ambient temperature and irradiance in parallel wind situations versus the perpendicular and low wind situation, it has been discovered that the variation of the module temperature is nonlinear and quite irregular with the ambient temperature and irradiance in parallel wind situations. The wind effect is more prominently noticeable at lower ambient temperatures, and module temperature has a very poor association with irradiance levels. Furthermore, when the irradiance is high, the parallel wind direction affects the module temperature more than the other two wind speeds. In addition, the area and period of contact between the module surface and the wind are reduced, lowering the overall cooling mechanism in a PV module. The data was measured using a Young wind sensor of model no. 05103 [14]. This model was developed considering how module temperature and several meteorological characteristics, including wind speed, wind direction, and in-plane irradiance, interrelate.
Although MPPTs (maximum power point trackers) or module-level power electronics (MLPE) such as microinverters or DC-DC optimizers can be connected to create arrays with the desired peak DC voltage and current capacity, this can be done with or without the use of independent MPPTs. In arrays with series/parallel coupled cells, shunt diodes help reduce shadowing power loss.
The data log from the National Institute of Solar Energy (NISE), Gurgaon, was analyzed in this paper. Experimental setup and data logging strategy has been described in [14].

LITERATURE SURVEY
Today, photovoltaic modules are commonly used. These modules are available in various shapes and sizes. Hetero-Junction Intrinsic Thin Film Module is one of them. They are the most modern and efficient photovoltaic module currently available. They are, nevertheless, still influenced by the conditions in which they labor. Wind speed, sunlight, weather, and other environmental conditions all have an impact on the module's performance. The temperature of the module is strongly affected by wind movement around it, which has an impact on the module's output power.
Magare et al. [14] used the linear regression approach of machine learning to estimate module temperature and maximum power production in prior works such as "Wind Effect Modeling and Analysis for Estimation of Photovoltaic Module Temperature." However, because machine learning has advanced so much in recent years, researchers are continuously coming up with new ways to improve prediction. They are also always looking for methods to improve the module's efficiency in delivering power.

Random Forest Regression
A random forest regression method is a regression technique utilizing ensemble learning as the supervised learning approach. A machine learning algorithm combines several different forecasts from different machine learning models to make more accurate predictions.

Fig. 1. Random Forest Regression
The structure of a Random Forest is depicted in fig  1. The trees run in a straight line with no contact. When a Random Forest is trained, many decision trees are constructed, and the mean of all the trees' predictions is derived. Let's go over the steps to acquire a better knowledge of the Random Forest algorithm: • The training set is divided into n random data points.
• Analyze the k data points to create a decision tree.
• Steps 1 and 2 should be repeated for as many trees as N is required.
• Assign the new data point to the average of all predicted y values from all your N-tree trees for each new data point.
In terms of accuracy and power, Random Forest Regression stands out. It works well on a wide range of situations, including those with non-linear relationships. Because it exploits randomization on two levels, the ensemble of decision trees has a high level of accuracy.
• At each split, the algorithm selects a subset of features at random to be used as candidates. This avoids many decision trees from sharing the same set of features, which decorrelate individual trees.
• When generating splits, each tree takes a random sample of data from the training dataset. This adds a layer of unpredictability to the equation, preventing the trees from overfitting the data.

Working of PV Cell
A photovoltaic cell as shown in fig 2 is composed of semiconductor materials that capture photons from the sun and generate an electrical charge [1]. Solar radiation travels through the geosphere in the form of photons, elementary particles that travel at speeds of 300,000 kilometers per second. When photons impact a semiconductor material such as silicon, the electrons in its atoms are released, leaving a free space. The wayward electrons roam around aimlessly in search of a new "hole" to fill.
Electrons must, however, move in the same direction to generate an electric current. Silicon comes in two forms to do this. It has one more electron than silicon, so the side facing outward is doped with phosphorus atoms, whereas the side facing inward is doped with boron atoms, which have one fewer electron than silicon. With the sandwich constructed, the excess electron layer serves as the negative terminal (n), while the shortfall on the positive side serves as the positive terminal (p).At the connection between the two layers, an electric field is formed.
An electric field sweeps electrons to the n-side, while holes drift to the p-side, when they are stimulated by photons [1]. Electrical energy is transferred to the external circuit in the form of electrons and holes, which are directed to the electrical contacts on both sides. This results in a direct current. The top of the cell is coated with an anti-reflective material to reduce photon loss due to surface reflection.

Photovoltaic Cell Efficiency
Because the visible range of electromagnetic radiation contains most of the energy in sunshine and artificial light, a solar cell absorber should be effective at absorbing radiation at those wavelengths. Solar cells are rated according to their energy efficiency based on how much electricity they produce compared to the amount of light they receive. In order to test the efficiency of the cell arrays, the cells are wired together into modules, which are then assembled into arrays. A solar simulator simulates optimal sunlight conditions including 1,000 watts (W) per cubic meter at 25 degrees Celsius with the panels positioned in front of it. Peak power, or the amount of electricity generated by the system, is a proportion of the solar energy received. The efficiency of a panel of one square meter that provides 200 W of electrical power is 20%. PV cells have a theoretical maximum efficiency of roughly 33%. The Shockley-Queisser limit is the name for this limit.
A solar cell's output, or how much electricity it can produce, is determined by several factors, including the solar radiation levels in the area, the efficiency of the cell, and the type of installation. In the Paris area, incident solar radiation is 1 megawatt-hour per square meter per year (MWh/sq.m/y), compared to 1.70 MWh/sq.m/y in southern France and approximately 3.0 MWh/sq.m/y in the Sahara Desert. In Paris, a solar panel with a 15% efficiency rating will create 150.0 kWh/sq.m/y, but in the Sahara, it will generate 450.0 kWh/sq.m/y [11].

Structure of HIT
Heterojunction with Intrinsic Thin-layer, or HIT, is an abbreviation for intrinsic thin-film heterojunction. Because Sanyo Corporation of Japan has sought for a registered trademark for HIT, it is also known as HJT or SHJ (Silicon Heterojunction solar cell). Three layers of photovoltaic material make up heterojunction solar panels as shown in fig 3. HJT cells combine crystalline and amorphous "thin-film" silicon technologies into a single device. The top layer of amorphous silicon captures sunshine and light that reflects off the lower layers before it reaches the crystalline layer. The middle layer, monocrystalline silicon, is responsible for converting most of the sunlight into energy. Finally, there is an amorphous thin-film silicon layer behind the crystalline silicon [16]. The remaining photons that pass through the first two levels are captured in this final layer. When these technologies are used together, more energy may be gathered than if they were used separately, with efficiency of 25% or higher. The basic structure of an HIT solar cell is shown in the diagram below, which is characterized by a pi-type a-Si: H film (film thickness 5-10 nm) on the light irradiation side and an n-type a-Si: H film (film thickness 5-l0nm) sandwiching a crystalline silicon wafer, forming transparent electrodes and collector electrodes on both sides to form an asymmetrical HIT solar cell. HIT combines the greatest features of crystalline silicon with those of amorphous silicon thin film to create a high-power hybrid cell that outperforms the industry's standard, PERC. There is the potential for significant cost reductions because the HIT manufacturing process requires four less steps than PERC technology.

Fabrication of HIT Cells
The fabrication order differs from one group to the next. The absorber layer of HIT cells is usually made of high-quality CZ/FZ grown c-Si wafers. The surface of the wafer is textured with alkaline etchants such as NaOH or (CH3)4NOH to generate pyramids of 5-10m height. After that, peroxide and HF solutions are used to clean the wafer. It is followed by the deposition of an intrinsic alkali Si passivation layer, which is usually done by PECVD or Hot-wire CVD. For deposition, silane gas (SiH4) is diluted with H2. The deposition temperature and pressure are 300°C and 0.1-1 Torr, respectively.

Fig. 4. Fabrication of HIT PV Cell
To avoid the production of faulty epitaxial Si, this stage must be precisely controlled. Deposition and annealing cycles, as well as H2 plasma treatment, were proven to offer excellent surface passivation. A-Si p-type layers are deposited by mixing phosphene gas with silane, whereas a-Si n-type layers are deposited by mixing diborane with silane. On a c-Si wafer deposited directly with doped a-Si layers, very poor passivation properties are observed. Dopant-induced defect formation in a-Si layers is most likely to blame. Because a-Si has a high lateral resistance, Bi-facial designs typically have front and back layers of Indium Tin Oxide (ITO) respectively that are transparent conductive oxides (TCOs).

DESIGN METHODOLOGY
This project proposes a prediction model using Random Forest Regression machine learning algorithm that can predict module temperature and maximum power output of a given photovoltaic module under different environmental conditions. A data log consisting of 18000+ data points of a whole year was used to train the model. Out of these, 80% data points have been used for training the model and 20% for testing.
The data set consists of parameters such as short circuit current (I sc ), open circuit voltage (V oc ), current at maximum power (I pmax ), voltage at maximum power (V pmax ), maximum power (P max ), module temperature (Temp_mid_avg), solar irradiance (G t ), wind speed (WS) and ambient temperature (T amb ). The parameters have been measured at an interval of 10 minutes. Out of these parameters, Temp_mid_avg and Pmax are dependent parameters. G t , WS and T amb are independent parameters for predicting temp_mid_avg. I sc and V oc are independent parameters for predicting P max . Temperature coefficient of maximum power for HIT module is -0.33 %/℃ [14]. Linear regression machine learning algorithm has been used before, but it is a very basic algorithm and limited to only linear relationships. It also doesn't work efficiently with large data types. The PV module's maximum output power was measured and compared to the predicted maximum output power. Random Forest Regression was used to get the anticipated output power. Random forest uses ensemble technique which means that it combines multiple models.

Required Libraries
Various libraries have been used for achieving the prediction model. Pandas is used for data processing and manipulation. Sklearn is used for training and testing of the data. Matplotlib and seaborn are used for data visualization.

Data Visualisation
Graphs have been plotted to compare the various parameters. Using this comparison, correlations between the parameters were observed.

Training and testing
Random Forest regression model operates by making several decision trees at the time of training and the final output is given by the mean value of all the decision trees. It picks random data points from the data set and then the decision tree is made according to the picked data set . The number of trees are then chosen and the same process of picking random data points and making decision trees is continued.

RESULTS AND OBSERVATIONS
In order to predict the maximum power and temperature of the module, a prediction model was used on the data log from the National Institute of Solar Energy (NISE), Gurgaon. Detailed information about the experimental setup and data logging strategy is presented in [14]. The predicted values were compared to the actual values given in the data set by plotting the "predicted vs actual value" curves. The difference in the values, i.e., error was calculated. RMSE values for each of the dependent parameters was also calculated.  As it is clearly observable from fig 5 and fig 6, random forest regression is much more accurate and efficient in predicting the module temperature than linear regression. Similarly, from fig 7 and fig 8, it can be observed that random forest regression is also more accurate and efficient in predicting the maximum power output of the module.
Temperature of the module has a non-linear relationship with its independent parameters, so the RMSE value is higher as compared to maximum power which has a linear relationship with its independent parameters. The RMSE values for linear and random forest regression is given in table . As it is apparent from the table, the random forest regression has a much lower RMSE value than linear regression. Thus, random forest provides much better prediction and is more efficient. The comparison of RMSE values is given in table 1.

CONCLUSION
The assessment of PV module temperature and maximum power was included in this project. With the machine learning model of random forest regression, a simple model for predicting PV module temperature and maximum power has been developed that considers wind speed and direction, in-plane irradiance, and module efficiency. Overall, the model's results show a good match between actual and predicted module temperatures and maximum power outputs for HIT PV cells. The methodology given in this project can be used to anticipate module temperatures of HIT PV cells under varying weather conditions with great consistency. The RMSE values of both linear regression and random forest regression were compared in this project. It was observed that linear regression has much higher RMSE value than random forest regression which makes the latter more accurate and efficient in predicting module temperature and maximum power. Actual vs predicted graphs were plotted as well and again random forest regression was more accurate.