Comprehensive Review of Deep learning Techniques in Electronic Medical Records

. A digital collection of patient’s health care data like diagnosis history of patient, treatment details, medical prescriptions are stored electronically. This electronic patient health records (EPHR) model provides huge volume of real time data and used for clinical research. Natural Language processing (NLP) automatically retrieve the patient’s information based on decision support system. NLP performs traditional techniques of machine learning, deep learning algorithms and focussing on word embeddings, classification and prediction, extraction, knowledge graphs, phenotyping, etc. By using NLP technique, extract the information from clinical data and analysis it provides valuable patient medical information. NLP based on clinical systems are evaluated on document level annotations which contains document of patient report, health status of patient, document section types contain past medical history of patient, summary of discharge statement, etc. similarly the semantic properties contain severity of disease in the aspects of positivity, negativity. These documents are developed and implemented on word level or sentence level. In this survey article, we summarize the recent NLP techniques which are used in EPHR applications. This survey paper focuses on prediction, classification, extraction, embedding, phenotyping, multilingually etc techniques.


I Introduction
In recent years, usage of digital data is rapidly increases and it is necessary to use these data in various research filed work. The digital collection of patients health care data, past medical history of data, treatment details, observation of patient's health status etc are stored electronically called as Electronic Patient Health Record (EPHR) [1]. Data which are stored in EPHR are in various formats like structured data and unstructured data [2]. Structured EPHR data contains heterogeneous data like medication, diagnosis of disease etc. In the unstructured data contains discharge summaries, medical notes etc [3]. In the NLP, analysis of text data processing, extracting the clinical text information from different applications such as data mining, prediction of various chronic diseases, analysis of diseases with its side effects etc. NLP based on deep learning, machine learning techniques produces better performance in the field of health care and biomedicine of clinical text information [4]. In the patient's heath record-based information, NLP is a field of research in the analysis of medical informatics, bio-medicine and computer-based linguistics [5].
Electronic Patient Health Records (EPHR) contains the main source of data about patient's treatment details, assessment of diagnosis of disease, patient's health history in the clinical care of the patient. This main source of data in the EPHR includes both structured and unstructured data. The patient's information includes laboratory test results, sign for disease, procedures to be conducted in the analysis of disease are considered as structured data. As well as observation of health issues of patient, planning for treatment are considered as unstructured data in the clinical care [6]. In the prediction of disease from the data stored in EPHR which are mainly focused on structured data and detecting the presence or absence of disease from the clinical care [7]. For identifying the health status of patient, the structured data alone is not enough. Along with structed data, patient's age, past history and some other related information is needed for patients' health care [8].
Clinical care NLP requires an automated extraction of patient information system for detecting the disease is an essential one. This automated system only extracts the status of disease, medication from the clinical care. Consequently, pre-processing information is required for extracting the information because the decision-support systems and summarization cannot be performed based on its input data. Therefore, pre-processing includes analysis of structure of the document by tokenization, sentence splitting, spell checking, parts of speech tagging, Word Sense Disambiguation and so on [9]. There are different extraction techniques are available in the clinical care-based NLP model. They are pattern matching techniques, machine learning, deep learning, statistical techniques and rulebased techniques. By using the extracted information analysis, the clinical health care of patient as well as improve the automated decision supported system [10].
The paper has been organized as follows: Section 2 discusses about preliminaries section 3 describes various tasks in health care using NLP, Section 4 discusses methods in health care using NLP, Section 5 about applications of NLP in health care domain, Section 6 discuss about framework of NLP, Section 7 discusses challenges in EPHR, Section 8 explains datasets in FSA, Section 9 concludes the paper with future works.

1) Sentiment Analysis
In the natural language processing Sentiment analysis, also known as opinion mining, and it extract the subjective textual information. The purpose of sentiment analysis is to determine the strength of sentiment in the textual information and it is used for decision making process. The sentiment is determined as subjectivity polarity and strength in polarity. The subjectivity polarity is determined as positive or negative. Similarly for strength in polarity is determined as strongly positive, mildly positive, weakly positive etc. for a review the text document.

2) Information Retrieval
Based on the given query it retrieves the relevant information from the dataset. An information retrieval model predicts and searches the collection of natural language document and retrieves the set of matching documents based on the user's query text information.

3) Extracting information in NLP
In the NLP extracting information plays powerful concept and it will enable to parse through textual information. The textual data contains huge amount of information but for the processing all information are not needed. Therefore, it is necessary to extract the specific textual information and make the relationship between the textual data.

4) Machine Translation
In NLP, machine translation (MT) is one of the components which converts text or speech from one natural language into another and at the same time it preserves the meaning of input text and produces the output of the language. The MT is subfield of artificial intelligence.

5) Question Answering
It focuses on building a model that automatically answer the questions asked by humans in a natural language.

6) Natural Language Processing in Healthcare Domain
National Language processing (NLP) faces many challenges and issues in related with healthcare domain. Huge amount of digital information related with healthcare domain is available in the internet and the digital information includes e-health records, publications based on medicine and symptoms, treatment for the diseases. There are many critical issues in the aspects of digital information are in the form of textual format not in the pre-structured format. It is tedious to enter the patient health history in a text format, controlling, and using of information in the health research field. Therefore, NLP is needed which converts the textual information of health care domain into structured format and it can be used in the computer applications [12]. After that, all medical information is in the structured format, which is easy to ease, time saving and also cost reduction. This medical information is stored electronically heath care of records (EPHR). Physician also practices in the adoption of EPHR because of the act followed from 2009 and the act is Health Information Technology for Economic and Clinical Health (HITECH) Act [13]. The basic format of EPHR is used in 84% of hospitals which has tremendous increased 9-fold from 2008 and based on the recent research from and get the study from office of the National Coordinator for Health Information Technology (ONC). Furthermore, the usage of EPHR by office-based physician has increased from 44% to 88%. The information about patient details is stored in the EPHR model includes diagnosis of disease, demographic information, laboratory reports, drugs, clinical notes, radiological images, etc.
The usage of EPHR has increased in both hospital and outpatient care of data. The main benefits of using EPHR in the healthcare domain enhancing the patient care, minimize the error rate, improving the treatment quality in an efficient way and also it provides rich of medical information to the researchers [14]. In the EPHR model various types of information are stored which includes diagnosis of disease, demographic information, laboratory reports, drugs, clinical notes, radiological images, etc. it is difficult to handle the information like numerical quantities, time series of data, date/time related information and so on. This complicated information in the EPHR is given below:

a. Numerical form of Quantities
This numerical quantity of data includes body mass index value.

b. Date/Time information
This information is related with patient's date of birth and patient's admission date and time details.

c. Natural Language Free Text of information
This information includes heath status of the patient, discharge summary etc. This information is collected and stored in the EPHR chronologically order.
This section presents survey on various techniques used in the EHR analysis. It also shows how NLP play vital role in EHR practice. Table 1 presents survey on various papers in the EHR data extraction. Biomedical Entity recognition models The data analysis has more prediction error Shinyama et al [24] Language analysis Human language processing Less efficiency and less robustness Rink et al [25] EHR clinical analysis Automatic data extraction Semantic analysis is missing. Si et al [27] Cancer analysis ML and NLP text extraction Accuracy can be improved Xu et al [29] and Aramaki et al [30] Drug analysis Drug label extraction and clinical drug analysis Need of more efficiency in text analysis Bethard et al [32] Clinical data extraction Semantic data analysis The run time of analyzing single text take more time The EHR analysis requires NLP and deep learning models for effective data analysis and classification. Recently artificial intelligence and deep learning techniques play vital role in In the clinical health care domain for the patient based on NLP includes several tasks and are -Detection of adverse in drug Events (DADEs), extracting the information (EI), Recognition of name entity (RNE), Clinical Relation Extraction (CRE), Disambiguation of Word Sense concept.

A) Detection of adverse in drug Events (DADE)
Detection of adverse drug events based on intervention of medical treatment through medicines like wrong diagnosis, mistake in prescription, over dosage, medicine allergic reactions etc [15].
Detection of adverse drug events is benefited by the research in the medical field and medical treatment given by the hospital. The information contained in the EPHR are hidden the unstructured data like clinical notes, medical history of the patient, treatment procedure notes, discharge summaries, and testing report form the laboratory [16][17][18].
Identification and detection of information related to DADE from the clinical notes is a difficult task and consumption of time is high. Therefore, NLP is needed which helps to develop automatic model based on the concept of EPHR for the detection of drug activities, DADE with its interaction to the patient [19].

B) Extracting the Information (EI)
It is very essential in health care domain and NLP uses the EPHR for taking decision of clinical health care support, improving the quality and doing the research based on the clinical information. In the medical domain EI also automatically extract and encode the clinical information from the clinical notes. But in the case of general domain this EI is used to recognized a specialization area in NLP and automatically extract the concepts, events, entities along with its relation between the attributes from textual data [20].

C) Recognition of name entity (RNE)
It is a subtask of extracting information (EI) and also it plays a vital task in the field of health care domain-based NLP. It converts the unstructured textual information into structured textual information and easily readable by the computer [21]. The aim of RNE task is identifying the expressions in the structured textual information which denotes the entities like medications, lab tests, and diseases from the clinical notes. Many techniques used in the RNE uses deep learning, rules model, dictionary model, hybrid approach and statistical approach [22,23].

D) Clinical Relation Extraction (CRE)
It is also a subtask of extracting information (EI) which focuses on identification and detection of semantic relationship between clinical treatment-based concepts from clinical notes [24,25]. For example, from the clinical notes, the test report of MRI for the patient which reveals that cord compression in C5-6-disc herniation. This report tells that the patient is affected by two types of diseases like cord compression and C5-6-disc herniation. Here it relates with one another by clinically notes. This is the relation of one disease with another by taking single report. Thus, various types of relations are revealed by the existing researchers such as attribute of disease by pairing extraction, identification of temporal  [26][27][28][29][30]. NLP in the clinical domain shared various tasks from the clinical notes like Integrating Biology and the Bedside (i2b2) challenges [31], challenges in Semantic Evaluation (SemEval) [32].

E) Disambiguation of Word Sense concept
In the NLP, Word sense disambiguation is the process of knowing meaning for certain word which is activated by particular context. That is, it automatically determines the accurate meaning for the particular context of data. In the health care domain, NLP task requires accurate meaning for ambiguous word. The list of all possible meanings for the health care domain is generated based on Word sense disambiguation concept in the NLP. Syntactic or semantic ambiguity, Lexical ambiguity are the problems in NLP. To solve the problem in resolving semantic ambiguity is termed as Word sense disambiguation. Resolving semantic ambiguity is difficult than resolving syntactic ambiguity. To solve the problem in word's syntactic ambiguity is by using Part-of-speech (POS) tag in an accurate way [33][34][35][36][37].

Methods in Healthcare Using NLP
In the EPHR, extracting the medical information with structured clinical data and it involves description of structured textual clinical data. There are deep learning (DL), machine learning (ML) and rule-based techniques are explored in clinical domain. Machine or deep learning model employed features with appropriate algorithms. Figure 2 shows that comparison of EPHR domain applying above concepts with number of publications per year.

Fig 2. Comparison of various algorithmic concepts
In the observation of figure 2 describes that machine learning algorithm is better growing compared with deep learning/rule-based algorithm. The efficiency of machine learning algorithm is highlighted and is compared with machine learning algorithms [38]. In the recent years, health care domain in the NLP based researchers has shown that deep learning algorithm plays a vital concept. In the evaluation of biomedical text information Recurrent Neural Network (RNN) with recognition of name entity (RNE) task produces better performance in an effectively. They proposed a model with a combination of Bidirectional Long Short Term Memory (Bi-LSTM) with Condition Random Field (CRF) based on the concept of word-embedding in character level [39]. Habibi et al. [40] proposed a model with a combination of BiLSTM-CRF and it is implemented by Lample et al. [41]. The concept of word embedding is developed by Pyysalo et al. [42]. Comparing these models word Number of Publications embedding in character level produces high successful rate in the implementation of health care domain-based NLP.

A) Rule-Based Approach
Rules are applied to the textual data and 8ocusing on pattern matching or parsing in the document. These rules are defined in words or POS tag as regular expressions. The steps involved in rule-based approach are shown in Figure 3.  Figure 3, it describes that step involved in rule-based approach in NLP. In some model riles are considered as pattern. Based on the pattern value it represents the text document using index term value. Then by using Doc2vec it converts the textual document into vector. Document is classified using classifier of IF. THEN is the pattern of the document. The rules are generated in two ways. They are manual methods and automatic method from the data set. In the evaluation of rule-based approach is implemented by performance metric measures of precision and recall. Rule based approach provide high precision but low recall since the rules for a specific dataset cannot be created for other data sets [43].

B) Machine Learning Approach
In analysis the EPHR in the NLP platform, machine learning algorithm is a sub field of artificial intelligence and it plays a vital role in processing EPHR. This machine learning techniques has two categories like supervised and unsupervised learning. The function of Supervised learning technique is represented by = ( ) (1) Here the input provides the mapping function to output . The algorithms which are used in the supervised model are classification and regression. The commonly used machine learning technique is logistics regression (LR) and support vectors machine (SVM). Similarly for the unsupervised techniques are principal component analyses (PCA), and cluster analysis. In the unsupervised learning technique is learn about the input distribute its features. This input distribute features are considered as set of attributes which are extracted for every datapoint. In recent years for the processing of EPHR uses SVM, LR and random forest (RF) were used [44]. New modern NLP platforms are refined using these innovative machine learning techniques. They are composed of four steps like models, data, loss functions and an algorithm [45][46].
From the figure 4, shows that three layers namely input, hidden and output. In some situations, more than one hidden layer is constructed. It depends upon the situation of problem or data used in constructing the architecture [47]. In the training data set based on the multilayer representation of deep learning techniques are implemented automatically with several factors of unlabeled data. Along with it implements computing resources in GPU, new frame work of algorithm is adapted [48]. Deep learning architecture uses ANN concepts. Recurrent network architecture is unsupervised hierarchical data representation.
As from the Figure 4, this ANN architecture contains multiple interconnected nodes (neurons) organized in three layers: input, hidden and output. In the training process the weight value is updated in the between input to hidden layer 1 and also between hidden layer 1 and layer 2. Finally interconnect with hidden and output layer. In order to get a optimized output of ANN architecture is evaluated by minimizing the loss of function which is defined as: ( , ) = − ∑ log [ = | , ] + ‖ ‖ (2)

C) Deep Learning Model
It is a subfield of machine learning based on the multi-layer neural network architecture. Figure 4 shows that the architecture of deep neural network. This multiple layer is used to store the hierarchical representations of data. In the training data set M is minimized by applying the log loss function of the first term. In minimizing the second term by using the learned parameter of with tunable parameter of . By implementing this it will prevent the system from overfitting and enhancing the model, the back propagation technique is used for minimize the loss function and optimized the final layer loss of the function [49].

Applications of NLP in Health CareDomain
In the EPHR based on the NLP platforms used both machine learning and deep learning techniques in many ways. Table 2 Shows that application of machine learning approaches in NLP platform with health care domain. Naïve Bayes [50][51][52][53][54][55][56] Heart disease prediction Detection of multiple sclerosis Cancer and obesity classification SVM [57][58][59] Heart disease prediction Diabetes Analysis of breast radiology report Conditional random fields [60][61][62], Heart disease prediction Diabetes Detection of multiple sclerosis Analysis of breast radiology report Detecting Tumour Random Forest [63,64] Heart disease prediction Tumour detection Classification of cancer Suliemanet.al [71] effectively classifies the patient portal messages using CNN. And for the biomedical text for recognising the named entities in the CNN is applied in NLP platform [72].
In the observation of Table 2, machine learning algorithm is used in health care domain in various forms. The drawback of using machine learning algorithm is difficult to handle complex high-scale data, even though it is easy to use, simplicity, interpretability in EPHR. To overcome this drawback deep learning algorithm is used. Features of deep learning algorithm is hierarchical in construction of data features and efficient in handling long-range of data dependencies. In the health care domain-based on EPHR research is done in numerous projects using deep learningalgorithm. Also, it provides enhanced result and less consumption time. Deep learning models like CNN, feed forward NN (FFNN) and RNN that can be applied in theanalysis of EPHR [65]. In the pre-processing Vector-embedding technique is applied and also transfer learning has improved its performance in the model [66]. CNN algorithm produces an effective performance in health care domain inthe platform of NLP. S. Baker.et al [67] proposed that identification of cancer using CNN. Y. Peng et.al [68] identified the protein-protein interaction relations in the biomedical report. M. Asada et.al [69] uses CNN and implements the mechanism for extract drug-drug interactions in the NLP platform. M. C. Chen et.al [70] classifies thepulmonary embolism based on radiology report using CNN. L. Suliemanet.al [71] effectively classifies the patient portal messages using CNN. And for the biomedical text for recognizing the named entities in the CNN is applied in NLP platform [72].

Framework of NLP
In the NLP platform the commonly available framework is UIMA (Unstructured Information's Management Architecture) and GATE (General Architecture Text Engineering) which is an open-source software.

a. GATE
It was developed in 1995 at Sheffield University based on the NLP platform. it includes the basic tools of NLP for processing the low-level processes like part-speak taggers, tokenizers, penetration splitters are combined into single wrapper unit called CREOLE wrapper. In the high-level process it includes recognition of entity into ANNIE for algorithm is used in health care domain in various forms. The drawback of using machine learning algorithm is difficult to handle complex high-scale data, even though it is easy to use, simplicity, interpretability in EPHR. To overcome this drawback deep learning algorithm is used. Features of deep learning algorithm is hierarchical in construction of data features and efficient in handling long-range of data dependencies.

b. UIMA
It was initially designed by IBM since 2006 and belongs to Apache Software Foundation software. It is based on the concept of pluggable architecture and easily plug in ourcomponent in the analysis of reducing the duplication of analytical development. IBM's 2011 Jeopardy challenge Watson system has developed UIMA's framework, for recognizing the best known foundation. In addition to textual information UIMA is used to analyze the audio/video data and also it extracts the cancer features using various biomedical NLP models like MedEx,MedKAT/P, and cTAKES [73,74].

Challenges in EPHR using NLP
The clinical data is surveyed to focus on EPHR analyses. This information has significant features related to health care. Further this section discusses huge challengesacquired during EPHR related research.

A) Annotations Lacking
Traditional deep and machine learning techniques uses supervised model needed labelled data in the training phase. Therefore, annotating EPHR becomes challenging due to variability and cognitive complexity in providing data quality. So neural network requires training the huge amountof textual data. In some situation it only allows qualified annotator data for the training process. It is difficult to identifyit and train the model.

B) Privacy
The EPHR contains sensitive information and as per the existence law of the (US) Health based Insurance andPortability, Accountability Act (HIPAA), ensures the patient privacy in information as well as summary of treatment procedures. Therefore, before sharing of textual data in the EPHR privacy-preserving steps must be taken.

C) Interpretability
Deep neural networks produce superior results when compared it with other existing algorithms. Neural network framework requires huge parameters for training dataset which causes difficult situation of model interpretability. In the lineardata type, neural network consists of complex architecturewith non-linear layers, in which deep neural network is implemented to provide the transparency of model.

Datasets used in NLP
The dataset is fully available at Clinical Practice Research Data link (CPRD). Note: link, https://www.cprd.com/. For the accessibility of data https://www.cprd.com/primary-care explains: License permits full access to CPRD which has detailed terms to be used.

Conclusion
This paper presented the qualitative review of recent survey on the electronic health records in NLP. Also, it discussed about overview of NLP and in health care domain. Tasks involved in health care domain with NLP are discussed. Then we have presented the methods are used in the NLP for health care domain. We have provided the application areas of machine and deep learning techniques in the health care domain based on the NLP platform. Conclude this review paper by explaining the challenges and difficulties in the existing techniques. Mostly available challenges are preserving privacy, interpretability and lack of annotations. This paper provides a chance to read complexity of textual information of clinical data and also it provides a challenge to the researchers for exploring the new methods in the NLP platform for health care data. In future, analysis the managing diseases, improving the quality in all aspects of the health care domain.