From Customer Data to Smart Customer Data: The Smart Data Transformation Process

Nowadays, smart data has emerged as a new trend in creating more business value for enterprises that is defined as the data that is gathered and processed to create new insights to support business decisions. However, the transformation from data into actionable insights is still a real challenge for enterprises. For this reason, this paper presents a smart data transformation process, which aims at transforming customer data into smart customer data in order to offer actionable insights. The purpose of the study is to propose a transformation process that can be used to operate a knowledge structure for a smart service system, which can manage and deliver smart data as a service. The process covers the three dimensions of a service system: Data processing corresponding to the engineering dimension, information processing corresponding to the science dimension, and knowledge processing corresponding to the management dimension for knowledge processing. Accordingly, a case study on the smart data transformation process of a customer journey management system as a smart service system is presented to demonstrate the application of the proposed process.


Introduction
Nowadays, smart data is emerging as a new source for service innovation [1], which is defined as the data gathered and processed to create new insights to support business decisions. Smart data has emerged as a trend in creating more business value for enterprises. However, the link between data and business value to generate actionable insights is still missing, leading to a challenge for the information system research community [2]. Accordingly, there is a need for a smart service system to handle the transformation from data into actionable insights. A smart service system is a typical service system, which is "capable of learning, dynamic adaptation, and decision-making based upon data received, transmitted, and/or processed to improve its response to a future situation" [3].
Focusing on the operations of the smart service system for supporting smart customer data, the paper proposes a smart data transformation process that can be used for operating a knowledge structure for a smart service system to promote customer intelligence [4]. In other words, the paper presents a smart data transformation process for transforming customer data into smart customer data in order to offer actionable insights and to deliver smart data as a service.
The paper is structured as follows. Section 2 presents the theoretical background. Section 3 explains the research design and Section 4 continues with the smart data transformation process. Accordingly, Section 5 introduces a case study about the smart data transformation process for a particular smart service system: a customer journey satisfied customers are, the more likely they are expected to spend. This indicates the greater lifetime value that customers can offer for enterprises [18,19].

Research design 3.1 Problem statement
Actionable insights from customer data are meaningful knowledge that can be acquired from the application of data analytic techniques [20]. In fact, it is a real challenge for moving forward to insights if there is a lack of a background to understand why it is important in some business situations, which require context-related data [3].
Smart data is a good start with reliable and verified data; those data need support to become enriched, contextualized, and featured in order to generate actionable insights [21]. However, the current studies often focus on a specific level of data transformation (i.e., data, information, or knowledge level). Therefore, there is an urgent need for frameworks, models, and processes for transforming raw data into smart data in a coherent manner.

Research question and objectives
This study aims at proposing a smart data transformation process; hereafter called the SDT process, based on the service science perspective to promote customer intelligence. The proposed process includes three levels: data processing, information processing, and knowledge processing. According to the perspective of service science, the data processing level corresponds to the engineering dimension, the information processing corresponds to the science dimension, and the knowledge processing level corresponds to the management dimension [22].
The key research question is: "How to support the transformation of customer data into smart data in a smart service system".
The previous section presents the literature review to better understand the potentials of smart customer data for generating actionable insights and the contributions of the proposed study. At first, actionable insights fill the gap between smart customer data and business values [5]. Therefore, actionable insights involve practical applications rather than theoretical contributions or mere reports [2,20].
With an aim to demystify actionable insights, the SMART characteristics of the SDT process, including Strategyoriented, Measurable, Analytic-based, Result-visualized, and Transforming are presented as follows: • Strategy-oriented: actionable insights should be aligned with the goals and strategies of an enterprise to drive actions [2,23]. • Measurable: actionable insights should be measurable by identifying relevant data for key performance indicators related to the strategic goals [7]. • Analytic-based: actionable insights result from the applications of descriptive, predictive, and prescriptive analytics to transform data into insights [7,24]. • Result-visualized: actionable insights need to be visualized through digital dashboards, graphs, or models to support the decision-making process [7]. • Transforming: actionable insights should transform businesses in terms of i) Economic value -how to make the most profits from customers), ii) Social value -how to use customers for social influence, and iii) Cognitive value -how to take advantage of customers' knowledge and experience [25].

Research methodology
In this study, different processes to transform data and information into knowledge and insights will be examined to determine the specific requirements of the transformation process. Thus, the SDT process as the key deliverable of the study is presented based on the design science research methodology (DSR) [26].
Accordingly, in terms of conception and design, this study proposes three types of artefacts: constructs, models, and processes [26]. Constructs are objects and modules involving three phases of the SDT process (Data processing, Information processing, and Knowledge processing) (cf. Fig. 1). For instance, key constructs of the information processing phase are two modules (Knowledge Organization, Knowledge Discovery) and a knowledge base. Models, which present interrelationships between constructs [26], primarily involve abstract structures for organizing databases and knowledge bases. Processes, which mostly focus on methods and/or steps to produce constructs, include the SDT process, ETL and real-time data processing (cf. Section 4.1), and knowledge base construction process (cf. Section 4.2).
In terms of demonstration and evaluation of the DSR methodology, a real-world application (i.e., a customer journey management system as a smart service system) of the SDT process as an illustration, including the technical experiment and case study, is illustrated in Section 5.

Smart data transformation process
The smart data transformation process is introduced in Fig. 1. which includes three processing levels: Data processing, information processing, and knowledge processing. Each level may have different components, including constructs and methods of the process [27]. This section continues with data processing, information processing, and knowledge processing. Since the main contribution of the study is smart data transformation, the knowledge processing is presented in more detail.

Data processing
The term "data processing" refers to the transformation of data from a range of various low-level data sources into purposed data that can then be stored in specified data storages, called data repositories, which are most notably database management systems [22]. In the era of big data, customer data is increasingly arriving from a variety of data sources that are too dissimilar in the format, including relational databases, websites, mobile devices, company servers, social media, and third-party data providers [20]. By leveraging the development of new technologies, particularly big data tools, it is possible to capture and store a diverse collection of customer data in a data repository, which is constantly available for high-level tasks for generating customer smart data.
In this study, data processing consists of the following components: Data source, Data repository, Real-time data ingestion, and ETL tool. These components enable the processing of a variety of customer data sources in order to support data collection, provision, and analytics [28]. Currently, customer data is derived from a variety of data sources that fall into two main categories: • Batch data sources: store large amounts of data in bulk, such as customer relationship management systems (CRM), Enterprise Resource Planning (ERP), SCM (Supply Chain Management), POS (Point of Sale), and third-party services [29]. Customer data from these sources can be in the form of customer profiles or transactions, and it is typically stored in SQL databases, NoSQL databases, or log files [30,31].
Because of the size of this data, it is scheduled to generate daily customer metric reports, which take a long time to process and calculate [32]. • Real-time data sources: sources that customer data can be created continually every millisecond as a result of customers engaging with websites, mobile devices, or social media platforms [33]. The majority of the data from these sources is unstructured or semi-structured and comprises the following: message, email, comment, post, like, image, and clickstream [34]. With these types of data, it is critical that the system processes and analyses them within milliseconds or seconds to ensure that customers receive timely responses or services in order to improve their experience [19,35].

Real-time data ingestion.
This component enables the collecting, processing, and temporary storage of all sorts of customer data generated in real-time from social networks, websites, and mobile devices in order to ensure that data is sent to the next level of the system without interruption [36]. The most widely used real-time data processing technologies are REST APIs, Kafka, RabbitMQ, and ActiveMQ [37]. Following this stage, the data can be processed in real-time for streaming analysis or stored in a data repository.

ETL tool.
This component facilitates the processing of significant batches of data by reading data from source files or databases and then converting it to the required format for storing in data repositories (Database Management System or Hadoop Distributed File System) [28]. This process may contain several sub-data pre-processes, including data cleansing, normalization, deduplication, structure mapping, and sub-data processes, including validating, sorting, aggregating [38]. Numerous popular ETL (Extract, Transform, and Load) solutions are currently available for usage at this step to bulk load data from a range of data sources, including CSV files, NoSQL databases, and relational databases, into a particular database management system. Apache Sqoop is commonly used to export and import huge volumes of data effectively between the Hadoop Distributed File System (HDFS) and structured data storage such as relational databases [39].

Data repository.
Data repository is defined as a storage component that manages a huge volume of customer data processed from ETL tools and real-time data ingestion [33]. From the perspective of customer focus, customer data can be categorized into 4 main groups [29]: • Demographic data provides information about groups of customers based on their age, gender, residence, family status, or income [20,40]. Customer relationship management (CRM) systems, customer data platforms (CDP), thirty-party data providers, and social media are the key sources of this sort of data. • Behavioral data includes data generated by a customer's interaction with a business service or company product [19,41]. This type of data can comprise critical customer behaviors (such as registration, login, add-to-cart, purchase, page view, and clickstream), as well as consumer habits (communication channel, device, browser, and operating system) [4]. Websites, mobile applications, CRM systems, marketing automation systems, call centers, help desks, and billing systems are all common sources of behavioral data. • Transactional data includes information about customer purchases such as orders, total amount, kind of item, the number of items, transaction date, payment methods, and shipping addresses [41,42]. This sort of data can be stored in an organization's RDMS or through third-party services in the form of transaction, order, billing, and invoice records. • Psychographic data includes information about customer emotions, evaluation, preferences regarding the company's services and products [8]. other hand, is a data repository suitable for storing large amounts of data because it is a well-known fault-tolerant distributed file system of Apache Hadoop, a dominant framework for big data processing with large infrastructures being deployed and used in a variety of application fields [43].

Information processing
The information processing level aims at transforming pre-processed data from the data processing level into wellstructured information to assist business insight generation [28]. Turning data into useful information is a challenging task due to the complexity and diversity of the data.
The study considers herein two significant issues: i) How to provide support for knowledge discovery and ii) How to structure data in an appropriate organization for knowledge retrieval. Accordingly, the information processing level consists of the following components: Knowledge discovery, Knowledge base construction, and Knowledge base.

Knowledge discovery.
Knowledge discovery activities involve transforming data, primarily structured in the tabular form, into the knowledge base by applying matching business analytics techniques. Those activities may also involve extracting knowledge entities and relations from unstructured or semi-structured data by applying data mining and/or machine learning techniques.

Knowledge base construction.
The knowledge base (KB) method aims at structuring a large amount of heterogeneous data from diverse sources in a suitable organization. There are two well-known approaches for this construction: top-down and bottom-up [44]. The first approach consists of pre-defining a knowledge model, then importing knowledge instances into the model. The second one, on the other hand, focuses on knowledge extraction from raw data. This study proposes a hybrid approach that leverages the pre-defined knowledge component (i.e., CAK model [45]) and seems suitable to the research problems concerned. According to the adopted method, this study proposes the KB construction component consisting of four essential elements (know-what, know-who, know-how, know-why).

Knowledge base.
Knowledge base, which is a storage component, is serving the reasoning process, is built on top of the abstract structure through knowledge importing and extraction activities. Based on the CAK model [45], an abstract structure for KB is proposed to feature knowledge components: know-what, know-who, know-how, know-why, know-with, know-when, and know-where [1][2][3] and their links to different data in the data repository.

Knowledge processing
Knowledge processing is defined as the process to put knowledge through the forms of actions. To put it another way, knowledge processing is the process of making values from the knowledge components for actionable insights [8,46]. The knowledge processing level includes two methods: Knowledge reasoning and Insight generating.

Knowledge Reasoning.
Knowledge reasoning provides services to exploit the knowledge base. For knowledge reasoning, various methods have been studied in recent years [44], e.g., reasoning via patterns, query languages, inference rules, and so forth. In this study, the reasoning engine, whose purpose is to serve the higher levels, will be implemented as APIs relying on a knowledge query language (e.g., SPARQL, Cypher, etc.) and a set of inference rules. According to the adopted method, a service receives a knowledge component (e.g., know-what, know-who [29]) as an input, then exploits the KB to retrieve all direct and/or indirect knowledge components related to the input considered. In particular, knowledge reasoning enables different services featuring various types of knowledge. More precisely, this study focuses primarily on the interrelations among customer (know-who), product/service (know-what), process/transition (know-how), and customer value (know-why). A set of predefined patterns is proposed to determine the reasoning process based on the input knowledge component ( Table 1). The output of the reasoning process can be displayed by various tools (e.g., dashboard, graph, or simple tabular structure). Exploiting customer behaviors, preferences, and historical purchases to discover interesting relationships between know-who, know-with, and know-how components, which can help to generate business values.

Insight generating.
From the perspective of smart customer data, the paper identifies different types of actionable insights and applies relevant knowledge components (know-what, know-how, know-why, and know-who) through each stage of the customer management process, including customer profiling, customer engagement, customer experience, and customer value [4,41].   [11,25]. Derived from behavioral and psychological data, the know-what component reveals the preferences and behaviors of customers [47]. With an aim to promote actionable insights as a service, the know-what component can be demonstrated through graphs of product/service profiles or developed into product/service ontology [35]. In this light, marketers can rely on such actionable insights for product development and innovation. Accordingly, these insights deal with product/service innovation in optimizing product/service features and characteristics that offer value for customers [4].

Customer profiling focuses on the know-what component by developing products/services that can meet customers' needs
Customer engagement relies on the know-who component to promote interactions with customers. Enterprises make use of know-who to develop customer profiles by identifying similar historical purchases from demographic and transactional data [48,49]. The know-who component contains knowledge on demography (age, gender), buying behaviors (needs, purchasing power, preferences, lifestyle), and purchasing attributes (recency, frequency, size)  [19,50]. Actionable insights acquired from know-who can be illustrated as customer profiles on a dashboard. Such dashboards would facilitate the process of customer segmentation, customer engagement, and customer profile management. With the support of actionable insights from know-who, enterprises will be able to define the target audience and choose the most profitable segment. Furthermore, enterprises will be able to better adjust their marketing strategies, particularly strategies related to segmentation, targeting, and positioning for optimal outcomes [4].
The customer experience process highlights the importance of the know-how component to manage customer experience and reinvent customer journeys [11,41]. Derived from behavioral data, the know-how component contains knowledge related to the interactions of customers-products/services and customers-customers [46,51]. From the perspective of knowledge processing, actionable insights from know-how can be developed into dashboards or interactive reports so that marketers can keep track of customer experience in each journey from pre-purchase and purchase to post-purchase [52]. With the support of such dashboards, marketers can identify key touchpoints to acquire customers and improve sales [16,20].
The customer value process aims at maximizing customer values for enterprises through the transformation of know-what into know-why -what to do next with customers to achieve the enterprise's goals [53,29]. The know-why component contains actionable insights related to i) Economic value -how to make the most profits from customers), ii) Social value -how to use customers for social influence, and iii) Cognitive value -how to take advantage of customers' knowledge and experience [25,54]. In terms of economic value, the know-why component contains actionable insights related to customer lifetime value and up/cross-selling. On the other hand, actionable insights on social values take advantage of customers' social status and networks to impact other customers through word of mouth (WOM) [8,25]. Finally, actionable insights on cognitive value figure out what value to offer for customers, what resources to provide, and what engagement forms to facilitate customer co-creation [55,56].

A customer journey management system
This section of the paper continues with a case study to demonstrate how the SDT process is used for designing and developing a customer journey management system, called Customer Journey Master (CJMA). CJMA aims at assisting small and medium-sized enterprises (SMEs) in generating customer journey insights. The three processes (Data, Information, and Knowledge processing) of CJMA are presented in the following parts (Fig. 3.). Django framework [57], a free and open-source Python web platform, serves as the backbone for the CJMA system. Smart data transformation phases are implemented on the Django framework by integrating Python libraries, which are used for data processing, data analytics, data science, and machine learning. At the Data processing level, Daily data exporting and Data structure mapping are the two main components that help in collecting, extracting, and pre-processing raw data into structured data before storing it in data repositories. With the help of the Python-crontab package, Daily data exporting automates the collection of customer data from Google Analytics and Big Query, which exports daily customer events and demographic data into large volume csv files. These file data, as well as tracking data collected from the SME's website and server via the predefined API, would be cleaned, normalised, validated, and then restructured according to CJMA's data repositories by using Panda's library. MySQL was utilized for data storage because it is an open-source relational database management system that is suitable for SMEs [58]. The system provides data mining and machine learning modules at the Information processing level to assist in the transformation of processed data into suitable information for analysis. These modules are built on a variety of Python libraries, including NumPy (for data vectorization) [59], Pandas (for tabular format transformation) [60], and PyMySQL (for MySQL database connection) [61]. Finally, at the level of Knowledge processing, several reasoning and insight generating components are implemented for customer journey analytics such as metric reports, customer segmentation, customer journey visualisation, customer journey clustering, and customer decision mining. In this stage, the Scikit-learn library (which provides various classification, clustering, and decision tree algorithms) [62] and PM4Py library (which provides methods of process discovery and sequential mining) [63] are used to extract useful customer knowledge and insight. To gain actionable insights, business actors would interact with the CJMA system via a graphical user interface built with ReactJS. Dashboards, scorecards, process models, report tables, and bar charts are used to display actionable insights.

Data processing
The CJMA data sources include customer data from CRM systems, third-party platforms such as Google Analytics, company websites, and company mobile applications. Google Analytics is a great source of demographic and behavioral data since it can track customer actions like clickstreams and events that customers initiate on e-commerce websites that use Google Analytics tracking code. These data comprise not only customer information such as customer activity history, but also customer preferences such as communication channels, devices, browsers, and operating system information. Company websites and mobile applications can also capture behavioral and transactional data through consumer interactions. In terms of the CJMA model, the data processing level leverages several data sources and data processing procedures to extract, transform, and load data into a centralized data repository. Concerning the ETL process, the CJMA system has a data structure mapping mechanism that facilitates the integration of customer data from huge csv files. For instance, customer journey data from Google Analytics, containing several rows of customer events can be daily exported into csv files (Fig. 4.) and then be loaded into CJMA data storage to combine and create daily customer ITM Web of Conferences 41, 05002 (2022) IESS 2.2 https://doi.org/10.1051/itmconf/20224105002 metric reports. In terms of real-time data ingestion, CJMA offers a list of predefined Restful APIs that other systems, such as websites and mobile applications, can utilize for sending real-time customer event data to the system.
MySQL was chosen as the data repository for CJMA due to the system's target audience of SMEs with limited budgets for infrastructure development. The data repository contains several primary data tables relating to the four types of customer data mentioned previously: demographic (Customer), behavioral (Touchpoint, Action, Device Category, OS, Browser), psychographic (Email, Post, Comment, Rating), and transactional (Order). Customer data in the database management systems (DBMS) can now be exported into csv/json files, and then be accessible to higher layers through APIs to acquire new customer knowledge.

Information processing
In the case of CJMA running example, data comes from two primary sources: traditional database systems and Google Analytics. After the pre-processing process, the data is transferred, validated, and reorganized in the data repository to feature data classification and indexation based on the corresponding knowledge components in the knowledge base. Thus, those data are promptly forwarded to real-time processing for analysis in order to discover new knowledge such as demographic data, behavioral data, transactional data, and psychographic data.
Demographic data, coming from enterprise systems and Google Analytics, may be used to exploit the similarity among customers (know-who) based on their profiles. This information is essential to segment customers, better understand the target group of customers, create personas, and plan advertising campaigns [41].
Behavioral data, mainly containing customers' clickstream and event information, may be used to exploit the interrelations among customers (know-who) and products (know-who) through their interactions (know-how) on enterprise websites, as well as context information (i.e., where and when customers' interactions take place). This information assists to analyze the customer journey, personalize customer acquisition, and forecast consumer behaviors in order to enhance the customer experience [18,64].
Transactional data, retrieved from customers' orders, may be used for daily revenue reports, consumer behavior analysis, purchase likelihood prediction, frequent itemset, and sequential pattern mining [65].
Psychographic data, primarily involving comments and ratings of a customer (know-who) for a product (knowwhat) via his/her interactions (know-how), may be used to discover customer preferences. This information is frequently utilized for text mining, emotion classification, and sentiment analysis in order to turn raw data (text) into customer smart data [66].

Knowledge processing
The CJMA not only supports the knowledge and insights as mentioned above but also offers some additional insights related to customer journeys such as customer journey discovering, customer journey clustering, and customer decision mining.  The Knowledge reasoning component focuses on customer journey discovering and clustering: • Customer journey discovery is a function that maps an event log to a process model, which can fully describe a customer journey [67]. The basic idea of journey discovery is as the following: given an event log L, process discovery aims at mining the workflow of customers, in terms of actions and behaviors (Fig. 5.). Using the discovered process model, users can determine the main flow of the customers when they interact through the different touchpoints, such as websites or mobile apps, and then build a recommendation system or improve the available services.
• Customer journey clustering, also called Discovering behavior pattern, aims at clustering the event log into several small event logs based on their similar properties to produce multiple simple process models. In other words, the CJMA attempts to map customer journeys with key touchpoints in each stage [20]. Thus, users can visualize the process model as a graph, which can be used to predict the behavior of customers in order to improve customer experience, recommend appropriate products, and create a better marketing strategy [40].  The Insight generating component deals with customer decision mining, which is an extended version of process enhancement to mine the workflow perspective of behaviors. Fig. 6. illustrates a decision process model, which is a predicted customer journey including future activities which customers can decide to take based on information about the customer's device usage habits (device category, operating system, etc.). Indeed, smart customer data can be applied to optimize customer journeys such as recognizing customer needs, requesting a product or service meeting their needs, and responding to the delivery of service and products [18].

Conclusion
Nowadays, smart data such as smart customer data becomes a new source for generating actionable insights to create more business value for enterprises. In order to establish the solid link between data and business value, the paper presents a smart data transformation process, called the SDT process, for transforming customer data into smart customer data to deliver smart data as a service and to promote customer intelligence. The SDT process is presented based on the perspective of service science, including the data, information, and knowledge processing levels.
In order to develop the SDT process, different data transformation processes have been investigated. Firstly, the general processes such as the Data-Information-Knowledge-Wisdom (DIKW) process [68] and the knowledge management process are evaluated [69]. Secondly, a specific process at the data level: Extraction-Transformation-Loading (ETL) process is also explored such as [70]. Finally, several data mining processes at the information level such as KDD, CRISP-DM, and SEMMA [71] are examined.
The main differences between the SDT process and other processes are that the SDT process supports the data transformation in a coherent manner and at all the levels of the transformation. Compared with the general processes such as the knowledge management process and DIKW, the SDT process focuses on processing mostly structured data to create actionable insights, whereas the other processes focus on different types of data. Compared with the other specific processes, the SDT process deals with all three levels (data, information, and knowledge), whereas the ETL concerns the data level, the data mining processes concern mostly the information level. In general, similar processes have paid little attention to Real-time data ingestion and Insight generating. It is argued that the SDT process is one of the first approaches that focus on smart data transformation based on the service science perspective.
Concerning the implications of our work in practice, the SDT process can be enhanced and customized to meet specific requirements of the smart service system of an enterprise in order to effectively and efficiently offer smart services. Concerning the implications for research, this approach needs to experiment on a broader scale with different types of smart services for promoting customer intelligence.
Finally, our future studies aim at enhancing the approach for context-aware smart services to support artificial intelligence (AI) based services such as chatbot, recommender systems, and insight engines. Moreover, the SDT process can be also extended so that smart data can support the new trends of today's business ecosystem such as resilience and sustainability [72].