The Uptake of Open Science: Mapping the Results of a Systematic Literature Review

This paper contributes to “Open Science” theory, with a specific focus on Open Science data generated by scholars. To this end, a mixedmethod systematic literature review, including science mapping techniques, was conducted. Our preliminary results reveal the potential of Open Science as a domain for interdisciplinary research. A keyword co-occurrence network analysis using the VOSviewer visualisation tool identified five clusters of interrelated sub-concepts within Open Science research. The key distinctive characteristics and the various categories of Open Science data have been identified. The relevant data platforms have been provided to exemplify each category of Open Science data. Finally, a distinction between Open Science data and Open Government data was explored and the convergence point between them was presented.


Introduction
The aim of this study is to propose some initial conceptual foundations of "Open Science" (OS) field of scholarship, with a particular focus on OS data. The relatively new phenomenon called "Open Science" has emerged due to the fundamental, revolutionary changes in government and science policies triggered by the development of Information and Communication Technologies (ICT). ICT advancements along with authorities' commitment allow both government and science to become transparent, participatory, and collaborative.
There are three ICT factors, which make up the phenomenon of OS. First, data sharing technologies are expanding the boundary of scientific knowledge dissemination to the public, far beyond academic communities. Second, online collaboration tools, including those that are designed to engage with general public, contribute to the networked science, which in turn speeds up scientific discoveries and knowledge creation. Finally, in the era of Big Data and Linked Open Data, data-driven intelligence enhances our ability to extract new knowledge and create value under condition that the access to the world's scientific data sources is widely available and the data are linked. Concept map (network analysis based on keyword co-occurrence) RQ2. What are the main fields of scientific knowledge (disciplines) under which the concept is studied?
Frequency pie chart based on international standard classifications of a paper or of a journal's knowledge domain RQ3. What are the key characteristics and categories of OS data?
Descriptive taxonomy of OS data characteristics and categories (based on the papers review and the results of keyword co-occurrence network analysis) RQ 4. What is the relationship between OS data and OG data?
Evidence of the relationship between OS data and OG data

Approach to literature review
We carried out a mixed-method systematic literature review (MMLR) [9] to answer the above-mentioned research questions. We used qualitative methods and quantitative methods to collect and analyse data, and report findings of our research (see details in section 2.2.). The MMLR included the following main steps: -Selecting international scientific databases and developing a search strategy.
-Literature search; screening; full-text evaluation; papers acceptance-rejection; recording each paper's findings, relevant to RQs, in a table; retrieval of bibliographic data of accepted papers.
-Developing a research strategy for quantitative data analysis and for reporting results (RQ1 and RQ2): methods, software and so on.
-Analysing data and integrating all quantitative and qualitative findings.

Review methods
We used the PRISMA Flow Diagram protocol [10] for identification, screening, and inclusion/exclusion of research papers -see Figure 1. We investigated three international scientific databases -Web of Science, SCOPUS, and EBSCO -and searched for English-language papers published from January 2014 to August 2019. We used the following Mendeley software was then applied to retrieve and organise bibliographic data. We identified 33 papers for review and analysis. The eligibility criteria for paper inclusion were as follows: -The paper addresses the general conceptualisation of the OS movement.
-The paper introduces categories and characteristics of OS data. -The paper addresses OS data within a context of government-funded R&D.
-The paper elaborates the relationship (if any) between OS data and OG data. To answer RQ1, a keyword co-occurrence network analysis was conducted using an adjacency matrix for network building and using the "VOSViewer" science mapping tool (www.vosviewer.com) for network visualisation. Prior to carrying out network analysis, keyword data cleaning was performed including identification and merging keywords that were synonyms. For the final network analysis and visualisation, we selected only keywords that were used two or more times.
To answer RQ2, we followed the following procedure. First, the fields of knowledge were identified for each paper based on classifications of the paper or of the published journal's fields in international scientific databases from which the paper was retrieved. The authors' affiliations and fields of expertise were taken into consideration as well. Second, the retrieved fields of knowledge were matched against the UNESCO standard nomenclature for fields of science and technology (http://skos.um.es/unesco6/view.php?l=en&alf=). From the UNESCO knowledge categories, which are divided into three hierarchical levels, we selected the most relevant 'Fields' (and 'Disciplines' where it was possible). In cases where we could not find the relevant categories in the UNESCO nomenclature, we relied on the fields of knowledge retrieved from the international scientific databases. Lastly, a frequency calculation was performed using MS Excel.

Open Science Taxonomy based on keyword co-occurrence
Having conducted the keyword co-occurrence network analysis, we created a concept map with the keywords (nodes) that tended to occur together in OS-related research papers (see Figure 2). The closer the nodes are in a network, the stronger is the relationship between them. The size of a node represents the total number of links of this particular node with other nodes in the network, the so-called "Total Link Strength" metric. Figure 3 shows a graph of the Each node has its own colour on the map (green, red, blue, yellow, or purple). The VOSViewer tool divided all the nodes into five clusters (each represented by a particular colour) of the most related keywords. The clusters' composition is presented in Table 2. All clusters, and the keywords inside of them, are sorted by the Total Link Strength indicator in descending order, i.e. from the most influential cluster of OS research (Cluster #1) to the less influential one (Cluster #5).

Fig. 2.
Open Science concept map based on a keyword co-occurrence network analysis. The composition of the clusters reveals the core areas of OS research (elements of OS). The first area (Cluster #1) is related to general issues of the OS movement aiming to stimulate transparency of the whole research process in order to achieve research reproducibility and to prevent fabricated, false, and biased findings -the so-called "questionable research practices" (e.g. so-called HARKing, P-Hacking, Cherry-picking, or Selective omission). The second area (Cluster #2) is centered around Open Access to published research results (such as journal papers), the most discussed and elaborated aspect of OS [6,7] 2 . Open Access, which is dependent on research funding and publishing policies, has a close relationship with the impact of research. To be more precisely, Open Access is supposed to improve evaluation mechanisms of research impact by increasing the research results' visibility and introducing new impact measurement metrics, such as altmetrics.
The third element (Cluster #3) is Open Research Data generated in the research process. It combines the characteristics of open data with issues such as Personal Data Protection and Intellectual Property regulations in data management policies and data dissemination models (please see more details in section 3.3. of this paper). Additionally, the cluster analysis placed in this group studies on the researchers' incentives for sharing research data.
The fourth area (Cluster #4) is related to the collaboration of different actors and the involvement of the general public in the research process. This emerging ICT-enabled trend in science is conceptualised in Citizen Science -"a form of open collaboration where members of the public participate in the scientific process, including identifying research questions, collecting and analyzing the data, interpreting the results, and problem solving" [11, p. 98]. Examples include the eBird (https://ebird.org) and GalaxyZoo (www.zooniverse.org/projects/zookeeper/galaxy-zoo/) projects. The cluster analysis showed that the Citizen Science element of OS research, is closely tied to OG as well.
The last element of OS research (Cluster #5) is institutionalised science policy concerned with the management of research e-infrastructures as "Commons". The "Commons" refers to resources (in the original concept -natural resources) jointly used and managed by a group of people, meaning that there are no private appropriation and commercial distribution of such resources for the group members [12]. The research infrastructures include scientific equipment, knowledge-based resources, computing systems and communication networks, and other related services [13]. An example is the European Plate Observing System (EPOS) -a pan-European infrastructure for Earth science (https://www.epos-ip.org). This cluster also includes studies on the researchers' general perceptions of OS policy. 2 Remarkably, the OS movement started among academics to protest against the rising subscription cost of academic journals. In addition, the main target of many governments' R&D funding policy is providing Open Access to the research project results. That is one of the reasons why OS is sometimes mistakenly related only to Open Access.

Knowledge domains of Open Science
The OS movement is gaining attraction across many disciplines [14]. The analysis of reviewed papers showed the range of fields, under which OS has been studied, including science, social sciences and humanities (see Figure 4).

Fig. 4. Fields of OS research (frequency distribution).
As can be seen from the pie chart, the majority of reviewed papers come from the Library and Information Sciences, and Mathematics (Computer Sciences) subject domains, with a frequency of 24% and 21%, respectively. The other frequently used domains studying OS are Life Sciences; Economic Sciences; Psychology; Juridical Sciences and Law; and Social Sciences (multiple interrelated fields, which study science and technology in a social context -particularly, Science, Technology, and Society Studies, STS).
Our results reveal the potential of OS in terms of its being a domain for interdisciplinary research. Despite the differences between individual fields in the research focus 3 , the state of development and the context of OS practices, a shared praxis has started to emerge within the above-mentioned fields. The latter means a common piece of knowledge and its practical application, which might be used for interdisciplinary collaboration on the further development of the OS domain.

Open Science data characteristics and categories
OS data has the basic characteristics of open data 4 , such as the so-called "FAIR Data Principles" (findable, accessible, interoperable, and reusable) [16, p. 177]. However, unlike open data, OS data has its own particular characteristics as well, which may vary depending on particular data types. The key particular characteristics of OS data are identified in the literature as follows: • The ownership of the copyright and other intellectual property rights in OS data is a critical issue [17]. • Licensed, restricted, and controlled access to particular data, complying with Personal Data Protection and Intellectual Property laws and regulations [17][18][19][20].
• Risk of an erroneous interpretation of OS data, especially by non-specialists (to minimise this risk, data should be inter-linked, at least within a research project, and be accompanied by proper metadata) [17].
• OS data is highly filtered according to a shared praxis (standard scientific knowledge) [4].
• OS data is supposed to be trackable and uniquely identifiable, particularly via DOI and ORCID, which is related to research impact evaluation [21].
• OS data has intrinsic (inherent) value, some data has the potential to be commercially valuable [20].
Having conducted the literature review and keyword co-occurrence network analysis, we identified five categories of OS data [21][22][23] -see Figure 5. The most developed OS practices are related to shared government-funded research results. The latter are scientific products, for example, in a form of peer-reviewed published papers, patents, research reports submitted to a funding agency, which are produced as a result of completing a funded research project. These days, many governments oblige beneficiaries to share the research results with the public and create national platforms to provide public access to all funded research results. Examples of such data platforms are the National Technical Information Service of the U.S. (classic.ntis.gov), Korea's National Technical Information Service (ntis.go.kr), the EU's CORDIS platform (cordis.europa.eu), and Japan's KAKEN public database (kaken.nii.ac.jp). there are usually no clear national policies related to them yet. Consequently, there is a lack of sufficient incentives for researchers to share non-mandatory (not legally prescribed) data. It may be due to intellectual property theft and privacy compliance concerns, or simply a wish to prevent competing researchers access to the raw data. In most of the cases, science policy-makers, research communities, and researchers themselves are quite sceptical about these types of OS data.
When it comes to open research data, the OECD defines it as "factual records used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings" [18, p. 228-229]. These include any "raw data" generated in the middle of research, in particular, experimental protocols, results of measurements, observations from fieldwork, survey results, interview recordings, etc. Examples of platforms, which publish research data, include the EU's Open Aire (explore.openaire.eu/search/find/datasets), Harvard Dataverse (dataverse.harvard.edu), Korea's Research Data Platform (dataon.kisti.re.kr), and Research Data Australia (researchdata.ands.org.au).
Another data type generated by OS practices is study pre-registration plans (reports), which are "the formal documentation of the study design, methods, measures, analysis plans, and hypothesis prior to commencing the research" [22, p. 6]. A study pre-registration aims to encourage a peer-review evaluation of a planned research framework and methodology in order to avoid questionable research practices leading to publication bias. Basically, study pre-registration tends to change the traditional research and publishing model by introducing an additional peer-review stage, which takes place before data collection (see Figure 6). However, up till now, study pre-registration has been rarely practised outside psychological studies, such as clinical trials with human subjects. For example, in the U.S. a pre-registration of clinical trials on ClinicalTrials.gov is required by law and a condition of the publication of their results in relevant journals [24, p. 11-12]. In addition, the Open Science Framework independent platform supports a study pre-registration for research across all areas of science (https://osf.io/sgrk6/). The open source codes of research software is another category of OS data. In particular, it is critically important for modelling and simulation research to reproduce and validate models/simulation runs, which are only accessible in the execution environment in which ITM Web of Conferences 33, 01001 (2020) ICTeSSH 2020 https://doi.org/10.1051/itmconf /20203301001 they were created [21]. For example, the GitHub platform (www.github.com) provides services for maintaining, sharing, and collaborating on open research software.
Finally, posts and comments on scientific network platforms can also be a valuable OS content worth analysing to extract new knowledge patterns and solve unconventional scientific tasks. These platforms include science blogs (e.g. www.gowers.wordpress.com, www.terrytao.wordpress.com, www.theness.com/neurologicablog), academic social networks (e.g. ResearchGate, LinkedIn), and citizen science platforms (e.g. https://talk.galaxyzoo.org).
According to the National Academic Research and Collaboration Information System (NARCIS) of the Netherlands, a portal offering access to multiple scientific information (www.narcis.nl), OS data can be provided to the public by different access modes -fully open, restricted, embargoed, or closed [25, p.93]. Fully Open Access means that OS data is available on the web to anyone without any restrictions, such as paywalls or registration/authorisation process. However, in most cases, there is no full Open Access to OS data -at minimum, a registration/authorisation procedure is typically required. Examples of full Open Access are the view access to many science blogs, such as those mentioned above.
There are different forms of restricted access to OS data, including a prescribed process of registration/authorisation; different access levels to OS content based on a user type; only partial information is available (e.g. metadata); paywalls; restricted computational access to analyse anonymised data in a virtual computer environment. An example of restricted computational access to OS data is a cloud data analysis of Korea's National Technical Information Service (NTIS). It enables users to view and analyse anonymised raw data related to national R&D projects under controlled circumstances (e.g. application to grant access; limited period of the granted access; or only the results of data analysis are allowed to be downloaded, not raw datasets).
The embargoed access mode entails a time delay (typically 6-24 months) before making the OS data open; during the embargo period, only the data owners can access and use the data. Lastly, closed access means that OS data are never available to third parties. For example, research data from the previous version of the Korea's Research Data Platform were closed to the public and were only available to researchers affiliated with particular research institutes.
However, the criteria to determine which data should be kept secure or open, and the range of openness of particular OS data have not been well developed and still need consideration and further discussion [26, p. 113].

The relationship between Open Science data and Open Government data
Empirical evidence of the relationship between OS and OG domains has been demonstrated by our keyword co-occurrence network analysis. In particular, we have found that a close relationship exists between Citizen Science and Open Government within OS-related research (see Figure 2). This relationship can be explained by common principles, which the OG and OS movements share, such as transparency, citizen participation (engagement), and multi-stakeholder collaboration in policy-making and science. For example, in the U.S., both data sharing policies for federally funded scientific research and multiple space-focused Citizen Science projects have been developed directly in line with the Obama Administration's OG Directive (2009) [11,27]. Besides, as some studies point out, OS data from government-funded R&D is very similar to public sector information (PSI / OG data) since these two categories of data are created using taxpayers' money. Consequently, the PSI dissemination model can be applicable partially to OS data as well [18, p. 236]. At the same time, despite some similarities between OG data and OS data, they have distinct natures and different management policies mandated by separate laws. The main difference between these two categories of data is depicted in Figure 7.

Fig. 7.
Relationship between OS data and OG data.
As we can see in Figure 7 above, there are different producers (owners) for OG and OS data. While OS data is generated by researchers themselves, OG data is produced by government officials. A small part of unclassified/unrestricted OS data serves as a primary source for OG data and can appear in OG data public repositories and portals in a processed form of OG data related to Science. Examples of the latter are R&D statistics (e.g., expenditure), trends and analytic information related to Science and Technology (S&T), information about national research facilities, announcements of scientific exhibitions, researchers' public profiles, etc. In contrast to OS data, OG data related to Science should be accessible and used by anyone without any restrictions.
The implication of the close relationship between OG and OS data is an opportunity to link them more effectively by ICT in order to facilitate scientific data-driven government decisions and policies, scientific discoveries, and innovative public services [28].

Conclusions and future research
The results of this research show that OS has been studied in a wide variety of knowledge areas. Among them, Library & Information Sciences, and Computer Sciences are the most dominant fields. The keyword co-occurrence network analysis we carried out has identified five clusters of interrelated concepts within OS research (the basic issues of OS theory related to research reproducibility; Open Access to published research results; Open Research Data; Citizen Science; and policy for implementation and management of research e-infrastructures as commons). Our study has demonstrated the potential for using science mapping as a powerful tool to uncover the conceptual aspects of a rapidly developing research field.
The distinctive characteristics and the diverse categories of OS data have been identified. These categories cover open R&D results (outcomes), open research data, study preregistration plans/reports, open research software codes, and posts and comments on scientific network platforms. Besides, our research has shown how OS and OG data can possibly be linked together. This is the first time such a systematic analysis of the content of articles on the subject have been carried out. This study argues that the presented results can provide genuine insights both for policy-makers to develop adequate OS data sharing policies ITM Web of Conferences 33, 01001 (2020) ICTeSSH 2020 https://doi.org/10.1051/itmconf /20203301001 and practices, and for individual researchers to plan their future studies in this still-developing field.
We are aware of the limitations of our research. We have examined only a sample of the relevant literature. The search terms we used were limited due to the vagueness of the very concept of OS, only a limited number of English-language papers were selected, and the time period we covered was limited. There was also, inevitably, a degree of subjectivity in our identification of papers' keywords and merging them as synonyms, as well as in our identification of the papers' fields of knowledge. Further work is needed to analyse a greater range of sources, including non-English language items, over a longer time period.
Following evaluation of our results, we would suggest the following aspects for future OS research: to investigate more deeply the interdisciplinary potential of OS for collaborative research (i.e., to identify co-occurrence patterns between independent disciplines by conducting a network analysis after extending the number of papers analysed); to develop a conceptual model of the ideal national OS e-infrastructure enabling sharing and reuse of diverse categories of OS data by different users (including the general public); and to investigate OS and OG data convergence opportunities to design and provide better quality services to both scholars and citizens. This research was supported by the Korea Institute of Science and Technology Information (KISTI).