Beyond Patent Analytics: Insights From A Scientific and Technological Data Mashup Based On A Case Example

World Patent Information 55 (2018) 61–77
Contents lists available at ScienceDirect
World Patent Information

journal homepage: www.elsevier.com/locate/worpatin
Beyond patent analytics: Insights from a scientific and technological data T

mashup based on a case example
Alessandro Comai
International University of Japan, Japan
A R T I C LE I N FO A B S T R A C T
Keywords: Although access to open and commercial digital sources is easily available thanks to the proliferation of the
Data mashup internet, R&D departments still face the challenge of how to analyze information from several sources. This
Network analysis paper addresses this issue specifically when technological and scientific information needs to be analyzed in an
Visual analytics integrated manner. 12,577 families of patents, 2601 scientific papers and 706 news articles are combined,
Patent analysis
normalized and analyzed using their own metadata and text. A software tool is used to extract insights from
Scientific paper analysis
News article analysis
semi-structured and unstructured data by means of text mining. Additionally, interactive force-directed graph
visualization is employed to show the multiple relations of concepts during different time periods with regard to
the entire technology ecosystem. Through a case study of 3D printing technology, this paper shows how to apply
mashup and obtain the benefits, and it defines the challenges of using interactive visualization representation.
1. Introduction To date, substantial efforts have been invested in this area of re-
search and several solutions have been proposed in recent years [7].
Intellectual property (IP) software has been around for quite some The shift towards multi-database-based analysis and networked re-
time [1–3]. Nevertheless, these software tools are used by R&D analysts search has created a new opportunity in social science. However, many
and managers, as well as other professional bodies such as intellectual challenges remain to be met. The review of patent analysis software
property experts, technology transfer institutions and Universities, in tools made by Abbas et al. [8] shows that text mining is a dominant tool
order to analyze the “state of the art” of a particular technology. Al- in contrast to visualization. Nevertheless, both techniques “generate
though patents are utilized as a main source for measuring the level of charts and insights that are hidden or not easy to identify by conven-
technology innovation [4], these do not represent the whole technology tional manual analysis” [3] which can be applied to answer specific
ecosystem. Patent analysis can be enriched with other information questions frequently raised by IP professionals [9].
sources, such as scientific papers, technical reports and technology-or- To address these issues, we have organized the paper in four main
iented news. Furthermore, sources unrelated with science such as social sections. The first section reviews the current literature in the different
media can also play an important role in understanding how techno- domains. In the second section, we will present a methodology which
logical information is disseminated today and how web sources can includes the process for capturing data and combining this into one data
significantly enrich the analysis of technology [5]. source. We will outline how data is analyzed in the network graph. The
Although inbound information flow is beneficial for the innovation third section discusses the results and how to obtain insights from the
process [6], R&D managers have to face several key challenges with data and the visualization. The fourth and final section of the paper is
respect to external knowledge acquisition: the sourcing and processing devoted to discussion and a conclusion.
of the sources. The great number and variety of sources makes it dif-
ficult to aggregate data in a normalized manner. Additionally, un- 2. Literature review
structured information, such as technological reports and news, and
semi-structured information, such as patents, provide an additional This literature review has six sections. Firstly, the most recent
barrier for R&D managers when extracting and interpreting the data. techniques for extracting knowledge from structured to unstructured
These issues become more relevant when the number of documents to datasets are described. Secondly, we explore the data aggregation
be analyzed increases significantly. In these circumstances, mashup process also known as “mashup” applied to IP. Thirdly, we highlights
shows itself to be a promising solution in many fields. the role of bibliometric techniques not only developed in scientific
E-mail address: [email protected].
https://doi.org/10.1016/j.wpi.2018.10.002
Received 27 February 2018; Received in revised form 12 October 2018; Accepted 14 October 2018
0172-2190/ © 2018 Elsevier Ltd. All rights reserved.
A. Comai World Patent Information 55 (2018) 61–77
literature but also in other data sources. Fourthly, a description of the Finally, information will be analyzed using a variety of visualization
benefit of visualization is provided. Fifthly, some principles and appli- techniques in the post-processing phase.
cations of network theory in the technology field are discussed. Finally, Although merging tech and social type information can reveal new
two research questions are presented: the first to addressing key ben- insights [31], there are still some constraints. According to Järvenpääa
efits of the mashup datasets and the second the visualization options. et al. [35], comparing different sets of source-based data on specific
technologies reveals that different sources do not always play an im-
2.1. Extraction methods and taxonomies portant role in showing how a technology evolves over time.
One of the main challenges analysts and researchers need to deal 2.3. Bibliometrics
with is how to extract insights and intelligence from unstructured data
quickly and conveniently. Extracting data refers to “the task of auto- A large body of research papers have used bibliometric techniques
matically extracting structured information from unstructured/semi- for studying patents, scientific papers and articles [29,36], defined as
structured machine-readable documents” [10]. NPL [30,32,37]. Bibliometrics is defined as a “quantitative analysis of
So far, many techniques have been proposed as potential solutions. academic publishing” [38]. On the other hand, White and McCain [39]
For example, statistical methods of analyzing metadata [5,11], text define bibliometrics as “the quantitative study of literature as they are
mining [12–14], computational linguistics and natural language pro- reflected in bibliographies. Its task, immodestly enough, is to provide
cessing [3,10,15,16], and semantic keyword extraction [17,18]. These evolutionary models of science, technology, and scholarship.” Although
methods range from common statistical techniques to artificial in- bibliometrics can also be applied to other domains, such as the web, for
telligence, when advanced algorithms are used for analysis of the data example [38], this paper focuses primarily on technological data. Bib-
[8,19]. liometrics are used for measuring research outputs [40] or counting the
The vast majority of the techniques described above serve to extract number of publications and the number of citations that the publica-
terms, keywords and metadata. In patents, for example, extracting tions receive [41–43].
keywords from both the full text or patent claims [15] may be very To do so, bibliometric studies tend to use software tools for pro-
beneficial for creating taxonomies [10,20], as well as ontologies [21]. ducing automated analyses of large corpora of data [18]. The number of
Open source tools, such as Wikipedia Miner, prove to be very useful in tools, as well as features, has increased significantly in the last decade
extracting and comparing keywords from social media sources such as [1], but most of the current software is focused on extracting in-
Wikipedia, among others [22]. formation from data or metadata using statistics-based visualization,
Specifically, taxonomies collate terms in groups in a hierarchical rather than text mining techniques [2]. Even if tools have started
tree for the purpose of gaining a better understanding of concepts re- adding functionalities capable of extracting meaning from text [2] or
lated with a specific domain [23]. Both represent a conceptual re- graphing data using compelling visualization techniques, there is still
presentation of knowledge of a specific technology domain. However, room for improvement.
the section of the document to be used for extracting keywords is im-
portant, as this is dependent on the objective of the analysis [24]. In this 2.4. Visualization
way, keywords and metadata can finally be used for displaying clusters
and graphing semantic networks [18,25]. The large amount of text available through digital media has in-
creased the need for deciphering the contents and accelerating the data
2.2. Mashup interpretation process. According to Kochtchi et al. [16], “text visuali-
zation attempts to turn large text corpora into more accessible visual
Combining technical and scientific datasets enriches the inter- representations”. Over the course of decades, visualizations have played
pretation of the information with additional detail. Abraham and an important role in enabling decision-makers to quickly identify in-
Moitra [26] stated that patent analysis is not limited to technology, but formation of interest [44], facilitating the knowledge acquisition pro-
patent statistics can show economic development. Moreover, according cess [45–47], digesting considerable amounts of data [48] and over-
to Nelson [27], utilizing multiple sources reduces potential biases in the coming information overload [49].
analysis of data retrieved from a single source. More recently, new visualization representation has been developed
When a technology is being studied using different sources, the and rapidly adopted to explore data quickly [50] or for data triangu-
options are limited to two alternatives. On the one hand, the contents lation [51]. Among visualization models, there are visualization tools
are studied independently and separate conclusions are drawn. This is that have increased its value. Interactive visual data exploration allows
perhaps the most common practice applied when studying two or more “the human to get insight into the data, draw conclusions, and directly
topics [28–30]. On the other hand, the information is merged together. interact with the data” [52]. In order to achieve these benefits, the
This option is the most desirable and undoubtedly the most advanta- process can be very structured and follows three main steps, namely
geous – for example, reducing potentially biased results when using a extracting, converting and generating visual representations [53].
single source [27] – but it requires careful data processing.
The process of grouping and normalizing several datasets is known 2.5. Network graph
as mashup (also mush-up). The term mashup is also often used to refer
to the blending of heterogeneous digital data [7]. Merging scientific According to Newman [54], a network consists of “vertices” or
and technology domains from heterogeneous as well as homogenous “nodes” connected by “edges”. It includes many properties, such as the
information sources may provide a number of positive outcomes. It size of the edges, degree of nodes, number of discrete nodes or edges,
helps to discover or identify new information in real time [7] and keep strength or weight of nodes or edges, node clustering, betweenness
abreast of dynamic business [31], brokers, gatekeepers, coordinators centrality of nodes, edge direction, color of node or edge, and network
[32,33], linkages between university and industry technology [33] and diameter [54]. Based on the principles of network and visualization
intermediaries [34], among others. theory, network graphs are used for showing multiple connections be-
To accomplish the task of mashing up information, especially when tween concepts, such as topics, keywords or metadata, or between key
using non-patent literature (NPL), several processes are required. Abbas stakeholders, such as institutions, firms, universities, inventors, owners
et al. [8] suggested three main phases for processing the data: pre- or people [3,16,42,55]. Nodes can also represent the relationship be-
processing, processing and post-processing. Firstly, data needs to be tween documents themselves based on technology domains [56] or
collected and transformed. Secondly, data needs to be structured. semantic similarity [10,17,57].
62
Network graphs are built on several streams of literature: sources. We have combined Patents, Scientific Literature and
Information visualization [58], Visual analytics [59], Interactive visual Technological News.
analysis [60] and Visual network analysis [61]. Network theory has To do so, we used 3D printing technology as the main subject for the
been used in computer science, social sciences [62] and in sciento- research. We selected this key topic for the research, because a growing
metrics for many purposes. For example, in order to show the number number of publications have been devoted to 3D printing in recent
of citations between documents [63]; to study a firm's connections, such years [71]. Additionally, 3D printing is a fast growing technology with
as mergers, acquisitions and strategic alliances [64]; and to study co- a significant impact on business, according to Rayna and Striukova
inventions [3,65] or technology breakthroughs [25]. Finally, and in [72]. The timeframe for this study, 2008–2017, was selected for several
perhaps the most popular context, online social networks or social reasons. Firstly, we wanted to make sure that significant data among
media [66] are likely to utilize network graphs to display an entire the data sources was retrieved. For patents, we used the priority date
ecosystem around a specific conversation [67]. instead of the publication date, because this date is the closest date to
According to Sternitzke et al. [42], the spread of social networks has the invention. Secondly, we wanted to identify key time events to see
encouraged IP software to adopt new methods of analysis, which has whether or not the different sources were interacting across the stages,
resulted in an increasing use of networks to visualize large amounts of as suggested by Brenner [73]. Finally, we wanted to ensure that our
information [2,3,19,55,68]. Perhaps one of the reasons that network timeframes across datasets were aligned.
graphs are so popular in science and technology fields is that they can In order to process and analyze the data, we used Mira Analytics,2
show many dimensions at once, rather than just two dimensions [42]. an analytical platform developed with the Hypertext Preprocessor
However, visual analytics of networks driven by data represents a novel (PHP) scripting languages. The software is able to process different
process for studying technology ecosystems [53]. unstructured and semi-structured data from a variety of sources, and
visualize, combine sources in one, utilize a taxonomy for retrieving key
2.6. Concluding remarks data, and finally, graph nodes and relationships between concepts using
a non force-directed layout.
Despite the value of the different research streams, a review of the The process for retrieving, cleaning, combining and visualizing data
literature shows that there are still some limitations. Two points need to consisted of five main steps (Fig. 1), which are described in the fol-
be taken into consideration. Firstly, the integration of different datasets lowing sections.
has mainly been focused on social media and little has been carried out
in the technological field. New techniques for analyzing patents and 3.1. Data collection
articles together are needed [8]. Secondly, although visualizations and
networks have been widely used in patent analysis software, ap- We used three different commercial databases – Patbase, Scopus and
proaches continue to focus on a single source of information rather than Proquest – for retrieving patent, scientific and technology information
aggregating a great variety of technically oriented sources. Thirdly, respectively. However, before describing which data query we used for
when searching large volumes of data which result in large ecosystems each database, a few observations need to be made.
[25], there is a technological challenge with respect to visualizing all Firstly, patent and scientific literature, unlike data from other open
the data. Fekete and Plaisant [50] addressed several visualization lim- sources, such as articles, blogs, social media or web pages, is not easily
itations which are still valid when using web-based tools. Interactive available from the web. We decided to use semi-structured databases to
Scalable Vector Graphics (SVG) or canvas-based visualization, which avoid crawling and scraping data from open sources [74,75]. Secondly,
are rendered in the user's personal computer, require good computa- although there are other databases on the market that can substitute the
tional resources and time for the process. In contrast, some network one we used, we chose these databases because they are very popular
graphs1 use a pre-defined calculation of the position of the nodes in the among scholars and IP professionals. Thirdly, although commercial
server to render the graph quickly, but the vast majority of them are data provides metadata which unstructured data would not, there are
static. still some restrictions. Specifically, Scopus does not provide all the
Nevertheless, limited technical capability makes the analysis of a metadata that is available in its database in batch download. Fourthly,
large amount of data more challenging [69]. Finally, visualization of commercial databases do not provide data cleaning, which is un-
networks needs to take into consideration the aesthetics of the node fortunate, and therefore we needed to apply harmonization between
allocation in the layout, which makes the graph easy or difficult to authors, for example.
interpret. In this context, Sugiyama et al. [70] suggested several rules to To collect data, we applied a query formed by “3D printing” or
improve the readability of a graph. “three-dimensional printing” [56], limiting the record to between the
Based on these conclusions, in our opinion it is important to explore year 2008 and2017.3 Although it is not clear when a breakthrough
how to integrate the data and show at least one solution that could be technology is visible or successfully introduced in the market [76], we
considered feasible. Thus, two key questions are suggested as follows: used a ten-year period of interest, because it concentrates the great
majority of the volume (94,07% for patents, 92.26% for scientific lit-
Q1. Does a mashup dataset provide a significant contribution en- erature and 92.65% for trade articles). A specific query was used for
riching data analysis? each database. For this paper, collection was performed as follows:
Q2. Which options can be adopted to meet the technological chal-
lenges of analyzing large volumes of data? • 12,577 families of patents were found in the Patbase 4
database. The
research strategy applied for retrieving the corpus was (TA=("3D
The following section explains how we approach the integration and printing" or "three-dimensional printing") and PD = 2008:2017)
analysis of technology information. limited in the title and abstract field. The data collected from each
record included the following fields:
3. Methodology ○ Title
○ Abstract
The goal of this paper is to present a case concerned with how to
visualize a technology ecosystem based on a combination of different
2
http://www.mira-analytics.com/.
3
For patents we have limited the search query to the priority date (PRD).
1 4
see: http://www.mapofscience.com/. Please see: https://minesoft.com/our-products/patbase/.
63
Table 1
Export options for each database supplier.
Field Patent Papers (Scopus) Articles (Proquest)
(Patbase)
Title yes yes yes

Summary yes yes yes
Key words – – yes
Main Claims yes – –
Int. class (IC)* yes – –
Priority Date yes – –
Publication Date yes Partially yes (publication)
(publication)**
Inventor/s or yes yes yes
Author/s
Applicant/s yes – Company
Institution/s – yes Companies
Source – yes yes
Link yes yes yes
Fig. 1. Five-step procedure of this research.
(*)For the purpose of this study we did not use the IPC as mandatory in the
software. Although Patbase includes other metadata that can be used in Patent
○ Patent assignee (standard) analysis, we only used those that are comparable with the metadata of the other
○ Inventor (standard) databases. (**)To be precise, the date of an article should refer to its submission
○ Earliest Priority Date date. Similarly, like a patent, the submission date represents the closest date to
○ Int. class (IC) when the topic was discussed by the author/s. It is partial, since Scopus does
○ 1st main claim not provide the month when downloading the papers, only the year of the
• 2601 papers (scholarly articles) and conference proceedings were paper.

extracted from the Scopus database using the following query ("3D
Printing" OR ″three-dimensional printing") limited in the title and Table 2
abstract field and limited to Journals and Conference proceedings. Conversion table between Patbase, Scopus and Proquest datasets.
The data collected from each record included the following fields: Field Type Patent(Patbase) Papers(Scopus) News Articles
○ Title of the scientific paper (Proquest)
○ Abstract of the scientific paper
Title Text field Yes Yes Yes
○ Publishing date Summary or Text field Yes Yes Yes
○ Author name(s) Abstract
○ Affiliation Main Claims Text field Yes – –
○ Name of the source Date Date Yes (priority) Yes (Publication) Yes
• 706 trade journal (magazine) articles extracted from the Proquest Inventors Metadata Yes (Inventors) Yes (Author/s)
(publication)
Yes (author/s)
Central database were used and the query used was (ti(("3D Organization Metadata Yes (Applicants) Yes (Institutions) Yes
printing" OR ″three-dimensional printing")) OR ab(("3D printing" OR (Companies)
″three-dimensional printing")) filtered by date (2008–2017) and Journal or Metadata – Yes (Source) Yes
Magazine
document type (Article) excluding Scholarly Journals, Conference
Name
Papers & Proceedings and Books. The data collected from each re- Link Metadata Yes Yes Yes
cord included the following fields:
○ Title of the article
○ Publishing date performed manually, once each file had been downloaded from each
○ Author name(s) data source.
○ Companies Based on Table 2, it is possible to observe that data is not hetero-
○ Source geneous and not all the fields can be compared directly. To combine the
○ Summary of the article three datasets, first we standardized the temporal data by using the
○ Full text of the article patent priority date instead of the publication date, since the date of the
○ Keywords invention is closer to the priority date than the publication date. Sec-
ondly, we merged the authors of a paper with the inventors of a patent,
To retrieve both scholarly and trade journal articles, we limited our as suggested by Breschi and Catalini [32]. The owner is kept as a se-
query to the title and abstract only (avoiding the use of keywords) in parate entity from the inventors, although in patents inventors and
order to keep the search strategy as close as possible to that used for applicants could be used together. Finally, titles and abstracts are used
patents. Table 1 below shows the differences and similarities between together as text fields in order to extract concepts for all datasets.
the fields available in each database. Table 2 shows the result of this process.
3.2. Uploading and data mashup 3.3. Data cleaning and harmonization
To upload the data to the tool, we implemented two parsing tools As discussed earlier, one of the key challenges when combining
which allowed us to upload the data from the files that were available datasets is to avoid having metadata, such as names of authors, entities
in the commercial databases for download. A CSV file format was used or concepts, written in different ways. If a harmonization process is not
for the Focus and the Proquest databases, and for Patbase we used an applied or used correctly, then the entire work might result in erro-
XML file for uploading patents. Data was stored in an open source neous analysis and visualization. Thus this activity is critical in any
NoSQL database MongoDB. This document-based database was used in patent or NPL database, particularly in the metadata provided by sev-
order not to limit the data volume. The uploading process was eral databases.
64
Table 4
Frequency of Categories based on the threshold used in each dataset (number of: Papers, Patent and News Articles).
Papers Patenta News Articles
# Papers # Authors # Universities # Keywords # Patent # Inventors # Applicants # Keywords # Articles # Authors # Organizations # Keywords
1 4973 1088 125 1 1047 954 95 1 729 142 91

2 826 198 114 2 94 540 76 2
7 90 33 106 3 38 91 66 5 9 2 55
13 18 9 98 4 17 61 63 9 2 0 40
18 6 3 91 6 4 31 53 13 2 0 31
19 6 2 90 7 0 16 47 17 2 0 26
25 0 0 78 9 0 6 40 29 1 0 26
– – – 10 0 10 33 – – – –
– – – 11 0 0 33 – – – –
a
Based on the first five hundred most recent families.
Fig. 2. Tool for building Taxonomy.
Although there are several computer-based functions for identifying this threshold. Other mechanical cleaning techniques could have been
similarities between metadata [10], we applied a manual process to considered, but as these are not supported by the tool used, we opted
harmonize records, focused on personal names and entities in each for a manual process. Additionally, the date also needs to be normalized
database. For example, to detect the same author in several news ar- and converted, for instance from the MM/DD/YYYY to the DD/MM/
ticles, we organized a spreadsheet sorting the author's column and then YYYY format. Additionally, for those databases that provide only one
manually reviewed all the authors' names. On the other hand, for pa- export option, for instance based on Excel files, conversion to the CSV
tents, we drew up a list of the main authors ordered according to the (Comma Separated Value) format is necessary in order to upload the file
number of patents invented, using the Patbase classification list. We to the software. To do so, an additional cleaning process is required in
then compared these listings with the other two datasets. Although the order to eliminate some punctuation marks such as commas or semi-
process can take a long time, its implementation is feasible when re- colons which could lead to an error during the interpretation of the CSV
moving authors who only have one document. For example, Table 4 file in the visualization software.
shows how quickly the number of documents is reduced by adopting
65
Table 3 The taxonomy is constructed of a hierarchical list of keywords

Parsing fields. formed by categories. Each category groups a list of key word, which in
Field Parsing category turn group a list of similar words or a synonymous term. Therefore, a
taxonomy can be constructed with a maximum of three levels of terms.
Title text However, although all keywords are used for analyzing the text, only
Summary text
two of those that are at the second level are visible in the network
Main Claims text (only for patent)
Date date
graph. Fig. 2 shows a screenshot of the tool used for building the tax-
Owners metadata onomy.
Authors metadata In order to build the taxonomy, we used 144 keywords suggested by
Applicants metadata the Scopus database, each composed of a single word as the starting
Companies metadata
point. Specifically, Scopus shows a list of keywords after searching in
Universities metadata
Magazine or Journal name metadata the database for tailoring, filtering and identifying a wider research list
Link link of papers. The list is developed by using the “indexed keywords” that
the publisher produces for each individual paper. However, some re-
dundant words relating to 3D printing, such as “Printing”, “3D Printer”,
3.4. Taxonomy and text mining “3-D Printing”, “3D Printing” and “Three Dimensional Printing” were
removed from the list, since these were the words and alternatives used
Once the dataset has been created, an extraction method needs to be in the search strategies. Additionally, ten identical words written in
applied to identify keywords and metadata from the datasets. For this singular and plural forms were blended together. The final list consists
study we applied a rule-based extraction method [77] similar to text of 135 keywords grouped under one category title: “concepts”.
mining [14], but using a pre-defined list of keywords or tokens (Tax- The goal is to find the occurrence of metadata and keywords in
onomy). Additionally, metadata was identified and extracted from each unstructured text and then represent the results in a network graph. The
dataset to enrich the analysis of the dataset(s). calculation is made by the software which uses a batch script developed
Fig. 3. Patents 3D Printing Ecosystem (limited to first 500 patents).
66
Fig. 4. Patents 3D Printing Sub-Ecosystems (limited to first 500 patents).
in Node.js.5 Metadata is calculated simply, but in the case of keywords, many times the metadata is repeated in each record. If the data is
we text mine them from the title and the abstract for each record. For extracted from un/structured text like title and summary, we cal-
patents, however, we also text mine the main claims, as they are widely culate how many times a keyword appears in the text and then we
used in patent analysis [24]. The resulting analysis is stored in a JSON calculate the total number. Both methods are based on word fre-
(JavaScript Object Notation) file format, which is then used for the quency. Although this method is quite simple to perform, capturing
visualization. The date and link to the original document are main- and visualizing the most frequent words is of value for organizations
tained in their original format (see Table 3). [78].
• Edge thickness: the thickness of an edge shows the number of re-
3.5. Analyzing and visualization cords in which both nodes appear together in each record. It de-
termines the concurrence of metadata or keyword in each record.
Analysis of the mashup dataset was mainly performed by the D3.js This feature is quite popular in patent software when the con-
tool, which is also responsible for drawing the network graph. Based on currency of two patent fields is shown [1]. Similarly, in scientific
the JSON document provided in the previous step, the JavaScript pro- papers collaboration networks are studied for determining the
cesses the calculations of frequency co-occurrence using the following connection between two authors that appear in the same paper [79].
rules: In a network graph, the relationship between two authors (edge)
shows the concurrence of two authors in the document.
• Node definition: a node can either be a metadata item or a keyword. • Edge color: edges use an axial color gradient based on the color of
• Node color: the node color depends on the colors assigned to a the connected nodes.
metadata item or a category of keyword in the taxonomy. The color
of a keyword group can be custom assigned and it is totally sub- For graphing nodes and edges, an indirect force graph layout is
jective. However, some metadata colors such as authors or applicant used. The graph shows character occurrence of metadata or keywords
are hardcoded. in the records of the mashup dataset where nodes are located using
• Node size: for determining the size of a node we applied two cal- closer proximity of nodes.
culations. If the node is a metadata item, then we calculate how
4. Results
5
See: https://nodejs.org. The aim of this paper is to answer the previously stated questions
67
Fig. 5. Papers 3D printing ecosystem.
that were posed to fill part of the gap in this particular research topic. Secondly, the Scopus dataset presents a network graph with the
The analysis of the data will allow us to learn from the mashup dataset connections of 3333 authors, 655 organizations and 117 keywords.
and gather insights through network visualizations. The results are Fig. 5 shows the analysis of 2601 papers.
described following each research question. Fig. 6 shows four sub-ecosystems based on the four categories
available (metadata and keywords).
4.1. Q1. Does a mashup dataset provide a significant contribution enriching Thirdly, the graph based on news articles shows the connections of
data analysis? 727 authors, 142 companies, 280 magazines and 91 keywords (Fig. 7).
From this representation, it is possible to observe that the maximum
In order to answer this question, we made three separate visuali- size of the edges is one. In other words, all the edges of the graph have a
zations and then we merged them together. Firstly, we generated a weight of 1. This means that a couple of metadata or keywords do not
Patent-based visualization showing 18,298 inventors by family (red concur in more than one publication. In this particular case, it makes no
color), 5650 applicants (purple color) and 92 keywords detected using sense to reduce the graph to a smaller number of nodes by filtering it
the taxonomy (yellow color). Fig. 3 shows the result of the entire eco- using the size of the edges, because above the value of one the visua-
system of patent metadata and keywords based on analysis of the first lization will be empty. Fig. 8 shows details of each category analyzed
500 records. separately.
On the other hand, if the ecosystem graph is simplified into smaller In all three graphs, the largest-size nodes show the most frequent
graphs, each representing the principal categories, the analysis becomes category resulting from the occurrence of either a metadata item or a
more feasible. This process is done by selecting one category at a time. keyword. Additionally, the force-direct algorithm7 tends to allocate the
In this way, the resulting graph provides specific data to be analyzed
separately. Fig. 4 shows three “Sub-ecosystems” of the main graph
about: Inventors, Applicants, and Keywords.6 (footnote continued)
each category is smaller when shown separately rather than together.
7
For more information about the algorithm used in the visualization, please
6
When a single category is graphed in the software, only linked nodes are see https://bl.ocks.org/mbostock/4062045 developed by Mike Bostock. The
shown. This means that unconnected nodes, such as an author that published force between nodes can be modified using a “force layout” control panel; a tool
just one article, will not be visualized by the graph. Therefore, any graph based that allows modifying: min. and max Link distance, Gravity, Charge Frictions,
on a single category shows at least one pair of nodes which, in the case of the Link strength and Charge max. distance. The values of the default settings can
authors, will be the smallest group. As a result, the total number of nodes in be changed in order to improve for instance highly connected nodes or large
68
Fig. 6. Papers 3D printing sub-ecosystems.
biggest nodes in the central part of the graphs, due to the proximities of of the entire ecosystem. The increase of complexity allows enrich the
nodes and the number of interconnections between them. This may be dataset for the analysis. For instance, combining three unrelated dataset
observed in Figs. 3–6. However, the graph resulting from the analysis of which includes the same metadata (see Table 3) simplifies the analysis
news articles shows a separately located group of authors in the upper and visualization. The resulting nodes of metadata, grow in size. The
right-hand corner, because the authors are not connected to any key- sum of the three metadata highlights relevant insights. In the case that
word. metadata are different the mushup is able to show that new community
On the other hand, the most frequent nodes also tend to be highly that are not visible.
interrelated and have the greatest number of connections. Specifically,
the graph relating to papers (Fig. 8) shows that there are a great number
4.2. Q2. Which option can be adopted to meet the technological challenges
of authors who work in 3D printing, but this number falls drastically
of visualizing large volumes of data?
from 8273 to 90 when the node size is filtered to one or more articles.
Consequently, the reduced network graph tends to show the most active
In order to present a less cluttered, more readable graph [70], we
authors. This filter can be applied to each graph. Fig. 9 shows a graph of
reduced the number of non-relevant nodes. Specifically, we have ap-
each dataset made by limiting the minimum size of nodes.
plied a threshold based on the idea that those nodes with less than three
From a bibliometric perspective, Table 4 shows the results of the
occurrences are not relevant, because authors, organizations or key-
occurrence of each key category of the three datasets based on the filter
words that do not appear at least once in the three databases do not
applied. It may be observed that the number of the stakeholders is re-
show any links with the datasets (see Fig. 10).
duced more quickly than the number of keywords. This is because one
However, even when the filters were applied, the graph is very in-
topic is of interest to many stakeholders.
tensive in the analysis, due in particular to the fragmentation of authors
The network resulting from the combination of the three datasets
and organizations. The majority of the groups were still detected in
shows that the number of nodes increases significantly the complexity
separate databases and the Mashup did not provide any additional in-
sight. On the other hand, the rendering process of the network graph
take important time which make more difficult the navigation and the
(footnote continued) extraction of insight when using the whole data sets.
clustered nodes. To tackle the problem of visualizing a large amount of data we used
69
Fig. 7. News articles 3D printing ecosystem.
the “maximum spanning tree” (MST) algorithm, which highlights the Fig. 9) can reduce backend data processing and the visualization
strongest edges and we focused only on concepts. In other words, the charge. Breaking down large datasets into smaller might be able to cope
MST algorithm prunes the non-relevant connection. Fig. 11 shows an with technological challenges and offer suitable solutions. In the next
example of the interconnection between the most important topics that section some examples are discussed where the application of filters
have been added into the Taxonomy (see section 3.2). From each node, provides insights.
it is also possible to analyze how the keyword evolved during the ten-
year period in comparison with others. MTS is very useful when
5. Discussion
working with large amount of concepts as it clusters into a hierarchical
relationships of words. However, MTS is not suitable for studying actors
The findings from this exploratory analysis show that the mashup of
and therefore a different method is needed to reduce complexity (see
scientific and technological data enriches the knowledge of a particular
Fig. 12).
technological domain. Specifically, each dataset produced events at
An alternative solution to working with large datasets is to apply
different times which can potentially show how a technology evolves
filters which reduce the number of nodes as well as the number of
over time. For instance, Air Products and Chemicals, Inc. identified how
edges. In our knowledge, the options can be focused on the selection of
each technology source contributes to the strength of the technology
(1) few metadata or categories, (2) the most important period of years
signals, when a possible new product could be launched by a firm
and (3) a threshold to the node size. In order to build the Mashup, we
[73].When data sources come into play at different times and if a
used a combination of these three filters as follows: first, we selected the
technology is detected at an early stage, an early warning sign can be
keywords category focused on title and abstract; secondly, we used the
provided which can also be used for assessing strengths and weaknesses
last three years of data available (2015–2017), which includes 80.69%
of competitors [80]. Additionally, the connections of multiple isolated
of all Patents (11,659 records), 76.85% of Papers (1999 records), and
events increase an understanding of the whole evolution of a particular
55.96% of News Articles (395 records); and finally, we applied a
technology, which may be possible to see by using a timeline or a
threshold of ≥1 node size. The final Mashup graph based initially on
historical map, as suggested by Bruce et al., [81]. This would be very
15,555 records extracted from the three databases resulted in a total of
difficult and extremely time-consuming if it is managed manually.
10,519 records. The resultant sub-ecosystem is shown in Fig. 9.
Combining three datasets proves to be synergetic and brings some
We consider therefore, that the most convenient solution when
benefits. Moreover, if three key filters like (1) Time line (2) Size of the
coping with large volume of data is re-arranging datasets and the vi-
nodes and (3) Size of the edges are applied in combination to analyze
sualization according to the questions that the analyst needs to answer.
the mashup dataset, then it is possible to extract additional insights.
To do so, dataset should be reduced first (backend), for instance looking
Hereunder, we will describe them starting from the most generic benefit
for the a specific group of metadata, and then visualize (frontend).
and finishing with the most specific, which requires the application of
Additionally, using one or more filters (see the examples provided in
filters:
70
Fig. 8. News articles 3D printing sub-ecosystems.
1. Frequency of nodes: The most frequent keywords and metadata in 3D printing. Additionally, Human and Cells are connected to Tissue.
the mashup are those that are more relevant or are leading. The layout is made by applying the “maximum spanning tree” al-
Identifying important themes helps gain an understanding of the gorithm, which helps focus on what are potentially the most re-
state of the art of a particular technology. Fig. 9 shows an example levant and discussed topics in the whole corpus when looking at the
of how a relevant author and/or inventor as well as a relevant topic different main branches as isolated clusters.
can be spotted in the network graph by using node filtering. Ad- 3. Gatekeepers. The detection of gatekeepers that are visible in all
ditionally, the combination of both stakeholders and themes can be datasets shows that there are very strong groups formed of co-au-
used to see who is working in what. On the other hand, the iden- thors or co-inventors, but in separate datasets (Figs. 6 and 8).
tification of underdeveloped themes may offer an opportunity for However, the connections that provide bridges between the dif-
some stakeholders. ferent scientific and technology datasets are less visible, as sug-
2. Interrelated words: The detection of multiple connected words gested by the findings of [32].
may show a cluster of concepts. Fig. 8 shows that Tissue en- 4. Early or late entrants: Early or late entrants or emerging and over-
gineering, Tissue bone and Surgery are strongly connected words in discussed themes can be detected by using the timeline available in
71
Fig. 9. Network of patents, papers and news articles.
each graph. The timeline helps detect new stakeholders that stand published per year (Figs. 3 and 4 for patents and papers respec-
out at the beginning or at the end of the period of time. tively). Similarly, other keywords can be observed, to be specific,
5. Trend detection: Detecting which topics are growing or declining those that are the most frequent or those that appear least in recent
over time may show emerging interest in the scientific arena. This or previous years.
analysis can be made by specifically looking at the keywords that
appear in the mashup ecosystem. For example, descriptive analysis In addition to the previous insights, a combination of different fil-
(Fig. 11) shows that the keyword “design” is not only the most ters can also be used to enrich the analysis. As there is no one size fits
frequent word in the corpus, but also the one that has grown most in all, it is necessary to select filters and datasets (as shown in Fig. 9) in
the last five years with respect to the overall number of documents relation to the insights users want to obtain from the data. Table 5
72
Fig. 10. Mashup ecosystem.
Fig. 11. News articles 3D printing ecosystem.
summarizes ten potential insights which can be obtained by combining priorities.

two filters together from a total of five available in the tool. Additionally, it is imperative that users understand how the visual
Highly interactive visualizations engage practitioners in an in- representation is made to avoid misrepresentation especially for con-
tensive user experience, not only because data can be explored visually cepts analysis and semantic visualizations. “While the existence of a
[52] or focused on a specific item [52], but also because filters and link indicates the existence of a relationship, viewing the sources allows
other mechanisms adapt the graph according to users’ needs and the user to gain an understanding of the relationship's semantics” [16].
73
Fig. 12. Mashup 3D Printing Sub-ecosystem filtered by Category, Years, Size of edges.
The procedure for determining the links and the weight between the does not form part of the work of those who simply need quick insights
patent did not use any complex calculation, such as artificial in- in order to take decisions. For instance, software such as Quid,8 which
telligence techniques, for example, as suggested by Chao-Chan and embeds sophisticated private algorithms, is less transparent about the
Ching-Bang [19]. We are convinced that when the user cannot under- processing of the data. In contrast, other tools like Leximancer [18]
stand the process in which the links are defined, it is more difficult to show the logic behind the data processing. Perhaps open sources can
interpret the final results. play an important role in facilitating analysis procedures and socia-
On the other hand, although academic papers have some similarities lizing them among users. The software Gephi,9 for example, is a tool
with patents [38], we find it very hard to combine both. The most that is mostly used by data science experts, rather than by general
challenging part is when data need a very important harmonization management.
procedure and/or the data have not been cleaned correctly by the Thirdly, working with large corpora of data does not guarantee
original source. Harmonization is a time-consuming process and needs gathering insights. Data and network visualizations should be adjusted
to be applied in each dataset. Surprisingly, to date this issue has yet to to the specific research question, for instance, by focusing on a parti-
be resolved. cular subject or by filtering nodes and relationships to obtain the most
significant data text. In addition, applying MST algorithms helps to
eliminate the less relevant relationships. These techniques prove ef-
6. Conclusion fective while human efforts seek to process a large number of visual
stimuli, especially in large network visualization.
This study uses a data-driven approach to show how different da- Fourthly, particular attention needs to be paid to taxonomy when
tasets can be combined and value extracted from them. Text analytics using text mining capacity. Very different results are achieved ac-
techniques are able to enrich the analysis of any datasets by providing cording to the key words used and how these are organized in the
compelling and interactive visualizations that aid summarization and taxonomy. Anecdotal observations show that two companies in the
show insights, but only when appropriate filters are applied. In con- same sector could conceive a taxonomy in different ways. On the other
trast, the visualization of large amounts of data turns them into noise. hand, a taxonomy may be enriched with new categories and/or terms
Although any potential integration of data is technically possible, the over time, in order to show the most up-to-date information possible, or
process must meet the needs of the user. Thus integration should be terms need to be translated to overcome language barriers when
user-driven, rather than technically driven. The implications of our working with multiple foreign databases. Additionally, utilizing a single
study include at least five points. taxonomy in the Mashup dataset improves the analysis not only of the
Firstly, visualization per se needs to be tailored to the user's needs. aggregated dataset, but also of each separate dataset. This is because
We consider that pre-settled filters should be prepared to address spe- the taxonomy is provided from a single tool, rather than from separate
cific user needs as well as answer key technology [9] or business or- databases or providers with different graphs and logics.
iented questions. Additionally, in order to reduce the amount of noise Fifthly, the mashup could be represented by the intersection of all
and focus on the most relevant information, filters should be used. the analyzed datasets (in our case “Papers ∩ Patent ∩ News” on a
According to Ellis and Dix [82], “most of the visualizations need to specific category), this being a subset of the total. If this approach were
adopt strategies for dealing with overcrowded displays”. Filters, how- applied to see what is in common across all datasets, it could drama-
ever, can be specific setting available in the tool and that user can use to tically reduce the workload and computational operation when datasets
adjust the visualization according to their needs (as described in are treated separately.
Table 5) or algorithms embed in the tool to produce readable visuali- Finally, although there is still work to be done in this area, we are
zation as suggested by Nick et al. (2013) for instance. confident that more scholars will devote attention to this issue and
Secondly, it is imperative to understand how tools interpret the produce norms, models and software to provide solutions. We consider
data. We strongly believe that this is key for any data visualization tool that future tools should incorporate mechanisms that facilitate
and specifically for users who are not specialists with this kind of tool. It
is not surprising that simple word counting has been widely accepted
and valued [78]. In contrast, complex algorithms may make the inter- 8
See: https://quid.com/.
9
pretation of the data harder, and they require experience and time. This https://gephi.org/.
74
A. Comai
Table 5
Insights from cross filtering.
Larger Node size Smaller Node size Larger Edge size Smaller Edge size Last in Time Early in time
It shows the most frequent It shows the less frequent nodes. It shows higher concurrence between It shows lesser concurrence between It produces a recent network It produces an early network
nodes. two nodes two nodes
It helps gain an understanding of It helps gain an understanding of It helps gain an understanding that at It helps gain an understanding that at It helps focus on the most up-to-date It helps focus on the historical
what the most shared themes are what the least shared themes are least two concepts or topics are least two concepts or topics are weakly network that can showcase new network which may show
and/or who the main leaders and/or who the least active strongly related or strong partnerships connected or sporadic partnerships are issues or stakeholders that have stakeholders and topics that
(stakeholders) are. (stakeholders) are. are built between two or more built between two or more emerged or become involved in a have disappeared over time.
stakeholders. stakeholders. new project.
Smaller *Filters do not apply.
Node
size
Larger It shows confirmed relationships It shows confirmed relationships
Edge between strong stakeholders between weak stakeholders and/
75
size and/or topics. or topics.
Smaller Not relevant. Not relevant *Filters do not apply.
Edge
size
Last in It shows confirmed stakeholders It shows new entrants It shows new or emergent strong It shows new or emergent weak
Time and topics. (stakeholders) or emergent concurrence between two or more concurrence between two or more
topics which may pose a threat. concepts or strong partnerships concepts or weak partnerships between
between two or more stakeholders are two or more stakeholders are built.
built.
Early in It shows important stakeholders It shows weak stakeholders and It shows past strong concurrence It helps seeing beyond weak *Filters do not apply.
time and topics at an early stage. topics which may disappear. between two or more concepts or past concurrence between two or more
Most infrequent themes. strong partnerships built between two concepts or beyond weak partnerships
or more stakeholders. built between two or more
stakeholders.
The symbol (*) in the cells shows that the combination of the two filters does not add any value.
World Patent Information 55 (2018) 61–77
procedures, highlights the key features to avoid “hairball” effects [83], taxonomies from text, Decis. Support Syst. 62 (2014) 78–93 https://doi.org/10.
and even facilitate human interaction, such as, for example, through 1016/j.dss.2014.03.006.
[21] T.R. Gruber, A translation approach to portable ontology specifications, Knowl.
voice recognition, as suggested by Srinivasan and Stasko [84], new Acquis. 5 (2) (1993) 199–220 https://doi.org/10.1006/knac.1993.1008.
visualization prototypes [85], or predefined settings as introduced in [22] S. Tuarob, P. Mitra, C. Lee Giles, Taxonomy-based query-dependent schemes for
Table 5, which accelerate and facilitate the interpretation of data, but profile similarity measurement, Conference Paper Number 8 Presented at the “1st
Joint International Workshop on Entity-oriented and Semantic Search, JIWES 2012
focus on answering the most difficult management questions. - Co-located with the 35th ACM SIGIR Conference”, Portland, OR, United States,
This study has focused on a specific technology using a limited August 2012, pp. 12–16.
number of keywords in the search query. Further work can be devoted [23] L.M. Garshol, Metadata? thesauri? taxonomies? topic maps! making sense of it all,
J. Inf. Sci. 20 (4) (2004) 378–391 https://doi.org/10.1177/0165551504045856.
to analyzing addictive manufacturing and/or the various applications [24] Z. Xie, K. Miyazaki, Evaluating the effectiveness of keyword search strategy for
separately and in several data sources, in order to understand the entire patent identification, World Patent Inf. 35 (1) (2013) 20–30 https://doi.org/10.
3D printing ecosystem. An additional limitation is related to the tax- 1016/j.wpi.2012.10.005.
[25] R.C. Basole, Visualizing ecosystems of hype, Proceedings of the 51st Hawaii
onomy, which is based on a predetermined list of keywords. The tax-
International Conference on System Sciences, 2018, pp. 4964–4973.
onomy we used was not intended to be exhaustive nor to satisfy specific [26] B.P. Abraham, S.D. Moitra, Innovation assessment through patent analysis,
interests. Thus, in order to conduct research into specific applications or Technovation 21 (4) (2001) 245–252 https://doi.org/10.1016/S0166-4972(00)
processes, several taxonomies can be personalized and applied. On the 00040-7.
[27] A.J. Nelson, Measuring knowledge spillovers: what patents, licenses and publica-
other hand, in order to explore new fields of application or new tech- tions reveal about innovation diffusion, Res. Pol. 38 (6) (2009) 994–1005 https://
nologies, it is possible to apply static methods to detect keywords that doi.org/10.1016/j.respol.2009.01.023.
were explicitly or implicitly ignored by humans. Public algorithms such [28] F. Narin, E. Noma, Is technology becoming science? Scientometrics 7 (3) (1985)
369–381 https://doi.org/10.1007/BF02017155.
as the “TF/iDF” (Term frequency/inverse document frequency) [19] or [29] H.R. Coward, J.J. Franklin, Identifying the science-technology interface: matching
the “Jaccard index”, as well as proprietary algorithms which use Nat- patent data to a bibliometric model, Sci. Technol. Hum. Val. 14 (1) (1989) 50–77
ural Language Processing (NLP) or statistical calculations like the one https://doi.org/10.1177/016224398901400106.
[30] E.C.M. Noyons, A.F.J. van Raan, H. Grupp, U. Schmoch, Exploring the science and
used by Randhawa et al. [18], can help to extract important keywords technology interface: inventor-author relations in laser medicine research, Res. Pol.
that could definitely enrich the analysis. 23 (4) (1994) 443–457 https://doi.org/10.1016/0048-7333(94)90007-8.
[31] D. Bonino, A. Ciaramella, F. Corno, Review of the state-of-the-art in patent in-
formation and forthcoming evolutions in intelligent patent informatics, World
References Patent Inf. 32 (1) (2010) 30–38 https://doi.org/10.1016/j.wpi.2009.05.008.
[32] S. Breschi, C. Catalini, Tracing the links between science and technology: an ex-
[1] A.J. Trippe, Patinformatics: tasks to tools, World Patent Inf. 25 (3) (2003) 211–221 ploratory analysis of scientists' and inventors' networks, Res. Pol. 39 (1) (2010)
https://doi.org/10.1016/S0172-2190(03)00079-6. 14–26 https://doi.org/10.1016/j.respol.2009.11.004.
[2] J.C. Vergara, A. Comai, J. Tena Millán, Evaluation of Software and Technological [33] F. Lissoni, Academic inventors as brokers, Res. Pol. 39 (7) (2010) 843–857 https://
Intelligence Needs, Emecom Ediciones S.L., Barcelona, 2006. doi.org/10.1016/j.respol.2010.04.005.
[3] Y.Y. Yang, L. Akers, C.B. Yang, T. Klose, S. Pavlek, Enhancing patent landscape [34] J. Howells, Intermediation and the role of intermediaries in innovation, Res. Pol. 35
analysis with visualization output, World Patent Inf. 32 (3) (2010) 203–220 https:// (5) (2006) 715–728 https://doi.org/10.1016/j.respol.2006.03.005.
doi.org/10.1016/j.wpi.2009.12.006. [35] H.M. Järvenpääa, S.J. Mäkinena, M. Seppänena, Patent and publishing activity
[4] O. Alexy, P. Criscuolo, A. Salter, Does IP strategy have to cripple open innovation? sequence over a technology's life cycle, Technol. Forecast. Soc. Change 78 (2)
MIT Sloan Manag. Rev. 51 (1) (2009) 71. (2011) 283–293 https://doi.org/10.1016/j.techfore.2010.06.020.
[5] T. Albert, Measuring Technology Maturity. Operationalizing Information from [36] F. Narin, Patent bibliometrics, Scientometrics 30 (1) (1994) 147–155 https://doi.
Patents, Scientific Publications, and the Web, Springer Gabler, 2016. org/10.1007/BF02017219.
[6] H. Chesbrough, The logic of open innovation: managing intellectual property, Calif. [37] F. Narin, D. Olivastro, Status report: linkage between technology and science, Res.
Manag. Rev. 45 (3) (2003) 33–58. Pol. 21 (3) (1992) 237–249 https://doi.org/10.1016/0048-7333(92)90018-Y.
[7] J. Yu, B. Benatallah, D. Casati, D. Florian, Understanding mashup development, [38] M. Thelwall, Bibliometrics to webometrics, J. Inf. Sci. 34 (4) (2008) 605–621
IEEE Internet Computing 12 (5) (2008) 44–52 https://doi.org/10.1109/MIC.2008. https://doi.org/10.1177/0165551507087238.
114. [39] H.D. White, K.W. McCain, Bibliometrics, in: M.E. Williams (Ed.), Annual Review of
[8] A. Abbas, L. Zhang, S.U. Khan, A literature review on the state-of-the-art in patent Information Science and Technology, vol. 24, Elsevier Science Publishers B.V.,
analysis, World Patent Inf. 37 (2014) 3–13 https://doi.org/10.1016/j.wpi.2013.12. 1989, pp. 119–186.
006. [40] S. Haustein, V. Larivière, The use of bibliometrics for assessing research: possibi-
[9] A.L. Porter, S.W. Cunningham, Generating and presenting innovation indicators, in: lities, limitations and adverse effects, in: I.M. Welpe, J. Wollersheim, S. Ringelhan,
A.L. Porter, S.W. Cunningham (Eds.), Tech Mining: Exploiting New Technologies for M. Osterloh (Eds.), Incentives and Performance, Springer International Publishing
Competitive Advantage, John Wiley & Sons Inc., Hoboken, 2005, pp. 249–288. Switzerland, 2015, pp. 121–139.
[10] S.M.R. Beheshti, B. Benatallah, S. Venugopal, et al., A systematic review and [41] R.J.W. Tussen, R.K. Buter, ThN. van Leeuwen, Technological relevance of science:
comparative analysis of cross-document coreference resolution methods and tools, an assessment of citation linkages between patents and research papers,
Computing 99 (4) (2017) 313–349 https://doi.org/10.1007/s00607-016-0490-0. Scientometrics 47 (2) (2000) 389–412 https://doi.org/10.1023/A:100560351.
[11] C. Sternitzke, A. Bartkowski, H. Schwanbeck, R. Schramm, Patent and literature [42] C. Sternitzke, A. Bartkowski, R. Schramma, Visualizing patent statistics by means of
statistics – the case of optoelectronics, World Patent Information, World Patent Inf. social network analysis tools, World Patent Inf. 30 (2) (2008) 115–131 https://doi.
29 (4) (2007) 327–338 https://doi.org/10.1016/j.wpi.2007.03.003. org/10.1016/j.wpi.2007.08.003.
[12] M. Fattori, G. Pedrazzi, R. Turra, Text mining applied to patent mapping: a practical [43] Next-generation Metrics: Responsible Metrics and Evaluation for Open Science,
business case, World Patent Inf. 25 (4) (2003) 335–342 https://doi.org/10.1016/ (2017) European Commission https://ec.europa.eu/research/openscience/pdf/
S0172-2190(03)00113-3. report.pdf , Accessed date: 21 March 2018.
[13] B. Yoon, Y. Park, A text-mining-based patent network: analytical tool for high- [44] D.F. Jerding, J.T. Stasko, The information mural: a technique for displaying and
technology trend, J. High Technol. Manag. Res. 15 (1) (2004) 37–50 https://doi. navigating large information spaces, IEEE Trans. Visual. Comput. Graph. 4 (3)
org/10.1016/j.hitech.2003.09.003. (1998) 257–271 https://doi.org/10.1109/2945.722299.
[14] Y. Tseng, C. Lin, Y. Lin, Text mining techniques for patent analysis Information, [45] H. Chen, R.H. Chiang, V.C. Storey, Business intelligence and analytics: from big data
Proc. Manag. 43 (5) (2007) 1216–1247 https://doi.org/10.1016/j.ipm.2006.11. to big impact, MIS Q. 36 (4) (2012) 1165–1188 https://doi.org/10.1145/2133806.
011. 2133826.
[15] S.-Y. Yang, V.-W. Soo, Extract conceptual graphs from plain texts in patent claims, [46] T.H. Davenport, Analytics 3.0, Harv. Bus. Rev. 91 (12) (2013) 64–72 https://hbr.
Eng. Appl. Artif. Intell. 25 (4) (2012) 874–887 https://doi.org/10.1016/j.engappai. org/2013/12/analytics-30.
2011.11.006. [47] H.A. van den Berg, Three shapes of organisational knowledge, J. Knowl. Manag. 17
[16] A. Kochtchi, T. von Landesberger, C. Biemann, Networks of names: visual ex- (2) (2013) 159–174 https://doi.org/10.1108/13673271311315141.
ploration and semi-automatic tagging of social networks from newspaper articles, [48] N.H. Lurie, C.H. Mason, Visual representation: implications for decision making, J.
Comput. Graph. Forum 33 (3) (2014) 211–220 https://doi.org/10.1111/cgf.12377. Market. 71 (1) (2007) 160–177 https://doi.org/10.1509/jmkg.71.1.160.
[17] Y.G. Kim, J.H. Suh, S.C. Park, Visualization of patent analysis for emerging tech- [49] D.A. Keim, F. Mansmann, J. Schneidewind, H. Ziegler, Challenges in Visual Data
nology, Expert Syst. Appl. 34 (3) (2008) 1804–1812 https://doi.org/10.1016/j. Analysis Proceedings of the International Conference on Information Visualisation,
eswa.2007.01.033. 2006, pp. 9–14. Article number 1648235.
[18] K. Randhawa, R. Wilden, J.S. Hohberger, A bibliometric review of open innovation: [50] J.-D. Fekete, C. Plaisant, Interactive information visualization of a million items,
setting a research agenda, J. Prod. Innovat. Manag. 33 (6) (2016) 750–772 https:// Proceedings - IEEE Symposium on Information Visualization Vol. 2002 INFO VIS,
doi.org/10.1111/jpim.12312. January, 2002, pp. 117–124. Article number 1173156.
[19] W. Chao-Chan, Y. Ching-Bang, Constructing an intelligent patent network analysis, [51] R.C. Basole, M.G. Russell, J. Huhtamäki, N. Rubens, K. Still, H. Park, Understanding
Data Sci. J. 11 (2012) 110–125 https://doi.org/10.2481/dsj.011-003. business ecosystem dynamics: a data-driven approach, ACM Trans. Manag. Inform.
[20] K. Meijer, F. Frasincar, F. Hogenboom, A semantic approach for extracting domain Syst. (TMIS). 6 (2) (2015) 1–32 https://doi.org/10.1145/2724730.
76
[52] D.A. Keim, Information visualization and visual data mining, IEEE Trans. Visual. system structures, IEEE Trans. Syst., Man Cybernetics 11 (2) (1981) 109–125
Comput. Graph. 8 (1) (2002) 1–8 https://doi.org/10.1109/2945.981847. https://doi.org/10.1109/TSMC.1981.4308636.
[53] J. Huhtamäki, M. Garrett Russell, K. Still, Processing data for visual network ana- [71] B. Xu, L. Yun, Technology resources distribution characteristics of 3D printing:
lytics: innovation ecosystem experiences, in: E. Bendoly, S. Clark (Eds.), Visual based on patent bibliometric analysis, Int. J. Technol. Transf. Commer. 14 (2)
Analytics for Management Translational Science and Applications in Practice, (2016) 171–195 https://doi.org/10.1504/IJTTC.2016.081646.
Taylor & Francis, 2016, pp. 56–71. [72] T. Rayna, L. Striukova, From rapid prototyping to home fabrication: how 3D
[54] M.E. Newman, The structure and function of complex networks, SIAM Rev. 45 (2) printing is changing business model innovation, Technol. Forecast. Soc. Change 102
(2003) 167–256 https://doi.org/10.1137/S003614450342480. (2015) 214–224 https://doi.org/10.1016/j.techfore.2015.07.023.
[55] P.L. Chang, C.C. Wu, H.J. Leu, Using patent analyses to monitor the technological [73] M.S. Brenner, Technology intelligence and technology scouting, Compet. Intell.
trends in an emerging field of technology: a case of carbon nanotube field emission Rev. 70 (3) (1996) 20–27 https://doi.org/10.1002/cir.3880070306.
display, Scientometrics 82 (1) (2000) 5–19 https://doi.org/10.1007/s11192-009- [74] S. Chakrabartia, M. van den Bergb, B. Domc, Focused crawling: a new approach to
0033-y. topic-specific Web resource discovery, Comput. Network. 31 (11–16) (1999)
[56] H. Dou, P. Clerc, Trends in 3-D printing from a patent information analysis (APA), 1623–1640 https://doi.org/10.1016/S1389-1286(99)00052-3.
Int. J. Technol. Intell. Plann. 10 (3/4) (2015) 354–372 https://doi.org/10.1504/ [75] D. Glez-Peña, A. Lourenço, H. López-Fernández, M. Reboiro-Jato, F. Fdez-Riverola,
IJTIP.2015.070854. Web scraping technologies in an API world, Briefings Bioinf. 15 (5) (2014) 788–797
[57] C. Sternitzke, Bergmann, Similarity measures for document mapping: a comparative https://doi.org/10.1093/bib/bbt026.
study on the level of an individual scientist, Scientometrics 78 (1) (2009) 113–130 [76] J.R. Ortt, D.J. Langley, N. Pals, Exploring the market for breakthrough technologies,
https://doi.org/10.1007/s11192-007-1961-z. Technol. Forecast. Soc. Change 74 (9) (2007) 1788–1804 https://doi.org/10.1016/
[58] S.T. Card, J.D. Mackinlay, B. Scheiderman, Readings in Information Visualization, j.techfore.2007.05.009.
Using Vision to Think Ist, Morgan Kaufmann, San Francisco, CA, 1999. [77] S. Sarawagi, Information extraction, Found. Trends Databases 1 (3) (2008) 261–377
[59] C. Wong, J. Thomas, Visual analytics, IEEE Comput. Graph. Appl. 24 (5) (2004) https://doi.org/10.1561/1900000003.
20–21 https://doi.org/10.1109/MCG.2004.39. [78] Z. Khan, T. Vorley, Big data text analytics: an enabler of knowledge management, J.
[60] J. Heer, B. Shneiderman, Interactive Dynamics for visual Analysis. A taxonomy of Knowl. Manag. 21 (1) (2017) 18–34 https://doi.org/10.1108/JKM-06-2015-0238.
tools that support the fluent and flexible use of visualizations, Commun. ACM 55 (4) [79] M.E. Newman, Scientific collaboration networks. I. Network construction and
(2012) 45–54 https://doi.org/10.1145/2133806.2133821. fundamental results, Phys. Rev. E 64 (2001) 1–8 https://doi.org/10.1103/
[61] Visualizing social networks, J. Soc. Struct. (2000), http://www.cmu.edu/joss/ PhysRevE.64.016131.
content/articles/volume1/Freeman.html , Accessed date: 21 March 2018. [80] F. Narin, E. Noma, Patents as indicators of corporate technological strength, Res.
[62] S.P. Borgatti, A. Mehra, D.J. Brass, G. Labianca, Network analysis in the social Pol. 16 (2–4) (1987) 143–155 https://doi.org/10.1016/0048-7333(87)90028-X.
sciences, Science 323 (5916) (2009) 892–895 https://doi.org/10.1126/science. [81] C.D. Bruce, B. Davis, N. Sinclair, et al., Understanding gaps in research networks:
1165821. using “spatial reasoning” as a window into the importance of networked educa-
[63] N. De Bellis, Bibliometrics and Citation Analysis: from the Science Citation Index to tional research, Educ. Stud. Math. 95 (2) (2017) 143–161 https://doi.org/10.1007/
Cybermetrics, Scarecrow Press Inc., Lanham, 2009. s10649-016-9743-2.
[64] P. Ritala, J. Hallikas, Network position of a firm and the tendency to collaborate [82] G. Ellis, A. Dix, A taxonomy of clutter reduction for information visualization, IEEE
with competitors—a structural embeddedness perspective, Int. J. Strategic Bus. Trans. Visual. Comput. Graph. 13 (6) (2007) 1216–1223 https://doi.org/10.1109/
Alliances (IJSBA) 2 (4) (2011) 307–328 https://doi.org/10.1504/IJSBA.2011. TVCG.2007.70535.
044859. [83] Graphs from the Internet: Analysis and Visualization of Large Dynamic Networks,
[65] T. Satomi, T. Ryoko, Generation of weak ties in a growing network: network ana- (2017) https://drive.google.com/file/d/0BxO7XNG6i53WVVVwRHpqM0ZNZFU/
lysis of co-inventor relationships, Int. J. Knowl. Learn. 5 (1) (2009) 26–36 https:// view , Accessed date: 12 August 2018.
doi.org/10.1504/IJKL.2009.024544. [84] A. Srinivasan, J. Stasko, Orko: facilitating multimodal interaction for visual network
[66] G.C. Kane, M. Alavi, G. Labianca, S.P. Borgatti, S.P, What's different about social exploration and analysis, InfoVis (2017).
media networks? A framework and research agenda, MIS Q. 38 (1) (2014) 275–304 [85] K. Bharat, Visualization of Large Diverse Collections of Scholarly Outputs, 2018,
https://doi.org/10.25300/MISQ/2014/38.1.13. Northern Illinois University, ProQuest Dissertations Publishing, 201810750921.
[67] D.L. Hansen, B. Shneiderman, M.A. Smith, Analyzing Social Media Networks with
NodeXL: Insights from a Connected World, Elsevier Inc., 2011. Alessandro Comai is a marketing associate professor at the Graduate School of
[68] C. Lee, B. Song, Y. Park, How to assess patent infringement risks: a semantic patent Management of the International University of Japan (IUJ). Previously joining IUJ he
claim analysis using dependency relationships, Technol. Anal. Strat. Manag. 25 (1) was, an associate professor at the University of Pompeu Fabra, (Barcelona, Spain) and a
(2013) 23–38 https://doi.org/10.1080/09537325.2012.748893. visiting professor at Tampere University of Technology (Tampere, Finland) were he re-
[69] Big data: The next frontier for innovation, competition, and productivity. McKinsey searched and taught business, marketing and technology intelligence (CI). He is currently
Global Institute. https://www.mckinsey.com/business-functions/digital-mckinsey/ doing research in Competitive Technology Intelligence, Open Innovation and data-text
our-insights/big-data-the-next-frontier-for-innovation 2011. (accessed 21 March Analytics. Alessandro has a Ph.D. in management science (ESADE), MBA and BSc
2018). (Honors) in Engineering.
[70] K. Sugiyama, S. Tagawa, M. Toda, Methods for visual understanding of hierarchical
77

Beyond Patent Analytics: Insights From A Scientific and Technological Data Mashup Based On A Case Example

Uploaded by

Copyright:

Available Formats

Beyond Patent Analytics: Insights From A Scientific and Technological Data Mashup Based On A Case Example

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Beyond Patent Analytics: Insights From A Scientific and Technological Data Mashup Based On A Case Example

Uploaded by

Copyright:

Available Formats

World Patent Information 55 (2018) 61–77

Contents lists available at ScienceDirect

World Patent Information

Beyond patent analytics: Insights from a scientiﬁc and technological data T

E-mail address: [email protected].

Title yes yes yes

• 2601 papers (scholarly articles) and conference proceedings were paper.

1 4973 1088 125 1 1047 954 95 1 729 142 91

Fig. 2. Tool for building Taxonomy.

Table 3 The taxonomy is constructed of a hierarchical list of keywords

Fig. 3. Patents 3D Printing Ecosystem (limited to ﬁrst 500 patents).

Fig. 4. Patents 3D Printing Sub-Ecosystems (limited to ﬁrst 500 patents).

Fig. 5. Papers 3D printing ecosystem.

Fig. 6. Papers 3D printing sub-ecosystems.

Fig. 7. News articles 3D printing ecosystem.

Fig. 8. News articles 3D printing sub-ecosystems.

Fig. 9. Network of patents, papers and news articles.

Fig. 10. Mashup ecosystem.

Fig. 11. News articles 3D printing ecosystem.

summarizes ten potential insights which can be obtained by combining priorities.

You might also like