An Adaptable Big Data Value Chain Framework for End-to-End Big Data Monetization
Abstract
:1. Introduction
- -
- Can the BDVC support data monetization?
- -
- How does one achieve end-to-end Big Data monetization?
- -
- What are the different models to ensure data monetization throughout the BDVC?
2. Research Methodology
- Research, identification, and selection of articles of primary study.
- Filter and evaluate the selected papers.
- Validate papers for data synthesis and analysis of finding.
- Google Scholar,
- ScienceDirect,
- Wiley,
- IEEE Xplore digital library,
- ACM (Association for Computing Machinery) digital library,
- Springer,
- Other diverse sources (Taylor and Francis, Emerald, Oxford, and Books).
3. Related Work and Background
3.1. Big Data Value Chain to Support Data-Driven Operations
3.2. Data Monetization: Definitions and Theoretical Foundations
4. Big Data Monetization Strategies and Business Models
- The strategy of retaining proprietary data, which can lead to various specific monetization strategies, such as data licensing.
- The trading data strategy with partners for shared benefits considers two monetization strategies: Trading data with business suppliers or downstream business partners.
- The strategy of selling data to many customers under several big data monetization strategies depends on the customer priority premium.
- The strategy to make data open and available to everyone would lead to monetization strategies such as “if you are not paying for it, you are the product.”
- Return of Advantage: Uses its internal performance data triangulated with external data to create a business advantage. Customer targeting, risk mitigation, and fraud detection are examples of this model.
- Premium Service: The data are processed, presented, and delivered to end-user consumption via an access interface offering data products.
- Differentiator: Consists of service or value to the customer with zero or negligible costs to build brand or customer loyalty or develop other services.
- Syndication: Refers to data delivered in a nonraw format to third-party entities.
- Data Users: Refer to businesses that use retrieved data from customers’ internet activities, product usage, behaviors, and preferences to develop strategies and enable data-driven decision-making.
- Data Providers: Refer to data brokerage firms engaged in primary and secondary data collection and sales activities.
- Data Aggregators: Provide customers with aggregated services and data, enabling them to produce a targeted advertising business model.
- Data Facilitators: Correspond to a technical platform based on tools collection, processing, storage, analysis, and data visualization. They will enable businesses to ensure more informed decisions.
5. Big Data Value Chain Framework
- Data Generation: Refers to data that are generated from various sources. It is classified by either the nature of data (structured, semi-structured, and unstructured data) or the source of data (IoT, social media, operational, and commercial data) [56,57,58]. Moreover, data might be generated by business, machines, or human processes [13]:
- -
- Machine-generated data, coming from equipment and connected objects.
- -
- Human-generated data, coming directly from people through forms, e-mails, research engines, and social media platforms.
- -
- Business-generated data, coming from giant business data aggregators, internal platforms such as data warehouses, government agencies, and public institutions.
- Data Acquisition: Refers to the way data can be received and collected. This phase consists of identifying the data flow mode when connecting to generation platforms. This data flow could be:
- -
- Batch loading mode can be performed on large datasets, grouped in a defined time interval. It is often used for data sources from legacy systems with proof processes or when data streams cannot be technically delivered.
- -
- Stream loading mode is continuous data inputs. It should perform in real-time or near real-time and has a faster loading rate than the incoming data rate.
- -
- Micro-batch loading mode allows the dividing of input flows into micro-lots. As a result, the data are obtained in near-real-time.
- -
- Data identification: Refers to determining the content that should be considered.
- -
- Data collected: Refers to identifying preliminary data structures that should be followed to suit the data management strategy.
- -
- Data transfer: Refers to transferring the collected raw data to a specific data storage infrastructure. Most of the time, it is a Data Lake [59].
- Data Pre-processing: The collected data from several heterogeneous sources contain a lot of noise, redundancy, and anomalies, which increases storage space by retaining unneeded data that could affect the data management workflow. Besides, analytical methods require a certain level of data quality [3,60]. For this, data preprocessing is a crucial step to ensure efficient data processing. This phase includes the following sub-phases:
- -
- Filtration: Refers to eliminating data considered as corrupt according to the organization’s data strategy requirements. Several techniques could be applied (e.g., filtration of URLs from web data, low-memory pre-filtration of data streams).
- -
- Extraction: Refers to reworking incompatible data, often specifically grouped or compressed. This sub-phase allows transforming disparate data into supported formats [61].
- -
- Transformation: Refers to modifying, adapting, and packaging data into appropriate forms and the scaling standardization of attributes to improve data analytics processes [13].
- -
- Validation: Refers to establishing validation and deletion rules to manage the syntactic and semantic structures of data and remove invalid and unknown data [62].
- -
- Cleaning: Refers to identifying and processing incomplete, inaccurate, and unreasonable data to remove or complete it.
- -
- -
- -
- -
- Aggregation: Refers to treating together datasets’ content belonging to the same field. This aggregation enables us to deal with voluminous data by combining similar and correlated data and eliminating redundancy to produce a sizeable unified view [62].
- -
- Denormalization: Refers to the data modeling process that involves collecting information from multiple tables to form a larger one. It allows optimizing queries’ performance and making data-oriented applications [68].
- Data Storage: refers to storing a massive amount of collected and preprocessed data. Storage systems strategies have a significant impact on the scalability and performance of BDVC in terms of data access and exposition. It is based on several aspects, namely:
- -
- Storage models: Developed mainly around three storage models: Block, File, and Object.
- -
- Data models: Often follow NoSQL topologies such as key-value, column-oriented, graph-oriented, or document-oriented. This NoSQL view is reasonable for efficient storage, leading to effective processing and, above all, native exposition capabilities.
- -
- Distributed storage systems: Operate as CA systems (consistent and highly available), CP systems (compatible and partition-tolerant), or AP systems (highly available and partition-tolerant).
- Data Analysis: Refers to manipulating massive data to identify patterns, find correlations, and discover new emerging knowledge models. This phase mainly relies on dedicated Big Data analytics capabilities categorized as descriptive, diagnostic, predictive, or prescriptive [4,57,62,69].
- -
- Descriptive analysis refers to the description and synthesis of knowledge models using statistical methods that describe a situation, such as standard reports, dashboards, and detailed analysis.
- -
- The diagnostic analysis identifies causes leading to better performance by reviewing past performances.
- -
- Predictive analysis refers to the prediction probabilities employed to define future trends. It uses supervised, unsupervised, and semi-supervised learning models to provide predictive analytical models.
- -
- The prescriptive analysis is applied to predict future events and drive proactive decisions outside human interaction bounds.
- -
- Machine learning (ML) belongs to the scope of artificial intelligence (AI). It relies on analytical methods to create predictive models. ML models could be either supervised, semi-supervised, or unsupervised. The most common ML techniques are classification, association analysis, regression, graph analysis, clustering, and decision tree [70,71,72].
- -
- Deep learning is a set of methods allowing us to create computational models based on nonlinear processing and hierarchical representations. It is used to build classification patterns and learn feature representations from multiple layers of abstraction.
- Data Visualization: Refers to illustrating data relationships with an artistic visual representation such as graphs, maps, data grids, and alerts, which help rapid and efficient decision-making [73,74,75]. Big Data Visualization uses suitable tools with extended capabilities that allow business users to find new trends or discover answers to questions not formulated [62].
- Data Exposition: Refers to making data available for consumption. This exposition consists of setting up many APIs (application programming interfaces), respecting security and confidentiality policies, and allowing the access to data in different states: Analyzed, preprocessed, transformed, or even as raw as collected. The data exposition generally serves many internal applications, such as CRM (Customer Relationship Management), to promote specific products, but it could be extended to serve partners as well.
6. Big Data Monetization through the BDVC Framework
6.1. Reduced Big Data Monetization Model
- Data could be either raw or specially prepared to serve for extensive analysis processes at the storage phase. This kind of sharing is often useful for the data scientist to run their models or entities with their analysis platforms. The challenge here is to find a balance between confidentiality and exposure. Therefore, an efficient sharing strategy must be implemented by employing suitable tools to control this exposure up to the most exquisite detail.
- At the visualization phase, insights are shared as final components. This kind of sharing, which consists of dashboards, maps, or just some text reflecting some strategic information, is often useful for entities that do not prefer to deal with BDVC phases and prefer to rely on ready-for-use insights.
6.2. Full Big Data Monetization Model
- Data lake storage: Contains the data as it is collected. As many use cases require one to rely on the original data to go over their BDVC, this storage allows the sharing of the raw data gathered from several operating systems.
- Data preprocessing storage: Contains the filtered, transformed, extracted, validated, cleaned, merged, and reduced data. This kind of sharing allows many entities to gain more time by relying on data respecting some maturity. This phase enables their processes to go faster in implementing their Big Data use cases.
- Data analysis storage: Contains the outputs of the analysis models and programs. This kind of sharing is more useful for either use cases that aim at building their visualization components or as input to other BDVCs seeking extensive analysis targets.
6.3. Value Co-Creation via the BDVC and Cloud Computing
7. Simulation and Evaluation
7.1. Use Case Description and Scenario
7.2. Adopted Platform and Tools
- 1.
- Data Acquisition
- Spark-Streaming: A component of the Spark core. It enables high-speed and scalable processing of real-time data from a variety of sources with fault tolerance. Once processed, data are delivered in real-time to databases, file systems, or dashboards. Spark Streaming works internally by dividing input data streams into batches and processes them through the powerful Spark engine [86].
- Kafka: An open-source distributed event messaging platform. It offers high-performance data pipelines, stream analysis, and integration of complex data and applications. It provides three main capabilities to publish and distribute event streams to and from other systems, store event streams sustainably and reliably, and process event streams on-the-fly or retrospectively. Kafka also provides high scalability and message consistency. Kafka is also a message broker that provides a data flow pipeline to Spark-Streaming to be divided into micro-batches for processing [87,88].
- NiFi: An open-source dataflow management system that integrates data streaming and simple event processing. It allows us to automatically inject data streams between different source systems and other systems. Fault-tolerant and scalable, NiFi ensures the entire data stream and authentication and access authorization security via Kerberos [89,90].
- 2.
- Data Pre-processing
- Hive: A data platform that exploits Hadoop’s capacities to offer reasonable possibilities to handle massive preprocessed and post-processed datasets. It is based on its HQL (Hive query language) similar to standard SQL statements for data query and analysis. It is used to summarize Big Data and facilitate querying and data aggregation. It is also considered a language for real-time queries and row-level updates [91].
- Spark: A parallel and unified analysis engine for large-scale data processing, known for its speed, ease of use, and versatility. It provides high-level APIs through several languages such as Java, Scala, and Python. It consists of several components such as Spark SQL to preprocess and process structured data, Structured Streaming [92].
- 3.
- Data Storage
- HDFS (Hadoop distributed file system): A distributed file system with a high level of fault tolerance that stores files as a replicated series of blocks. It is one of the core components of the Apache Hadoop framework. It provides high-speed data access and is suitable for Big Data applications based on distributed processing [93].
- Hbase: Based on the concept and features of Google BigTable as a nonrelational structure (NoSQL). It relies on a family column-oriented concept with key-value-pair data stores [94].
- Hive (warehouse): Besides its processing capability, it is a data warehouse that allows reading, writing, and managing large dataset files stored in HDFS. Hive tables are similar to those in a relational database, are organized from largest to most granular, and are queried using HQL. Simultaneously, storage is more scalable than a relational database, and schema reading is faster. It supports many forms, such as Avro, orc, and parquet [91].
- 4.
- Data Analysis
- Spark: In addition to what was presented, Spark allows the application of machine learning analysis through Spark MLib and GraphX. The API MLlib provides several functionalities for learning, underlying statistics, optimization, and linear algebra. It supports multiple languages and harnesses the rich Spark ecosystem, and feeds the ML pipeline from end to end. In addition, the GraphX component for Spark is dedicated to model and graph processing [92].
- 5.
- Data Visualization
- Zeppelin: A tool that provides a web interface as a notebook form to analyze and display visually and interactively a large volume of processed data. It is coupled with various software components, such as Spark. It is based on a set of plugins that make it more flexible [95].
- Superset is a fast and intuitive tool. It allows for simple exploitation and ready-to-use data visualizations by creating shared dashboards. It supports linear graphs and very detailed geospatial maps. It also ensures integration with most SQL-speaking RDBMS (Relational DataBase Management System) [96].
- 6.
- Exchange/Data Monetization
- Kafka: Previously presented above, it remains the centralized standard platform for data exchange pipelines for data consumption and data production.
- APIzation: Refers to using third-party services and data access interfaces to allow external/internal applications to connect to a resource application to exchange data and outsource services [97].
7.3. Simulation and Results
- Data quality dimensions: To assess the impact on data handling and to measure data relevance and maturity, i.e., credibility, consistency, time penalty, accuracy, and reliability.
- Computational capabilities: To calculate the speed of processing and responses to different tasks.
- Generated insights: To verify the two processes’ ability to deliver value, knowledge, and wisdom.
- Data monetization and exchange: To check the ability of both processes to monetize valuable data.
8. Conclusions and Research Outlooks
Author Contributions
Funding
Conflicts of Interest
References
- IDC’s 2016 Global IoT Decision Maker Survey Finds Organizations Moving Past Pilot Projects and Toward Scalable Deployments. Available online: https://www.businesswire.com/news/home/20160921005122/en/IDCs-2016-Global-IoT-Decision-Maker-Survey (accessed on 14 September 2020).
- More Than 30 Billion Devices Will Wirelessly Connect to the Internet of Everything in 2020. Available online: https://www.abiresearch.com/press/more-than-30-billion-devices-will-wirelessly-conne/ (accessed on 28 April 2019).
- Curry, E. The Big Data Value Chain: Definitions, Concepts, and Theoretical Approaches. In New Horizons for a Data-Driven Economy; Cavanillas, J.M., Curry, E., Wahlster, W., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 29–37. ISBN 978-3-319-21568-6. [Google Scholar]
- Alaoui, I.E.; Gahi, Y.; Messoussi, R. Full Consideration of Big Data Characteristics in Sentiment Analysis Context. In Proceedings of the 2019 IEEE 4th International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), Chengdu, China, 12–15 April 2019; pp. 126–130. [Google Scholar]
- Moro Visconti, R.; Morea, D. Big Data for the Sustainability of Healthcare Project Financing. Sustainability 2019, 11, 3748. [Google Scholar] [CrossRef] [Green Version]
- Forgó, N.; Hänold, S.; Schütze, B. The Principle of Purpose Limitation and Big Data. In New Technology, Big Data and the Law; Corrales, M., Fenwick, M., Forgó, N., Eds.; Perspectives in Law, Business and Innovation; Springer: Singapore, 2017; pp. 17–42. ISBN 978-981-10-5037-4. [Google Scholar]
- Yang, C.; Huang, Q.; Li, Z.; Liu, K.; Hu, F. Big Data and cloud computing: Innovation opportunities and challenges. Int. J. Digit. Earth 2017, 10, 13–53. [Google Scholar] [CrossRef] [Green Version]
- Liu, J.; Li, J.; Li, W.; Wu, J. Rethinking big data: A review on the data quality and usage issues. ISPRS J. Photogramm. Remote Sens. 2016, 115, 134–142. [Google Scholar] [CrossRef]
- Vahi, K.; Rynge, M.; Juve, G.; Mayani, R.; Deelman, E. Rethinking data management for big data scientific workflows. In Proceedings of the 2013 IEEE International Conference on Big Data, Silicon Valley, CA, USA, 6–9 October 2013; pp. 27–35. [Google Scholar]
- Miller, H.G.; Mork, P. From Data to Decisions: A Value Chain for Big Data. IT Prof. 2013, 15, 57–59. [Google Scholar] [CrossRef]
- Faroukhi, A.Z.; El Alaoui, I.; Gahi, Y.; Amine, A. Big data monetization throughout Big Data Value Chain: A comprehensive review. J. Big Data 2020, 7. [Google Scholar] [CrossRef]
- Moro Visconti, R.; Larocca, A.; Marconi, M. Big Data-Driven Value Chains and Digital Platforms: From Value Co-Creation to Monetization. SSRN Electron. J. 2017. [Google Scholar] [CrossRef]
- Saggi, M.K.; Jain, S. A survey towards an integration of big data analytics to big insights for value-creation. Inf. Process. Manag. 2018, 54, 758–790. [Google Scholar] [CrossRef]
- Big Data Led Big Monetization—ProQuest. Available online: https://search.proquest.com/openview/7cc2f1e5ca16b5f000da83e0e96eeb2d/1?pq-origsite=gscholar&cbl=936333 (accessed on 1 September 2020).
- Tranfield, D.; Denyer, D.; Smart, P. Towards a Methodology for Developing Evidence-Informed Management Knowledge by Means of Systematic Review. Br. J. Manag. 2003, 14, 207–222. [Google Scholar] [CrossRef]
- Porter, M.E. Clusters and the new economics of competition. Harv. Bus. Rev. 1998, 76, 77–90. [Google Scholar]
- Micek, G. Competition, Competitive Advantage and Clusters: The Ideas of Michael Porter—Edited by Robert Huggins & Hiro Izushi: BOOK REVIEWS. Tijdschr. Voor Econ. En Soc. Geogr. 2012, 103, 250–252. [Google Scholar] [CrossRef]
- Holsapple, C.W.; Singh, M. The knowledge chain model: Activities for competitiveness. Expert Syst. Appl. 2001, 20, 77–98. [Google Scholar] [CrossRef]
- Carlucci, D.; Schiuma, G. Knowledge asset value spiral: Linking knowledge assets to company’s performance. Knowl. Process Manag. 2006, 13, 35–46. [Google Scholar] [CrossRef]
- Chyi Lee, C.; Yang, J. Knowledge value chain. J. Manag. Dev. 2000, 19, 783–794. [Google Scholar] [CrossRef]
- Pil, F.K.; Holweg, M. Evolving from Value Chain to Value Grid. Available online: https://www.researchgate.net/publication/285703652_Evolving_from_value_chain_to_value_grid (accessed on 12 February 2019).
- Hahn, I.; Kodó, K. Literature Review of the Value Grid Model. Open Access DiVA 2017, 11. Available online: http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-33421 (accessed on 22 November 2020).
- Latif, A.; Saeed, A.U.; Hoefler, P.; Stocker, A.; Wagner, C. The Linked Data Value Chain: A Lightweight Model for Business Engineers. In Proceedings of the I-KNOW ’09 and I-SEMANTICS ’09, Graz, Austria, 2–4 September 2009; pp. 568–575. [Google Scholar]
- Peppard, J.; Rylander, A. From Value Chain to Value Network. Eur. Manag. J. 2006, 24, 128–141. [Google Scholar] [CrossRef] [Green Version]
- Attard, J.; Orlandi, F.; Auer, S. Data Value Networks: Enabling a New Data Ecosystem. In Proceedings of the 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), Omaha, NE, USA, 13–16 October 2016; pp. 453–456. [Google Scholar]
- Bhatt, G.D.; Emdad, A.F. An analysis of the virtual value chain in electronic commerce. Logist. Inf. Manag. 2001, 14, 78–85. [Google Scholar] [CrossRef]
- Kasim, H.; Hung, T.; Li, X. Data Value Chain as a Service Framework: For Enabling Data Handling, Data Security and Data Analysis in the Cloud. In Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, Singapore, 17–19 December 2012; pp. 804–809. [Google Scholar]
- Han, H.; Yonggang, W.; Tat-Seng, C.; Xuelong, L. Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. IEEE Access 2014, 2, 652–687. [Google Scholar] [CrossRef]
- ur Rehman, M.H.; Chang, V.; Batool, A.; Wah, T.Y. Big data reduction framework for value creation in sustainable enterprises. Int. J. Inf. Manag. 2016, 36, 917–928. [Google Scholar] [CrossRef] [Green Version]
- Rajpurohit, A. Big data for business managers—Bridging the gap between potential and value. In Proceedings of the 2013 IEEE International Conference on Big Data, Silicon Valley, CA, USA, 6–9 October 2013; pp. 29–31. [Google Scholar]
- Petrova-Antonova, D.; Georgieva, O.; Ilieva, S. Modelling of Educational Data Following Big Data Value Chain. In Proceedings of the 18th International Conference on Computer Systems and Technologies—CompSysTech’17, Ruse, Bulgaria, 23–24 June 2017; pp. 88–95. [Google Scholar]
- Munshi, A.A.; Mohamed, Y.A.-R.I. Big data framework for analytics in smart grids. Electr. Power Syst. Res. 2017, 151, 369–380. [Google Scholar] [CrossRef]
- Daki, H.; El Hannani, A.; Aqqal, A.; Haidine, A.; Dahbi, A. Big Data management in smart grid: Concepts, requirements and implementation. J. Big Data 2017, 4. [Google Scholar] [CrossRef] [Green Version]
- Korpela, K.; Hallikas, J.; Dahlberg, T. Digital Supply Chain Transformation toward Blockchain Integration. In Proceedings of the 50th Hawaii International Conference on System Sciences, Hilton Waikoloa Village, HI, USA, 4–7 January 2017. [Google Scholar]
- News Room, TM Forum Gartner: Companies Missing Out on OPPORTUNities to Monetize Data. Available online: https://inform.tmforum.org/news/2015/10/gartner-companies-missing-out-on-opportunities-to-monetize-data/ (accessed on 19 November 2020).
- Data Monetization—Gartner IT Glossary. Available online: https://www.gartner.com/it-glossary/data-monetization (accessed on 28 April 2019).
- Moore, S. How to Monetize Your Customer Data. Available online: //www.gartner.com/smarterwithgartner/how-to-monetize-your-customer-data/ (accessed on 7 September 2020).
- Cashing In on Your Data. Available online: https://cisr.mit.edu/publication/2014_0801_DataMonetization_Wixom (accessed on 31 August 2020).
- Wixom, B.H.; Ross, J.W. How to Monetize Your Data. MIT Sloan Management Review. Available online: https://sloanreview.mit.edu/article/how-to-monetize-your-data/ (accessed on 19 November 2020).
- Data Monetization Strategies. How to Make Money or Save Money With Data and Analytics. Available online: https://www.irmconnects.com/white-papers/data-monetization-strategies-how-to-make-money-and-save-money-with-data-and-analytics/ (accessed on 19 November 2020).
- Liu, C.-H.; Chen, C.-L. A review of data monetization: Strategic use of big data. In Proceedings of the Fifteenth International Conference on Electronic Business (ICEB 2015), Hong Kong, China, 6–10 December 2015; p. 7. [Google Scholar]
- Opher, A.; Chou, A.; Onda, A. The Rise of the Data Economy: Driving Value through Internet of Things Data Monetization. IBM Glob. Serv. 2016. Available online: https://assets.toolbox.com/research/the-rise-of-the-data-economy-driving-value-through-internet-of-things-data-monetization-42098 (accessed on 22 November 2020).
- Najjar, M.S.; Kettinger, W.J. Data Monetization: Lessons from a Retailer’s Journey. MIS Q. Exec. 2013, 12, 14. [Google Scholar]
- Gomez-Arias, J.T.; Genin, L. Beyond monetization: Creating value through online social networks. Int. J. Electron. Bus. Manag. 2009, 7, 79–85. [Google Scholar]
- Nagarajan, M.; Baid, K.; Sheth, A.; Wang, S. Monetizing User Activity on Social Networks - Challenges and Experiences. In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, Milan, Italy, 15–18 September 2009; pp. 92–99. [Google Scholar]
- Bataineh, A.S.; Mizouni, R.; Barachi, M.E.; Bentahar, J. Monetizing Personal Data: A Two-Sided Market Approach. Procedia Comput. Sci. 2016, 83, 472–479. [Google Scholar] [CrossRef] [Green Version]
- Elragal, A.; Klischewski, R. Theory-driven or process-driven prediction? Epistemological challenges of big data analytics. J. Big Data 2017, 4. [Google Scholar] [CrossRef] [Green Version]
- Hanafizadeh, P.; Harati Nik, M.R. Configuration of Data Monetization: A Review of Literature with Thematic Analysis. Glob. J. Flex. Syst. Manag. 2020, 21, 17–34. [Google Scholar] [CrossRef]
- Franzetti, A. Data Monetization in the Big Data Era: Evidence from the Italian Market. Available online: https://www.academia.edu/34951960/Data_monetization_in_the_big_data_era_evidence_from_the_Italian_market/ (accessed on 19 November 2020).
- Walker, R. From Big Data to big Profits: Success with Data and Analytics; Oxford University Press: New York, NY, USA, 2015; ISBN 978-0-19-937832-6. [Google Scholar]
- Berman, S.J. Digital transformation: Opportunities to create new business models. Strategy Leadersh. 2012, 40, 16–24. [Google Scholar] [CrossRef]
- Wells, A.R.; Chiang, K. Monetizing Your Data: A Guide to Turning Data into Profit-Driving Strategies and Solutions; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2017; ISBN 978-1-119-35627-1. [Google Scholar]
- KPMG Framing a Winning Data Monetization Strategy. Available online: https://home.kpmg/mu/en/home/insights/2015/10/framing-a-winning-data.html/ (accessed on 19 November 2020).
- Schroeder, R. Big data business models: Challenges and opportunities. Cogent Soc. Sci. 2016, 2. [Google Scholar] [CrossRef]
- The Data Monetization | Big Data Business Models. Available online: https://www.feedough.com/the-data-monetization-big-data-business-models/ (accessed on 13 February 2019).
- Rathore, M.M.; Paul, A.; Hong, W.-H.; Seo, H.; Awan, I.; Saeed, S. Exploiting IoT and big data analytics: Defining Smart Digital City using real-time urban data. Sustain. Cities Soc. 2018, 40, 600–610. [Google Scholar] [CrossRef]
- Kibria, M.G.; Nguyen, K.; Villardi, G.P.; Zhao, O.; Ishizu, K.; Kojima, F. Big Data Analytics, Machine Learning, and Artificial Intelligence in Next-Generation Wireless Networks. IEEE Access 2018, 6, 32328–32338. [Google Scholar] [CrossRef]
- Ge, M.; Bangui, H.; Buhnova, B. Big Data for Internet of Things: A Survey. Future Gener. Comput. Syst. 2018, 87, 601–614. [Google Scholar] [CrossRef]
- Mehmood, H.; Gilman, E.; Cortes, M.; Kostakos, P.; Byrne, A.; Valta, K.; Tekes, S.; Riekki, J. Implementing Big Data Lake for Heterogeneous Data Sources. In Proceedings of the 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW), Macao, China, 8–12 April 2019; pp. 37–44. [Google Scholar]
- Azeroual, O. Treatment of Bad Big Data in Research Data Management (RDM) Systems. Big Data Cogn. Comput. 2020, 4, 29. [Google Scholar] [CrossRef]
- Grzegorowski, M.; Stawicki, S. Window-Based Feature Extraction Framework for Multi-Sensor Data: A Posture Recognition Case Study. In Proceedings of the 2015 Federated Conference on Computer Science and Information Systems (FedCSIS), Lodz, Poland, 13–16 September 2015; pp. 397–405. [Google Scholar]
- Erl, T.; Khattak, W.; Buhler, P. Big data fundamentals: Concepts, drivers & techniques. In The Prentice Hall Service Technology Series from Thomas Erl, 1st ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2016; ISBN 978-0-13-429121-5. [Google Scholar]
- Yan, Z.; Liu, J.; Yang, L.T.; Chawla, N. Big data fusion in Internet of Things. Inf. Fusion 2018, 40, 32–33. [Google Scholar] [CrossRef]
- Chen, M.; Mao, S.; Liu, Y. Big Data: A Survey. Mob. Netw. Appl. 2014, 19, 171–209. [Google Scholar] [CrossRef]
- Siddiqa, A.; Hashem, I.A.T.; Yaqoob, I.; Marjani, M.; Shamshirband, S.; Gani, A.; Nasaruddin, F. A survey of big data management: Taxonomy and state-of-the-art. J. Netw. Comput. Appl. 2016, 71, 151–166. [Google Scholar] [CrossRef]
- Trovati, M.; Bessis, N. An influence assessment method based on co-occurrence for topologically reduced big data sets. Soft Comput. 2016, 20, 2021–2030. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Cheung, Y.-M. Discretizing Numerical Attributes in Decision Tree for Big Data Analysis. In Proceedings of the 2014 IEEE International Conference on Data Mining Workshop, Shenzhen, China, 14 December 2014; pp. 1150–1157. [Google Scholar]
- Rodrigues, R.A.; Filho, L.A.L.; Gonçalves, G.S.; Mialaret, L.F.S.; da Cunha, A.M.; Dias, L.A.V. Integrating NoSQL, Relational Database, and the Hadoop Ecosystem in an Interdisciplinary Project involving Big Data and Credit Card Transactions. In Information Technology—New Generations; Latifi, S., Ed.; Springer International Publishing: Cham, Switzerland, 2018; Volume 558, pp. 443–451. ISBN 978-3-319-54977-4. [Google Scholar]
- Hassani, H.; Beneki, C.; Unger, S.; Mazinani, M.T.; Yeganegi, M.R. Text Mining in Big Data Analytics. Big Data Cogn. Comput. 2020, 4, 1. [Google Scholar] [CrossRef] [Green Version]
- Qiu, J.; Wu, Q.; Ding, G.; Xu, Y.; Feng, S. A survey of machine learning for big data processing. EURASIP J. Adv. Signal Process. 2016, 2016. [Google Scholar] [CrossRef] [Green Version]
- Zhou, L.; Pan, S.; Wang, J.; Vasilakos, A.V. Machine learning on big data: Opportunities and challenges. Neurocomputing 2017, 237, 350–361. [Google Scholar] [CrossRef] [Green Version]
- Alfrjani, R.; Osman, T.; Cosma, G. A Hybrid Semantic Knowledgebase-Machine Learning Approach for Opinion Mining. Data Knowl. Eng. 2019, 121, 88–108. [Google Scholar] [CrossRef]
- Becker, T. Big Data Usage. In New Horizons for a Data-Driven Economy; Cavanillas, J.M., Curry, E., Wahlster, W., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 143–165. ISBN 978-3-319-21568-6. [Google Scholar]
- Hirve, S.; Pradeep Reddy, C.H. A Survey on Visualization Techniques Used for Big Data Analytics. In Advances in Computer Communication and Computational Sciences; Bhatia, S.K., Tiwari, S., Mishra, K.K., Trivedi, M.C., Eds.; Springer: Singapore, 2019; Volume 924, pp. 447–459. ISBN 9789811368608. [Google Scholar]
- Miu, M.; Zhang, X.; Dewan, M.; Wang, J. Development of Framework for Aggregation and Visualization of Three-Dimensional (3D) Spatial Data. Big Data Cogn. Comput. 2018, 2, 9. [Google Scholar] [CrossRef] [Green Version]
- Rowley, J. The wisdom hierarchy: Representations of the DIKW hierarchy. J. Inf. Sci. 2007, 33, 163–180. [Google Scholar] [CrossRef] [Green Version]
- Sharma, S. Expanded cloud plumes hiding Big Data ecosystem. Future Gener. Comput. Syst. 2016, 59, 63–92. [Google Scholar] [CrossRef]
- Adner, R. Match your innovation strategy to your innovation ecosystem. Harv. Bus. Rev. 2006, 84, 98. [Google Scholar] [PubMed]
- Asaithambi, S.P.R.; Venkatraman, R.; Venkatraman, S. MOBDA: Microservice-Oriented Big Data Architecture for Smart City Transport Systems. Big Data Cogn. Comput. 2020, 4, 17. [Google Scholar] [CrossRef]
- Xie, J.; Yu, F.R.; Huang, T.; Xie, R.; Liu, J.; Liu, Y. A Survey on the Scalability of Blockchain Systems. IEEE Netw. 2019, 33, 166–173. [Google Scholar] [CrossRef]
- Atlam, H.F.; Azad, M.A.; Alzahrani, A.G.; Wills, G. A Review of Blockchain in Internet of Things and AI. Big Data Cogn. Comput. 2020, 4, 28. [Google Scholar] [CrossRef]
- Xie, S.; Zheng, Z.; Chen, W.; Wu, J.; Dai, H.-N.; Imran, M. Blockchain for cloud exchange: A survey. Comput. Electr. Eng. 2020, 81, 106526. [Google Scholar] [CrossRef]
- Moro Visconti, R. Blockchain Valuation: Internet of Value and Smart Transactions. In The Valuation of Digital Intangibles; Springer International Publishing: Cham, Switzerland, 2020; pp. 401–422. ISBN 978-3-030-36917-0. [Google Scholar]
- Moro Visconti, R. Big Data Valuation. In The Valuation of Digital Intangibles; Springer International Publishing: Cham, Switzerland, 2020; pp. 345–360. ISBN 978-3-030-36917-0. [Google Scholar]
- Cloudera. Getting Started with HDP Sandbox: Loading Sensor Data into HDFS. Available online: https://www.cloudera.com/content/dam/www/marketing/tutorials/getting-started-with-hdp-sandbox/assets/datasets/Geolocation.zip (accessed on 12 November 2020).
- Spark Streaming—Spark 3.0.1 Documentation. Available online: https://spark.apache.org/docs/latest/streaming-programming-guide.html (accessed on 12 November 2020).
- Apache Kafka. Available online: https://kafka.apache.org/ (accessed on 12 November 2020).
- Gurcan, F.; Berigel, M. Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges. In Proceedings of the 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey, 19–21 October 2018; pp. 1–6. [Google Scholar]
- Apache NiFi Overview. Available online: https://nifi.apache.org/docs/nifi-docs/html/overview.html (accessed on 12 November 2020).
- Cloudera. ApacheNiFi: A real-time integrated data logistics and simple event processing platform. Available online: https://www.cloudera.com/content/www/en-us/products/open-source/apache-hadoop/apache-nifi.html (accessed on 12 November 2020).
- What is Apache Hive? | IBM. Available online: https://www.ibm.com/analytics/hadoop/hive (accessed on 12 November 2020).
- Salloum, S.; Dautov, R.; Chen, X.; Peng, P.X.; Huang, J.Z. Big data analytics on Apache Spark. Int. J. Data Sci. Anal. 2016, 1, 145–164. [Google Scholar] [CrossRef] [Green Version]
- HDFS Architecture Guide. Available online: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html (accessed on 12 November 2020).
- Apache HBase—Apache HBaseTM Home. Available online: https://hbase.apache.org/ (accessed on 12 November 2020).
- Zeppelin. Available online: https://zeppelin.apache.org/ (accessed on 12 November 2020).
- Welcome | Superset. Available online: https://superset.apache.org/ (accessed on 12 November 2020).
- Schöne, P. APIzation in the B2B Space: Integration & Infrastructure. API Friends 2017. Available online: https://apifriends.com/api-management/apization/ (accessed on 19 November 2020).
- Cloudera. Hortonworks Data Platform (HDP) on Sandbox. Available online: https://www.cloudera.com/content/www/en-us/downloads/hortonworks-sandbox/hdp.html (accessed on 12 November 2020).
Criteria | Processes | with BDVC | without BDVC | Impact | Conclusions | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Acquisition | Pre-Processing | Storage | Analysis | Visualization | Monetization | Discover | Storage | Processing | Visualization | ||||
Data Quality Dimensions | Credibility | - | 91% | - | 89% | - | - | - | - | 55% | - | Medium | In the first scenario, the dataset was preprocessed by filtering, cleaning, and detecting the missing and outlier values, thus generating more credible insights. Meanwhile, in the second scenario, collected raw data only underwent Null values treatment, leaving the information still impure and sometimes biased. The generated data throughout the BDVC were much more credible than the second case. |
Consistency | - | 81% | - | - | - | - | - | - | 42% | - | Medium | In the BDVC, we applied several techniques to obtain consistent data such as transformation, reduction, and aggregation operations on collected data using Spark and Hive SQL. These operations allow us to build smart datasets that are preprocessed and directly exploitable, potentially by the analysis phase or monetization. By contrast, in the second scenario, this process was limited to simple SQL operations, with unreliable data. | |
Time Penalty | - | - | 90% | 90% | - | - | - | 51% | 31% | - | Strong | The access to stored data (i.e., in HDFS, HBase, Hive in our case) in the BDVC is variable. However, as a data warehouse, Hive offers high-speed and granular access to data, far exceeding the performance. In addition, the use of Spark RDD during data analysis in the BDVC allows a fast display of data compared to the second scenario, which uses standard JDBC access. | |
Accuracy and Reliability | - | 91% | - | 89% | - | 90% | - | - | 45% | 35% | Strong | In the BDVC, we prepared and processed the data using different techniques implemented on Spark and Hive, which gave us a reliable and smart dataset. Then, we analyzed them using Spark SQL, which made them more accurate. The processing pipeline by which the data flows into the BDVC allows the ensuring of efficient data processing, while in the second case, raw data are not fully preprocessed, which raises serious questions about the process outputs’ accuracy and reliability. In our case, it reduced drastically the accuracy of the results, which made queries on the database inaccurate. | |
Computation | 80% | 80% | 90% | 90% | 90% | 90% | 25% | 50% | 40% | 45% | Strong | Hortonworks, as a big data platform, offers high computation capabilities when combined with dedicated hardware. Despite the modest test platform, the queries were executed in the milliseconds, and the dataset loading did not exceed one second. Meanwhile, without the BDVC, data loading and the execution of nested or complex analysis are penalized. If not prepared in boxes or views, they can have a polynomial behavior in terms of execution time in the Big Data context. | |
Insights | 35% | 70% | 80% | 90% | 90% | 80% | 25% | 50% | 40% | 45% | Strong | The simulated BDVC provides multiple valuable outputs. Indeed, the storage contains raw data (in HDFS), which are then filtered well and updated, resulting in preprocessed data (in HBase). Then, they are transformed and aggregated to become intelligent datasets (in Hive), ready for analysis and visualization. It is essential to highlight that this configuration allows the generation of insights into several points through the BDVC. By contrast, without BDVC, we only have access to stored, unreliable data and their visualization by a query. Thus, the use of the BDVC allows the generation of more valuable insight in a different format. | |
Exchange/ Monetization | 90% | 90% | 90% | 90% | 90% | 99% | - | 40% | - | 40% | Strong | As mentioned above, the BDVC generates end-to-end insight under different formats (e.g., raw, preprocessed, smart data, analysis model, web page). These data can be exchanged or shared as needed by other systems using exchange, APIzation, and exchange protocols. The low level of insight without BDVC makes the process limited only to its output and eventually to the storage level. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Faroukhi, A.Z.; El Alaoui, I.; Gahi, Y.; Amine, A. An Adaptable Big Data Value Chain Framework for End-to-End Big Data Monetization. Big Data Cogn. Comput. 2020, 4, 34. https://doi.org/10.3390/bdcc4040034
Faroukhi AZ, El Alaoui I, Gahi Y, Amine A. An Adaptable Big Data Value Chain Framework for End-to-End Big Data Monetization. Big Data and Cognitive Computing. 2020; 4(4):34. https://doi.org/10.3390/bdcc4040034
Chicago/Turabian StyleFaroukhi, Abou Zakaria, Imane El Alaoui, Youssef Gahi, and Aouatif Amine. 2020. "An Adaptable Big Data Value Chain Framework for End-to-End Big Data Monetization" Big Data and Cognitive Computing 4, no. 4: 34. https://doi.org/10.3390/bdcc4040034
APA StyleFaroukhi, A. Z., El Alaoui, I., Gahi, Y., & Amine, A. (2020). An Adaptable Big Data Value Chain Framework for End-to-End Big Data Monetization. Big Data and Cognitive Computing, 4(4), 34. https://doi.org/10.3390/bdcc4040034