0% found this document useful (0 votes)
24 views41 pages

BDA 1st Unit

The document provides an overview of Big Data, including its definition, types (structured, semi-structured, and unstructured), characteristics (volume, variety, velocity, and variability), and advantages for businesses such as improved decision-making and enhanced customer insights. It also discusses the applications of Big Data across various industries, including retail, healthcare, and finance, highlighting specific examples like Amazon and IBM Watson Health. Additionally, the document touches on web analytics as a method for understanding visitor behavior on websites.

Uploaded by

startrader196
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
24 views41 pages

BDA 1st Unit

The document provides an overview of Big Data, including its definition, types (structured, semi-structured, and unstructured), characteristics (volume, variety, velocity, and variability), and advantages for businesses such as improved decision-making and enhanced customer insights. It also discusses the applications of Big Data across various industries, including retail, healthcare, and finance, highlighting specific examples like Amazon and IBM Watson Health. Additionally, the document touches on web analytics as a method for understanding visitor behavior on websites.

Uploaded by

startrader196
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 41
( | Understanding Big Data UNITI UNDERSTANDING BIG DATA oe Introduction to big data - convergence of key trends — data - industry examples of big data — web analytics tured applications— big data technologies — introduction to Hadoop $ one source technologies — cloud and big data — mobile business intelli ree ~ Crowd sourcing analytics — gence inter and trans firewall analytic: Ss. LL 1.1. INTRODUCTION TO BIG DATA Big Data is a collection of data that is huge in volume, yet growin, ig exponentially with time. It is data with so large a size and complexity that none of the traditional data management tools can store it or process it efficiently. Big data is also data but with a huge size. Big Data analytics is a process used to extract meaningful insights, such as hidden patterns, unknown correlations, market trends, and customer preferences, Big Data analytics provides various advantages—it can be used for better decision- making, and preventing fraudulent activities, among other things. 1.1.1 Types of Bigdata There are three main types of big data: « — Structured, « Semi-structured, and * Unstructured data. Structured data: Structured data is highly organized and typically stored in a database. It can be easily analyzed using tfaditional data analysis tools and techniques, as it is formatted ina specific way. Examples of structured data include transactional data, customer data, financial data, and inventory data. _— ee Semi-structured data: Semi-structured data s fata is a mixt ure tured data. It has a defined o unstruct d data model, but the data itseip ttUetureg me orgs ce @ 's of semi~ cl i organized Fxamples of semi-structured data include XML ang 2 nat files. and sensor data JSON day ia aM Unstructured data: Unstructured data is not organized in zed in ay and does not have a defined data model. It is generated in a " Particle, and can be difficult to analyze using traditional data analysis pane of ma, ; * . SIS tools an Examples of unstructured data include emails, social media data aes Ueching, and text data. ges, Video, 1.1.2 Characteristics of Big Data Big data can be described by the following characteristics: e Volume e Variety e Velocity Variability Volume « Thename Big Data itselfis related to a size which is enormous. Size of data plays a very crucial role in determining value out of data. «Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data. Hence, ‘Volume’ is one characteristic which needs to be considered while dealing with Big Data solutions. Variety « The next aspect of Big Data is its variety. Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. « Nowadays, data in the form of emails, photos, videos, monitoring devices: PDFs, audio, ete. are also being considered in the analysis applications. Ths variety of unstructured data poses certain issues for storage, mining 2% analyzing data. 4 _——___ ne understanding Big Data 13 velocity ‘The term “velocity” refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data. s Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, Mobile devices, ete. The flow of data is massive and continuous. Variability This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively. . 1.1.3 Advantages of Bigdata Big data has several advantages for businesses and organizations, including: Improved decision-making: Big data provides organizations with the ability to collect and analyze vast amounts of data from different sources. This allows businesses to make more informed and data-driven decisions, as they have access to a wider range of information. Enhanced customer insights: Big data analytics can help businesses gain deeper insights into their customers’ behavior and preferences. This can help companies personalize their products and services, improve customer | experiences, and increase customer satisfaction. «Improved operational efficiency: Big data analytics can help businesses optimize their operations by identifying inefficiencies, reducing waste, and improving productivity. This can result in cost savings and improved overall performance. « New revenue opportunities: Big data can reveal new revenue opportunities for businesses. By analyzing customer data and market trends, companies can identify new products and services to offer or new markets to target. «- Competitive advantage: Big data can provide businesses with a competitive advantage by enabling them to make better and faster decisions, respond to market trends quickly, and identify new opportunities before their competitors. (tei = ee > Big Data Analytics ENDS 14 CE OF KEY TR VERGEN 12 CON i to drive the rged, in recent years Btowth ds have conve: ind ral key trent ee se trends include: i he: importance of big data. TI tion: The increasing use of digital technolo, Digital t1 ame Internet of Things (IoT) has led to the growing adoption data from various sources. As more dey, wee er internet, the amount of data generated is expect connet Bies ang the Seneration of ices become ed to Continue to increase. Cloud computing: The widespread adoption of cloud computing has Made Ye id more cost-effective for organizations to store and Process large saourt - data. Cloud-based big data platforms have become More a4 allowing businesses to process and analyze large amounts of data without investing in expensive on-premise infrastructure. ¢ Machine learning and AI: The growth of machine learning and artificial intelligence (AI) has made it possible to extract insi, complex data sets. These technolo; identify patterns and trends, ights from large and gies can help automate data analysis, and make predictions based on data, © Data privacy and. Security: The im increased significantly in recent ye have become more common. Big data is properly Protected, and that security protocols are in Place to prevent unauthorized access, portance of data privacy and security has ears, as data breaches and cyber-attacks data solutions must ensure that sensitive maintaining and complying with data Privacy regulations. TatiOns. As the an ce © Make big data a critic: technologie amount of, data Continues to grow, 8nd proces, m “alue from their data, °° 10 ensure that they UNSTRUCTURED DaTa business Ope to adopt ne and extract 13 al component of modern Organizations will need can effectively manage Unstructured data is data that does Rot hay Ve a Wel], “defined data model or structure. understanding Big Data images. videos, and audio. Unstructured data is generdtkd frdm a variet v 7 'y of source; email, mobile devices, sensors, and web logs, * Such as social media Some examples of unstructured data include: e Social media data: This includes data from social media platf Facebook, Twitter, LinkedIn, and Instagram. It can include i 7 videos, and other types of multimedia content, ee + Emails: Email data includes the text, attachments, ‘ Aan and metadata i sent and received by individuals or organizations ee « Web content: Web content includes text-based data from websites, such as blog posts, news articles, and product reviews. , « Audio and video files: Audio and video files include data from recorded phone calls, interviews, and surveillance footage. One popular application is customer analytics. Retailers, manufacturers and other companies analyze unstructured data to improve customer experience and enable targeted marketing. Sentiment analysis can be done to better understand customers and identify attitudes about products, customer service and corporate brands. 1.3.1 Structured Vs. Unstructured Data The main differences between structured and unstructured data include the type of analysis it can be used for, schema used, type of format and the ways it is stored. Traditional structured data, such as the transaction data in financial systems and other business applications, conforms to a rigid format to ensure consistency in processing and analyzing it. Sets of unstructured data, on the other hand, can be maintained in formats that aren’t uniform. Structured data is stored in a relational database (RDBMS) that provides access to data points that are related to one another via columns and tables. For example, customer information kept in a spreadsheet and categorized by phone numbers, addresses or other criteria is considered structured data. Ee > Big Data An, on EXAMPLE OF BIG DATA stries that are using INDUSTRY re many re 14 big data to drive innovation and im, here al indu: . There a business operations: justry? retail industry coll interactions, customer interaconss t gain insights int can be used to in! ects vast amounts of data, including sales transaction, inventory data, and more. By analyzing this a 9 customer behavior, preferences, and Putchag form business decisions and drive revenye Retail Ind The retailers ca" patterns, whicl growth. . . ample of big data in the retail industry is Amazon. Amazon uses big est by recommending products base ize it erience, onalize its customer exP ’ i ! data to pers' behavior. This helps to increase on customers’ purchase history and browsing ic omer engagement and loyalty, which in turn drives sales and revenue, Another example of big data in the retail industry is Walmart. Walmart uses big data to optimize its supply chain operations, by analyzing data from suppliers, distributors, and its own stores. This allows Walmart to better forecast demand, optimize inventory levels, and reduce waste, resulting in cost savings and improved operational efficiency, cust Healthcare Industry: «The healthcare industry is one of the fastest-growing industries for big data, as it generates and manages vast amounts of data from various sources such as electronic health records (EHRs), medical imaging, clinical trials, and genomics. « By analyzing this data, healthcare organizatigns can gain insights into patient health, disease diagnosis and treatment, and operational efficiency. a : ; ' rea — big data in healthcare is IBM Watson Health. IBM Watson . an Heh stp = and artificial intelligence (AI) to help healthcare ; ve patient patient experience. , Teme man cog en aime «Watson Health analyzes va: records, lab results, and ime treatment recommendatio; ist amol : aging dino of patient data, including medical » '0 provide clinicians with personalized ns and insi; i . ights into disease trends and patterns. eee understanding. Big Data 17 Another example of big data in healthcare is Pfizer. Pfizer uses big data to accelerate drug discovery and development, by analyzing vast amounts of genomic, clinical, and operational data. This allows Pfizer to identify new drug targets and develo P More effective treatments, while also improving operational effi iciency and reducing costs, Finance Industry: The finance industry is another industry that generates and manages vast amounts of data, including financial transactions, market data, and customer information. By analyzing this data, financial institutions can gain insights into market : trends, customer behavior, and risk management, which can be used to inform business decisions and drive revenue growth. e One example of big data in the finance industry is Mastercard. Mastercard uses big data to identify and prevent fraud in real-time, by analyzing vast amounts of transaction data from its global network. « This allows Mastercard to detect fraudulent activity and alert cardholders and merchants before any fraudulent transactions are processed, reducing financial losses and improving the customer experience. « Another example of big data in the finance industry is Goldman Sachs. Goldman Sachs uses big data to inform investment decisions, by analyzing vast amounts of market data, social media data, and news articles. « This allows Goldman Sachs to identify emerging market trends and | Opportunities, which can be used to inform investment decisions and drive | revenue growth. 1.5 WEB ANALYTICS Web analytics is the practice af measuring, analyzing, and reporting on the behavior of website visitors. Web analytics helps website owners and digital marketers understand how visitors interact with their website, | which pages are popular, and how visitors navigate through the site. | This information can : be used to improve website design, user experience, and online marketing efforts, ee Big Data Analytic, é ytics tool such as Goo, 18 gle ics data ks Vi oe ae Web analy! S rae! ettics such as: s, which u ollected ific page f times a spect ic Pp jduals who visit the website Cc istypica 1y othe irr activ On Analytic yn the website is viewed, ‘The data © pageviews? sors: TH que visitorst | unig tod of te . ' the website after viewing give! over a8! «tors who leave nce rate: d on the website. Boul t visitors spe only one PAB th of time that f visitors who complete a specific goal, Sessit purchase. Conversion rate: such as filling 0M ff Web Analytics pes of ‘web anal age 0 : The percent rm oF making a contact fol 1.5.1 Types ° yytics and off-site analytics. There are two ‘main ty] jytics: on-site anal, On-site analytics: « analytics tracks user pageviews, behavior on @ specific website. It collects data bounce rates, conversion rates, and other On-site clude Google Analytics, Adobe Analytics, on website traffic, On-site analytics tools in metrics. and Piwik. «Onsite analytics data can be used to identify user behavior patterns, popular ton the website. This information can be pages, and areas for improvement to optimize website design, tience, and marketing efforts. need user expe opf-site analytics: Off-site analytics tracks traffic from external soci media, and referral sites. It collects referral sources, and user behavior on the website || sources, such as search engines, . data on the number of visitors, Off-site analyti i eet eat iate ey Ae and Similar Web. OfFsite oe 0 track the effecti i identify popular referral sources, and sptiniae market i ite a ing strategies. 1.6 Ness n + P . Nece. — Beals: Defining the Key metrics that will determine the * Con T business and website, e’Ctin, isitors ‘thering. information, Statistics, and data on website - Wis tools, re OCessing . ta: ‘ THOS, KPIs, ang oooYetNg the ray data you've gathered into meaningful . ae other in; mation that tell.a story, : ing data: i i * Develop "SPlaying the Processed data in an €asy-to-read format, ing Perience Le Strategy: Creating a Plan to optimize the website a business Roals, a : Periment, ing Website aaa ee B tests to determine the best way to optimize 3 Benefits of Web Analytics Unde isi » Standing visitor behavior: are interacting with th Measuring marketing Performance: Web analytics data can be used to track the effectiveness of online marketing campaigns and make adjustments to improve their Performance, . Increasing Website traffic: By identifyin, Content, website Owners cay in increas, BIGDATA APPLICATIONS 8 Popular pages and Optimizing website © website traffic and engagement, Big data has numerous applications across various ii industries, including: ° Healthcare: Big data is used in healthcare to improve patient outcomes, teduce costs, and optimize treatment Plans. Healthcare Providers use bi data to analyze patient data, predict disease outbreaks, and improve diagnosti accuracy. 1.10 e — Retail: Retailers use big data to analyze cy, e Finance: Big data is used in finance e Manufacturing: Manufacturers use big data « Energy: Energy companies use big data to oj Big Data Ans , i sbi t Stomer behavior ite and inventory, and personalize the me a ; : OMer exper; . "Z€ brig, retailers make data-driven decisions abo! Perience, Bi d ng UL Which ata hel price them, and how to market them, Products to Stock, bowie Mics to detect fraud, improve i inancial institutions Use bj i Potential} tisks, and erie ize management, and predict market trends, Fj to analyze customer behavior, identify investment strategies. to optimize Production Costs. Big data heig n Processes, track Product processes, improve quality control, and reduc manufacturers identify inefficiencies in producti; quality, and optimize supply chain management. )ptimize energy Production, data helps energy companies ‘sses, and improve energy reduce costs, and improve energy efficiency, Big predict demand, optimize production Proce: efficiency. ¢ Transportation: Transportation companies use big data to optimize Toutes, reduce fuel consumption, and improve safety. Big data helps transportation companies track vehicle performance, optimize routing, and improve safety measures, © Marketing: Marketers use big data to analyze customer behavior, personalize the customer experience, and optimize marketing campaigns. Big data helps marketers identify target audiences, track customer behavior, and optimize marketing strategies, 1.7 BIGDATA TECHNOLOGIES t There are several big data technologies and tools that are commonly used to ig date Store, process, and analyze large and complex data sets. Some of the popular big data technologies include: ote] rage © Hadoop: Hadoop is an open-source framework used for distributed storae a ity hardware. It and processing of large data sets across clusters of commodity har: ares or down is designed to handle large and complex data sets and can scale up as needed, Understanding Big Data . a fast and Powerful open-source big data processing “signed for ta bi T8e-scale data Processing and can process data in atches, NoSQL databases, 5 NoSQu. designed to handle QL databases are non- Telational databases that are + © and unstructured data sets. Examples of NoSQL, dat: P Slsbases include MongoDB, Cassandra, ang HBase. Apache Kafka: ‘ used for be ‘@ Apache Kafka is a distributed Streaming platform that is ui dis Teal. data pipelines and streami indle high Volumes of. data and can proce Apach : i i pac, he Storm: Apache Storm 'S 40 open-source distributed real-time Processing System that is used fo: time. It is 9, T processing large streams of data in real- ™mmonly used in applications such data Processi; Ng, and social media analytics, ‘ Apache Flink; Apache Flink isan Open-source big data Processing platform that is designed for both batch and stream Processing. It can process large and complex data Sets in real-time and can handle both structured and Unstructured data, ess data in real-time. as fraud detection, sensor Data Warehousing tools: Data warehousing tools such Google BigQuery, and Mi icrosoft Azure SQL Data W; storing and managing large data sets for analytics an as Amazon Redshift, ‘arehouse are used for d reporting purposes, stributed NoSQL. database ility and fault tolerance. It is oft ten used for managing Series data, such as sensor data or user activity logs. Apache Beam: Apache Beam is a for both batch and stream data Processing, It supports multiy and can run on various big data Processing engines, includin; Apache Flink, and Google Cloud Dataflow. Elastic Stack: Elastic Stack (formerly known as ELK stack) is an open- Source software suite used for search, analytics, and visualization of large data sets. It includes Elasticsearch, Logstash, and Kibana. Apache NiFi: Apache NiFi is an Open-source data integration tool used for i i . Itis often ingesting, processing, and distributing data across various systems. It is o1 used in data lakes and data hubs, Apache Cassandra: Apache Cassandra is a dis designed for high scalabi large amounts of time- "open-source unified programming model ple languages 'g Apache Spark, 1.12 Big Daty 1.8 INTRODUCTION TO HADOOP Apache Hadoop is an open-source platform developed in MapReduce framework, and it is used for distributed storage an of huge volumes of data on commodity clusters. Java lan, : . age d distribute py ro Sy Apache Hadoop was derived from white papers such as Google MapRedy, Google File System (GFS). ey Hadoop is designed to scale up from single servers t lo thousands Of com, each offers local processing and storage. ip ty Hadoop Distributed File System (HDFS), job sche duling, management are various tools offered by the Hadoop platform: 7 ANd reso, Hadoop can be viewed as a cluster of com puters that consists of one Master py and many worker nodes. The master node schedules the tasks and the workers Are responsible; performing the execution of the map and reduce tasks. Hadoop can be deployed in three modes: * Standalone mode: It is used for debugging in a single node environne Hadoop can be installed on a single node in a Single standalone instance ¢ Fully Distributed mode: Hados op cluster can be formed by connect multiple nodes of commodity hat tdware. * Pseudo-distributed mode: This is a multi-node installation on asi instance. It is also called a single Node java system that runs the entire ls The two versions of Hadoop: Hadoop 1.x and Hadoop 2.x, 1. Hadoop 1. x: ¢ — It supports the MapReduce model only. e —Non- MapReduce tools are not supported. j ‘sed to ¢ _ Itis less scalable than the Hadoop 2. x version since it is limited ® nodes per cluster. . : sol" Hadoop 1. x is responsible for data processing and cluster -- 2, Hadoop 2. x: understanding Big Data : dels such jstributed mo It supports the MapReduce model as well as other i as Spark, HBase, Giraph, etc. sg responsible for It can seale up to 10000 nodes per cluster. Hadoop 2 * jator) and rce Negotiato cluster resource management YARN (Yet Another pa data processing is done using different processing moce!s- eeaber eee intai has the nu! On each of the nodes, resource information is maintained suc] of map and reduce slots available. istributed It allows running other frameworks on top of HDFS (Hadoop Distr! File System) using YARN API. The MRv2 is a next-generation MapReduce framework that runs within YARN. Hadoop 1.0 MapReduce (Resource Management and Data Processing) HDFS (File Storage) Figure 1.1 Hadoop 4. x version Hadoop 2.0 MapReduce Others (Data Processing) (Data Processing) YARN (Resource Management and Data Processing) HDFS (File Storage) ee ececeearescaaoeeeeeeeeeeeecmesl 1) Figure 1.2 Hadoop 2. x version 1.14 Big Data Analytics 1.8.1 Hadoop Core Components Hadoop Components Tasks Performed Hadoop Common Common utilities i. . java librar other components such as Y and java files Used by MapReduce for running the Had HDFS, Yar ‘Ny and ‘Op cluster, HDFS- Storage layer It allows the storage of a hu; ig volume of data Across multiple nodes. Data is stored in the form of memo blocks and is distributed across the cluster. Ee Hadoop YARN- resource | It is responsible for job scheduling and Tesource management layer management. MapReduce- data Parallel Processing of huge datasets processing layer 1, Hadoop Distributed File System (HDFS) + HDFSis the file system of Hadoop. * Itisan open-source implementation of the distributed Google File System, ¢ _ Itis designed to work with huge datasets or files. HDFS splits the data into block-sized chunks, The default block size of HDFS is 64 MB and it can be extended up to 128 MB. Users can configure the block size as per the requirement. Storing small files in HDFS leads to a wastage of memory. Data is stored in HDFS in two forms, actual and metadata. The actual datais stored in DataNodes and metadata is stored in NameNode. It includes the timestamp, file size, HDFS ensures data availability. Features of HDFS and location of blocks. Replication in * Provides distributed storage « Can be implemented on commodity hardware per ees EE 1.15 understanding Big Data «Provides data security . t machine «Highly fault-tolerant Ifone machine goes down, the data from os goes to the next machine 2, Hadoop YARN ' 7 Hadoop YARN stands for Yet Another Resource Negotiator. It isthe res" management nit of Hadoop and is available as a component of Hedoop version 2. ¢ Hadoop YARN acts like an OS to Hadoop. It is a file system that Is built on top of HDFS. + It is responsible for managing cluster resources to make sure you don’t overload one machine. + It performs job scheduling to make sure that the jobs are scheduled in the right place «Inthe second version of Hadoop called YARN, the two major features of the Job Tracker have split into, (1) a global Resource Manager and (2) a per- * application Application Master e Cluster resource management and job scheduling are separated into two separate daemons. « The main components of YARN architecture are resource manager, node manager, application master, container. 3. MAPREDUCE « MapReduce is the data processing layer of Hadoop. e It is a software framework for easily writing applications that process the vast amount of structured and unstructured data stored in the Hadoop Distributed Filesystem (HSDF). * It processes huge amount of data in parallel by dividing the job (submitted job) into a set of independent tasks (sub-job). «In Hadoop, MapReduce works by breaking the processing into phases: Map and Reduce. eee ee 1.8.2. Key Features and Benefits of Hadoop + The Map is the first phase of processin Bio, 1B, Whi ata Ay logic/business rules/costly code, ST We specify weit ‘ithe SOmple + Reduce is the second phase of processing a processing like aggregation/summation, © Specify light “Weigh, Scalability: Hadoop is designed to be petabytes of data by adding addition: ideal for organizations with large an highly scalable an can eas} al nodes tothe cluster Tan? Mie d growing data sets, HS Makes i © Fault-tolerance: Hadoop is desi: igned to be highly fault-tolerant hi that it can continue to operate even if a node in th clusters ithmeans achieved by replicating data ai cross mul Ster fails. Ths ig there is always a backup Copy available, tiple nodes in the cluster, 55 that e Cost-effectiveness: Hadoop is built using commodity hardware. it much more cost-effective than traditional hi systems. This is particularly important for organi: , Which makes jeh-end servers and storage ations with limited budget, Hadoop is a very flexible platform and ca n handle a wide range of data types, including structured, semi-structured, and unstructured data It can also integrate with a wide range of tools and technologies, such a data warehouses, ETL tools, and BI platforms. e Flexibility: Real-time processing: Hadoop has evolved to include real-time processing capabilities, which allow it to handle data streams and process data in nee real-time. This is achieved through technologies such as Apache Kafka and Apache Storm. is supported «© Community support: Hadoop is an open-source platform tt ae by a large and active community of developers and users. aac access to a wealth of resources, such as documentation, tutorials, forums. 1,9 OPEN-SOURCE TECHNOLOGIES ; odify, at Source code available to the public, allowing anyone to access, Mm! it it have the! yrograms that ha Open-source technologies refer to software or computer prog! nd distribute + a gral! It in gree 5 ich can resul ‘This means that users can see and edit the code, whiel collaboration, and innovation in software development. a. Understanding Big Data 17 _ Ten, "ree movement started in the late 1990s and has since become a Significant force in the tech industry, Many Popular Sofiware tools and Tess, are o, platforms, including Linux, Apache, MySQL, Pen source, and WordP, ; Open-source technologies offer several advantages, flexibility, ang “ustomization, Th i iti ity build en hey also Provide PPortunities for collaboration and community building, as Cevelopers from around the World can Contribute to the improvement, Software’s development and Such as cost-effectiveness, A In the Context of big data, ©Pen-source technologi platforms that are freely available and enable Processing, analysis, and Storage of large volumes of data, These technologies have become es refer to software tools and beca an essential Pi complex data Sets. art of the big data ecosystem © scalable and Cost-effective sol lutions for Managing large and Some Popular open Source big data technologies include: * Hadoop: d Computing Platform that allows. S clusters of compute; A distribute Processing of large data Sets acros; Spark: A fast and general-; urpos both batch and real-time Processi for the Storage and TS. se data processing engine that can handle ing. Cassandra: A Nos QL database that is desi data across multipl igned to handle large amounts of le servers, Elasticsearch: A distributed search and analytics engine that can quickly and easily search large amounts of data. Kafka: A distributed streaming platform that can handle real-time data streams, These technologies have enabled organizations to Process and analyze data at Scale, leading to improved decision-making, increased efficiency, and betes —s > b outcomes. Additionally, the Open-source nature of these technologies alloy Collaborative developme 4 Fi " ‘nt and community contributions, which can lead 10 faster Innovation and better features 1.18 Bi 110 CLOUD AND BIGDATA TOR Aah, puting is the use of computing resources (har, Cloud com| : = ‘dware that are delivered as a service over a network (typically the Internet), it's and i framework. eta It is like a resource on demand whether it be storage, Computing ete, cy pay per usage model. © Cloug follow, This computing service by cloud charges you based only on the 3 amount of computing resources we use. So for example, if you want to give demo to a client on a cluste 100 machines and you do not have so many machines currently ay T Of mor A aii in) then in such case cloud computing plays a very important role. lable With yoy, Cloud plays an important role within the Big Data world, by providing ho, expandable and optimized infrastructure that supports practical ae ‘rizontally ao : Mation of Big 1.10.1 Cloud Computing and Big Data In cloud computing, all data is gathered in data centers and then distributed the end-users. Further, automatic backups and recovery of data is also ensured . business continuity, all such resources are available in the cloud. " There are multiple ways to access the cloud: 1. Applications or software as a service (SAAS) ex. Salesforce.com, dropbox, google drive etc. 2. Platform as a service (PAAS) 3. Infrastructure as a service (IAAS) Features of Cloud Computing . Scalability: * Scalability is provided by using distributed computing «© = Elasticity: ich it % Customers are allowed to use and pay for only that much resource Whi is using. 1.19 understanding Big Data ¢ Incloud computing, elasticity is defined as the de able to adapt to work! time the available res: ‘gree to which a system is load changes in an autonomic manner, so that at any ‘Ources match the current demand as closely as possible. + Resource Pooling + Same resources are allowed to be used by multiple organizations. The computing Tesources ai tenant model, with differes according to consumer de; . ‘mand. © Self service * Customers are Provided Casy to use interface through which they can choose services they want, * Aconsumer can unilaterally provision computing capabilities, such as server time and network Storage, as needed without requiring human interaction, « Low Costs * — Itcharges you based only on the amount of computing resources We use and you need not buy expensive infrastructure, . Pricing ona utility compu Tequired for implementati © Fault Tolerance ting basis is Usage-based and fewer IT skills are ion, * 4 part in cloud System fails to Tespond. Cloud Deployment Models There are mainly 2 types of cloud deploy yments models: . Public cloud — 4 cloud i S called a “public cloud” when the services are pen over a network for public use, single organization, or externally, Cloud Delivery Models Cloud services are categorized as below: a 1,20 * . Infrastructure as a service (IAAS): It means complete infrastructure Will be Provid Vided . 0 you, Maintenance-related tasks will be done b; . tl can use it as per your requirement, ¥ the eloug PrOVider ang You It can be used as public and private both. Examples of IaaS are virtual machines, load balance attached storage. TS, and et Wong e Platform as a service (PAAS): + ° ° Here we have object storage, queuing, databases, Tuntime et; + fc, All these we can get directly from the cloud provider. It’s our responsibility to configure and use that, Providers will give us the resources but connectivit ae use 'Y to our database and other similar activities are our responsibility. Examples of PaaS are Windows Azure and Google App Engine (GAB) © Applications or software as a service (SAAS) * ex. Salesforce.com, dropbox, google drive etc. * Here we do not have any responsibility. ¢ We are-using the application that is running on the cloud. All infrastructure setup is the responsibility of the service provider. * For SaaS to work, the infrastructure (laaS) and the platform (PaaS) must be in place. Cloud for Big Data Below are some examples of how cloud applications are used for Big Data: e IAAS ina public cloud: + . i . . i ives: Using a cloud provider’s infrastructure for Big Data services, 8 access to almost limitless storage and compute power. 24 ——O~Srt=‘“CSO understanding Big Data ¢ IaaS can be utilized by enterpris easily scalable IT solutions where c 4 and expenses of managing the underlying 1.21 ate cost-effective and sustomers to cres ae re bear the complexities Joud providers hardware. ¢ Ifthe scale of a business customer’s operations fluctuates, or Oe looking to expand, they can tap into the cloud resource as and when they need it rather than purchase, install and integrate hardware themselves. « PAAS in a private cloud: + PaaS vendors are beginning to incorporate Big Data technologies such as Hadoop and MapReduce into their PaaS offerings, which eliminates dealing with the complexities of managing individual software and hardware elements. + For example, web developers can use individual PaaS environments at every stage of development, testing and ultimately hosting their websites. + However, businesses that are developing their own internal software can also utilize Platform as a Service, particularly to create distinct ring-fenced development and testing environments. «© SAAS in a hybrid cloud: «+ Many organizations feel the need to analyze the customer’s voice, especially on social media. SaaS vendors provide the platform for the analysis as well as the social media data. * Office software is the best example of businesses utilizing SaaS. Tasks related to accounting, sales, invoicing, and planning can all be performed through SAAS. % Businesses may wish to use one piece of software that performs all of these tasks or several that each performs different tasks. The software can be subscribed through the internet and then accessed online via any computer in the office using a username and password. If needed, they can switch to software that fulfills their requirements in better manner. % Everyone who needs access to a particular piece of software can be set up as a user, whether it is one or two people or every employee in a corporation that employs hundreds. oe ——— ee / ‘Infrastructure as a Service cloud computing companies: Providers in the Big Data Cloud Market Cloud computing companies come in all sha Pes and Sizes, All large software vendors either have alrea : dy star ; space, or are in the process of launching one, Offerings in elo In addition there are many starty ips that have interes space. Here we have a list of maj ting pro, }or vendors of cloug a Cts in — Few of the cloud providers are goo; gle, citrix, netm, etc. Amazon (aws) is the leading cl loud provider am Microsoft is also providing cloud Services and it j agic, redhat. Tack, > ‘S] ongst all, 'S Called as azure. Amazon’s offerings include $3 (Data storage/file system} relational database) and EC2 (comput ting servers), Rackspace’s offerings include Cloud D; Sites (web site hosting on cloud) and ), SimpleDB (nop, rive (Data storage/file system), Cloud Servers(computing se, IBM’s offerings include Smart Business Storage Cloud and Computing on Demand (CoD). AT&T’ Clout Vers), S provides Synaptic Storage and Synaptic Compute as a service, Platform as a Service cloud computing companies Googles AppEngine is a develo pment platform that is built upon Python and Java, com’s provides a development platform that is based upon Apex. Microsoft Azure provides a development platform based upon .Net. Software as a Service companies In SaaS, Google provides sj pace that includes Google Docs, Gmail, Google Calendar and Picasa, : he 7 i in IBM provides LotusLive iNotes, a web-based email service for messaging and calendaring capabilities to business users. Zoho provides online products similar to Microsoft office suite. oo Understanding Big Data Data Security 23 Organizations mu: st ensure. Provider ensure ¢, that their agreement with the cloud service ‘ata security, Handing over Private data to others worties some people. Corporate &xecutives might hesitate to take advantage of a cloud computing System becaus; ® they can’t keep their company’s information under lock and key, Performance Compliance Legal Issues Parameters of cloud Performance must be sj and quantified where ecified in the agreement ‘Ver possible, Cloud services must be compatible with the compliance needs of the business, §, re also concerned about regulatory issues, Market observers say that around 50 percent will be tied to one Provider of cloud storage, ome companies at People worry that they Organization must ensure that the location of the physical resources of the cloud does not bring any legal issue, The cloud presents a number of legal challenges towards Privacy issues involved in data stored in multiple locations in the cloud, additionally increasing the risk of confidentiality and privacy breaches, Costs : ith the use of Organizations should be aware of all the costs involved with the us ffers pay cloud, and use the services in a controlled manner as cloud offers p as per usage method of the cost incurred by the company. eee y 1.24 1.1 . 1111 Site Big Data Analytics MOBILE BUSINESS INTELLIGENCE > The definition of mobile BI refers to the access and use mobile devices. With the increasing use of mobile devices for business _ not only in management positions ~ mobile BI is able to bring business intelligence and analytics closer to the user when done Properly, of information via Whether during a train journey, in the airport departure lou meeting break, information can be consumed almost anywhi with mobile BI. nge OF during 4 lere and anytime Mobile BI ~ driven by the success of mobile devices - was considered by many as a big wave in BI and analytics a few years ago. Nowadays, there, a level of disillusion in the market and users attach much less importance tg this trend. BI delivers relevant and trustworthy information to the right person at the right time. Mobile business intelligence is the transfer of business intelligence from the desktop to mobile devices such as the BlackBerry, iPad, and iPhone. The ability to access analytics and data on mobile devices or tablets rather than desktop computers is referred to as mobile business intelligence. The business metric dashboard and key performance indicators (KPIs) are more clearly displayed. With the rising use of mobile devices, so have the technology that we all utilise in our daily lives to make our lives easier, including business, Many businesses have benefited from mobile business intelligence. Essentially, this post is a guide for business owners and others to educate them on the benefits and pitfalls of Mobile BI. Need For Mobile Bi Mobile phones’ data storage capacity has grown in tandem with their use. You are expected to make decisions and act quickly in this fast-paced environment. The number of businesses receivi ing assistance in such a situation is growing by the day. To expand your business or boost your business productivity, mobile BI ca help, and it works with both small and large businesses. eg ee Se I COOOOGQ< understanding Big Data 1.25 ¢ Mobile BI can help you whether you are a salesperson or a CEO. + There is a high demand for mobile BI in order to reduce information time and use that time for quick decision-making. As a result, timely decision-making can boost customer satisfaction and improve an enterprise's teputation among its customers. + Italso aids in making quick decisions in the face of emerging risks. 1.11.2 Advantages Of Mobile BI Simple access ¢ Mobile BI is not restricted to a single mobile device or a certain place. + You can view your data at any time and from any location. + Having real-time visibility into a firm improves production and the daily efficiency of the business. * Obtaining a company’s perspective with a single click simplifies the process. Competitive advantage . Many firms are seeking better and more responsive methods to do business in order to stay ahead of the competition. Easy access to real-time data improves company opportunities and raises sales and capital. ¢ This also aids in making the necessary decisions as market conditions change. Simple decision-making + As previously stated, mobile BI provides access to real-time data at any time and from any location. ¢ During its demand, Mobile BI offers the information. ¢ This assists consumers in obtaining what they require at the time. e Asaresult, decisions are made quickly. Increase Productivity itical e By extending BI to mobile, the organization’s teams can access cM company data when they need it. 1.26 we 2 Bi + Obtaining all ofthe corporate data with Single click fey neat amount of time to focus on the smooth and erm’ Hees up 8 signif, TClent operation se eM icant ON Of the in + Increased productivity results in a smooth and quick-runn Tuning §, 1.12, CROWD SOURCING ANALYTICS a ¢ Crowdsourcing is the collection of information, opinions group of people, usually sourced vi OF Woy a the Internet, OTK from + Crowdsourcing work allows com Panies to save time and Money whi into people with different skills ey Whil OF thoughts from all over the wi ® "Ping World, © While crowdsourcing seeks information or work, vdsour i » Crowdfundin, to support individuals, charities, or Startup com 8 Seeks npanies, and © The advantages of crowdsourcin g include cost savin, to work with people who have BS, Speed, and the abili skills that an in-ho pa ‘se team may not have nies to farm out work t + Crowdsourcing allows compa . the country or around the world; as a result, crowdsour: tap into a vast array of skills and expertise without i overhead costs of in-house employees, People anywhere jn, ‘cing lets businesses incurring the Normal * Crowdsourcing is becoming a popular method Projects. As an alternative to traditional financing options, crowdsourcing taps into the shared interest of a group, bypassing the conventional gatekeepers and intermediaries required to raise capital. to raise capital for Special « — Crowdsourcing usually involves taking a large job and breaking it into many smaller jobs that a crowd of people can work on separately. 1.12.1 Crowd Sourcing and Crowd Funding i > ing instead While crowdsourcing seeks information or workers labor, cromsnesat! eee a. a solicits money or resources to help support individuals, charities, or startup ion of . 7 ' expectation People can contribute to crowdfunding requests with fate ; i ibutors. repayment, or companies can offer shares of the business to contri Advantages ject O° jon pros ; . sat \d a comm ¢ Crowdsourcing brings together communities aroun cause ynderstanding Big Data od + Efficient way of solving time-intensive problems « Deeper engagement by communitie s, who resonate and build loyalty to the product or solution, Disadvantages e Results can be easily skewed based on the crowd being sourced ¢ Lack of confidentiality or ownership of an idea Potential to miss the best ideas, talent, or direction and fall short of the goal or purpose. 1.12.2. Types of Crowdsourcing Crowdsourcing involves obtaining information or resources from a wide swath of people. In general, we can break this up into four main categories: Wisdom of the crowd: « It’s a collective opinion of different individuals gathered in a group. e — This type is used for decision-making since it allows one to find the best solution for problems. «Many brands pay attention to the collective opinion of their customers because they help bring their businesses new ways of thinking, ideas, and strategies. e Asa result, the overall performance of a company improves. Crowd creation: ¢ — This type involves a company asking its customers to help with new products. e This way, companies get brand new ideas and thoughts that help a business stand dut. For instance, McDonald’s is open to new ideas from its consumers. ¢ — The famous fast food company asked customers to create their perfect burgers — and submit their ideas to the brand. P ; 7 ano -ator’s « The company released winners’ burgers each week, including the cre : short bio. 1.28 Crowd vot Big Data Ara ies + Isatype of erowdsourcing where customers are alloweq . oH i Fi toch They can vote to decide which of the is 008e a ws ye options is the best for them 8 Winney This type ean be applied to different situations. Consume i Cr of the options provided by experts or products created by 'S can ¢| consuy ho OSE Ong A umers, For instance, if brand asks its consumers to create anew tj ‘aste, pack; . design of a product, other consumers vote to identify the best one, ‘ABE, of Crowd funding: It's when people collect money and ask for investmen ' ts for charities, Projects, . and startups without planning to return the money to the owners, © People do it voluntarily. Often, companies gather money to help individuals and families suffi from natural disasters, poverty, social problems, etc. be 1.13. INTER AND TRANS FIREWALL ANALYTICS 1.13.1 Inter Firewall Analytics Inter Firewall Analytics is a type of security analytics that focuses on monitoring and analyzing traffic flowing between different zones or segments of a network that are separated by firewalls. The goal is to identify and prevent potential threats that may be hiding within the traffic. Firewalls are a common security measure used to control traffic flow between different segments of a network, such as between an internal network and the internet or between different departments within a company. However, firewalls alone cannot provide complete protection against all ° threats, especially those that may be hiding within the allowed traffic. ¢ Inter Firewall Analytics involves deploying specialized tools and techniques to monitor the traffic passing through the firewalls and analyzing it for signs of potential threats. « This can include detecting anomalies in the traffic patterns, identifying unusual or unauthorized access attempts, and flagging suspicious activity. ad understanding Big Data — 9 Big Data, 1.29 Some Common tech hniques used in Inter Firewall Analytics include: * Packer “apture and analysis: This involves capturing and analyzing network tralfic to iden NUify potential threats, such as malware, exploits, or suspicious behavior, tehavion-based araivsis: This involves analyzing the behavior of network raffic ver time to identi tify anos threat, : malies or patterns that may indicate a potential * This involves using machine learning algorithms to raffic and identifi indicat Potential threat, ee pater tat mowers . Threat intelligence: the t This involves integrating threat intelligence feeds into analytics process to identify known threat Of potential threats, Sand help prioritize the analysis 1g betwe prevent potential threats bef The goal of Inter Firewall Al ies i i nalyties is to provid more complete yi network traffic flowin, i he network eee ‘ore they can cause harm, een different Segments of the network and to detect and By monitoring and analyzing traffic at the network. level, Inter Firewall Analytics can help Organizations ide: ify and Tespond t effectively. ‘© potential threats more quickly and 1.13.2 Trans Firewall Analytics The main Purpose of trans firewall I analytics is to identify and Prevent network ware, viruses, Phishing attemy te an organization’s network, Trans Firewall Analytics involves monitoring and analyzing network traffic logs generated by the firewall. pts, and other types of cyber threats that try to penetrat These logs contain information about the source and destination IP addresses, the protocols and ports used, the size of the packets, and other network traftic metadata, By analyzing these logs, security analysts can detect patterns of network traffic that indicate a potential threat or attack. . threats. to detect attack. abnorm: Behavioral analy: al network behavior that may be a sign of a Security breach, ich, Real-sime monitoring: Trans firewall analytics tootg in real-time, allowing security teams to respond q Threat detection: These tools use advanced algorithms and and analyze patterns of network traffic that may indi Cate information they need to take action. Trans firewall analytics tools can det Monitor nety, Mickly to an Machine lean ng © threat ct unusual op Alerting and reporting: These tools generate alerts and reports potential threat or attack is detected, providing security teams in a Mh the Integration with other security tools: Trans firewall analytics tools ca integrate with other security tools such as SIEM systems, IDS/IPs, oa endpoint protection solutions to provide a comprehensive view of an organization’s security posture. Trans firewall analytics is an important component of any organization’s network security strategy. helping to identify and prevent cyber threats and attacks before they can cause harm. Aspect Inter Firewall Analytics Trans Firewall Analytics | Type of firewall Firewall between networks Firewall between networks and internet Purpose Monitoring traffic flow between two networks Monitoring traffic flow between a network and the internet Traffic direction Both inbound and outbound | Primarily inbound = Use cases Intrusion detection, data loss prevention, network security monitoring Web application security, malware detection, threat intelligence Data sources NetFlow, syslog, SNMP NetFlow, HTTP logs, DNS logs dg understanding Big Data Metrics collected Analysis. techniques. Benefits Challenges Tools and technologies 131 Ip addresses, bytes, duration Ports, protocols, Packets, session Statistical a detection detection nalysis, anomaly . Signature-based Enhanced net Source and destination IP addresses, URLs, file types, malware signatures, response codes Machine learning, behavior- based detection, heuristics ‘work security, N of potential Proved incident carly detectio threats, im, Tesponse Better understanding of web traffic, improved detection and Prevention of web-based attacks Complexity Of traffic analysis, difficulty jn identifying attacks that Span multiple flows Firewall logs, Network traffic analysis tools, SIEM systems Overwhelming volume of traffic, limited visibility into encrypted traffic, high false Positive rates Web application firewalls, content delivery networks, cloud-based Security services QUESTION AND ANSWERS \e 1,32 TWO MARKS hat you mean by bigdata? Big Data is 4 collection of data that is huge in volume, yet arom exponentiall Wing It is data with s management 100! e types of Bigdata. ree main types of big data: ly with time. ze and complexity that none of the traditional Nal data o large a si Is can store it or process it efficiently. 2, Name thi There are thi Structured, ° Semi-structured, and . Unstructured data. characteristics of Bigdata. cribed by the following characteristics: . List out the Big data can be des i. Volume ii. Variety iii. Velocity iv. Variability 4. What is the advantage of bigdata? Big data has several advantages for businesses and organizations, including: Improved decision-making Enhanced customer insights Improved operational efficiency New revenue opportunities Competitive advantage .- . What you meant by unstructured data? 5. e Unstructured data that fines ode! 01 jata is data that does not have a well-defin: d data mi structure. a ia data such It is typically text-based data, but it can also include multimed . as images, videos, and audio. understanding Big Data data is generated fro email, mobile nsors, a devices, se Some examples of Unstructured dat: a ¢ Social media data: © Emails: © Web content ¢ Audio and video files 6. Difference betw. een structured and Structured Data 33 m a variety of sources such as social media, nd web logs. include: unstructured data. Unstructured Data Data that has a clearly defined schema. and is easily searchable and organized Data that has no clear structure or schema and is often difficult to search and organize Relational databases, spreadsheets Text documents, social media Posts, images, videos Typically stored in a tabular format Can be stored ina variety of formats such as text, JSON, XML, binary, etc, Can be processed using traditional data processing tools like SQL Often requires specialized tools such as natural language processing, machine learning, and computer vision Typically smaller in size and easier to manage ‘ Can be extremely large and difficult to manage Changes to structured data are often slow and predictable Changes to unstructured data can be rapid and unprédictable Structured data is usually uniform. in format and easier to analyze Unstructured data can be very diverse in format and harder to analyze Structured data is valuable for traditional data analysis and reporting Unstructured data is valuable for gaining insights into customer sentiment, social media trends, and other areas where traditional data may not provide enough context. i = 1.34 1A 8. 9. 10. 11. Define Web Analytics. Web analyties is the process of analyzing and measuring website tra visitor behavior to improve the overall effectiveness of a website, Itinvolves the collection, measurement, and analysis of website data j — Big Data Analyticg fie ang order to understand user behavior, and identify areas for improvement, Web analytics helps website owners make data-driven decisions to optimize website performance, increase traffic, and improve user experience, It involves various techniques such as data mining, user profiling, clickstream analysis, and web metrics to gain insights into website traffic, visitor behavior, and website performance. What are the data collection metrics in web analytics? The data collected can include metrics such as: * Pageviews © — Unique visitors « Bounce rate « — Session duration « Conversion rate List out the types of web analytics, There are two main types of web analytics © On-site analytics © Off-site analytics Name some benefits of Web analytics. ¢ Understanding visitor behavior © Improving website design © Measuring marketing performance ¢ — Increasing website traffic List out some applications of Bigdata, Here are some applications of big data: e Business Analytics e Healthcare 1.35 understanding Big Data 4, «Finance ¢ Government «Retail © Manufacturing « Energy e Education e — Transportation e° Media and Entertainment Name some bigdata technologies, Here are some popular Big Data technologies: « Hadoop «= Spark e Hive ‘© HBase * MongoDB * ZooKeeper What is Hadoop? main components: Hadooy (HDFS) and Yet Another Resource Explain the core components of Hadoop. Hadoop is an Open-source framework intending to store and process big data in a distributed manner, Hadoop’s Essential Components: 1. HDFs (Hadoop Distributed F; ile System) — Hadoop’s key storage system is HDFS. The extensive data i: 's stored on HDFS. It is mainly devised for storing Massive datasets in commodity hardware, _—— P adoop MaPRe TT ee are wo StaBe® Map and Resi, jis M apReduce- stage where data blocks are read and mag, duce ; . ae ¢, Map isa st _ eat simple terms: feomputers jnodes /containers) for processing, nail ro the exceed data iS collected and collated ; a stage where 4 n / yeThe framework which is used to process in Hadoop ig vA a AR management and to provide multiple data processing ey, Ly esouree mana : ao S ae streaming. data sciencey and batch processing is done by vil real-time S tt ures of Hadoop- data but also processing big data. It is 7 le Explain the feat data hurdles. Some salient featur oy i OT Hada, 18. not only store Hadoop assists inn : dle significant reliable way to han are — Distributed processing - Hadoop helps in distributed processing te /e., quicker processing. In Hadoop HDFS, the data is collected disbuted manner and the data parallel processing, and MapReiies liable for the same. Hadoop is independent of cost as it is an open-soy Sout e code as per the User's 1. 2. Open Source — framework. Changes are allowed in the sourc: requirements. 3, Fault Tolerance - Hadoop is highly fault-tolerant. By default, for evey block, it creates three replicas at distinct nodes. This number of replicas ca be modified according to the requirement. So, we can retrieve the data fron a different node if one of the nodes fails. The discovery of node failure ai restoration of data is made automatically. .* ; Scalability — It is fitted with different hardware, and we can promptly acces the new device. * aie a The data in Hadoop is stored on the cluster in a safe mas ecosystem’: ne of the machine. So, the data stored in the Hato? S data does not get affected by any machine breakdowns. 16. What you mean by HDFS? . 5 ot : Hadoop Distributed File System (HDFS) is a distributed file system that designed to run on commodity hardware. Understanding Big Data 17. 18, Hadoop offers several benei oo What do you mea List out the benefits of the NQAURYNE It 37 & core component Of the Hadoop framework and provides a distributed ARE system for large data sets, HDFS is designea to handle very large files with streaming data access Patterns, and to, Provide high-th roughput access to data, n by YARN? YARN (Yet Ano ther Resource Negotiator) is one of the core er eer Sdoop, responsible fe T managin, Tesources and schedulin; tasks across a Hadoop cluster, ° It was introduced in Hadoop 2.x as an of MapReduce, Which Suffered from YARN Separates the Job s Mi lapReduce into two se] ‘Parate d; the NodeManager (NM). © ResourceManager (RM) and » Tespectively, Hadoop, fits in the world of big data Processing, Scalability Fault tolerance Cost-effective Processing speed Flexibility Data storage Integration Open-source including: 19. Define Open-source technology. Open-source technologies refer to software or computer programs that have their source code available to the public, allowing anyone to access, modify, and distribute it. i i in greater This means that users can see and edit the code, which can result in greater collaboration and innovation in software development. i ince become a The open-source movement Started in the late 1990s and has sinc significant force in the tech industry. = Big Data Any, e Many popular software tools and platforms, including Linux, 4, < MySQL, and WordPress are open source. Pache, 1.38 20. How cloud technology impacts the bigdata? Cloud technology has @ significant impact on big data in the following Ways: © Scalability © Cost-Effectivencss © Accessibility © Flexibility © Agility © Security «Reliability Innovation 21. What do you mean by cloud computing? ‘Cloud computing is the delivery of computing services including servers, storage, databases, networking, software, analytics, and intelligence over the Internet (“the cloud”) to offer faster innovation, flexible resources, and economies of scale. «The services provided by cloud computing can be categorized into three main models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) 22, List out the features of Cloud Computing. « Scalability «Elasticity «Resource Pooling «Self service « — LowCosts. «Fault Tolerance 23. What are the issues in using cloud services? Some important cloud services issues are as listed: « Data Security * Performance understanding Big Data 24. 25. 26. Comp! © Legal Issues © Costs Define mobile business intelligence. * Mobile business intelli that enab| 8ence (Mobile Bl) is a type of business intelligence vam the acess and Analysis of business data on mobil 88 smartphones and tablets, le devices such enables use: others, Providing real-time TS to share and ¢, Ollaborate on d: Updates and noti ata insights with fications, to make data-q, efficient, '8€ Capacity has Srown in tandem ©xpected to mak, isi environment. The number of businesses Teceiving assistance in such asituationis growing by the day. To expand your business or boost your busines: 'S productivity, mobile BI can help, and it works with both small and large businesses, What are the advantages of business Intelligence, © Simple access . Competitive advantage Simple decision-making ¢ Increase Productivity wa — 1.40 27. 28. 29. 30. 31. Big Data Anal Define crowd sourcing: s the collection of information, opinions Crowdsourcing i ugually sourced via the Internet. group of people. work allows companies to save time and mone ferent skills or thoughts from all over ¢ . + OF Wot fm Crowdsourcing ath di 'Y whi into people with di ile tappin he world, What are the types of crowd sourcing? 1. Wisdom of the crowd 2, Crowd creation 3, Crowd sourcing 4. Crowdfunding What is inter firewall analytics? Inter Firewall Analytics is a type of security analytics that focuses i i i ' monitoring and analyzing traffic flowing between different zones or segme, tk that are separated by firewalls. of a netwo! The goal is to identify and prevent potential threats that may be hiding wit the traffic. What do you mean by trans firewall analytics? Trans Firewall Analytics is a type of network security analytics that focu on monitoring and analyzing network traffic passing through organization’s perimeter firewall(s). The main purpose of trans firewall analytics is to identify and prevent netw threats and attacks such as malware, viruses, phishing attempts, and o types of cyber threats that try to penetrate an organization’s network. «Trans Firewall Analytics involves monitoring and analyzing network te logs generated by the firewall. Difference between inter and trans firewall analytics. r Firewall analytics performed bet F alytics performe: i tween | Fi: i " ; irewall an: Pi pater Pirewall ‘Trans-Firewall this single firewall understanding Big Data 141 Spans multiple firewalls or se ~ curity | Limited to a single firewall or security domains domain Captures and analyzes traffic between firewalls, including ingress and traffic Captures and analyzes traffic within a single firewall, including traffic between different network zones or segments egress FireEye, Palo Alto Networks, Cisco ASA | Suricata, Snort, pfSense PART -B What is Bigdata? Describe the main features of bigdata in detail. List out the characteristics of the bigdata. Discuss about web analytics in detail. What do you mean by open-source technology? Discuss in brief. What is the relationship between cloud and bigdata? Explain about Mobile BI and its types in detail. What is Crowdsourcing? discuss about it. Compare and contrast the inter and trans firewall in detail. erm n nw fF YN Can you think of any bigdata application that impact you daily life? how? 2S Discuss two bigdata application in detail.

You might also like