Big Data Shivani

shivani of big data

Uploaded by

Umashankar Mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

1K views78 pages

Big Data Shivani

shivani of big data

Uploaded by

Umashankar Mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 78

Shivani® Complete Book® Sor Educating People Engineering Students As Per New Scheme & Syllabus@y (AICTE FiexitSe Late New se Th eainaeter } nation Papers ComyUNIT-1— Introduction to big data, Big data characteristics, Types of big data, Traditional versus big data, Evolution of big data, challenges with Big Data, Technologies available for Big Data, Infrastructure for Big data, Use of Data Analytics, Desired properties of Big Data system. UNIT-II — Introduction to Hadoop, Core Hadoop components, Hadoop Ecosystem, Hive Physical Architecture, Hadoop limitations, RDBMS Versus Hadoop, Hadoop Distributed File sytem, Pocessing Data with Hadoop, Managing Resources and Application with Hadoop YARN, MapReduce programming. ions. UNIT-III- Introduction to Hive, Hive Architecture, Hive Data types, Hive Query Language, Introduction to Pig, Anatomy of Pig, Pig on Hadoop, Use Case for uagewith Pig, ETL Processing, Data types in Pig, running Pig, Execution model of Pig, Operators, functions, Data types of Pig. nt friendly. i ut —_ UNIT-IV — Introduction to: NoSQL, NoSQL Business Drivers, NoSQL Data architectural patterns, Variations of NoSQL architectural patterns, using ‘cach a NoSQL to Manage Big Data. Introduction to MongoDB. degree of UNIT-V - Mining social Network Graphs — Introduction, Applications of social Network mining, Social Networks as a Graph, types of social Networks, included Clustering of social Graphs, Direct Discovery of communities in a social Graph, Introduction to recommender system. Price : Rs. 90.00 (Rs. Ninty Only) Edition : 2020 —® pee tobig data, Big data characteristics, Types of big PAGE No, data, Traditional versus big data, Evolution of big data challenges with Big Data Technol available for Big Data, Use of Data Analytics, Desired prop structure for Big data, Data system.. unit. Introduction to Hadoop, Core Hadoop components, Hadoop Ecosystem, Hive Physical Architecture, Hadoop limi RDBMS Versus Hadoop MapReduce programtiming UNIT Introduction to Hive, Query Language Introduction to Pig, Anatomy of Pig, Pig on Hadoop, Use Case for Pig, ETL Processing Data types in Pig, running Pi Operators, functions, Data types of Pig UNITAV Introduction to NoSQL, NoSQL Business Drivers, NoSQL Data architectural patterns, Variations of NoSQL architectural terns, using NoSQL to Manage Big Dat Introduction to MongoDB -. UNITV- Mining social Network Graphs — : Introduction, Applications of social Network mining, Social Networks as a Graph, types of social Networks Clustering of social Graphs, Direct Discovery of communities ina social graph, Introduction to recommender system jive Architecture, Hive Data types, Hive eee | UNIT el INTRODUCTION OF BIG DATA | INTRODUCTION TO BIG DATA, BIG DATA CHARACTERISTICS, TYPES OF BIG DATA, TRADITIONAL VERSUS BIG DATA, EVOLUTION OF BIG DATA, CHALLENGES WITH BIG DATA Q.1. What is big data ? Explain. Ans. “Data of a very large size and typically to the exter ‘manipulation and management present significant logistical challen as big data The technologies and initiatives that involve data th diverse, fast- changing or massive for conventional technologies, skills and infrastructure to address effi is referred to as big data. The data sels that ae so large, complex, and impractical to mmanage with traditional software tools are described by big data. But now the information from big data can be analyzed by using new technologies ¢.g., user web clicks can be tracked by retailers to ide behavioural trends that improve campaigns, pricing and stockage, Major web companies such as Google, Amazon, and Facebook pioneered businesses built on monetizing massive data volumes over the last decade: The new paradigms not only for extracting value from data but also for ‘managing data and compute resources from data center design, to hardware, ‘0 software, to application provisioning were invented by them Another definition of big data is as follows — “The collection, processing, discovery, analysis and storage of large | Nolumes and disparate types of data is enabled by the emerging technologies and practices, very quickly and cost effectively”. | Q.2. What is the importance of big data ? Ans. The importance of big data depends upon it fan be fetched from any source and analyzed to solve that enal that its known4 Big Data j terms of ~ i) Cost redactions | shane eng ‘nroducton of 89 Date 5 | ethango, and sett customer information, We ate cag and lca won. We are sceing new tend inthe Pace, in which customer experienc rom one Packaged, and sold to other industries, pd he eat 24. Disess four Vand fiveV's character nmin me 24 Disess four Vand fveV's characteristics of big data with suitable Finding the root cause of flues, issues and defect in ay | manage tad maine eo ne Fa pzins et soe, Ve : ‘ast amounts of dat athe right speed atthe aoe time, gia the right ns ition, Big dat gener ) Generating coupons a the point of sale secing the customer, sl a ights. In addition, Big data genersory smust ‘habit of buying goods. Recelcuatng ene risk portfolios in just minutes, (iv) Detecting fraudlent behaviour before it affects and risks ouy | xrganizaton. "| hes tin deld based on some of is earacistis. Theo, es : ' : churacteristieshave been sed to define Big Dats, earlier nownes Vs eon, ~= 0.3. Write short note on Drivers for big data. tuned a, Ans, There ate three contributing factors or drivers for big data. These drivers are consumers, automation and monetization, More than cach of these contributing factors, the creation of big data. With increasing automatio mn and consumption opporti | scalabie ‘consumers and the snt marketplace for big drivers are explained below ~ (Sophisticated Consumers ~The increase in information vel and the associated tools has created a new breed of sophisticated consumers. These consumers are fer more analytic, far savvier at using statistics, and far more ‘connected, using social media to rapidly collect and collate opinion from others. (ii) Automation ~ Marketing and sales have received their biggest boost in instrumentation from Intemet-driven automation over the past 10 Browsing, shopping, ordering, and customer service on the web not ‘only user but also has created an enormous ‘ ,, product and sales eae ie Fig. 1.1 Five V's Big Data Characteristics 1¢ buyer’s behavior. Each sequence of web clicks cap io ee ane ee ri fers to the quantity of data gathered by a company. dysphor® | ‘This data must be used further to gain im aye collected, collated and analyzed for customer delight, puzzlement, uence knowledge. Enterprises are or outright defection. More information can also be obtained about se! awash with ever-growing data of all types, amassing terabytes even leading upto a decision, . bytes of it ring 12 terabytes of tweets per day into (iil) Monetization — A big data analytics perspectiver, a “¢4 tproved product sentiment analysis; o¢ converting 350 billion annual meter is the biggest enabler to create an extemal market place where W® adings to better predict power consumption).:., 6 Big Data Moreover, Demchet iarety—Itrefer tothe type of data that bg data ca com ybe structured or unstructured. Big data consist deen of data, including structured and unstructured data such as = logfiles and so on. The analysis of combined ed) situations, and so on, such as monitoring of live video feeds from surveillance cameras 10 target pointe or exploiting the 80% data growth in images, video and documents to Ines customer satisfaction (Gx) Value ~ It refers to the important feature of the dia which is defined by the added-value thatthe collected data can bring tothe intended process, activity or predictive analysis/hypothesis. Data value the events or processes they represent such as stochastic, probe! orrandom, Depending on data, store for longer period ‘ypes brings new problems, ated to the data volume and vatiey. \e degree in which a leader trusts information 5 is very important for the business future. However, as Jeaders do not trust the inf big data presents a huge c! generating tus it lenge as the number and type of sources grows Q5. Explain big data types with examples. Or a structured and unstructured data. Write short note on structured beeen : mt fate yo resem Ans. Big data encompasses everything, from dollar transactions 9 ee trjintes audio. Therefore, taking advantage of big data reaules this information to be integrated for analysis and data management ‘more difficult than it appears, cludes huge volume, onal database stored in relations i “ eS i) Structured Data eae ‘able inthe format of row and col mata | Web where various form of data need medium for interchanging the ‘ntroouetion of Big Date 7 ions by creating a model. The model allows as well as gives permission tured gery langage (SOL) wed formanagee hn eae (i) Semistructured Data-Daia which ninth fomoveoeraca Joes not fit the data moc as This form of data increased rapidly after the introduction ike XML and JSON, Example ~ CSV, XML and JSON documents are semi-structured ‘NoSQL databases are considered as semi-structured. ‘tured Data ~ Data without any specific structure and in a databank. Volume ‘ough to manage and anal Fig. 1.2 Big Data Types gp ——2.6. Give advantages of big data o, er traditional data, major source for storing and about 30-40 years back. The steuctured data and the Introduction of Big Data 9 Data schema | Fixed schema | Dynamic schema Preserves the information in data Cost etfective [Accuracy |Less accurate results [High accurate | Confident results results and reliable Q.8. Describe history of big data. _Ans, Big data is a long evolution of capturing and using of data and not 2 new phenomenon. Big data is the future act that will bring change in the way ‘wen society, just like the other developments in storage of data, processing, data and inlernet. The ancient history of data is when humans used tally ks for storing and analysis of data about C 1800 BCE. The tribal peoples gi and find the solution price, improves perfor architecture is based on microprocessors which is eco to centralized database which is based on mainframe and distributed ave has more computational power as compared to traditional. Traditimal ae | systems are based on structured data whereas big data uses seri as well g| used 10 mark notches into bones or sticks for calculations, which would make unstructured data, Traditional database store small amount ofcaaiihare| them predict sbout how long their food would last, One ofthe ealies prebistoric from some giga-bytes to terabyte however big data can store and analyze dat| tt storage is Ishengo Bone now known as Uganda which was discovered in ranging from hundreds of terabytes or petabytes and more. Storing large | '960. Then in C2400 BCE came the very first device particularly for performing amount of data reduces the cost which will help the business intelligence (B)), coe paces: a ao a otamegon Gti dees Bianchi acters Se a uy sane which cannot be changed once saved. Traditional database system requ complex and expensive software and hardware for managing large amount of data, While in big data, the large data is divided into several systems, ths | amount of data in each system is reduced. This makes the use of big =] simple and cheap. @.7. Compare traditional data and big data. ae ‘Ans. The comparison of traditional data and big data is given in table ‘Table 1.1 Comparison of Traditional Data and Big Data mnomical 8s compared data analysis. In 1880, used punch eards for years of work in 3 months designed automated computation ete. Then 28 a German-Austrian Advantage of Traditional Data Big Data Big Data ‘Then came the Business Intelligence and start of large data centers Data Ceowalized Distributed Cost effective wher deaf eaional daar nd Material Reuienen Planing ems architecture |database database ty I Lamon int a Improves vat first use ofthe term big deta was made by Erik Larson in the ee ee eee wher be id at The fees of ig ata hey Volume {Small amount of | Lange amount of forthe consumer's benefit, But deta have a way of being used dala, Range ~ Giga-| Range _-| i |S4 Prcesig coins, | own desk ste unbounded steams of | fault-tolerant plupeale Apste [Infasctie and Highest, a seme p fp completeness SQL steam | Sensor, M2M, and tele | SQL-based, realtime st (i) }Dmad Infrastructure and__| High performance dist fsscrver | matics applications aming big daa platform, platiorm execution engine, good pro [Splunk Collect and hamess _| Fastand easy to use, dynamic| rammability: machine data environments, sie from (i Apache | Machine teaming | Good maturity. Taptop to datacenter Jmahout algorithms in business (0) JApacke kafka| Distributed publish sub-| High-throughput sueam of Gv) Jaspersoft |Businessintelligence | Cost-effective, self-service scribe messaging system | immutable activity data Bisuite software BL at scale, (vi) {SAPHHana | Platform for real-time | Fast in-memory computing (») [Pentaho | Business analytics | Robustness, scalability, fh —— aod bettie emalitic bees platform bility in knowledge discovery iii) Big Data Tools Based on Interactive Analysis ~The interactive assis | seta prevents th daa nan irate envroumen,aowag ro (i) |Skytree | Machine leamingand | Process massive datasets undertake their own analysis of information. Users are dizetly connected to server fadvanced analytics | accurately at high speeds. | the computer and hence can interact with it in realtime. The data ean be (i) {Tabteax | Datavisuatization, | Faster, smart, ft, beatiful ard) viewed, compared and analyzed in tabular or graph format or both atthe business analytics [easy t0 use dashboards. | stme time (ii [Karmasphere} Big data workspace | Collaborative and standards (@)1n2010, Google proposed an interactive analysis system, named ratio eat based unconstrained analyt, Dremel, which is scalable for processing nested data, Dremelhas very diferent nay arene architecture compared! known Apache Hadoop, and acs asa successial a ns. Ithas capability to run (2) [aed open Pen angen! nd ond by means of combining for interactive Processing large amount of data in parallel. It provides a general jictunism to distribute aggregate workload across differe aoe 85, Hadoop is designed for baich processini e @ real-time and high performance engin ‘hroughout latency in its implementations. Stream big dal 110 Google's Dremel. For drill, there is more ious different query languages, data formats and14, Write short note 2 Introduction of Big Data. 21 Ans, For big dal ws. This approach leads to faster have been developed : realy a Whe pc ao ae ized to store and query data management system. Hadoop, is used for stor Before processing big information it must be recorded feom ig sources. In the order of its happening it mast ought tobe recarde by mete icipated or unpredi ‘data has to be different than that for tra analytic tools and : : Redundancy i npr: handle a reat dat of data 0.15, Describe the architecture for big data, Irom many sources. Redundancy comes in many forms. For instance, i he Ans. The architecture for big data is shown in fig. 1.5. ‘company has created a private cloud, company may want to crete redundancy win private areas so that it can scale out to support changing workoads. If 2 ae company needs to limit intemal IT growth, it may use external cloud services to to mm resources. In some cases, this redundancy may come inthe form as a Service (Sax), allowing companies to carry out advanced data ig Data Applications Reporting aod Vwalzation catihuercians “Analyties (Traditional and Advanced) £ cmographies or shifts in patient needs. This data about patients needs to be Analytical Data Warehouses and Data Marts requirements and to protect patient privacy. 10 is allowed to see the data and when they io be able to verily the identity of as protect the identity of patients. These "must be part of the big data fabric from the out set, and not an after thous (x) Operational Data Sources — Concerning big 6a nteruces and Feeds from/to the Internet Fig, 1.5 Big Data Architecture enone litte and Feeds What makes big data bigs the ‘ies on picking up lots of data from lots of sources. Therefore, open opp faces (APIs) ae a core part of any big data architec level and between every ayer hivecture also must work to porting infrastructure of organization oF company.22 Big Dota For instance, the company might be interested in running model, Intro is safe to drill for oil in an offshore area, provided ae 10 how these data elements offer context ba essed, With big data, rportingand dat vin ontext of how data is related and the iships on the future, oe Big Date Applications ~ Tradonally busines is ancipeted data would be used to answer questions about what o do and when todo data oftemperature, salinity, sediment resuspension, and me chemical, and physical properties of the water column fun this model using a traditional server configuration. Howe, distributed computing model, a day's long task may take right also determine the Kind of database that company would wae Certain circumstances, stakeholders may want to understand how su! il distinct data elements are related, or the relationship between social no a activity and growth in sales. This isnot the typical query the corppany wi ask ofa structured, relational database. A graphical database might be g,74 sing the developm: it eboice, as it may be tailored to separate the Fes get advantage of the unig “properties” or the information that defines that nd the “edge relationship between nodes and properties. Using the right database may, met, AM of thes sp application might beable to monitor pematre infest ‘termine if data indicates when intervention is needed. In mansfacturns, a (i) Organizing Data Services and Tools ~ Indeed, 10a he big data application can be sed to prevent a machine from shuting down that organizations use is operational. A growing amount of data comes fron, ‘a production un. A big data traffic management application may reduce umber of sources that are not quite as organized or straightforwarc id, the number of traffic jams on busy city highways, decreasing the number of data that comes from machines or sensors, and massive public and priv) accidents while saving fuel and reducing pollution. data sources. In the past, most companies were not able to either capaci) 9.16, What do yow inean by big data analytics ? Explain various types store this vast amount of data. It was simply too expensive of too overw! of analytics, Even if companies are able to capture the data, they do not have the tools “Ans. Big dain analyticg, is the process of examining large data eet that anything about it. Very few tools can make sense of these vast amoun's\ containing a variety of data types ie, big data to uncover all hidden pattems, data, The tools that did exist were complex to use and did not produce 4 arknown correlations, market trends, customer preferences and other useful within a reasonable time frame. In the end, companies who really want! business information, Then analytical findings can lead to more effecive and technical applications, | ' improve performance. Typically, a graph database may be used ia sci do the enormous effort of analyzing this data were forced to work i) marketing, new revenue opportunities, better customer service, improved Snapshots of data: This means that stakeholders may miss ott o” 7H operational efiiency, competitive advantages over rival organizations and events as they may not have been captured in a certain snapshot. other business benefits. ta Mares—Aferacont| — The primary goal of big data analytics is to help companies make more s often import informative business decisions by enabling data scientists, predictive modellers ‘and other analyties professionals to analyse large volumes of transactional data, as well as other forms of daa that may be untapped by more conventional business intelligence (Bl) programs. That could include web server logs and Internet click stream data social media content and social network activity always 0 SPO, ex from customer e-mails and survey responses, mobile phone cll on the capability to ereate reports to give them an understanding of W'" dctal records and machine data captured by sensors and connected 10 the ‘aia tells them about everything from monthly sales figures t0 pe Internet of Things. {powth Big data changes the way the data is managed and used asec Bie data burst upon the scene in the first decade ofthe 21st cenur. od Scola manage an aay enough data, tay we ET he tego to enact were nie an ts is ‘Arguably. ‘oolstoelp management truly understand the impact not just oF ims like Google, Linkedin, eBay and Facebook were round big data (vii) Analytical Data Warehouses anc24 ‘Big Data from the beginning. They did not have to reconcile or in egrate bi Introduction of Big Data. 25 dag, ‘more traditional sources of data and the analytics perform : they did not have that much of at They id each feral espe mec nate ig big data technologies with thei TT infrastructures ns m4 a infrastructures did mot exist tand alone, big ace Sy could be the only focus of analytics, and big data technology acy" gush a5 ne alte Sali eaurcmaae could be the only architecture. hig joo pounpe cece Analytics can be classified into following three types — {versus building a new one. (i). Predictive analytics, 0.17. Explain core components of analytical data architecture. IR.GRY., May 2019 (VIII-Sem,)} 1 big data storage and analytics platform provides resources and ' for storage as well as for batch and real-time processing of the vig provides main integration interfaces between the site operational plstform and the cloud data lab platform and the programming interfaces for {implementation of the data processes. The internal structure of the Pig data trage and analytes platform s given in Fe 1.6 Gi) Descriptive analytics Gi) Prescriptive analytics. (Predictive Analytics —Predictive analysis establish patterns and gives list of solutions which may come for gi Predictive analysis study the present as well as past data and, happen in future, give probal sed onl big data to forecast other data which we do not have This analytical mel is one of the most commonly used methods used for sales lead scoring sa} media and consumer relationship management data ‘Three basic elements of predictive analytics are as follows — ision analysis and optimization ransaction profiling. (i) Descriptive Analytics ~ Descriptive analytics also known ‘mining, ope ‘appening in real-time. It is one of the simples of analytics as it converts big data into small bytes. The result Fig. 1.6 The Internal Architecture of the Big Data Storage and Analytics Platform stored in the distributed file system, which s responsible id replication of large datasets across the multiple servers access tothe structured data is provided bythe distributed standard SQL interface. The main component responsible the distributed data processing framework, which provides el API fr the implementation of the data pre-processing tasks and for mn ofthe predictive functions. Predictive functions are-. ing revenue, reducing oPeX, Chum and ony, we as Key business objectives. Introduction of Big Data. 29 ven eatment sage done a prone o each dae near ns 7 ON ee fon can take better decisions respo stage umber o vanced analytics where they are investing Now and where aia acs hece yeas. invest in th Operators face an uphill challenge whet they need to g compelling, revenue generating services without overload (vi) In Agriculture ~ & biotechnology firm uses treatment, increasing number Frequent post treaiment (iv) For Insurance Companies ~ Governent for giving medical ia to patients do large amount of expenditure, By using BDA analysis, prediction and minimizing fraud medical claims ean be done IDA technique ter period oft ich are effective in vernment analyzes this massive tons between weather and disease and accordingly preventive measures are taken. Public health surveillan Ans, Advantages of using Big Data Analytics in Healthare Se|™?"°'% % "=U ie esPone io dssase outbreak quick by wing BD Advantages of using big data analytics in healthcare sector are as For Research ~The large amount of data produced, gives Q.19. Explain advantages of using big data analytics in hea | 4s! 0 predict epidemics, by finding cor sector and banking sector. For Pharma Companies ~'To improve workflow quality and quantity ike predictive modeling, statistical tool and algorithms. These improve the oulcome of experiment and provide better understanding of developing ‘drugs pharma companies need new tools. Tis tool successfully navigates the regulatory approval and marketing process. Advantages of using Big Data Anal Advantages of using big data analytics in banking in Banking Sector ~ are as follows ~ ents is taken into account. The move is towards formuis (patient on personalized treatment) on the geno response to certain medicines, allergy, and family history. When gen« kaown completely, some kinds of relations are and the disease. T} reputation or account igh all the information and provide fork process, save time and formation and its proper knowledge allow organization to identify 1es before they affect their customers. Fraud Detection and Prevention ~ One of the most important ed by banking sectors is fraud. Big data transactions are done and provides security as well as safety 10 re system, lual. The patient gets advantage by various ways such as correc “Selfectv line of treatment, beter health related decisions, prev ime, continuous health monitoring of patients by wireless devices, Petsonal line treatment, increase life quality and expectancy. ‘get readmit ater treatment, identi that hosp identification of patents Vagal, provider could develop pre health plans to prevent ho ‘queries that could be answered: jusing these BDA tools, include W"30. Big Data iv) Enhanced Reporting - Getting 0263s to huge i also contain different needs of different customers. Then ban ™%sr,| InroicSono Big Data 31 needs in a meaningful way. Banking industry provides the ct® of § ft of mritingl Talend rove source dt required by the customer by using big data _ : Brows oa vendor that ys havea (x). Risk Management ~ Early detection of hat it has to coexist withthe tary solution fora longtime for many restons, For example atin trou Hadoop to a database req sting aur cleansing and the datatype ae whic i the case wi most ircumsta sg castomer, loyalty programs are created, Targeted marketing yy Tee abjegralpernec aire ‘made as well as relationships are build between valuable custome. i sn Hadoop eliste. SQL-H i sofware a ( Customer Feedback ~ Customers sont ze problem. collected in text form from various social media sites and afte afer wat are the desired properties of big data system ? aad negative TH alley, Q.24. What are the desired properties of big data syste used to provideseniz) Ans, The desired properties of big data system are as follows — ; i) Error Tolerance and Robustness ~ Because ofthe cllenges 0.20. Explain open-source technology for big data analytics. ‘very much dificult to bud a system Open-source software is computer sofware that savant “ao the Hah hing” Stems a Temieed Se Oy te machines going dovin randomly, the complex semantics of wifors and improve and at times also to di the software. The opensn cee 08 many more: These tallenges make iteorplicat even name came out of a 1998 meeting in Palo Alto in reaction to Nesey sbusnes of big data system is needed to overcome the complexiiesassocited announce! Source code release for Navigator (as Mozilla). fy ee . a Although the source code is released, there are still governing bai (Ud) Scalability — itis the ability to maintain the performance with agreements in place. The m pleis the GNUG ne growing data and load by adding resources to the system. The lambda Public License (GPL), wt der the conti architecture is horizontally sealable across all layers of the system stack i. further developments and applications are put unde the same licens.” Thi¥scaling is achieved by including more mumber of chines. thatthe products keep improving over time for the greater population of (ii) Generalization ~ Avwide range of applications ean be function ina ‘Some other open-source projects are managed and supported by om uencral system. As lambda architecture is based on function ofall data, it generalizes companies, such as Cloudera, that provide extra capabilities, taint Hal aplication, whet ani managers systems so ia ames professional services that support open-source projects suchas Halo | - ae me aa | similar to what Red Hat has done for the open-source project Lin! iv) Debuggabitity— Aig data system must provide the information 1 system when things go Wr should be able to f the open-source analytics stack is ti! fequired to debug the system when things go wrong. We shou to have that value. ied by someone else’s predetermined ideas oF Vs Champagne, chief technology officer at Revolution Analytics Ft . "hature of the batch layer ‘by preferring to use recomputat ‘The open-source stack does not pul yo" ee * Pu you con® upvhen p 5 ret (0) Ad hoc Queries ~ The ability to perform ad hoe queries on the Gatais significant. Every large dataset contains unanticipated value init, Having customer. One ofthe great benefits of open-source lies inthe flexibility model — Yeu download and deploy itwihen you need said YS32 Big Data the ability of data WADOOP INTRODUCTION TO HADOOP, CORE HADOOP COMPONENTS, HADOOP ECOSYSTEM, HIVE PHYSICAL ARCHITECTURE, HADOOP LIMITATIONS, RDBMS VERSUS HADOOP need low latency re aire the update latency requirements may vary widely. In som. pes eed propagate immediately, but nother applications uptelgg of few hour is allowed. 0.1. Whats Hadoop ? | ans Hadoop was developed inthe year of 2008 by Doug Cuting and Ilitc Carta tis the Apache open source software which allows to sore Hea is huge volume of Jian a dstbuedenvzoomen nd ten in ava Hadoop is also called MRI The major social networking ses ae ney book. Yohoo, Google, Twit and Linkedln uses the Hadoop 5 chnology are fst, sealable, } loop consist of two main frame work Map reduce layer and prs layer Map reduce layer is used for processing the big dat (wherethe Serapplicaion executes) and 11DFS is used wo store the big dat whenethe bern zesien : | 0.2. Expluin main components of Hadoop. | Ans. Two main components of Hadoop are as follows ~ (The Hadoop Distributed File System (HDES) SHDFS is the DES breaks it‘Computer Cluster Hedoop 95 Hadoop’s parallel world has following two major layers — ‘Processing/computation ayer is called MapReduce Storage layer is called adoop Distributed Fite Sytem (HDS) (Gut. Explain the ecosystem of Hadoop. vans. tladoop is an open source framework maintained by the Apache on for reliable, sealable and distributed computing According tothe hadoop apache org the eomponents of Hadoop ae defined as projects ‘imeiion different to cach other's. Some of the widely used Hadoop (). Pig ~ tis a platform for HDFS. It consists of 2 compiles for eduer programs and a high-level language called Pig Latin. I provides perform daa extractions, transformations and fading andbasi nays nt having to write MapReduce programs. iy Hive — It's a distributed data warehouse. A data warehouse and tinge that presents data in the form of tables. Hive to database programming. (It was initially developed Fig. 2.1 HDFS & Map Reduce (i) Map Reduce ~ Because Hadoop stores the entire da ‘small pieces across a number of servers, analytical jobs can be dis to each of the servers storing part of the data. Each ser fragment simt a comprehensive answe 3 Facsbook). (iy HBase— 1fHadoop. HBase tab (iv) Zookeeper ~ = isa data mining software that can be easily are designed it (0) Mahout ~ Mabout is a data mining software that ¢ signed to continue to work even if there are failures. HDFScon, 4. Syanout offers java libraries or scalable machine learning algorithm monitors the data stored on the cluster. If a server becomes unavil ic Toa ee svg the dat, These machine Kaming algorithms disk drive fails or data is damaged due to hardware or software Pic, user to perform a task such as classification, clustering, association rule HDFS automatically restores the data from one of the known slats, ‘and predictive analysis. ter{ MapReduce monitors te ROB) (9) Casandra —Hadoop Cassandra provides datas that an be ating in the job, when an analysis job i ui sy scalable and highly avaiable without interruption in the job performance of them is'slow in returning an answer or fail (il) Chukwa — Chukvva isa data collections system which is mainly MapReduce automatically starts another-instance-of the task °% [titer tects outcomes of the collected data ‘a non-relational, distributed database that runs on ‘serve as input and output for MapReduce jobs, is an application that coordinates distributed server that has a copy of the data. i 1g system whicl. is used for Because of the way that HDFS and MapReduce work, Had00? Mbnfiguring the Hadoop cluster for fast processing ‘of Hadocp data. Spark sealable, reliable and fault-tolerant services for data storage ané axilifes not use MapReduce job of execution engine to run the job. It uses its ‘wn distributed runtime to complete the job. | Gx) Tez — Tez is a data-flow programming language build in the 0.3. Write short note on Hadoop’s parallel world. fier ‘Yam to execute an arbitrary DAG of tasks to process data for both Ans. The Hadoop framework app oat atch and interactive use-case Provides distibuted storage and computation across clusters °F Hadoop is designed to scale up from single server to thousands of online each offering local computation and storage. (x) Avro ~ Avro is used for data seri which provides a file for storing persistent data. Avro was ereated by Doug Cutting36 Big Date formaking Hadoop tobe writable in many programy ‘JavaScript, Python, Ruby. MeHg a web interface for managin RAM Size —For using Hadoo components. o oe Processor ~ Two or more core processors are needed for data between Hadoop (xiv) Oozie ~Itis a Hadoop job scheduler. ‘The Hadoop ecosystem is shown in fig. 2.2. fe] TaN [ome] adoop AWS} tntosphere cone] aca ding. Java, Eclipse, and Intell are also a use these software iho available from commer and Horton works are Tey charge for service. Their fee version can be used butte software “ppt is not available when needed. "Quer the requirements are mect, the Hadoop software canbe installed fe of cost to get started with the simple project. Later the software and vradware can be upgraded to work on more complex project with bigger jolumes and variety of big dats. es 0.6. What is S40 Ans. Sqoop is mainly used to transfer ie huge amount data between Hadoep land relational database. Sqoop refers “SQL to Hadoop and Hadoop to SQL I ‘witer |] pce || premet the relational database such as Mya Orci, poste SO = 91'S, Hive, HBase) and exports the data from HDFS to relational Tel Fig, 2.2 The Hadoop Ecosystem 0.5. What are the system requirements for installing Hade Ans, There is vation data. In order to address in the hadoop apache.org webs hardware and software requirement for using single node clust* listed below ~ op are noe the migration of heterogeneous data (a) Operating System — Hadoop project c2 Pe ee the Linux oF Windows operating system. Windows 10 versio® system has been found to be most efficient.38 Big Data ‘i - bi 0.7, What is zookeeper ? Also write its advantages ang ‘Ans In a traditional distributed environment coordinating 9%, Hadoop 99 askisquite complex and complicated. Butthe zookeeperoveyee on May iges of Maliout ~ with the help of simple architecture and its API. In the cluster nes th a to maintain the shared data and coordinating among then. isa doesn't support sala version in the development hhas no decision tree algorithm, ly developed at Yahoo for thei complex work scarch engine. Later it was acquired by open source Apache incu 1 orflow scheduler for managing Hadoop jobs. There are two major types of | rovie jobs are available, ic. oozie workflow and oaze sesie workflow it follows Directed Acyclic Graph (DAC | SEeuentil execution of jabs inthe Hadoop. | ‘The control flow node controls the begi xecution. In the oozie coordination, workflow jobs are triggered by time. sg advantage of Once — Advantages of Zookeeper ~ | () It allows the workflow of execution can be restarted from the @_ Itprovides reli (i 1 offers high synchronization and serialization. (iii) The atomicity eliminates the inconsistency of data among chu is fast and simple. | Disadvantages of Zookeeper ~ failure. It provide web service API (ie. we can control the jobs from ages of Oozie ~ not a resource scheduler. Its not suitable for off grid scheduling. (The large number of stacks needs to be maintained. 0.10. Give some applications of Hadoop. 4. What is Mahout ? Give its advantages and disadvantages. Ans, Now-a-days, with the rapid growth ofthe data volume, the storage os Epes tah eae Tromework wd 24 processing of Big Data has become the most pressing reeds ofthe : Mahont is the Apache open source software framework #)| enterprises. Hadoop as the open source distributed computing platform has provides data mining library. The processing task can be split in © ™H) become a brilliant choice for the business, The users can develop their own ‘segments and each segment can be computed on a different machine isc distributed ‘ations on Hadoop and processing Big Data even if they do to speed up the computation process. The primary goals of the Melos) not know the bottom-level details of the system. Due tothe high performance statistical modeli® of Tadoop, it has been widely used in many companies. Some applications of 1g and machine es!) Hadoop are given below ~ (2) Hadoop in Yahoo! ~ Yahoo! isthe leade in Hadoop technology Haadoop on various products, which include i, content optimization, anti-spam e-mail system, and advertising iy used in user interests) prediction, searching ranking, and advertising location. data clustering, classification, regression t collaborative filtering, It provides scalable data minin (it makes the decision based on the current and previous history © approaches for the data, Advantages of Mahout — (It supports complementary and distributed naiv classification. Cc a0 and Linkedin intemally uses Mahout for data mining. In the Yahoo! home page pet the real-time service system Temines the huge volume of data. silica the data from he database othe intrest maping tough the Apacs, ‘The companies such as Adobe, Twitter, Foursquare, Fae" | Every 5 minutes, the system will rearrange the contents ‘based on Hadoop yi Tvatien cluster and update the contents every 7 minutes. - (iv) Yahoo uses it for pattem mining. Hl 5 > a _40 Big Dato including search, tog videolmage analysis, every day. At present, the larg i ing, recommendation syslems, data warehousing pache Hadoop is an open-source (ii) Hadoop in Facebook ~ Facebook is the ‘om 2004 to 2009, Facebook has over & ed everyday is huge. This means that contains con sharing, comments, and users access histories. These da process so Facebook has adopted the Hadoop and Hie Q.11. Why Facebook has chosen Hadoop ? sters of commodity servers and each of thane ss Facebook is developing it discovered that MySQL cangg, | serves local CPUS and dik storage tat canbe evra by the system “ reauiremens. Afr longtem research and experiment, Faber. iagnap Arc anove Hdoop and Hbase as the data processing syst. The ma] yrs ore and proces Facebook choose the Hadoop and Hbas has the two aspects. Ontiren of computes wing i base meets the requirements of Facebook. Hbase can suppor these) romaigle servers thousands ofmactines {othe data, Although Hbase does not suppor the traditional outer fomsopeag_ with high degre of fault ewe the Hbase column oriented storage model brings high fleibiliy soni’ Hadoop cluster is broken down i , i sand distributed throughout the «ier form, HIbase is also a good choice for int eae eee huge dat, support the complex index with the flexible sea the speed of data access. On the other hand, Facebook has the confi *olve the Hadoop problems in real use. For now, Hbase has already benii| foie don processing. provide high consistency and high throughput key-value storage Hadoop framework includes four Namenode asthe only manager node in the HDFS may become the bit quel ofthe system. Then, Facebook has designed a high availabilty Namenod 2 i) Hadoop Common ~ Taey contain AvatatNode to solve this problem. In the aspect of the fault tolerance that are required by other Hadoop modules, The . ‘can tolerate and isolate faults in the subsystem of the disk. The failures! tem and OS level abstraction. It contains necessary Java files ‘hole clusters of Hbase and HDFS are part of fault tolerance system. that are required to start Hacoop Overall, according to the improvements by the Facebook, net meet the Facebook most requirements and can provide a stable effcies safe service for the Facebook users, 2.12. What are the advantages of Hadoop ? Explain Hadoop ari and its components with proper diagram. [R.GRV., May 2019 (WH Ans. Advantages of Hadoop — = (The scalability and elasticity of free open source Hadoo? ) on Standard hardware allow organizations to hold onto more da! os advantage of all ir data to increase opera tional efficiency and fa) edge. Hadoop supports complex analysis across large collections ©! ‘one tenth the cost of traditional solutions, , and this provides the scalability needed large scale data processing this is Hadoop Refer to Q4. hes Dieses42 Big Data Q.13. Explain the Hive physical architecture, with Hadoop. The main components of Hive are - YM ity External Interfaces ~ Hive fine (CLI) and web UI, and appl JDBC and ODBC. The Hive Thrift Serverexposes a faces “ ‘can be stored and queried o0. On droppi inet my, Creating a8 Internal Tale ~ - CREATE TABLE STUDENTS (roll_nursber INT, ame STRING, age INT, address STRING) ides both user inte ton programming very simple client API tog Stalements. Thrift isa framework for cross-language service wneHig) ROW FORMAT DELIMITED writen in one language (like Java) can also support cliente: Other) FIELD TERMINATED BY“, ‘The Thrift Hive liens generated in different languages arcuees Ma, (di) External Table — drivers like JDBC (java), ODBC (C++), oe “RY INT, name STRING, age INT, address STRING) The Driver manages the life eycle ofa HiveQL statement seiner igang) ROW FORMAT DELIMITED ean et, fomtete| _FIELD'TERMINATED BY" LOCATION” ROW FORMAT should have delimiters used o terminate the fel and es ike inthe above example the flds are terminated wih cor (7 (014 Give the linttations of Hadoop. ations of Hadoop areas fellows — (_ Secarty Conceras-Hadoops missing encryption athe songe which sa major ition fom goverment agencies and others organizations point of view tht prefer to keep es dala unr Wrap (i Vudnerable by Nature ~ Speaking of security the very makeup | of Hadoop makes runing i a Fisky propos | almost entirely in Java. thas been heavily exploited by cyber cr Hive > load data locallnPath ‘ home/hadoop/fi Select Command: Hive>select*from students; Tht Server 5 a result, it is not recommended for organizations small quantities of data. QUIS. Differentiate Hadoop vs distributed data base. peared [R.GPY., May 2019 (VITI-Sem)] Ans. Differences between Hadoop and distributed data base areas follows — [Parameter RDBMS ‘Hadoop Type of date Structured data with Unstructured and known schemes structured |_| Pata groups Records, long fields, Files objects, XML. Fig. 2.5 Hive Physical ArchitectureData modification | Updates allowed SQL & XQuery Simple ile compra 30+ years of innovation g | Batch proce Streaming access tof, files Acceptance Large DBA and appl development comm widely used, ad HADOOP DISTRIBUTED FILE SYSTEM, PROCESSING DATA Hecocp 48 ifferent slaves? nodes oe ts ind can be inere Me shown in the fig. 2.6. M a pe ote slave node erode manages the file 1FS isa distributed system DPS el rat ne difrences elec. WITH HADOOP, MANAGING RESOURCES AND APPLICATION" WITH HADOOP YARN, MAPREDUCE PROGRAMMING Ans, HDES also known as Hadoop Distribute Hadoop components which handles the storage of to.add more storage inthe system, then they can easily increase the ty by adding IDFS coi ‘are broken blocks whi }) Replica placement jeartbeat and block report HDFS high throughput ace: |, t-HDES blocks ar rg compart isk locks, inorder’ misinine the cost of sosks, By making block large enough, the time tly larger than the time to seek othe fom di canbe ma sient er st46 Big Data é . ua 0.19. Explain the architecture of Hadoop (Hpk). nelbieg Ans. HDFS is the master/siave structure. The N, “ i Hadoop 47 Jocks stored in the form of redundancy backup inthe D: fo in the Datanode, data storage lists tothe Namenode regularly stat Datanode, Metadata Opn we short note on the followings ~ a Authority management of IDES. Ga) Tintin of HDES. “Authority Management of HDES ~ DPS shares» silat esd im to POSIX. Each file of directory has an owner and a group, sions for the files or the directories are different o the Tin the same group, and other users. On the one hand, forthe Jee sets are required the authority to read and the authority to wit, Sete other hand, for the directories, users need the ~F authority to list the rectory content and -W authority 10 ereate or delete, Unlike the POSIX [Yotem, there is no sticky, setuid or setgid of directories because there is n0 ‘poneept of executable files in HDFS. Limitations of HDFS ~MDFS es the opensource implementation manager ofthe HDPSis equ! OFS (Gol File System) ison excelent ditibtd flesyem and ts the legate Te a atlmany aanges HDFS was designdtonin oath icp comms nvae inate Sie system, I wil pt al theo expensive machines, This ens that he robs of ode ulus files directories. At the same time, Namenode al sms gghly high, To give a fll considerati sign of HDES, we may find relations between each file and the locctay oy han Pc acs HDS has nt only advantages bat for dealing wih some specific masnaores aie cation ofthe data block. Dates pebens, The limitations of HDFS ars fllows~ the hard d I data in the system. However, all the data is pts {he hard dives but will be collected when the system sats to ide server of the required documents, ey ‘The isonly, ae eo {a backup node for the Names 6 stead. Because HDFS has only one single Master system, obviously become the in the Hadoop cluster environment, the Newel need to be processed by the Master, When there is @ huge Msstichee es DAS bur of requests, ther isan inevitable day, Cures ‘there are some This is the reason why Hadey et the whole operation eto projects to aes this imitation, sch a Hse wes the Upper aliemativebuskap, Treg doo? designed the Secondary ‘Namencée?Dala Management project to manage the data conpantt backup. The Secondary Namenodeusualy runsonasepaeh (b) Poor St Per ofthe reas communication at certain time interval to kee? ines aes immed metadata withthe Namenode so that it can 0%" NTN is by ce. sea lace where the real datas saved and tel be ‘management ofthe name files metadata into IDES neads to use the respond to the client determined by the48. Big Data possibleto manage milins fils. However, when he Files exten, the work pressures onthe Namenode is hevier andthe time of ‘is unacceptable. ae 0.21, Describe in detail about dataftow of ie read in Drs, Ans, To get an idea of how data flows between the elient int HDFS, the Namenode and the Datanode, consider the fig. 2.8, wine’ the main sequence of eveats when reading a file, ® copy of that block. Fur their proximity to the cl a MapReduce task, for ‘manages the Datanode and. on the stream (step 3). DFSInj Addresses for the rin pack tothe client, which call the end of the block is reach of view is just reading ith the DFSlnputSuream opening, ‘lo call the Namenode to retrieve the Datanode locations f blocks as needed. When the client has finished reading FSDatalnputStream (step 6). One important aspect of this design is that the client contacts Datanodes directly to retrieve data, and is guided by the Namnenode tothe best Datanode for tach block. This design allows EIDFS to scale to large number of concurrent liens, since the data traffic is spread across all the Datanodes in the cluster. The [Namenode meanwhile merely has to service block location requests (which it sores in memory, making them very efficent), and does not, for example, serve data, which would quickly become a bottleneck as the number of clients grew. (0.22, Describe in detail about dataflow of fle write in HDFS. ide isthe case of ereating anew file,50 Big Data DistributedFileSystem returns a InputFor ; yrmat is also {he input splits and dividing them into records, The dats of splits (typically 64/128Mb) in HDFS. An input that is processed by a single map, oe InputFormat class calls the get each file and then sends them t PORIbIe foe | ided ing ts() function and computes 5 jobtracker, whi a locations to schedule map tks to process then one ees Bsa | tasktracker, the map task passes the split tothe createRecerdia neo Ong on InputFormat to obtain a RecordReader for t 8) ety loads data from its source and converts into key-value pairs sunk Rede by mapper. The default InputFormat is TextinputFormat whieh = value of input a new value and the associated key is byte offer "tS *% A RecordReader is little more than an iterator over rec oF Over records, task uses one to generate record key-value pairs, which i passes | function. We ean see this by looking at the Mapper’s run(} method public void run(Context context) throws IOException, Interrupted xceptn setup(context); J while(context.nextKeyValue( )) { ‘map(Context.getCurrentKey( ), context.getCurrentValue ), conten) } cleanup(context); } SDataOutputStream for the client to start writing data to. Just as inthe read case, FSDataOutputStream wraps a DFSOutputStream, which handles tion with the Namenode. consumed by the DataStreamer, whose responsi to allocate new blocks by picking a li ‘we will assume the replication 1¢. The Data Streamer streams the packet and larly, the second d'8 \d last) Datanode io the ine (step 4). DFSOurputStream also maintains an intemal queue of packet waiting to be acknowledged by Datanodes, called the ack queue removed from the ack queue only when it has been acknowled! ‘the Datanodes in the pipeline (step 5). if a Datanode fails while data is being written to it, then the fol ye ‘sctions are taken, which are transparent to the client writing the data. Fist od) Jine is closed, and any packets in queue are added to the front fhe ek uc co tht Datos sara from the failed js given a new yunicated to eet wcrc a ee omsepe a bebe i polluaeta inges for a further rept ell 1d on another node, Subsequent p Sick av hen ested as neal ‘Wheo the clenthas finshed writing datait calls close( ) on the stream (step 6 This action flushes all the remaining packets to the Datanode pipeline and knowledgements before 1¢ Namenode to signal is complete (step 7). The already knows which blocks the file is made up of (via Data Streamer asking for block allocations), 0 itonly has to wait for blocks to be ‘minimally replicated before returning successfully. 2.23. What is the Google file system ? Explain architecture of = Ans. The Google File System (GFS) isa sealable distributed 9 file ae lage disteibuted data intensive applications. It provides fel tolerance WE ‘unning on inexpensive commodity hardware, ad it deliver heh SebsA Performance toa large numberof clients. GES provides familias OE ‘terface, though it does not implementa standard APL suchas POST FI Onganized hierarchically in directories and identified by path-names SA) the usual operations such as create, delete, ope, close ea ! i

DSGT TechNeo
50% (2)
DSGT TechNeo
226 pages
0 D 55
No ratings yet
0 D 55
23 pages
Data Analytics Notes
No ratings yet
Data Analytics Notes
1 page
cp5293 Big Data Analytics Question Bank
0% (1)
cp5293 Big Data Analytics Question Bank
13 pages
Characteristics of Soft Computing
88% (8)
Characteristics of Soft Computing
11 pages
A Mini Project Report On: "Big Mart Sales Prediction" by
67% (3)
A Mini Project Report On: "Big Mart Sales Prediction" by
23 pages
Secure Distributed Data
100% (1)
Secure Distributed Data
33 pages
Java Programming and Dynamic Webpage Design
80% (10)
Java Programming and Dynamic Webpage Design
3 pages
Viruses and Related Threats in Security
80% (5)
Viruses and Related Threats in Security
12 pages
Problem Representation in Ai
100% (10)
Problem Representation in Ai
12 pages
CS8661-IP LAB MAUAL UPDATION NEW (1) Lak
100% (1)
CS8661-IP LAB MAUAL UPDATION NEW (1) Lak
87 pages
CS8091-BIG DATA ANALYTICS UNIT V Notes
100% (4)
CS8091-BIG DATA ANALYTICS UNIT V Notes
31 pages
Sih PPT (1) 1 1
No ratings yet
Sih PPT (1) 1 1
11 pages
Top 100 Codes
100% (1)
Top 100 Codes
94 pages
CS8651 Internet Programming (Downloaded From Annauniversityedu - Blogspot.com)
No ratings yet
CS8651 Internet Programming (Downloaded From Annauniversityedu - Blogspot.com)
539 pages
Machine Learning With Python Report
100% (1)
Machine Learning With Python Report
41 pages
Super Important Questions For BDA
100% (1)
Super Important Questions For BDA
26 pages
Unit 3 Programming & CG
100% (2)
Unit 3 Programming & CG
2 pages
Assignment DBMS
No ratings yet
Assignment DBMS
8 pages
Python LAB Solution PRAYOSHA GTU
100% (2)
Python LAB Solution PRAYOSHA GTU
29 pages
Weather Forecasting Project
No ratings yet
Weather Forecasting Project
18 pages
Cloud Computing Architecture and Management
No ratings yet
Cloud Computing Architecture and Management
4 pages
Data Analytics Unit-3 Notes
No ratings yet
Data Analytics Unit-3 Notes
21 pages
QA
No ratings yet
QA
264 pages
Unit-Iii 3.1 Regression Modelling
100% (1)
Unit-Iii 3.1 Regression Modelling
7 pages
Abstract
100% (1)
Abstract
42 pages
Frequent Pattern Based Clustering
100% (1)
Frequent Pattern Based Clustering
18 pages
Note Big Data Analytics Bda - Lecturenotes
No ratings yet
Note Big Data Analytics Bda - Lecturenotes
120 pages
Web-Programming Notes MRIET
No ratings yet
Web-Programming Notes MRIET
142 pages
Irt 2 Marks With Answer
No ratings yet
Irt 2 Marks With Answer
15 pages
An XML File Which Will Display The Book Information and DTD
No ratings yet
An XML File Which Will Display The Book Information and DTD
7 pages
Final Document
No ratings yet
Final Document
73 pages
of Chapter 7 and Topics Beyond The Syllabus
No ratings yet
of Chapter 7 and Topics Beyond The Syllabus
36 pages
Hive File Formats Presentation
No ratings yet
Hive File Formats Presentation
19 pages
Web Essentials: Client, Server and Communication
No ratings yet
Web Essentials: Client, Server and Communication
18 pages
Question Bank - Data Analysis Using Python
50% (2)
Question Bank - Data Analysis Using Python
3 pages
cp5293 Big Data Analytics Unit 5 PDF
No ratings yet
cp5293 Big Data Analytics Unit 5 PDF
28 pages
Machine Learning Question Paper Solved ML
No ratings yet
Machine Learning Question Paper Solved ML
55 pages
R Lab Manual
No ratings yet
R Lab Manual
19 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
2 pages
R22 - IT - Python Programming Lab Manual
No ratings yet
R22 - IT - Python Programming Lab Manual
96 pages
DSA Sheet by Rohit Negi
No ratings yet
DSA Sheet by Rohit Negi
38 pages
r18 - Big Data Analytics - Cse (DS)
0% (1)
r18 - Big Data Analytics - Cse (DS)
1 page
Cloud Computing Architecture - Javatpoint
No ratings yet
Cloud Computing Architecture - Javatpoint
7 pages
Agriculture Management System-3
No ratings yet
Agriculture Management System-3
22 pages
Daa Lab Manual Kcs553 2022-23
No ratings yet
Daa Lab Manual Kcs553 2022-23
89 pages
BDA
No ratings yet
BDA
148 pages
WT Lab Manual
No ratings yet
WT Lab Manual
47 pages
FEATURES AND AUGMENTED GRAMMARS Overall
No ratings yet
FEATURES AND AUGMENTED GRAMMARS Overall
3 pages
UNIT5 - Comparison Tree
No ratings yet
UNIT5 - Comparison Tree
52 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
Jntuh Iot Le Cture Notes
No ratings yet
Jntuh Iot Le Cture Notes
92 pages
Data Structure Using Python
No ratings yet
Data Structure Using Python
20 pages
Data Analytics III I
No ratings yet
Data Analytics III I
86 pages
Ai Important Questions For Viva
No ratings yet
Ai Important Questions For Viva
4 pages
Unit 5 FSD Iv Icse
No ratings yet
Unit 5 FSD Iv Icse
40 pages
Bda Notes Jntuk R20 Unit 4
No ratings yet
Bda Notes Jntuk R20 Unit 4
14 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
AI Lab Manual-1
100% (1)
AI Lab Manual-1
16 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages

Big Data Shivani

Uploaded by

Big Data Shivani

Uploaded by

You might also like