0% found this document useful (0 votes)
50 views127 pages

Data Analytics Da Quantum

Good to go

Uploaded by

dev sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
50 views127 pages

Data Analytics Da Quantum

Good to go

Uploaded by

dev sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 127
y-23 (CS-5AT-8) Introduction to Data Analytics a warm] What is data analytics ? 1. Data analytics is the science of analyzing raw data in order to make” 'é about that information. C= 2, Any type of information can be subjected to data analytics techniques to get insight that can be used to improve things. 3. Data analytics techniques can help in finding the trends and metrics that would be used to optimize processes to increase the overall efficiency ofa business or system. 4, Many of the techniques and processes of data analytics have been automated into mechanical processes and algorithms.that work over raw data for human consumption. 5, For example, manufacturing companies often record the runtime, downtime, and work queue for various machines and then-analyze the data to better plan the workloads so the machines operate closer to peak Explain the source of data (or Big Data). Three primary sources of Big Data are: \l-Bocial data : a. Social data comes from the likes, tweets & retweets, comments, Video uploads, and general me hat are uploaded and shared via social media platforms. b. This kind of data provides invaluable insights into consumer behaviour and sentiment and can be enormously influential in marketing analytics. Scanned with CamScanner 135 CS5/n9 ties Data Anal another good our of be I data, and tong ce The pul ‘can be used to good effe 0 increase the volun, a Google ea bigdata. Machine data! information which i 2 Machine data js defined as informatic ich is gencrateg by a aoe “y nt, sepsors that are installed in machine injustsil eau p track user behaviour. 7 eb . coer sedata is expected to grow exponentially as thej " Inter b This ype ts er more pervasive and expands around thew! ofthing : th as medical devices, smart meters, road camera, Sensors “games and the rapidly growing Internet of Thing at Geliver high velocity, value,volume and variety of data in the very near future. a ‘Transactional data : : 7 ‘a Transactional datais generated from all the daily transactions tht take place both online and offline. (4c Invoices, payment orders, storage records, delivery receipts are characterized as transactional data. Write short notes on classification of data. 1 Unstructured data : \ se Unstructured data is the rawest form of data. \b-—Data that has no inherent structure, which may include text documents, PDFs, images, and video. © This datais often stored in a repository of files. 2. Structured data : \2— Structured data is tabular data (rows and columns) which are ve") well defined. \be Data containing a defined data type, format, and structure, whic? may include transaction data, traditi CSV files, simple spreadsheets ta, raditional RDBMS, CSV files ss 3% Semi-structured dat; i » Tpuldeta files witha distinct pattern that enables parsi96®™* be Reonsitont reap Language IKML} datafilesor JSON: strict, "* format is defined however the structure is not e Semi-st, “Structured data are often stored as files. Scanned with CamScanner yr naa sss (CS5AT-) Introduction to Data Analytics mel Differentiate between structured, semi-structured and ctured data. [Unstructured a - ite data | Ttis based on Itisbased on XML/|It is based on Relational database |RDF. character and table. binary data. Iyvansaction | Matured transaction |Transactionis [No transactio’ | sanagement| and various adapted from management an concurrency DBMS. mo concurrency. techniques. [Foxitility | Itis schema It is more flexible| It very flexible and dependent and less|than structured| there is absence of flexible. data but less than|schema. flexible than unstructured data. | [Scalability | It is very difficult to|It is more scalable |It is very scalable. scale- database|than structured schema. data. Query Structured query | Queries over (Only textual query| lperformance| allow complex anonymous nodes|are possible. joining. are possible. ‘| Explain the characteristics of Big Data. Big Data is characterized into four dimensions : ‘Volume: a. Volume is concerned about scale of datai.e., the volume of the data at which it is growing. b. The volume of datais growing rapidly, due to several applications of business, social, web and scientific explorations. 2 Velocity : \a—The speed at which data is increasing thus demanding analysis of streaming data. b. The velocity is due to growing speed of business intelligence applications such as trading, transaction of telecom and banking domain, growing number of internet connections with the increased usage of internet etc. © Scanned with CamScanner ; 155g athe ¢ forms of data to use ~ a iets different form: a to use for analys: \a-—Variety # It depit ‘Ysig Such ‘4 ured, semi structured and unstructured, seructred ucture oe city? inty ori . A Veracity is concerned with uncertainty or inaccuracy of iC the data will be inaccurate hence filter . tho data which s actually needed is a complicated tc ig statistical and analytical process has to go . nee intrinsic data for decision making. ‘the. data for data Cleansing il 1. Big data platform is. type of IT solution that combines the features and capabilities of several big data application and utilities within a single solution, ~ 2 Tis an enterprise ‘class IT platform that enables organization in developing, deploying, operating and managing. big data infrastructure! environment. . 3. Pig data platform generally consists of big data storage, servers, dalabss, big data management, business intelligence and other big date management utilities, 4 Wal Supports custom evelopment, querying and integration wi & she rimary benefit hind a big data platform isto reduce the complet , 2 multiple vendors/ solutions into a one cohesive solution. ni data rltform are also delivered through cloud where the provi” Provides an all inclusive bi, se eeheNs ig data solutions and services. ‘What are the features of big data platform ? Scanned with CamScanner 1-64 (CS-6AT-6) Introduction to Data Analytics _———_—____ Introduction to Data Analytics erin 7 =a cao Features of Big Data analytics platform : 1, Big Data platform should be able to accommodate new platforms and tool based on the business requirement. 2, Itshould supportlinear seale-out, 3, Itshould have capability forrapid deployment. \éc* Itshould support variety of data format. 5, . Platform should provide data analysis and reporting tools. 6, Itshould provide real-time data analysis software. \4 It should have tools for searching the data through large data sets. Why there is need of data analytics ? Need of data analyties : \4c It optimizes the business performance. \2x~ Ithelps to make better decisions. \8- Ithelps to analyze customers trends and solutions. \RGCTRD] what are the steps involved in data analysis ? Steps involved in data analysis are: 1. Determine the data: a. The first step is to determine the data requirements or how the data is grouped. . b. “Data may be separated by age, demographic, income, or gender. © Data values may be numerical or be divided by category. | | ~ Scanned with CamScanner ie: oa Data Analytics “19 ap \2-- Collection of data a. Thesecond step in data analyticsis the process of cole in This ean be done through a variety of sources such qs Bi online sources, cameras, environmental souregg, "bie, personnel. the Organization of data : a. Thirdstep is to organize the data, b, Once the datais collected, it must be organized so it om ization may take place on a 5; \yaed, Dare thatcan take statistical de eee & othe yp . \4e~“Cleaning of data : a. Infourth step, the data is then cleaned up before analysis, b, "This means it is scrubbed and checked to ensure there duplication or error, and that itis not incomplete. m8 luplication or error, and that it is not incomplete : Lee This step helps correct any errors before it tobe analyzed. \eiia] Write short note on evolution of analytics scalahiliy, 1. Inanalytic scalability, we have to pull the data’ together ina separate analytics environment and then start performing analysis, (Databases) The heavy processing occurs in the analytic environment Analytic server or PC 2. Analysts do the merge operation on the data sets which contain 10" and columns. Se oes on to adata analy, 5. The columns represent information about the customers such 25 spending level, or status, " In merge or join, two or more data sets are combined together ovat are typically merged / joined so that specific rows of one data table are combined with specific rows of another. ea Scanned with CamScanner 5, Analysts also do data proparation. Data preparation is made up of joins, aggregations, derivations, and transformations. In this process, they pull data from various sources and merge it all together to create the variables required for an anal 6g Massively Parale! Prgcossiny (MPP) aystem)s the most mature, proven. Fi idely deployed mechanism for storing and analyzing large amounts of data, ‘An MPP database breaks the data into independent pieces managed by ' independent storage and central processing unit (CPU) resources. Introduction to Data Analytics | 100 GB 100 GB |} 100 GB }| 100 GB 100 GB Chunks || Chunks || Chunks |} Chunks | | Chunks terabyte | _, | table 100 GB || 100 GB || 100 GB || 100 GB |} 100 GB Chunks || Chunks || Chunks || Chunks || Chunks | A traditional database 10 Simultaneous 100-GB queries will query a one terabyte one row at time. po 8, MPPsystems build in redundancy to make recovery easy. 9. MPPsystems have resource management tools : ‘a. Manage the CPU and disk space b. Query optimizer . With increased level of scalability, it needs to update analytic processes to take advantage of it. : 2 ‘This can be achieved with the use of analytical sandboxes to provide analytic professionals with a scalable environment to build advanced analytics processes. 3. One of the uses of MPP database system is to facilitate the building and deployment of advanced analytic processes. 4. An analytic sandbox is the mechanism to utilize an enterprise data warehouse. 5. fused appropriately, an analytic sandbox can be one of the primary drivers of value in the world of big data. Analytical sandbox : 1. An analytic sandbox provides a set of resources with which in-depth. analysis can be done to answer critical business questions. Scanned with CamScanner ’ Data Analytics 10g (8.577, mn, development i 2 Ananalytie sandbox ti 7 tie, proof of concepts, and prototyping. . Once things progress into ongoing, user-managed processes or prog... . rvenises then ta Sandbox should not be involved, ution 4. Asandbox is going to be leveraged by a fairly small set of users, 5. There willbe data created within the sandbox thats segregate fy, the production database, ' et 6 Sandbox users will also be allowed to load data of their own for bres time periods as part of. project, even if that datais not part of the official enterprise data model. j:| Explain modern data analytic tools. Fanalyy egy Sao i ‘Modern data analytic tools : \4-Apache Hadoop: \aAnache Hadoop, abig data analytics tool which is a Java based free software framework. = \D-— It helps in effective storage of huge amount of data in a storage place known as.a cluster. 7 ¢. Itruns in parallel on a cluster and also has hbility to process huge data across all nodes in it. ® ‘There isa storage system in Hadoop popularly known as the Hadoop (HDFS), which helps to splits the large volume of data and distribute across many nodes present in a a cluster, 2 KNIME: a. KNIME analytics platform is one of the leading open solutions for { data-driven innovation, data-driven innovation, i .b. This tool helps in discovering the potential and hidden in a huge ' : volume of data, it also performs mine for fresh insights, or predic's | the new futures, \ 3. OpenRefine: \a-—OneRefine tool is one of the efficient tools to work on the messy {e and large volume of data. { b, Ttincludes cleansing data, transforming that data from one format { another. \e-~ It helps to explore large data sets easily, 4. Orange: \&— Orange is famous open-source data visualization and helps in dat i analysis for beginner and as well to the expert, a Scanned with CamScanner 1-10 (CS-51T-6) Introduction to Data Analytics $$$ __Intoduetion to Data Analyties b. This tool provides interactive w to create the same which helps 5, RapidMiner: a, RapidMiner tool operates u: rates using visual programming and also it is much capable of manipulating, analyzing and modeling the data. RapidMiner tools make data science teams easi i b. Rapic ‘e data science teams easier and productive by using an open-source platform for all their jobs like machine learning, data preparation, and model deployment. R-programmin; \~ Ris a free open source software programming language and a software environment for. statistical compu iting and graphics. b, Ibis used by data miners for developing statistical software and data analysis. orkflows with a large toolbox option in analysis and visualizing of data. \s—It has become a highly popular tool for big data in recent years, Datawrapper: \-4~ Itis an online data visualization tool for making interactive charts. \b-Th.uses data file in a esy, pdf or excel format. \_t-~ Datawrapper generate visualization in the form of bar, line, map ete. It can be embedded into any other website as well. \s-—Tableauis another popular big data tool. It is simple and very intuitive touse. b. Itcommunicates the insights of the data through data visualization, Through Tableau, an analyst can check a hypothesis aid explore the data before starting to work on it extensively, Que 1.13, | What are the benefits of analytic sandbox from the view of an analytic professional ? Benefits of analytic sandbox from the view of an analytic Professional : : \4r“ Independence : Analytic professionals will be able'to work independently on the database system without needing to continually 80 back and ask for permissions for specific projects. 2 Flexibility : Analytic professionals will have the flexibility to use whatever business intelligence, statistical analysis, or visualization tools that they need to use. 3. Efficiency : Analytic professionals will be able to leverage the existing enterprise data warehouse or data mart, without having to move or migrate data, en ‘ Scanned with CamScanner 1-11J (C8.5r7. Data Analytics TCs5aT6) se i -educe focus on the administrati Freedom : Analytic professionals can reduce foc inistration of systems and production processes by shifting those maintenanc, cla be realized with th 5. Speed : Massive speed improvement will be realized with the move to parallel processing, This also enables-rapid iteration and the ability ¢° “fail fast” and take more risks to innovate. Quedid: What are the benefits of analytic sandbox from the view of IT? Answer | : Benefits of analytic sandbox from the view of IT : 1. Centralization : IT will be able to centrally manage a sandbox environment just as every other database environment on the system is managed. 2. Streamlining: A sandbox will greatly simplify the promotion of analytic processes into production since there will be a consistent platform for both development and deployment. ‘ 8 Simplicity : There will be no more processes built during development that have to be totally rewritten to run in the production environment, 4. Control : IT will be able to control the sandbox environment, balancing sandboxneeds and the needs of other users. The production environment is safe from an experiment gone wrong in the sandbox. 5. _ Costs : Big cost savings can be realized by consolidating many analytic _ data marts into one central system. ‘(Que 1.15. | Explain the application of data analytics. Application of data analytics : 1 Security : Data analytics applications or, more specifically, predictive analysis has also helped in dropping crime rates in certain areas. 2. Transportation : a. Data analytics can be used to revolutionize transportation. b, It can be'used especially in areas where we need to transport @ large number of people to a specific area and require seamless transportation, i » 3. Risk detection: a. Many organizations were struggling under debt, ted a ©" solution to problom of aud, ee er debt and they wanted b. . They already had enough custo: in thei 60, they apred deena customer data in their hands, an 4 —d Scanned with CamScanner 12.5 (CS-5IT-6) j2d Introduction to Data Analytics this ied ‘divide and conquer’ Policy with the data, analyzing recent cedorstend eres and any other intportant idferaeaies to understand any probability of a customer defaulting, 4 Delivery: r 5, Fast internet allocation : a While it might seem that allocating fast internet in every area makes a city ‘Smart, in reality, it is more important to engage in smart allocation. This smart allocation would mean: understanding b. It is also important to shift the dat: priority. It is assumed that financi: the most bandwidth during weekdays, while residential ‘areas require it during the weekends, But the situation is much more complex. Data analytics can solve it. ¢. For example, using applications of data analysis, a community ean draw the attention of high-tech industries and in. such cases; higher bandwidth will be required in such areas. 6 Internet searching : a allocation based on timing and ial and commercial areas require a. When we use Google, we are using one of their many data analytics applications employed by the company. b, Most search engines like Google; Bing, Yahoo, AOL ete., use data analytics. These search engines use different algorithms to deliver the best result for a search query. 7. Digital advertisement : a. Dataanalytics has revolutionized digital advertising. - b. Digital billboards in cities as well as banners on websites, that is, most of the advertisement sources nowadays use data analytics using data algorithms. ae] What are the different types of Big Data analytics ? sae Different types of Big Data analytic 1. Descriptive-analytics : a. Tt uses data aggregation and data mining to provide insight into the past, % Scanned with CamScanner interpretable by humans. 2 Predictive analyties Tt uses statistical models and forecasts techniques to understang the future. b. Predictive analytics provides-companies with actio: nable insi, based on data. It provides estimates about the likelihood ofa fants outcome, & Prescriptive analytic a. Ituses optimization and simulation algorithms to advie outcomes. b. _Itallows users to “prescribe” a number of different possible actiong and guide them towards a solution. 4. Diagnostic analytics : a. _Itis used to determine why something happened in the past, b, “Itis characterized by techniques such as drill-down, data discovery, data mining and correlations. ©. Diagnostic analytics takes & deeper look at data to. understand the root causes of the events. © ON Possible Explain the key roles for a suecessful analytics projects. Key roles for a successful analytics project : 1. Business user : ey 2. Business user is someone who understands the domain area and usually benefits from the results, ele ee b. This person can consult and advise the project team-on'the context of the project, the value of the results, and how the: outputs will be operationalized. : : al Scanned with CamScanner en —ee 1-143 (CS-5/1T-6) ion to Data Analytics | ie ——— c. Usually a business analyst, line manager, or deep subject matter export in the project domain fulfills thia role. 2, Project sponsor: a, Project sponsor is responsible for the start of the project and provides all the requirements for the project and defines the core business problem, b. Generally provides the funding and gauges the degree of value from the final outputs of the working team. ¢. This person sets the priorities for the project and clarifies the desired outputs. 3 Project manager : Project manager ensures that key milestones and objectives are met on time and at the expected quality. 4, Business Intelligence Analyst : a. Analyst provides business domain expertise based on a deep understanding’ of the data, Key Performance Indicators (KPIs), key metrics, and business intelligence from a reporting perspective. b. Business Intelligence Analysts generally create dashboards and reports and have knowledge of the data feeds and sources. 5 Database Administrator (DBA) : a. DBAprovisions and configures the database environment to support the analytics needs of the working team. « b. These responsibilities may include providing access tokey databases or tables and ensuring the appropriate security levels are in place related to the data repositories. 6 Data engineer : Data engineer have deep technical skills to assist with tuning SQL queries for data management and data extraction, and provides support for data ingestion into the analytic sandbox. 7. Data scientist : i a. Data scientist provides subject matter expertise for analytical techniques, data modeling, and applying valid analytical techniques to given business problems. b. They ensure overall analytics objectives are met, ¢. They designs and executes analytical methods and approaches with the data available to the project. qua Explain various phases of data analytics life eycle, eas Queyet | Various phases of data analytic lifecycle are : Phase 1 : Discovery : Scanned with CamScanner 1415 F(CS.5 777, Data Analytics ) 1 Luding rey the business. domain, incl 8 Televan Ja Phage 1 te ee ao orgualeata or business Unit has attempet i ‘SUC 7 : ane projects in the past from which they can learn, ‘The team assesses the resources available to support the 1 time, and data. terms of people, technology, time, , i aan activites this phase include framing the business problem, asan analytics challenge and formulating inital hypotheses (Ls) to tg and begin learning the data, Project in Phase 2 : Data preparation: it res the presence of an analytic sandbox, in which the team cmwoneniie and perform analytics for the duration ofthe project ‘The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. Data should be transformed in the ETL process so the team can work with it and analyze it 1m this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data. Phase 3 : Model planning : 1 Phase 3 is model planning, where the team determines the methac techniques, and workflow it intends to follow for the subsequent model ilding phase. 2 The team explores the data to learn about the relationships between Variables and subsequently selects key variables and the most suitable models. ‘ Phase 4: Model building : 1. Tn phase 4, the team develops data sets for testing, training, and production ‘purposes. fi 7 2. Inaddition, in this phase the team builds and executes models based on the work done in the model planning phase, 3. The team also considers whether its existing tools will be adequate for P tual Tuning the models, or if it will need a more robust environment for executing models and work flows, ‘ Phase 5 : Communicate results : “ 1. In phase 5, the team, in collaborati i maj i serene if the resus a the project on, i Aior stakeholder a Project are i based on tho erera developed snr A sucess ora failure * 2. ‘Theteam should identi : develop anarrative woe sey Andings, auantify the business value, 4nd Rarrative to summarize and come ® takeholders. Phase 6: Operationalize : Se ening to 1. In phase 6, the documents, *™ delivers final "Sports, briefings, code, and technical Scanned with CamScanner 4-16 J (CS-5/T-6) Introduction to Data Analytics 9, Inaddition, the team may runa aproduction environment. pilot project to implement the models in What are the activities should be performed while identifying potential data sources during discovery phase ? Main activities that are performed while identifying potential data sources during discovery phase are : 1. Identify data sources : a. . Make alist of candidate datasources the team may need to test the initial hypotheses outlined in discovery phase. b. Make an inventory of the datasets currently available and those that can be purchased or otherwise acquired for the tests the team wants to perform. 2 Capture aggregate data sources : a This is for previewing the data and providing high-level understanding, b. Itenables the team to gain a quick overview of the data and perform further exploration on specific areas, c. _Italso points the team to possible areas of interest within the data. 3. Review the raw data: a. Obtain preliminary data from initial data feeds. b. Begin understanding the interdependencies among the data attributes, and become familiar with the content of the data, its quality, and its limitations. 4.. Evaluate the data structures and tools needed : a, The data type and structure dictate which tools the team can use to analyze the data. b. This evaluation gets the team thinking about which technologies may be good candidates for the project and how to start getting access to these tools. 5. “Scope the sort of data infrastructure needed for this type of problem : In addition to the tools needed, the data influences the kind of infrastructure required, such as disk storage and network capacity. ] Explain the sub-phases of data preparation. Sub-phases of data preparation are: 1. Preparing an analytics sandbox : Scanned with CamScanner os’ Data Analytic: JATI C8.57pp , ofdata preparation requires the team toy) ~ a moniter inwhich the team can explore the data wing interfering with live production databases. i i itis abest practice loping the analytic sandbox, i ceo cit [es " ae ofdatat theres as team members need access to high vot and varietios of data for aBig Data analytics project, ; © This can include everything from stimmary-level a | structured data, raw data fe call logs or web logs. 2 Performing ETLT: a. InETL, users perform extract, transform, . load processes to i data from a data store, perform data transformations, j data back into the data store. Seregated data eds, and unstructured text data from, i b i w form and loaded into the data store, where analysts can choose to transform the data into new state or leave itin its original, raw condition, 2 Learning.about the data : i % A cztical aspect ofa data science project isto become familiar with hi the data itself, 1 b.. Spending time to learn the nuances, of the datasets ‘Provides context + to understand what Constitutes a reasonable value and expected output. & In addition, itis important to cctalogue the data sources that the {2am has access to and identify additional data sources that the team can leverage. : W | 4 Date conditioning ; i & Data condition ning refers to the Process of cleaning data, normalizing datasets, and Performing transformations on the data, | Data conditioni ene an involve many complex Steps to join or merge d wise get datasets into astate that enables analysis in further phases, Ttis viewed as Processing step for data analysis, Scanned with CamScanner 1-184 (CS-81T-6) _ Introduction to Data Analytics paiee eae as wat | What are activities that are performed in model planning phase Activities that are performed in model planning phase are : 1. Assess the structure of the datasets : a, The structure of the data sets is one factor that dictates the tools and analytical techniques for the next phase. b Depending on whether the team plans to analyze textual data or transactional data different tools and approaches are required. 2 Ensure that the analytical techniques enable the team to meet the business objectives and accept or reject the working hypotheses. | 3. Determine if the situation allows a single model or a series of techniques as part ofa larger analytic workflow. Wae122] What are the common tools for the model planning phase? = Common tools for the model planning phase : L R: . a Ithasacomplete set of modeling capabilities and provides a good environment for building interpretive models with high-quality code. b. It has the ability to interface with databases via an ODBC connection and execute statistical tests and analyses against Big Data via an open source connection. 2 SQL analysis services : SQL Analysis services can perform in- database analytics of common data mining functions, involved aggregations, and basic predictive models. & SAS/ACCESS: a. SAS/ACCESS provides integration between SAS and the analytics sandbox via multiple data connectors such as OBDC, JDBC, and OLE DB. ; b. SAS itselfis generally used on file extracts, but with SAS/ACCESS, care can connect to relational databases (such as Oracle) and data warehouse appliances, files, and enterprise applications, Explain the common commercial tools for model Scanned with CamScanner Data Analytics 119 F(C8 gp Commercial common tools for the model building phase ; ty | 1. SAS enterprise Miner: . - $AS Enterprise Miner allows users torun predictive and deseyi eae es wine ee eee across the enterpri® b, It interoperates with other large data: stores, has many partnersh ; and is built for enterprise-level computing and analytics, Ps, 2. SPSS Modeler provided by IBM : It offers methods to explore thd analyze data through a GUI. 8. Matlab : Matlab provides a high-level language for performing a variety ofdata analytics, algorithms, and data exploration. 4 Apline Miner : Alpine Miner provides a GUI frontend for users to develop analytic workflows and interact with Big Data toolsand pation? on the backend, 5 5., STATISTICA and Mathematica are also popular and well-regarded dag mining and analytics tools. 1 Qtie 1.24] Explain common open-source tools for the mode] building phase. Free or open source tools are: 1. RandPLR: a. R provides a good environment for building interpretive models and PLR is a procedural language for PostgreSQL with R, b, Using this approach means that R commands can be executed in database. 8 Tt is.a free software programming language for computational modeling, has some of the functionality of Matlab. b. Octave is used in major universities when teaching machine learning. 8. WEKA:WEKA is a free data mining software package with an analytic workbench. The functions ereated in WEKA van ke executed withil Java code. : 4° Python: Python is a programming I ides tots r h language that provides * Higahine learning and analysis, such as mumpy, stipy" pando, nd el data visualization using matplotlib, Scanned with CamScanner 1-205 (CS-5/IT-6) Introduction to Data Analytic 5, MADIib: SQL in-database implementations, such as MADIib, provide an alternative toin-memory desktop analytical tools. MADIib provides an open-source machine learning library of algorithms that can be executed in-database, for PostgreSQL. ©OO Scanned with CamScanner jompetitive Learning, j "incipal Component Analysis ral Networks 271d Cs5a7-6) z. : a Scanned with CamScanner 9-23 (CS-B/IT-6) Data Analysis 1... Regression models are widely used in analytics, in general being among the most easy to understand and interpret type of analytics techniques. 2. Regression techniques allow the identification and estimation of possible relationships between a pattern or variable of interest, and factors that, influence that pattern. . 3.. For example, a company may be interested in understanding the effectiveness of its marketing strategies. 4, - Aregression model can be used to understand and quantify which of its marketing activities actually drive sales, and to what extent. 5, Regression models are built to understand historical data and relationships +o assess effectiveness, as in the marketing effectiveness models. 6. Regression techniques are used across a range of industries, including finaneial services, retail, telecom, pharmaceuticals, and medicine. What are the various types of regression analysis techniques ? Various types of regression analysis techniques : 1. Linear regression : Linear regressions assumes that there isa linear relationship between the predictors (or the factors) and the target - variable. 2 Non-linear regression : Non-linear regression allows modeling of non-linear relationships. 3% Logistic regression : Logistic regression is useful when our target variable is binomial (accept or reject). . 4. Time series regression : Time series regressions is used to forecast future behavior of variables based on historical time ordered data. Scanned with CamScanner JOUUROgSUED YUM pauUueds, a paehasie “Sag FRESE] wet sort nate o ne eeenion model Linear regression mode! iP Ne conaiter the modeling beiween the dependent and one independen, Meco gen thoresonly one indopendent variable inthe rogreg rare mode is generally termed asa linear regression a Consider a simple lineer regression model y= Bet BFE Where, ‘ | is termed asthe dependent or study variable and Xis termed ast independent or explanstory variable. ‘are the parameters of the model. The parameter, ‘tors ad B, Baer et farce er, and the parameter is termed ata Slope paraetr 5. Tee parameters aro usually called ax ogretioncoafcients, The sae rerccirarenmonea, axons forte ture of data ton, ereraine sod reprevate the diffrence between Uh rue aad obearvd eatin 4 Thorocan be several retsons for such difreace, ich athe ft of ltd varlablesin the modo variables nay b gual, inherent Fandomness inthe cbvrvations a. ‘Wena that isohered as independent and dential ditibted Tandon arable withncan sero and constant varia o and asnune ‘that eis normally distribated. ‘Theindopendnt variables ar viewed as controlled hy the experiment, toltinconasarod as nowstochaste whereney x viwed as a and ‘variable with iy) = y+ BX and Var ) = 0%. 7. Sometimes X can also be a random variable. In such a ease, instead of ‘the sample mean and sainple variance ofy, we consider the conditions! ‘mean of y given X= x as ‘and the conditional varisnee ofy given X = a8 * Vari) When the values of fy B, and o* are known, the model is completely

You might also like