0% found this document useful (0 votes)
17 views5 pages

Chapter 6 - Data Science and K Nearest Neighbour Model (PART B)

Data scientist and nearest neighb

Uploaded by

uug449162
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

Chapter 6 - Data Science and K Nearest Neighbour Model (PART B)

Data scientist and nearest neighb

Uploaded by

uug449162
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 5
© CHAPTER 6 GET SET GO (Page No. ~ 239) 1. Surveys 2. Interviews 3, Document and Records 4, Polls 5. Questionnaire CHECKBOT (Page No. ~ 243) DATA SCIENCES AND K-NEAREST NEIGHBOUR MODEL Regression Classification 1. Inregression, the algorithm generates a mapping function from the given data, represented by the solid line, In classification, the algorithm can determine which set a given data point belongs to by utilising a classification function represented by the dotted line. 2. The dots shown in the graph are the data values and the solid line here represents the mapping done for them. The model classifies datasets according to the rules given to it. Usually, the dataset used for classification are labelled and the data then gets sorted according to their labelling 3. With the help of this mapping function, we can predict future data. To apply the regression modelling technique, we need continuous data. CHECKBOT (Page No. - 244) To be done by the students. CHECKBOT (Page No. ~ 247) The classification works on the discrete dataset. The sources from which relevant data can be collected are as follows: Sensors Surveys Interviews Observations Open-sourced Government Portals Reliable Websites (Kagale) Noaa eno CHECKBOT (Page No. ~ 252) World Organisations’ open-sourced statistical websites K-nearest neighbours (KNN) algorithm is used to solve both classification and regression type problems. 60 # The algorithm is simple and easy to implement. The supervised learning algorithm is trained with data and the ‘Scanned with CamScannee _— corresponding label. After training, the algorithm can label the data, which is not labelled yet. The classification type problems are the problems in which the data can be classified into two or more categories. KNN algorithm res that similar things exist nearby. ass eXERCISEBOT Aad 2a 3c Ac 5.b ea 7d Bc 9a . 1.2008 2. data science application 3. patterns 4, data scientist s.median 6. Keras 7. Scatter plot 8. nearby GF 27 3.7 4.F 5.F p, 1. Asupervised learning algorithm analyses the training data and produces an inferred function, which can be used for mapping new examples. This is the simplest and easily implemented algorithm. A fully ‘rained algorithm will be able to observe a new, never seen before examples and predict a good label for them. It demands more and more examples until it can accurately perform the task. 2. Mode: This function returns the most common value in a set of data. >>eimport statistics as st >>snums=[1, 2,3, 5,7, 9, 7, 2,7, 6] >>>print(st.mode(nums)) The output will be 7. 3. Aplotis an effective way to display data in pictorial form. it makes easier to draw comparison and analyse the growth, relationship and trends among the values in a table. Different types of plots used in Python are as follows: Line plot Bar plot a. b. c. Histogram plot d. Scatter plot ©. Box and Whisker plot stdev( ) returns the standard deviation of the sample. This is equal to the square root of the sample variance. >>>import statistics as st >eenums=[1, 2, 3, 5, 7,9, 7, 2, 7, 6] >>>print(st.stdev(nums)) The output will be 2.7264140062238043. {nearest neighbours (KNN) algorithm is used to solve both classification and regression type problems. The algorithm is simple and easy to implement. The supervised learning algorithm is trained with data and the corresponding label. After training, the algorithm can label the data, which is not labelled yet. The classification type problems are the problems in which the data can be classified into two or more categories. KNN algorithm assumes that similar things exist nearby. Statistical learning is a framework for Machine Learning from the field of statistical and functional analysis. It deals with the problem by predictive-based functions on data, Learning can be of the following types. 6 ae ‘Scanned with CamScanner 62 up ‘a. Supervised Learning: A supervised learning algorithm analyses the training data and produces an inferred function, which can be used for mapping new examples. This is the simplest and easily implemented algorithm. A fully trained algorithm will be able to observe a new, never seen before ‘examples and predict a good label for them. It demands more and more examples until it can accurately perform the task. b. Unsupervised Learning: Unsupervised learning isa type of Machine Learning that looks for previously undetected patterns in a data set with no pre-existing labels and minimum human supervision. It can learn to group, cluster and organise data in such a way that human can make sense of newly organised data. It is an intelligent algorithm that can take terabytes of unlabelled datz and make sense of it. ¢. Reinforcement Learning: When some sort of signal is provided to the algorithm that associates good behaviour with a positive signal and bad behaviour with a negative one, the algorithm can be reinforced to prefer good behaviour over bad behaviour. Over some time, the algorithm learns and makes fewer mistakes. It is very much influenced by neuroscience and psychology. 2. Companies require a data scientist to make a data-driven decision. The model can be used to give a better customer experience. Some of the applications are as follows: (any five) a. For Better Marketing: Companies use data and feedback for marketing but they directly don't know what the customers will say about the product. The data scientists collect that data and after analysis, can suggest a better marketing strategy. Marketing is an important step in business. The data scientist can also tell which advertisement is having an impact and which is not that saves money and efforts. By studying the customer feedback, the companies can create the best advertisement. b. For Customer Acquisition: The data scientists can analyse the feedback and other data. They can tell us the needs of the customer. The company can use this information to tailor the product. The product tailored by a company based on this information can be the best product that suits the customer's requirement. It also helps to find potential customers. The data scientist can help in recognising the potential customer and their needs. ¢. For Innovation and Manufacturing: The customer feedback can be used to innovate and manufacture the product. The data scientist helps to innovate the product. It can be used to craft a new product or make changes to the regular product. The information given by the data scientist can lead you in the right direction in decision-making and product innovation. d. For Banking: A data scientist can give information about frauds. Data science deals in the areas of customer service, forecasting, understanding consumer sentiments, customer profiling and targets marketing by analysing customer feedback and queries that are studied by data scientists. Banks use data science to approve the loan too. e. In E-commerce: Data analysis can be used to find out the potential customers, to recommend a product by analysing the reviews and feedback. But for that purpose, a skilled data scientist is required. The total sales of e-commerce depend on analysis. f. InHealthcare: Medical images and reports can be analysed and compared with another patient showing the same symptoms. The analysis of reports and case studies of various patients can be used to suggest the drugs used in treatment. Even in Covid-19 pandemic, many people used the virtual assistant for their treatments. g._In Transportation: The self-driving cars can be improved with the analysis of data collected and related accidents. The driving experience can be improved by calculating the traffic on the road. Google maps use that data analysis to tell us the estimated time. ‘Scanned with CamScanner h, In Finance: Customers can be segmented based on purchasing and saving habits so that data can be used to suggest loans and investment scheme. The data analysis can be used to predict the market so the purchasing and selling of shares can be done. The risk can be easily analysed in an investment and investments can be done considering these risks. i, In Education: Educational institutions use various techniques to analyse and evaluate data. This data helps them to understand student requirements, course content demand, teaching methodologies, etc. Data science also reduces chances of evaluator's bias. Data science makes it possible for institutions to devise innovative curriculums. The performance of students is measured by teachers and data science helps in measuring the performance of teachers. |. The steps involve in Al project cycle are as follows: a, Acquire data that will become the base of the project as it will help understand the parameters related to problem scoping. b. Data acquisition is made by collecting data from various reliable and authentic sources. Since the data collected would be in large quantities, it will be very important to visualise different types of representations such as graphs, databases, flowcharts, maps, etc. This makes it easier to interpret the patterns, which acquired data follows. c. After exploring the patterns, it is easy to decide, which model would be built to achieve the goal. For this, online research can be done and various models that give a suitable output, can be selected. d. Test the selected models and figure out which is the most efficient one. e. The most efficient model is now the base of your Al project. Develop an algorithm around it. f. Once the modelling is complete, test your model on some newly obtained data. The results will help in evaluating the model and improving it. g. Finally, after evaluation, the project cycle is now complete and the Al project is ready. There exist various sources of data from where you can collect required data and the data collection process can be categorised in two ways: Offline and Online. Offline Data sources are as follows: a. Sensors b. Surveys c. Interviews d. Observations Online Data sources are as follows: a. Open-sourced Government Portals b. Reliable Websites (Kaggle) ©. World Organisations’ open-sourced statistical websites The following points should be kept in mind, while accessing data from any of the data sources: + Data that is available for public usage only should be taken up. + Personal datasets should only be used with the consent of the owner. + One should never breach someone's privacy to coliect data. + Data should only be taken from reliable sources as the data collected from random sources can be wrong or unusable } ‘Scanned with CamScanner Reliable data sources ensure the authenticity of dat: Al model. a, which helps in the proper training of the Classification Regression In classification, the algorithm can determine which set a given data point belongs to by utilising a classification function represented by the dotted line 2. The model classifies datasets tules given to it. as according to the 3. The classification works on the discrete dataset, In regression, the ala mapping funetion from t represented by the solid line he given data, The dots shown in the graph are the data values and the solid line here represents the mapping done for them. With the help of this mapping function, we can predict future data. To apply the regression ‘modelling technique, we need continuous data. ‘Scanned with CamScanner rithm generatesa

You might also like