Chapter 6 - Data Science and K Nearest Neighbour Model (PART B)

Data scientist and nearest neighb

Uploaded by

uug449162

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

17 views5 pages

Chapter 6 - Data Science and K Nearest Neighbour Model (PART B)

Data scientist and nearest neighb

Uploaded by

uug449162

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 5

© CHAPTER 6 GET SET GO (Page No. ~ 239) 1. Surveys 2. Interviews 3, Document and Records 4, Polls 5. Questionnaire CHECKBOT (Page No. ~ 243) DATA SCIENCES AND K-NEAREST NEIGHBOUR MODEL Regression Classification 1. Inregression, the algorithm generates a mapping function from the given data, represented by the solid line, In classification, the algorithm can determine which set a given data point belongs to by utilising a classification function represented by the dotted line. 2. The dots shown in the graph are the data values and the solid line here represents the mapping done for them. The model classifies datasets according to the rules given to it. Usually, the dataset used for classification are labelled and the data then gets sorted according to their labelling 3. With the help of this mapping function, we can predict future data. To apply the regression modelling technique, we need continuous data. CHECKBOT (Page No. - 244) To be done by the students. CHECKBOT (Page No. ~ 247) The classification works on the discrete dataset. The sources from which relevant data can be collected are as follows: Sensors Surveys Interviews Observations Open-sourced Government Portals Reliable Websites (Kagale) Noaa eno CHECKBOT (Page No. ~ 252) World Organisations’ open-sourced statistical websites K-nearest neighbours (KNN) algorithm is used to solve both classification and regression type problems. 60 # The algorithm is simple and easy to implement. The supervised learning algorithm is trained with data and the ‘Scanned with CamScannee_— corresponding label. After training, the algorithm can label the data, which is not labelled yet. The classification type problems are the problems in which the data can be classified into two or more categories. KNN algorithm res that similar things exist nearby. ass eXERCISEBOT Aad 2a 3c Ac 5.b ea 7d Bc 9a . 1.2008 2. data science application 3. patterns 4, data scientist s.median 6. Keras 7. Scatter plot 8. nearby GF 27 3.7 4.F 5.F p, 1. Asupervised learning algorithm analyses the training data and produces an inferred function, which can be used for mapping new examples. This is the simplest and easily implemented algorithm. A fully ‘rained algorithm will be able to observe a new, never seen before examples and predict a good label for them. It demands more and more examples until it can accurately perform the task. 2. Mode: This function returns the most common value in a set of data. >>eimport statistics as st >>snums=[1, 2,3, 5,7, 9, 7, 2,7, 6] >>>print(st.mode(nums)) The output will be 7. 3. Aplotis an effective way to display data in pictorial form. it makes easier to draw comparison and analyse the growth, relationship and trends among the values in a table. Different types of plots used in Python are as follows: Line plot Bar plot a. b. c. Histogram plot d. Scatter plot ©. Box and Whisker plot stdev( ) returns the standard deviation of the sample. This is equal to the square root of the sample variance. >>>import statistics as st >eenums=[1, 2, 3, 5, 7,9, 7, 2, 7, 6] >>>print(st.stdev(nums)) The output will be 2.7264140062238043. {nearest neighbours (KNN) algorithm is used to solve both classification and regression type problems. The algorithm is simple and easy to implement. The supervised learning algorithm is trained with data and the corresponding label. After training, the algorithm can label the data, which is not labelled yet. The classification type problems are the problems in which the data can be classified into two or more categories. KNN algorithm assumes that similar things exist nearby. Statistical learning is a framework for Machine Learning from the field of statistical and functional analysis. It deals with the problem by predictive-based functions on data, Learning can be of the following types. 6 ae ‘Scanned with CamScanner62 up ‘a. Supervised Learning: A supervised learning algorithm analyses the training data and produces an inferred function, which can be used for mapping new examples. This is the simplest and easily implemented algorithm. A fully trained algorithm will be able to observe a new, never seen before ‘examples and predict a good label for them. It demands more and more examples until it can accurately perform the task. b. Unsupervised Learning: Unsupervised learning isa type of Machine Learning that looks for previously undetected patterns in a data set with no pre-existing labels and minimum human supervision. It can learn to group, cluster and organise data in such a way that human can make sense of newly organised data. It is an intelligent algorithm that can take terabytes of unlabelled datz and make sense of it. ¢. Reinforcement Learning: When some sort of signal is provided to the algorithm that associates good behaviour with a positive signal and bad behaviour with a negative one, the algorithm can be reinforced to prefer good behaviour over bad behaviour. Over some time, the algorithm learns and makes fewer mistakes. It is very much influenced by neuroscience and psychology. 2. Companies require a data scientist to make a data-driven decision. The model can be used to give a better customer experience. Some of the applications are as follows: (any five) a. For Better Marketing: Companies use data and feedback for marketing but they directly don't know what the customers will say about the product. The data scientists collect that data and after analysis, can suggest a better marketing strategy. Marketing is an important step in business. The data scientist can also tell which advertisement is having an impact and which is not that saves money and efforts. By studying the customer feedback, the companies can create the best advertisement. b. For Customer Acquisition: The data scientists can analyse the feedback and other data. They can tell us the needs of the customer. The company can use this information to tailor the product. The product tailored by a company based on this information can be the best product that suits the customer's requirement. It also helps to find potential customers. The data scientist can help in recognising the potential customer and their needs. ¢. For Innovation and Manufacturing: The customer feedback can be used to innovate and manufacture the product. The data scientist helps to innovate the product. It can be used to craft a new product or make changes to the regular product. The information given by the data scientist can lead you in the right direction in decision-making and product innovation. d. For Banking: A data scientist can give information about frauds. Data science deals in the areas of customer service, forecasting, understanding consumer sentiments, customer profiling and targets marketing by analysing customer feedback and queries that are studied by data scientists. Banks use data science to approve the loan too. e. In E-commerce: Data analysis can be used to find out the potential customers, to recommend a product by analysing the reviews and feedback. But for that purpose, a skilled data scientist is required. The total sales of e-commerce depend on analysis. f. InHealthcare: Medical images and reports can be analysed and compared with another patient showing the same symptoms. The analysis of reports and case studies of various patients can be used to suggest the drugs used in treatment. Even in Covid-19 pandemic, many people used the virtual assistant for their treatments. g._In Transportation: The self-driving cars can be improved with the analysis of data collected and related accidents. The driving experience can be improved by calculating the traffic on the road. Google maps use that data analysis to tell us the estimated time. ‘Scanned with CamScannerh, In Finance: Customers can be segmented based on purchasing and saving habits so that data can be used to suggest loans and investment scheme. The data analysis can be used to predict the market so the purchasing and selling of shares can be done. The risk can be easily analysed in an investment and investments can be done considering these risks. i, In Education: Educational institutions use various techniques to analyse and evaluate data. This data helps them to understand student requirements, course content demand, teaching methodologies, etc. Data science also reduces chances of evaluator's bias. Data science makes it possible for institutions to devise innovative curriculums. The performance of students is measured by teachers and data science helps in measuring the performance of teachers. |. The steps involve in Al project cycle are as follows: a, Acquire data that will become the base of the project as it will help understand the parameters related to problem scoping. b. Data acquisition is made by collecting data from various reliable and authentic sources. Since the data collected would be in large quantities, it will be very important to visualise different types of representations such as graphs, databases, flowcharts, maps, etc. This makes it easier to interpret the patterns, which acquired data follows. c. After exploring the patterns, it is easy to decide, which model would be built to achieve the goal. For this, online research can be done and various models that give a suitable output, can be selected. d. Test the selected models and figure out which is the most efficient one. e. The most efficient model is now the base of your Al project. Develop an algorithm around it. f. Once the modelling is complete, test your model on some newly obtained data. The results will help in evaluating the model and improving it. g. Finally, after evaluation, the project cycle is now complete and the Al project is ready. There exist various sources of data from where you can collect required data and the data collection process can be categorised in two ways: Offline and Online. Offline Data sources are as follows: a. Sensors b. Surveys c. Interviews d. Observations Online Data sources are as follows: a. Open-sourced Government Portals b. Reliable Websites (Kaggle) ©. World Organisations’ open-sourced statistical websites The following points should be kept in mind, while accessing data from any of the data sources: + Data that is available for public usage only should be taken up. + Personal datasets should only be used with the consent of the owner. + One should never breach someone's privacy to coliect data. + Data should only be taken from reliable sources as the data collected from random sources can be wrong or unusable } ‘Scanned with CamScannerReliable data sources ensure the authenticity of dat: Al model. a, which helps in the proper training of the Classification Regression In classification, the algorithm can determine which set a given data point belongs to by utilising a classification function represented by the dotted line 2. The model classifies datasets tules given to it. as according to the 3. The classification works on the discrete dataset, In regression, the ala mapping funetion from t represented by the solid line he given data, The dots shown in the graph are the data values and the solid line here represents the mapping done for them. With the help of this mapping function, we can predict future data. To apply the regression ‘modelling technique, we need continuous data. ‘Scanned with CamScanner rithm generatesa

Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
DS QB Unit 1
No ratings yet
DS QB Unit 1
45 pages
Applied Data Analysis
No ratings yet
Applied Data Analysis
128 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
2 Marks With Answers
No ratings yet
2 Marks With Answers
39 pages
Chapter 1
No ratings yet
Chapter 1
85 pages
Ads TopperSh
No ratings yet
Ads TopperSh
50 pages
IDS Unit 1
No ratings yet
IDS Unit 1
67 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
Fd45092a Ccad 459e Bc18 B01536fd6bac Untitled
No ratings yet
Fd45092a Ccad 459e Bc18 B01536fd6bac Untitled
53 pages
Kadir
No ratings yet
Kadir
84 pages
DS PPT 1
No ratings yet
DS PPT 1
30 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
Data Science S3mca
No ratings yet
Data Science S3mca
55 pages
Kadir
No ratings yet
Kadir
80 pages
02 Introduction - Fall 23-24
No ratings yet
02 Introduction - Fall 23-24
29 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
FDS Notes
No ratings yet
FDS Notes
5 pages
Unit I
No ratings yet
Unit I
52 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
Introduction Am
No ratings yet
Introduction Am
74 pages
M1.1 DS
No ratings yet
M1.1 DS
57 pages
B Ei
No ratings yet
B Ei
44 pages
Screenshot 2025-04-23 at 8.26.12 AM
No ratings yet
Screenshot 2025-04-23 at 8.26.12 AM
14 pages
Data-Science - Introduction
No ratings yet
Data-Science - Introduction
35 pages
Unit 1 Data Science Notes
No ratings yet
Unit 1 Data Science Notes
33 pages
Data and Analysis
No ratings yet
Data and Analysis
13 pages
Datascience
No ratings yet
Datascience
12 pages
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
No ratings yet
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
28 pages
Data Sciene - Unit 5 Material
No ratings yet
Data Sciene - Unit 5 Material
15 pages
2 - Business Problems and Data Science Solutions
No ratings yet
2 - Business Problems and Data Science Solutions
26 pages
Project Report
No ratings yet
Project Report
29 pages
Lecture 1 - Introduction To Data Science
No ratings yet
Lecture 1 - Introduction To Data Science
14 pages
File
No ratings yet
File
27 pages
Ds U1 chp1
No ratings yet
Ds U1 chp1
13 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
DSA Unit1
No ratings yet
DSA Unit1
37 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
8 pages
WEEK 4-5-Exploring Data Science Methods, Models, and Application
No ratings yet
WEEK 4-5-Exploring Data Science Methods, Models, and Application
18 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
54 pages
Data Science
No ratings yet
Data Science
10 pages
FDS Unit 1 QB
No ratings yet
FDS Unit 1 QB
7 pages
Data Science
No ratings yet
Data Science
10 pages
PDF Data Science
No ratings yet
PDF Data Science
7 pages
Unit 4
No ratings yet
Unit 4
6 pages
Digital Data Part 2
No ratings yet
Digital Data Part 2
6 pages
UNIT IV Data Science
No ratings yet
UNIT IV Data Science
7 pages
Data Science
No ratings yet
Data Science
5 pages
TTDS Lectures
No ratings yet
TTDS Lectures
13 pages
The Field of Data Science
No ratings yet
The Field of Data Science
4 pages
IAT 2 Part A - DS
No ratings yet
IAT 2 Part A - DS
5 pages
Data Science
No ratings yet
Data Science
11 pages
Summary Business Analytics
No ratings yet
Summary Business Analytics
24 pages
Data Science and Analytics Reviewer
No ratings yet
Data Science and Analytics Reviewer
5 pages
Ai Answers
No ratings yet
Ai Answers
3 pages
Impact of Data Science Across Industries
No ratings yet
Impact of Data Science Across Industries
3 pages
Adobe Scan 09 Sept 2024
No ratings yet
Adobe Scan 09 Sept 2024
4 pages
Ab Assignment 3
No ratings yet
Ab Assignment 3
7 pages

Chapter 6 - Data Science and K Nearest Neighbour Model (PART B)

Uploaded by

Chapter 6 - Data Science and K Nearest Neighbour Model (PART B)

Uploaded by

You might also like