0% found this document useful (0 votes)
453 views

Summary - Applied Data Science With Python and Jupyter

This document discusses machine learning strategies and predictive modeling in Jupyter Notebooks. It covers preprocessing data using scikit-learn and pandas, training classification models like SVM, k-Nearest Neighbors, and Random Forest, and using validation curves and dimensionality reduction. The next steps of data acquisition like analyzing HTTP requests and web scraping are also mentioned.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
453 views

Summary - Applied Data Science With Python and Jupyter

This document discusses machine learning strategies and predictive modeling in Jupyter Notebooks. It covers preprocessing data using scikit-learn and pandas, training classification models like SVM, k-Nearest Neighbors, and Random Forest, and using validation curves and dimensionality reduction. The next steps of data acquisition like analyzing HTTP requests and web scraping are also mentioned.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

    

Summary

In this chapter, we have seen how predictive models can be trained in Jupyter
Notebooks.

To begin with, we talked about how to plan a machine learning strategy. We


thought about how to design a plan that can lead to actionable business
insights and stressed the importance of using the data to help set realistic
business goals. We also explained machine learning terminology such as
supervised learning, unsupervised learning, classification, and regression.

Next, we discussed methods for preprocessing data using scikit-learn and


pandas. This included lengthy discussions and examples of a surprisingly time-
consuming part of machine learning: dealing with missing data.

In the latter half of the chapter, we trained predictive classification models for
our binary problem, comparing how decision boundaries are drawn for various
models such as the SVM, k-Nearest Neighbors, and Random Forest. We then
showed how validation curves can be used to make good parameter choices
and how dimensionality reduction can improve model performance. Finally, at
the end of our activity, we explored how the final model can be used in
practice to make data-driven decisions.

In the next chapter, we will focus on data acquisition. Specifically, we will


analyze HTTP requests, scrape tabular data from a web page, build and
transform Pandas DataFrames, and finally create visualizations.


 Previous Section (/book/big_data_and_business_intelligence/9781789958171/2/ch02lvl1s

Next Section  (/book/big_data_and_business_intelligence/9781789958171/3)


You might also like