0% found this document useful (0 votes)
37 views4 pages

Reviewer

Statistical Descriptions of Data Agenda • Basic Statistical Descriptions of Data • Measuring the Central Tendency of Data • Symmetric vs. Skewed Distribution • Measuring the Dispersion of Data Basic Statistical Descriptions of Data • Many data mining and data analysis tools make use of statistical descriptions to provide results. • Statistical descriptions also allow us to better understand out data. • We can analyze aspects of the data, such as central tendency, variation and spread. Measuring
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views4 pages

Reviewer

Statistical Descriptions of Data Agenda • Basic Statistical Descriptions of Data • Measuring the Central Tendency of Data • Symmetric vs. Skewed Distribution • Measuring the Dispersion of Data Basic Statistical Descriptions of Data • Many data mining and data analysis tools make use of statistical descriptions to provide results. • Statistical descriptions also allow us to better understand out data. • We can analyze aspects of the data, such as central tendency, variation and spread. Measuring
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Statistical Descriptions of Data • It is the middle value if there is an odd number of

values.
Agenda
• It is the average of the middle two values otherwise.
• Basic Statistical Descriptions of Data
• It is estimated by interpolation.
• Measuring the Central Tendency of Data
Mode
• Symmetric vs. Skewed Distribution
• This is the most frequent value in the data set.
• Measuring the Dispersion of Data
• The empirical formula is based on the values of the

mean and mode.


Basic Statistical Descriptions of Data
mean − mode = 3(mean − median)
• Many data mining and data analysis tools make use of
statistical descriptions to provide results. Symmetric Distribution

• Statistical descriptions also allow us to better • A symmetric distribution occurs when the values of
understand out data. variables appear at regular frequencies.

• We can analyze aspects of the data, such as central • The two sides of the distribution are a mirror image of
tendency, variation and spread. each other.

Measuring the Central Tendency of Data • This is also known as a normal distribution.

• A measure of central tendency is a single value that • The characteristics of a symmetric distribution are:

attempts to describe a data set. • Symmetrical shape of the curve

• It identifies a central position within the data set. • Mode, Median, and Mean are the same and
are together in the center of the curve
• It helps with finding the average of a dataset.
• There can only be one Mode
• The three most common measures of central
tendency • Most of the data is clustered around the
center, with extreme values on the side
are:
Skewed Distribution
• Mode
• A skewed distribution occurs when one tail of the
• Median
distribution is longer than another.
• Mean
• Skewness is the tendency for the values to be more
Mean frequent around the high or low ends of the x-axis.

• The sum of all values divided by the total number of Skewed Distribution
values.
• A left-skewed distribution has a long left tail.
• Can also be thought of as the weighted arithmetic
• It is also called a negatively-skewed distribution
mean.
• A right-skewed distribution has a long right tail.
• Trimmed mean: a variation where you remove
extreme values from the data set. • It is also called a positively-skewed distribution

Median Skewed Distribution

• This is the middle number in an ordered data set. • The characteristics of a skewed distribution are:
• Asymmetrical shape of the curve • Web scraping has two parts, the crawler and the
scraper.
• Mean and Median have different values and
do not all lie at the center of the curve • The crawler is an algorithm that browses the web to
look for a particular data.
• There can be more than one mode
• The scraper is a tool that extracts data from a website.
• The distribution of the data tends towards the
high or low end of the data set Types of Web Scrapers
• These are the main types of web scrapers:
Measuring the Dispersion of Data
• Self-built Web Scrapers
• Dispersion is the state of getting dispersed or spread.
• Pre-built Web Scrapers
• Statistical dispersion means the extent to which
numerical • Browser Extension Web Scrapers

data is likely to vary about an average value. • Software Web Scrapers

• Statistical dispersion helps us understand the • Cloud Web Scrapers


distribution of our data. • Local Web Scrapers
Measuring the Dispersion of Data

• These are the different statistical methods that we Self-built Web Scrapers
can use to analyze the dispersion of data:
• These are scrapers which are built from the ground
• Boxplot Analysis
up by a programmer.
• Histogram
• Its features depend on what can be added by the
• Quantile Plot developer.
• Quantile-Quantile (Q-Q) Plot

• Scatter Plot Pre-built Web Scrapers


Web Scraping • These are scrapers which are ready to use with
minimal
Web scraping is a tool used to collect content and data
setup.
from the internet.
• These are usually in the form of libraries and packages
• It allows us to collect large amounts of data
that you can import.
automatically.
• These have a variety of features but may require a
• This data can range from text to images to videos.
code to set up the scraper.
• The amount of data you can scrape depends on a
variety of factors. Browser Extension Web Scrapers
• Data commonly scraped from the web is • These are scrapers that are integrated to your web
unstructured.
browser.
• We then convert this data to structured data so that it
• They are easy to install but can have limitations due to
can be used for data analysis.
the platform they are on.
• You can scrape for data using online services,
specialized Software Web Scrapers
API’s or libraries, or creating one from scratch. • These are scrapers which are downloaded and
installed on your computer. • Package Installer for Python (pip) is the built in tool
for
• These are more complicated than web
Python to install libraries and packages.
scrapers, but have more features.
Installing Python Libraries
Cloud Web Scrapers
• You can also use a tool like Conda to install and
• These are scrapers run on a cloud service. manage
• These allow you to utilize the computing power of a packages for Python.
cloud service instead of your own hardware. • Conda can be installed separately or packed with the
• May suffer delays due to latency limitations inherent Anaconda Python Distribution.
to cloud services. • Either tool will work as long as you select the right
Local Web Scrapers libraries

• These are scrapers which run on your computer using or commands.

local resources. • Scrapy is an open source Python framework used for

• If your hardware is not able to meet the needs of the collecting data from the web.

scraper, the application might slow down or fail. • Scrapy has three main features:

Web Scraping Steps 1. It provides tools to extract data from


websites.
• There are many tools we can use to collect data, but
2. It provides tools to process the data
there are four general steps to keep in mind: (cleaning, modifying, etc.)
1. Set up the scraper 3. It allows you to store the data in your
2. Define the scraping logic preferred structure and

3. Inspect the source format.

4. Test the scraper • You can set up scrapy to be installed in Python by


using
Python for Scraping
the code below:
• Python is a commonly used programming language
for • pip install scrapy

web scraping. • This will install the Scrapy library onto your system so
that it can be utilized by Python
• It has various libraries for collecting data, cleaning
data, Beautiful Soup

and analyzing data. • Beautiful Soup is a another python package for


parsing
• These libraries also make it easy to conduct web
scraping. HTML and XML documents.

Installing Python Libraries • Similar to Scrapy, it has many options to collect,


modify
• Using the right libraries with Python is the key to web
and extract data from the web.
scraping.
• Both packages achieve the same goal, so what you use
• You can install libraries manually using the command is up to your preference.
line.
• You can install Beautiful Soup using the following
command: -Ordinal
• pip install beautifulsoup4
-Numeric
• This allows you to use the different features of the
library.
Types of Numeric
• Image scraping works with the same principles as web
scraping but instead gathers image data. Interval Scaled
• Images are commonly gathered from search engines,
Ratio-Scaled
social media sites, and public image sharing sites.
• This is most commonly used for machine learning Discrete
purposes.
• There is a large variety of image scraping libraries that Continuous
are available for various search engines.
Files Types
• Many of these libraries are found on GitHub.
• Desktop and web based apps also exist for image
-xls/xlsx.
scraping, but these require payment or have limitation
• Image scraping follows these general processes no -csv
matter the library or implementation:
• Determine the source of data -arff

• Specify the image tags


-.txt
• Adjust the image parameters
• Test and clean the image results
Attributes and File Classification
Data Sets

Data Objects

Attributes

Nominal

Types of Binary Attributes

-Symmetric

-Asymmetric

Qualitative

You might also like