100 Days Python
100 Days Python
In this course, you'll learn the fundamentals of the Python programming language, along with
programming best practices. You’ll learn to represent and store data using Python data types and
variables, and use conditionals and loops to control the flow of your programs. You’ll harness the power
of complex data structures like lists, sets, dictionaries, and tuples to store collections of related data.
You’ll define and document your own custom functions, write scripts, and handle errors. Lastly, you’ll
learn to find and use modules in the Python Standard Library and other third-party libraries.
I will learn python to solve practical problem. Python is case sensitive and relate to spacing because it
don’t use a syntax
1. Arithetics operatos
1. Only use ordinary letters, numbers and underscores in your variable names. They can’t have spaces,
and need to start with a letter or underscore.
1
2. You can’t use reserved words or built-in identifiers that have important purposes in Python, which
you’ll learn about throughout this course. A list of python reserved words is described here. Creating
names that are descriptive of the values often will help you avoid using any of these words. A quick table
of these words is also available below.
3. The pythonic way to name variables is to use all lowercase letters and underscores to separate words.
Assignment Operators
Below are the assignment operators from the video. You can also use *= in a similar way, but this is less
common than the operations shown below. You can find some practice with much of what we have
already covered here.
2
More operators: https://www.programiz.com/python-programming/operators
There are two Python data types that could be used for numeric values:
Exceptions
Syntax
An Exception is a problem that occurs when the code is running, but a 'Syntax
Error' is a problem detected when Python checks the code before it runs it.
The bool data type holds one of the values True or False, which are often encoded as 1 or 0,
respectively.
3
Strings
Strings in Python are shown as the variable type str. You can define a string with either double
quotes " or single quotes '. If the string you are creating actually has one of these two values in it, then
you need to be careful to assure your code doesn't give an error.
The len() function
len() is a built-in Python function that returns the length of an object, like a string. The length of a string
is the number of characters in the string. This will always be an integer.
You got a quick look at type() from an earlier video, and it can be used to check the data
type of any variable you are working with.
Checking your variable types is really important to assure that you are retrieving the results
you want when programming.
String Methods
In this video you were introduced to methods. Methods are like some of the functions you have already
seen:
1. len("this")
2. type(12)
3. print("Hello world")
These three above are functions - notice they use parentheses, and accept one or more arguments.
Functions will be studied in much more detail in a later lesson!
A method in Python behaves similarly to a function. Methods actually are functions that are called using
dot notation. For example, lower() is a string method that can be used like this, on a string called
"sample string": sample_string.lower().
Methods are specific to the data type for a particular variable. So there are some built-in methods that
are available for all strings, different methods that are available for all integers, etc.
Below is an image that shows some methods that are possible with any string.
.islower()
.count('a')
.find('a')
.format()
4
Another important string method: split()
A helpful string method when working with strings is the .split method. This function or method returns
a data container called a list that contains the words from the input string. We will be introducing you to
the concept of lists in the next video.
The split method has two additional arguments (sep and maxsplit). The sep argument stands for
"separator". It can be used to identify how the string should be split up (e.g., whitespace characters like
space, tab, return, newline; specific punctuation (e.g., comma, dashes)). If the sep argument is not
provided, the default separator is whitespace.
True to its name, the maxsplit argument provides the maximum number of splits. The argument gives
maxsplit + 1 number of elements in the new list, with the remaining string being returned as the last
element in the list. You can read more about these methods in the Python documentation too.
.split()
Debugging Code
Here are some tips on successful debugging that we'll discuss in more detail below:
Understand common error messages you might receive and what to do about them.
"SyntaxError: unexpected EOF while parsing" Take a look at the two lines of code below. Executing
these lines produces this syntax error message - do you see why?
This message is often produced when you have accidentally left out something, like a parenthesis. The
message is saying it has unexpectedly reached the end of file ("EOF") and it still didn't find that right
parenthesis. This can easily happen with code syntax involving pairs, like beginning and ending quotes
also.
"TypeError: len() takes exactly one argument (0 given)" This kind of message could be given for many
functions, like len in this case, if I accidentally do not include the required number of arguments when
I'm calling a function, as below. This message tells me how many arguments the function requires (one
in this case), compared with how many I gave it (0). I meant to use len(chars) to count the number of
characters in this long word, but I forgot the argument.
5
Google Introduction
The [ ] syntax and the len() function actually work on any sequence type -- strings, lists, etc..
String Methods
Here are some of the most common string methods. A method is like a function, but it runs "on" an
object. If the variable s is a string, then the code s.lower() runs the lower() method on that string object
and returns the result (this idea of a method running on an object is one of the basic ideas that make up
Object Oriented Programming, OOP). Here are some of the most common string methods:
s.strip() -- returns a string with whitespace removed from the start and end
s.isalpha()/s.isdigit()/s.isspace()... -- tests if all the string chars are in the various character classes
s.startswith('other'), s.endswith('other') -- tests if the string starts or ends with the given other string
s.find('other') -- searches for the given other string (not a regular expression) within s, and returns the
first index where it begins or -1 if not found
s.replace('old', 'new') -- returns a string where all occurrences of 'old' have been replaced by 'new'
s.split('delim') -- returns a list of substrings separated by the given delimiter. The delimiter is not a
regular expression, it's just text. 'aaa,bbb,ccc'.split(',') -> ['aaa', 'bbb', 'ccc']. As a convenient special case
s.split() (with no arguments) splits on all whitespace chars.
s.join(list) -- opposite of split(), joins the elements in the given list together using the string as the
delimiter. e.g. '---'.join(['aaa', 'bbb', 'ccc']) -> aaa---bbb---ccc
String Slices
The "slice" syntax is a handy way to refer to sub-parts of sequences -- typically strings and lists. The slice
s[start:end] is the elements beginning at start and extending up to but not including end. Suppose we
have s = "Hello"
If Statement
Python does not use { } to enclose blocks of code for if/loops/function etc.. Instead, Python uses the
colon (:) and indentation/whitespace to group statements. The boolean test for an if does not need to
be in parenthesis (big difference from C++/Java), and it can have *elif* and *else* clauses (mnemonic:
the word "elif" is the same length as the word "else").
6
Any value can be used as an if-test. The "zero" values all count as false: None, 0, empty string, empty list,
empty dictionary. There is also a Boolean type with two values: True and False (converted to an int,
these are 1 and 0). Python has the usual comparison operations: ==, !=, <, <=, >, >=. Unlike Java and C, ==
is overloaded to work correctly with strings. The boolean operators are the spelled out words *and*,
*or*, *not* (Python does not use the C-style && || !). Here's what the code might look like for a
policeman pulling over a speeder -- notice how each block of then/else statements starts with a : and
the statements are grouped by their indentation:
Python Lists
Python has a great built-in list type named "list". List literals are written within square brackets [ ]. Lists
work similarly to strings -- use the len() function and square brackets [ ] to access data, with the first
element at index 0. (See the official python.org list docs.)
The "empty list" is just an empty pair of brackets [ ]. The '+' works to append two lists, so [1, 2] + [3, 4]
yields [1, 2, 3, 4] (this is just like + with strings).
FOR and IN
Python's *for* and *in* constructs are extremely useful, and the first use of them we'll see is with lists.
The *for* construct -- for var in list -- is an easy way to look at each element in a list (or other collection).
Do not add or remove from the list during iteration.
The *in* construct on its own is an easy way to test if an element appears in a list (or other collection) --
value in collection -- tests if the value is in the collection, returning True/False.
You can also use for/in to work on a string. The string acts like a list of its chars, so for ch in
s: print ch prints all the chars in a string.
Range
The range(n) function yields the numbers 0, 1, ... n-1, and range(a, b) returns a, a+1, ... b-1 -- up to but
not including the last number. The combination of the for-loop and the range() function allow you to
build a traditional numeric for loop:
While Loop
Python also has the standard while-loop, and the *break* and *continue* statements
work as in C++ and Java, altering the course of the innermost loop. The above for/in
loops solves the common case of iterating over every element in a list, but the while
loop gives you total control over the index numbers. Here's a while loop which accesses
every 3rd element in a list:
7
## Access every 3rd element in a list
i = 0
while i < len(a):
print a[i]
i = i + 3
List Methods
list.append(elem) -- adds a single element to the end of the list. Common error: does not return
the new list, just modifies the original.
list.insert(index, elem) -- inserts the element at the given index, shifting elements to the right.
list.extend(list2) adds the elements in list2 to the end of the list. Using + or += on a list is similar
to using extend().
list.index(elem) -- searches for the given element from the start of the list and returns its index.
Throws a ValueError if the element does not appear (use "in" to check without a ValueError).
list.remove(elem) -- searches for the first instance of the given element and removes it (throws
ValueError if not present)
list.sort() -- sorts the list in place (does not return it). (The sorted() function shown later is
preferred.)
list.pop(index) -- removes and returns the element at the given index. Returns the rightmost
element if index is omitted (roughly the opposite of append()).
The re.search() method takes a regular expression pattern and a string and searches for that
pattern within the string. If the search is successful, search() returns a match object or None
otherwise. Therefore, the search is usually immediately followed by an if-statement to test if the
search succeeded, as shown in the following example which searches for the pattern 'word:'
followed by a 3 letter word (details below):
8
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
# If-statement after search() tests if it succeeded
if match:
print 'found', match.group() ## 'found word:cat'
else:
print 'did not find'
The 'r' at the start of the pattern string designates a python "raw" string which passes through
backslashes without change which is very handy for regular expressions (Java needs this
feature badly!). I recommend that you always write pattern strings with the 'r' just as a habit.
Basic Examples
The basic rules of regular expression search for a pattern within a string are:
The search proceeds through the string from start to end, stopping at the first
match found
All of the pattern must be matched, but not all of the string
Repetition
9
Things get more interesting when you use + and * to specify repetition in the pattern
+ -- 1 or more occurrences of the pattern to its left, e.g. 'i+' = one or more i's
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii') # found, match.group() == "ii"
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2 3xx') # found, match.group()
== "1 2 3"
Emails Example
Suppose you want to find the email address inside the string 'xyz [email protected]
purple monkey'. We'll use this as a running example to demonstrate more regular
expression features. Here's an attempt using the pattern r'\w+@\w+':
The search does not get the whole email address in this case because the \w does not
match the '-' or '.' in the address. We'll fix this using the regular expression features
below.
Square Brackets
Square brackets can be used to indicate a set of chars, so [abc] matches 'a' or 'b' or 'c'.
The codes \w, \s etc. work inside square brackets too with the one exception that dot (.)
just means a literal dot. For the emails problem, the square brackets are an easy way to
add '.' and '-' to the set of chars which can appear around the @ with the pattern r'[\w.-]
+@[\w.-]+' to get the whole email address:
10
match = re.search(r'[\w.-]+@[\w.-]+', str)
if match:
print match.group() ## '[email protected]'
(More square-bracket features) You can also use a dash to indicate a range, so [a-z]
matches all lowercase letters. To use a dash without indicating a range, put the dash
last, e.g. [abc-]. An up-hat (^) at the start of a square-bracket set inverts it, so [^ab]
means any char except 'a' or 'b'.
Group Extraction
The "group" feature of a regular expression allows you to pick out parts of the matching
text. Suppose for the emails problem that we want to extract the username and host
separately. To do this, add parenthesis ( ) around the username and host in the pattern,
like this: r'([\w.-]+)@([\w.-]+)'. In this case, the parenthesis do not change what the
pattern will match, instead they establish logical "groups" inside of the match text. On a
successful search, match.group(1) is the match text corresponding to the 1st left
parenthesis, and match.group(2) is the text corresponding to the 2nd left parenthesis.
The plain match.group() is still the whole match text as usual.
A common workflow with regular expressions is that you write a pattern for the thing
you are looking for, adding parenthesis groups to extract the parts you want.
findall
findall() is probably the single most powerful function in the re module. Above we used
re.search() to find the first match for a pattern. findall() finds *all* the matches and
returns them as a list of strings, with each string representing one match.
11
## Suppose we have a text with many email addresses
str = 'purple [email protected], blah monkey [email protected] blah dishwasher'
Python Utilities
In this section, we look at a few of Python's many standard utility modules to solve
common problems.
Exceptions
An exception represents a run-time error that halts the normal execution at a particular
line and transfers control to error handling code. This section just introduces the most
basic uses of exceptions. For example a run-time error might be that a variable used in
the program does not have a value (ValueError .. you've probably seen that one a few
times), or a file open operation error because a file does not exist (IOError). Learn more
in the exceptions tutorial and see the entire exception list.
Without any error handling code (as we have done thus far), a run-time exception just
halts the program with an error message. That's a good default behavior, and you've
seen it many times. You can add a "try/except" structure to your code to handle
exceptions, like this:
try:
## Either of these two lines could throw an IOError, say
## if the file does not exist or the read() encounters a low level
error.
f = open(filename, 'rU')
text = f.read()
f.close()
except IOError:
## Control jumps directly to here if any of the above lines throws
IOError.
sys.stderr.write('problem reading:' + filename)
12
## In any case, the code then continues with the line after the
try/except
The try: section includes the code which might throw an exception. The except: section
holds the code to run if there is an exception. If there is no exception, the except:
section is skipped (that is, that code is for error handling only, not the "normal" case for
the code). You can get a pointer to the exception object itself with syntax "except
IOError as e: .." (e points to the exception object).
PROGRAMIZ
Python Keywords
We cannot use a keyword as a variable name, function name or any other identifier. They are used to
define the syntax and structure of the Python language.
There are 33 keywords in Python 3.7. This number can vary slightly over the course of time.
All the keywords except True, False and None are in lowercase and they must be written as they are.
Python Identifiers
An identifier is a name given to entities like class, functions, variables, etc. It helps to differentiate one
entity from another.
Things to Remember
Always give the identifiers a name that makes sense. While c = 10 is a valid name, writing count =
10 would make more sense, and it would be easier to figure out what it represents when you look at
your code after a long gap.
13
In this tutorial, you will learn about Python statements, why indentation is important and use of
comments in programming.
Python Statement
Instructions that a Python interpreter can execute are called statements. For example, a = 1 is an
assignment statement. if statement, for statement, while statement, etc. are other kinds of statements
which will be discussed later.
Python Indentation
Most of the programming languages like C, C++, and Java use braces { } to define a block of code.
Python, however, uses indentation.
A code block (body of a function, loop, etc.) starts with indentation and ends with the first unindented
line. The amount of indentation is up to you, but it must be consistent throughout that block.
Generally, four whitespaces are used for indentation and are preferred over tabs.
Python Comments
Comments are very important while writing a program. They describe what is going on inside a program,
so that a person looking at the source code does not have a hard time figuring it out.
You might forget the key details of the program you just wrote in a month's time. So taking the time to
explain these concepts in the form of comments is always fruitful.
Multi-line comments
In this tutorial, you will learn about Python variables, constants, literals and their use cases.
Python Variables
A variable is a named location used to store data in the memory. It is helpful to think of variables as a
container that holds data that can be changed later in the program.
As you can see from the above example, you can use the assignment operator = to assign a value to a
variable.
Constants
A constant is a type of variable whose value cannot be changed. It is helpful to think of constants as
containers that hold information which cannot be changed later.
You can think of constants as a bag to store some books which cannot be replaced once placed inside
the bag.
14
PI = 3.14
GRAVITY = 9.8
import constant
print(constant.PI)
print(constant.GRAVITY)
Literals
Literal is a raw data given in a variable or constant. In Python, there are various types of literals they are
as follows:
Numeric Literals
Numeric Literals are immutable (unchangeable). Numeric literals can belong to 3 different numerical
types: Integer, Float, and Complex.
String literals
A string literal is a sequence of characters surrounded by quotes. We can use both single, double, or
triple quotes for a string. And, a character literal is a single character surrounded by single or double
quotes.
Boolean literals
Special literals
Python contains one special literal i.e. None. We use it to specify that the field has not been created.
Literal Collections
There are four different literal collections List literals, Tuple literals, Dict literals, and Set literals.
In this tutorial, you will learn about different data types you can use in Python.
Every value in Python has a datatype. Since everything is an object in Python programming, data types
are actually classes and variables are instance (object) of these classes.
15
There are various data types in Python. Some of the important types are listed below. We can use
the type() function to know which class a variable or a value belongs to
A floating-point number is accurate up to 15 decimal places. Integer and floating points are separated by
decimal points. 1 is an integer, 1.0 is a floating-point number.
Python List
List is an ordered sequence of items. It is one of the most used datatype in Python and is very flexible.
All the items in a list do not need to be of the same type.
Declaring a list is pretty straight forward. Items separated by commas are enclosed within brackets [ ].
We can use the slicing operator [ ] to extract an item or a range of items from a list. The index starts
from 0 in Python. Lists are mutable, meaning, the value of elements of a list can be altered.
Python Tuple
Tuple is an ordered sequence of items same as a list. The only difference is that tuples are immutable.
Tuples once created cannot be modified.
Tuples are used to write-protect data and are usually faster than lists as they cannot change
dynamically.
We can use the slicing operator [] to extract items but we cannot change its value.
Python Strings
String is sequence of Unicode characters. We can use single quotes or double quotes to represent
strings. Multi-line strings can be denoted using triple quotes, ''' or """.Just like a list and tuple, the slicing
operator [ ] can be used with strings. Strings, however, are immutable.
Python Set
Set is an unordered collection of unique items. Set is defined by values separated by comma inside
braces { }. Items in a set are not ordered. We can perform set operations like union, intersection on two
sets. Sets have unique values. They eliminate duplicates. Since, set are unordered collection, indexing
has no meaning. Hence, the slicing operator [] does not work.
Python Dictionary
It is generally used when we have a huge amount of data. Dictionaries are optimized for retrieving data.
We must know the key to retrieve the value.
16
In Python, dictionaries are defined within braces {} with each item being a pair in the form key:value.
Key and value can be of any type.
We use key to retrieve the respective value. But not the other way around.
We can convert between different data types by using different type conversion functions
like int(), float(), str(), etc. Conversion from float to int will truncate the value (make it closer to zero).
Conversion to and from string must contain compatible values. We can even convert one sequence to
another.
>>> set([1,2,3])
{1, 2, 3}
>>> tuple({5,6,7})
(5, 6, 7)
>>> list('hello')
['h', 'e', 'l', 'l', 'o']
>>> dict([[1,2],[3,4]])
{1: 2, 3: 4}
>>> dict([(3,26),(4,44)])
{3: 26, 4: 44}
In this article, you will learn about the Type conversion and uses of type conversion.
Type Conversion
The process of converting the value of one data type (integer, string, float, etc.) to another data type is
called type conversion. Python has two types of type conversion.
1. num_int = 123
2. num_flo = 1.23
3. num_new = num_int + num_flo
4. Value of num_new: 124.23
5. datatype of num_new: <class 'float'>
Also, we can see the num_new has a float data type because Python always converts smaller data types
to larger data types to avoid the loss of data.
17
2. Explicit Type Conversion
In Explicit Type Conversion, users convert the data type of an object to required data type. We use the
predefined functions like int(), float(), str(), etc to perform explicit type conversion.
This type of conversion is also called typecasting because the user casts (changes) the data type of the
objects.
This tutorial focuses on two built-in functions print() and input() to perform I/O task in Python. Also, you
will learn to import modules and use them in your program.
We use the print() function to output data to the standard output device (screen). We can also output
data to a file, but this will be discussed later.
After all values are printed, end is printed. It defaults into a new line.
Python Input
input([prompt])
Python Import
When our program grows bigger, it is a good idea to break it into different modules.
A module is a file containing Python definitions and statements. Python modules have a filename and
end with the extension .py.
Definitions inside a module can be imported to another module or the interactive interpreter in Python.
We use the import keyword to do this.
4. Python Operators
In this tutorial, you'll learn everything about different types of operators in Python, their syntax and how
to use them with examples.
Arithmetic operators
18
Arithmetic operators are used to perform mathematical operations like addition, subtraction,
multiplication, etc.
/ Divide left operand by the right one (always results into float) x/y
x**y (x to the
** Exponent - left operand raised to the power of right
power y)
Comparison operators
Operato
Meaning Example
r
> Greater than - True if left operand is greater than the right x>y
< Less than - True if left operand is less than the right x<y
19
Greater than or equal to - True if left operand is greater than or equal to the
>= x >= y
right
<= Less than or equal to - True if left operand is less than or equal to the right x <= y
Logical operators
Assignment operators
a = 5 is a simple assignment operator that assigns the value 5 on the right to the variable a on the left.
There are various compound operators in Python like a += 5 that adds to the variable and later assigns
the same. It is equivalent to a = a + 5.
= x=5 x=5
+= x += 5 x=x+5
-= x -= 5 x=x-5
*= x *= 5 x=x*5
/= x /= 5 x=x/5
%= x %= 5 x=x%5
20
//= x //= 5 x = x // 5
**= x **= 5 x = x ** 5
|= x |= 5 x=x|5
^= x ^= 5 x=x^5
Special operators
Python language offers some special types of operators like the identity operator or the membership
operator. They are described below with examples.
Identity operators
is and is not are the identity operators in Python. They are used to check if two values (or variables) are
located on the same part of the memory. Two variables that are equal does not imply that they are
identical.
is True if the operands are identical (refer to the same object) x is True
is not True if the operands are not identical (do not refer to the same object) x is not True
Membership operators
in and not in are the membership operators in Python. They are used to test whether a value or variable
is found in a sequence (string, list, tuple, set and dictionary).
21
In a dictionary we can only test for presence of key, not the value.
In this article, you will learn to create decisions in a Python program using different forms of if..else
statement. The if…elif…else statement is used in Python for decision making.
if test expression:
statement(s)
Here, the program evaluates the test expression and will execute statement(s) only if the test expression
is True.
Syntax of if...else
if test expression:
Body of if
else:
Body of else
The if..else statement evaluates test expression and will execute the body of if only when the test
condition is True.
If the condition is False, the body of else is executed. Indentation is used to separate the blocks.
Syntax of if...elif...else
22
if test expression:
Body of if
Body of elif
else:
Body of else
The elif is short for else if. It allows us to check for multiple expressions.
If the condition for if is False, it checks the condition of the next elif block and so on.
Only one block among the several if...elif...else blocks is executed according to the condition.
Any number of these statements can be nested inside one another. Indentation is the only way to figure
out the level of nesting. They can get confusing, so they must be avoided unless necessary.
23
In this article, you'll learn to iterate over a sequence of elements using the different variations of for
loop.
The for loop in Python is used to iterate over a sequence (list, tuple, string) or other iterable objects.
Iterating over a sequence is called traversal.
Body of for
Here, val is the variable that takes the value of the item inside the sequence on each iteration.
Loop continues until we reach the last item in the sequence. The body of for loop is separated from the
rest of the code using indentation.
We can also define the start, stop and step size as range(start, stop,step_size). step_size defaults to 1 if
not provided.
The range object is "lazy" in a sense because it doesn't generate every number that it "contains" when
we create it. However, it is not an iterator since it supports in, len and __getitem__ operations.
This function does not store all the values in memory; it would be inefficient. So it remembers the start,
stop, step size and generates the next number on the go.
To force this function to output all the items, we can use the function list().
print(range(10))
print(list(range(10)))
print(list(range(2, 8)))
24
# iterate over the list using index
for i in range(len(genre)):
print("I like", genre[i])
The break keyword can be used to stop a for loop. In such cases, the else part is ignored.
Loops are used in programming to repeat a specific block of code. In this article, you will learn to create
a while loop in Python.
The while loop in Python is used to iterate over a block of code as long as the test expression (condition)
is true.
We generally use this loop when we don't know the number of times to iterate beforehand.
while test_expression:
Body of while
In the while loop, test expression is checked first. The body of the loop is entered only if
the test_expression evaluates to True. After one iteration, the test expression is checked again. This
process continues until the test_expression evaluates to False.
The body starts with indentation and the first unindented line marks the end.
25
# Program to add natural
# numbers up to
# sum = 1+2+3+...+n
n = 10
while i <= n:
sum = sum + i
i = i+1 # update counter
We need to increase the value of the counter variable in the body of the loop. This is very important
(and mostly forgotten). Failing to do so will result in an infinite loop (never-ending loop).
The while loop can be terminated with a break statement. In such cases, the else part is ignored. Hence,
a while loop's else part runs if no break occurs and the condition is false.
'''Example to illustrate
the use of else statement
with the while loop'''
counter = 0
26
else:
print("Inside else")
Inside loop
Inside loop
Inside loop
Inside else
In this article, you will learn to use break and continue statements to alter the flow of a loop.
Loops iterate over a block of code until the test expression is false, but sometimes we wish to terminate
the current iteration or even the whole loop without checking test expression.
The break statement terminates the loop containing it. Control of the program flows to the statement
immediately after the body of the loop.
If the break statement is inside a nested loop (loop inside another loop), the break statement will
terminate the innermost loop.
print("The end")
s
t
r
The end
27
The continue statement is used to skip the rest of the code inside a loop for the current iteration only.
Loop does not terminate but continues on with the next iteration.
print("The end")
s
t
r
n
g
The end
DAY 2
Python Functions
In this article, you'll learn about functions, what a function is, the syntax, components, and types of
functions. Also, you'll learn to create a function in Python.
Functions help break our program into smaller and modular chunks. As our program grows larger and
larger, functions make it more organized and manageable.
Syntax of Function
def function_name(parameters):
"""docstring"""
statement(s)
28
2. Parameters (arguments) through which we pass values to a function. They are optional.
3. A colon (:) to mark the end of the function header.
4. One or more valid python statements that make up the function body. Statements must have
the same indentation level (usually 4 spaces).
The return statement is used to exit a function and go back to the place from where it was called.
Types of Functions
1. Built-in functions - Functions that are built into Python. Like str(), int()
In Python, you can define a function that takes variable number of arguments. In this article, you will
learn to define such functions using default, keyword and arbitrary arguments.
In Python, there are other ways to define a function that can take variable number of arguments:
When we call a function with some values, these values get assigned to the arguments according to their
position.
# 2 keyword arguments
greet(name = "Bruce",msg = "How do you do?")
# 2 keyword arguments (out of order)
greet(msg = "How do you do?",name = "Bruce")
Sometimes, we do not know in advance the number of arguments that will be passed into a function.
Python allows us to handle this kind of situation through function calls with an arbitrary number of
arguments.
def greet(*names):
"""This function greets all
the person in the names tuple."""
# names is a tuple with arguments
for name in names:
print("Hello", name)
greet("Monica", "Luke", "Steve", "John")
29
Output
Hello Monica
Hello Luke
Hello Steve
Hello John
In this tutorial, you will learn to create a recursive function (a function that calls itself).
What is recursion?
A physical world example would be to place two parallel mirrors facing each other. Any object in
between them would be reflected recursively.
In Python, we know that a function can call other functions. It is even possible for the function to call
itself. These types of construct are termed as recursive functions.
The following image shows the working of a recursive function called recurse .
30
Our recursion ends when the number reduces to 1. This is called the base condition.
Advantages of Recursion
2. A complex task can be broken down into simpler sub-problems using recursion.
3. Sequence generation is easier with recursion than using some nested iteration.
Disadvantages of Recursion
2. Recursive calls are expensive (inefficient) as they take up a lot of memory and time.
In this article, you'll learn about the anonymous function, also known as lambda functions. You'll learn
what they are, their syntax and how to use them (with examples).
31
In Python, an anonymous function is a function that is defined without a name.
While normal functions are defined using the def keyword in Python, anonymous functions are defined
using the lambda keyword.
Lambda functions can have any number of arguments but only one expression. The expression is
evaluated and returned. Lambda functions can be used wherever function objects are required.
print(double(5))
In the above program, lambda x: x * 2 is the lambda function. Here x is the argument and x * 2 is the
expression that gets evaluated and returned.
In Python, we generally use it as an argument to a higher-order function (a function that takes in other
functions as arguments). Lambda functions are used along with built-in functions
like filter() , map() etc.
The function is called with all the items in the list and a new list is returned which contains items for
which the function evaluates to True .
The function is called with all the items in the list and a new list is returned which contains items
returned by that function for each item.
32
In this tutorial, you’ll learn about Python Global variables, Local variables, Nonlocal variables and where
to use them.
In this article, you’ll learn about the global keyword, global variable and when to use global keywords.
In Python, global keyword allows you to modify the variable outside of the current scope. It is used to
create a global variable and make changes to the variable in a local context.
Python Modules
In this article, you will learn to create and import custom modules in Python. Also, you will find different
techniques to import and use custom and built-in modules in Python.
We use modules to break down large programs into small manageable and organized files. Furthermore,
modules provide reusability of code.
We can define our most used functions in a module and import it, instead of copying their definitions
into different programs.
Python has tons of standard modules. You can check out the full list of Python standard modules and
their use cases. These files are in the Lib directory inside the location where you installed Python.
Standard modules can be imported the same way as we import our user-defined modules.
We can use the dir() function to find out names that are defined inside a module.
>>> dir(example)
33
['__builtins__',
'__cached__',
'__doc__',
'__file__',
'__initializing__',
'__loader__',
'__name__',
'__package__',
'add']
Python Package
In this article, you'll learn to divide your code base into clean, efficient modules using Python packages.
Also, you'll learn to import and use your own or third party packagesin your Python program.
We don't usually store all of our files on our computer in the same location. We use a well-organized
hierarchy of directories for easier access.
We can import modules from packages using the dot (.) operator.
DAY 3
We will learn numpy(Basics Functionalities and Matrix Operations)
34
We can create a NumPy array (a.k.a. the mighty ndarray) by passing a python list to it and using
` np.array()`: np.array([1,2,3])
Array Arithmetic
Adding = the arithmetic is made on rows
Multiplications:
See how NumPy understood that operation to mean that the multiplication should happen with
each cell? That concept is called broadcasting, and it’s very useful.
Indexing
We can index and slice NumPy arrays in all the ways we can slice python lists:
Aggregation
Additional benefits NumPy gives us are aggregation functions:
you get all the greats like mean to get the average, prod to get the result of multiplying all the
elements together, std to get standard deviation,
https://jakevdp.github.io/PythonDataScienceHandbook/02.04-computation-on-arrays-aggregates.html
A key part of the beauty of NumPy is its ability to apply everything we’ve looked at so far to any
number of dimensions.
Creating Matrices
np.array([[1,2],[3,4]])
35
Matrix Arithmetic
We can add and multiply matrices using arithmetic operators (+-*/) if the two matrices are the
same size. NumPy handles those as position-wise operations:
Dot Product
A key distinction to make with arithmetic is the case of matrix multiplication using the dot
product. NumPy gives every matrix a dot() method we can use to carry-out dot product
operations with other matrices:
Matrix Indexing
Indexing and slicing operations become even more useful when we’re manipulating matrices:
Where data[row, column]
Matrix Aggregation
We can aggregate matrices the same way we aggregated vectors:
Not only can we aggregate all the values in a matrix, but we can also aggregate across the rows
or columns by using the axis parameter:
36
Yet More Dimensions
NumPy can do everything we’ve mentioned in any number of dimensions. Its central data
structure is called ndarray (N-Dimensional Array) for a reason.
In a lot of ways, dealing with a new dimension is just adding a comma to the parameters of a
NumPy function:
37
Practical Usage
And now for the payoff. Here are some examples of the useful things NumPy will help you
through.
Formulas
Implementing mathematical formulas that work on matrices and vectors is a key use case to
consider NumPy for. It’s why NumPy is the darling of the scientific python community. For
example, consider the mean square error formula that is central to supervised machine learning
models tackling regression problems:
The beauty of this is that numpy does not care if predictions and labels contain one or a
thousand values (as long as they’re both the same size). We can walk through an example
stepping sequentially through the four operations in that line of code:
Data Representation
Think of all the data types you’ll need to crunch and build models around (spreadsheets,
images, audio…etc). So many of them are perfectly suited for representation in an n-
dimensional array:
Tables and Spreadsheets
38
Images
If the image is colored, then each pixel is represented by three numbers - a value for each of
red, green, and blue. In that case we need a 3rd dimension (because each cell can only contain
one number). So a colored image is represented by an ndarray of dimensions: (height x width x
3).
39
Domain-specific packages,
Mayavi for 3-D visualization
pandas, statsmodels, seaborn for statistics
sympy for symbolic computing
scikit-image for image processing
scikit-learn for machine learning
40
DAY 4
Learn Pandas basic functionalities and playing with dummy data sets
What is it?¶
pandas is an open source Python library for data analysis. Python has always been great for
prepping and munging data, but it's never been great for analysis - you'd usually end up using R
or loading it into a database and using SQL (or worse, Excel). pandas makes Python great for
analysis.
Data Structures¶
pandas introduces two new data structures to Python - Series and DataFrame, both of which
are built on top of NumPy (this means it's fast).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.set_option('max_columns', 50)
%matplotlib inline
Series¶
A Series is a one-dimensional object similar to an array, list, or column in a table. It will assign a
labeled index to each item in the Series. By default, each item will receive an index label from 0
to N, where N is the length of the Series minus one.
For series you can specify an index or transform a dictionary in series using the keys as the
index:
s = pd.Series([7, 'Heisenberg', 3.14, -1789710578, 'Happy Eating!'], index=['A', 'Z', 'C', 'Y', 'E'])#s
41
d = {'Chicago': 1000, 'New York': 1300, 'Portland': 900, 'San Francisco': 1100, 'Austin': 450,
'Boston': None} #Dictionary
cities = pd.Series(d) #Transformation
You can also change the values in a Series based on indexing or Boolean indexing
cities['Chicago'] = 1400 # changing based on the index
cities[cities < 1000] = 750 # changing values using boolean logic
You can add two Series together, which returns a union of the two Series with
the addition occurring on the shared index values.
print(cities[['Chicago', 'New York', 'Portland']] + cities[['Austin', 'New York']])
DataFrame¶
A DataFrame is a tablular data structure comprised of rows and columns, akin to a spreadsheet,
database table, or R's data.frame object. You can also think of a DataFrame as a group of Series
objects that share an index (the column names).
Creating a data frame or read a file dataframe by csv:
data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions', 'Lions', 'Lions'],
'wins': [11, 8, 10, 15, 11, 6, 10, 4],
42
'losses': [5, 8, 6, 1, 5, 10, 6, 12]}
football = pd.DataFrame(data, columns=['year', 'team', 'wins', 'losses'])
football
from_csv = pd.read_csv('mariano-rivera.csv')
Or if you want to delete a dataframe use and read dataframe direct from a database or an url:
del football # delete the DataFrame
conn = sqlite3.connect('/Users/gjreda/Dropbox/gregreda.com/_code/towed')
query = "SELECT * FROM towed WHERE make = 'FORD';"
url = 'https://raw.github.com/gjreda/best-sandwiches/master/data/best-sandwiches-geocode.tsv'
# fetch the text from the URL and read it into a DataFrame
from_url = pd.read_table(url, sep='\t')
from_url.head(3)
Inspection
dataframe.info() #it shows to us the rows and memory usage
movies.dtypes #show to us the datatypes
users.describe() #basic statstic for numeric var
43
movies.head() or users['occupation'].head() #get a sample of DF
movies.tail(3) $get a sample of the df
movies[20:22] or users.iloc[[1, 50, 300]] #slicing for rowss
users[users.age > 25].head(3) # users older than 25
users[(users.age == 40) & (users.sex == 'M')].head(3) # users aged 40 AND male
users[(users.sex == 'F') | (users.age < 30)].head(3) # users younger than 30 OR female
Joining
Throughout an analysis, we'll often need to merge/join datasets as data is typically stored in
a relational manner. The function provides a series of parameters (on, left_on, right_on,
left_index, right_index)
Grouping
Grouping in pandas took some time for me to grasp, but it's pretty awesome once it clicks.
Assume we have a DataFrame and want to get the average for each group - visually, the split-
apply-combine method looks like this
chicago = pd.read_csv('city-of-chicago-salaries.csv', converters={'salary': lambda x:
float(x.replace('$', ''))})
split-apply-combine
The real power of groupby comes from it's split-apply-combine ability.
def ranker(df):
"""Assigns a rank to each employee based on salary, with 1 being the highest paid.
Assumes the data is DESC sorted."""
df['dept_rank'] = np.arange(len(df)) + 1
return df
44
chicago.sort_values('salary', ascending=False, inplace=True)
chicago = chicago.groupby('department').apply(ranker)
print(chicago[chicago.dept_rank == 1].head(7))
chicago[chicago.department == "LAW"][:5]
The above movies are rated so rarely that we can't count them as quality films.
Let's only look at movies that have been rated at least 100 times.
atleast_100 = movie_stats['rating']['size'] >= 100
movie_stats[atleast_100].sort_values([('rating', 'mean')], ascending=False)[:15]
45
labels = ['0-9', '10-19', '20-29', '30-39', '40-49', '50-59', '60-69', '70-79']
lens['age_group'] = pd.cut(lens.age, range(0, 81, 10), right=False, labels=labels)
lens[['age', 'age_group']].drop_duplicates()[:10]
lens.set_index('movie_id', inplace=True)
by_age = lens.loc[most_50.index].groupby(['title', 'age_group'])
by_age.rating.mean().head(15)
pivoted.reset_index('movie_id', inplace=True)
disagreements = pivoted[pivoted.movie_id.isin(most_50.index)]['diff']
disagreements.sort_values().plot(kind='barh', figsize=[9, 15])
plt.title('Male vs. Female Avg. Ratings\n(Difference > 0 = Favored by Men)')
plt.ylabel('Title')
plt.xlabel('Average Rating Difference');
What is LA?
46
Linear Algebra is a branch of mathematics that lets you concisely describe coordinates and
interactions of planes in higher dimensions and perform operations on them.
Linear Algebra is about working on linear systems of equations (linear regression is an
example: y = Ax). Rather than working with scalars, we start working with matrices and
vectors (vectors are really just a special type of matrix).
Why LA?
A list of the Top 10 Algorithms of science and engineering during the 20th century includes:
the matrix decompositions approach to linear algebra. It also includes the QR algorithm, which
we'll cover, and Krylov iterative methods which we'll see an example of
As a field, it’s useful to you because you can describe (and even execute with the right
libraries) complex operations used in machine learning using the notation and formalisms
from linear algebra.
Matrix: an m*x matrix A is a collection of scalar values arranged in a rectangle of m rows and n
columns
Tensor
47
Dot product
Equation of line
Inverse: is the inverses for the matrix. Each number has an inverse
A (A-1) = A-1 A=I
Here's the idea: If A is a n by n matrix, and it has an inverse, I will say a bit more about that
later, then the inverse is going to be written A to the minus one and A times this inverse, A to
the minus one, is going to equal to A inverse times A, is going to give us back the identity matrix
Determination of a matrix: the volume of the transforation of the matrix A.
48
To describe the supervised learning problem slightly more formally, our goal is, given a training
set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value
49
of y. For historical reasons, this function h is called a hypothesis. Hyphotesys in this case means
that we are going to predict that y is a linear function of x
Cost Function or standard error = this will let us figure out how to fit the best possible straight
line to our data. measure the accuracy of our hypothesis function by using a cost function. or
the difference between the predicted value and the actual value.
Choose the parameters so that our hypothesis is close to y for our training example(x,y)
theta one, they stabilize what I call the parameters of the model. With different choices of the
parameter's theta 0 and theta 1, we get different hypothesis, different hypothesis functions.
Each value of theta one corresponds to a different hypothesis, or to a different straight line fit
on the left. And for each value of theta one, we could then derive a different value of j of theta
one.And that's why minimizing j of theta one corresponds to finding a straight line that fits the
data well
50
51
The graph above minimizes the cost function as much as possible and consequently, the result
of \theta_1θ1 and \theta_0θ0 tend to be around 0.12 and 250 respectively. Plotting those values
on our graph to the right seems to put our point in the center of the inner most 'circle'.
I want to tell you about an algorithm called gradient descent for minimizing the cost function J.
This function keep changing the parameters to reduce the standard error untill we end up with
at a minimum......
A_{ij}Aij refers to the element in the ith row and jth column of matrix A.
A vector with 'n' rows is referred to as an 'n'-dimensional vector.
52
v_ivi refers to the element in the ith row of the vector.
In general, all our vectors and matrices will be 1-indexed. Note that for some programming
languages, the arrays are 0-indexed.
Matrices are usually denoted by uppercase names while vectors are lowercase.
"Scalar" means that an object is a single value, not a vector or matrix.
53
Let's say we have a set of four houses so 4 houses with 4 sizes like these. And let's say I have a
hypotheses for predicting what is the price of a house. It turns out that the resulat vector is the
prediction based on my hypotheses for my houses
For multipling a matrix with matrix the number of columns from first matrix need to match the
number of rows from second matrix. The result is number of rows of first mat and number of
col from sec matrix
54
55
56
DAY 6 10 obvious and concrete examples of linear algebra in
machine learning.
Linear algebra is a sub-field of mathematics concerned with vectors, matrices, and linear
transforms.
It is a key foundation to the field of machine learning, from notations used to describe the
operation of algorithms to the implementation of algorithms in code.
57
Download the photograph and save it in your current working directory with the file name
“opera_house.jpg“.
# load and show an image with Pillow
from PIL import Image
image = Image.open('opera_house.jpg')
# summarize some details about the image
print(image.format)
print(image.mode)
print(image.size)
# show the image
image.show()
#For example, you could easily load all images in a directory as a list as follows:
# load all images in a directory
from os import listdir
from matplotlib import image
# load all images in a directory
loaded_images = list()
for filename in listdir('images'):
# load image
img_data = image.imread('images/' + filename)
# store loaded image
loaded_images.append(img_data)
print('> loaded %s %s' % (filename, img_data.shape))
One-Hot Encoding
# example of a one hot encoding
from numpy import asarray
from sklearn.preprocessing import OneHotEncoder
# define data
data = asarray([['red'], ['green'], ['blue']])
print(data)
# define one hot encoding
encoder = OneHotEncoder(sparse=False)
# transform data
onehot = encoder.fit_transform(data)
print(onehot)
Dummy Variable Encoding
# example of a dummy variable encoding
from numpy import asarray
from sklearn.preprocessing import OneHotEncoder
# define data
data = asarray([['red'], ['green'], ['blue']])
print(data)
# define one hot encoding
encoder = OneHotEncoder(drop='first', sparse=False)
# transform data
onehot = encoder.fit_transform(data)
print(onehot)
4. Linear Regression
Linear regression is an old method from statistics for describing the relationships between
variables.
The objective of creating a linear regression model is to find the values for the coefficient
values (b) that minimize the error in the prediction of the output variable y.
The way this is typically achieved is by finding a solution where the values for b in the model
minimize the squared error. This is called linear least squares.
59
If you have used a machine learning tool or library, the most common way of solving linear
regression is via a least squares optimization that is solved using matrix factorization
methods from linear regression, such as an LU decomposition or a singular-value
decomposition, or SVD.
y = A . b
Where y is the output variable A is the dataset and b are the model coefficients.
60
[0.31, 0.35],
[0.42, 0.38],
[0.5, 0.49],
])
X, y = data[:,0], data[:,1]
X = X.reshape((len(X), 1))
# QR decomposition
Q, R = qr(X)
b = inv(R).dot(Q.T).dot(y)
print(b)
# predict using coefficients
yhat = X.dot(b)
# plot data and predictions
pyplot.scatter(X, y)
pyplot.plot(X, yhat, color='red')
pyplot.show()
The QR decomposition approach is more computationally efficient and more numerically
stable than calculating the normal equation directly, but does not work for all data matrices.
How to solve linear regression using SVD and the pseudoinverse.
5. Regularization
A technique that is often used to encourage a model to minimize the size of coefficients
while it is being fit on data is called regularization. Common implementations include the L2
and L1 forms of regularization.
61
Both of these forms of regularization are in fact a measure of the magnitude or length of the
coefficients as a vector and are methods lifted directly from linear algebra called the vector
norm
The L1 norm that is calculated as the sum of the absolute values of the vector.
# l1 norm of a vector
from numpy import array
from numpy.linalg import norm
a = array([1, 2, 3])
print(a)
l1 = norm(a, 1)
print(l1)
The L2 norm that is calculated as the square root of the sum of the squared vector
values.
# l2 norm of a vector
from numpy import array
from numpy.linalg import norm
a = array([1, 2, 3])
print(a)
l2 = norm(a)
print(l2)
Often, a dataset has many columns, perhaps tens, hundreds, thousands, or more.
Modeling data with many features is challenging, and models built from data that include
irrelevant features are often less skillful than models trained from the most relevant data.
It is hard to know which features of the data are relevant and which are not.
Methods for automatically reducing the number of columns of a dataset are called
dimensionality reduction
, and perhaps the most popular method is called the principal component analysis, or PCA
for short.
62
This method is used in machine learning to create projections of high-dimensional data for
both visualization and for training models.
The core of the PCA method is a matrix factorization method from linear algebra.
The eigendecomposition can be used and more robust implementations may use the
singular-value decomposition, or SVD
Steps:
The first step is to calculate the mean values of each column.
Next, we need to center the values in each column by subtracting the mean column value.
The next step is to calculate the covariance matrix of the centered matrix C. Where
covariance is a generalized and unnormalized version of correlation across multiple
columns. A covariance matrix is a calculation of covariance of a given matrix with
covariance scores for every column with every other column, including itself.
Finally, we calculate the eigendecomposition of the covariance matrix V. This results in a list
of eigenvalues and a list of eigenvectors. If all eigenvalues have a similar value, then we
know that the existing representation may already be reasonably compressed or dense and
that the projection may offer little. If there are eigenvalues close to zero, they represent
components or axes of B that may be discarded.
A total of m or less components must be selected to comprise the chosen subspace
Once chosen, data can be projected into the subspace via matrix multiplication.
1 P = B^T . A
Where A is the original data that we wish to project, B^T is the transpose of the chosen
principal components and P is the projection of A.
This is called the covariance method for calculating the PCA, although there are alternative
ways to to calculate it.
63
# eigendecomposition of covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)
# project data
P = vectors.T.dot(C.T)
print(P.T)
Descriptive statistics refers to methods for summarizing and organizing the information in a
data set and inferential is about conclusion about population.
Why statistics for machine learning – because Machine Learning is an interdisciplinary field that
uses statistics, probability, algorithms to learn from data and provide insights which can be
used to build intelligent applications
In the descriptive statistic we identify the type of variable and number of observation and
identify the distribution of variable and outliers. Talk about the measure of center(Indicate on
the number line where is the central part of the data located) and measure of
variability(Quantify the amount of variation, spread or dispersion present in the data) and
measure of position (Indicate the relative position of a particular data value in the data
distribution). And there are two type of descriptive stat, more exactly Uni-variate Descriptive
Statistics(The various plots used to visualize uni-variate data typically are Bar Charts,
Histograms, Pie Charts. etc.) and Bi-variate Descriptive Statistics(Bi-variate analysis involves the
analysis of two variables for the purpose of determining the empirical relationship between
them. The various plots used to visualize bi-variate data typically are scatter-plot, box-plot.)
Scatter Plots = The simplest way to visualize the relationship between two quantitative
variables , x and y. Scatter plots are sometimes called correlation plots because they show how
two variables are correlated.
Correlation = A correlation is a statistic intended to quantify the strength of the relationship
between two variables. The correlation coefficient r quantifies the strength and direction of the
linear relationship between two quantitative variables.
Box Plots = A box plot is also called a box and whisker plot and it’s used to picture the
distribution of values. When one variable is categorical and the other continuous, a box-plot is
commonly used.
Histogram is created by dividing values into beans and stack the value that fall in the same bin.
The bins need to be not to wide not to narrow
Mean = the average of a distribution. Sum divided by total number
64
Mode The mode is the data value that occurs with the greatest frequency
Median The median is the middle data value
Variance = how spread are the data in a variable
SD = square root of variance. The standard deviation or sd of a bunch of numbers tells you how
much the individual numbers tend to differ from the mean.
Parameter is a measurement of a variable distributin
Gaussian/normal distribution = Is a function that shows the possible values for a variable and
how often they occur. The width of the curve is defined by sd.
Probability and statistics are related areas of mathematics which concern themselves with
analyzing the relative frequency of events. But probability deals with predicting the likelihood
of future events, while statistics involves the analysis of the frequency of past events.
Bayes’s theorem is a relationship between the conditional probabilities of two events
1.2 Variance
When you start working with a new dataset, I suggest you explore the variables you are planning to use
one at a time, and a good way to start is by doing histogram
1.5 Outliers
2. Data cleaning, when you import data like this, you often have to check for
errors, deal with special values, convert data into different formats, and perform
calculations.
3. Validation: One way to validate data is to compute basic statistics and compare
them with published results.
65
4. Interpretation: To work with data effectively, you have to think on two levels at
the same time: the level of statistics and the level of context.
Summary statistics are concise, but dangerous, because they obscure the data. An alternative is to look
at the distribution of the data, which describes how often each value appears. The most common
representation of a distribution is a histogram, which is a graph that shows the frequency or probability
of each value.
Mode: The most common value in a distribution is called the mode. In Figure 2.1 there is a clear mode at
39 weeks. In this case, the mode is the summary statistic that does the best job of describing the typical
value.
66
Shape: Around the mode, the distribution is asymmetric; it drops off quickly to the right and more slowly
to the left. From a medical point of view, this makes sense. Babies are often born early, but seldom later
than 42 weeks. Also, the right side of the distribution is truncated because doctors often intervene after
42 weeks.
Outliers: Values far from the mode are called outliers. Some of these are just unusual cases, like babies
born at 30 weeks. But many of them are probably due to errors, either in the reporting or recording of
data.
A variable is a value that may change or differ between individuals in an experiment. The moon's
circumference will always have the same value, so it is called a constant.
Frequency is the count of occurrances for each value and relative freq is freq in proportion. With
frequency tables, we have exact counts, so we can always create the histogram. But not the opposite
way around.
The range doesn’t accurate represent the variability of data too good because the outliers impact the
increase variability.
A good way to get rid of the outliers is by cutting of the tails(25% lower and 25% upper)
SD is squre root of average squared deviation. The number of sd away from the mean is a way to look
for unpopular value.
If I know the distribution I can think critically about the mean,median,mode to describe the data set. For
this is good to read with proportion because it is easy to read and give you an idea. A smller bin size
allow us to get more info
When there are different measure units is good to standardize the distribution because it use 0 as
reference point(the midle) and sd = 1. And you transfor a distribution in a standard normal variate.
Continuous distribution allow us to calculate any proportion between two values on x axis.
Sampling distribution is the form of distribution of a data set: Uniform, Bimodal, Normal, Skewed
Kurtosis is the tail and if the value is negative means that there is a have light-tails(little data in the tails)
and if the value is positive means that there is a heavy tails(more data in the tails)
Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging
between 0 and 1. It is also known as Min-Max scaling.
Standardization is another scaling technique where the values are centered around the mean with a unit
standard deviation. This means that the mean of the attribute becomes zero and the resultant
distribution has a unit standard deviation.
67