0% found this document useful (0 votes)
13 views

harvard python for research

The document provides an overview of Python programming concepts, including the Fibonacci function, mutable vs immutable objects, and the use of modules. It explains various data types such as lists, tuples, sets, and dictionaries, along with their characteristics and operations. Additionally, it covers basic programming structures like loops and statements, emphasizing the importance of object identity, value, and type in Python.

Uploaded by

mavagoncalves
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

harvard python for research

The document provides an overview of Python programming concepts, including the Fibonacci function, mutable vs immutable objects, and the use of modules. It explains various data types such as lists, tuples, sets, and dictionaries, along with their characteristics and operations. Additionally, it covers basic programming structures like loops and statements, emphasizing the importance of object identity, value, and type in Python.

Uploaded by

mavagoncalves
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Video 1.0.

The Fibonacci function, as you may have guessed, computes the first
terms of the Fibonacci sequence.

The code underneath the function calls to function 10,000 times, asking
Python to compute the first 10,000 numbers in the Fibonacci sequence.

Then, finally, it adds those numbers up.


For mutables, = makes the right object refer to the left one. In contrast,
the Python shortcut with indexing ":" makes a new copy of a mutable with
its containing elements. Therefore, a and b are different objects, but each
with the same elements.

x = "Hello, world!"
y = x[5:]

What is the value of y?


', world!' correcto
incorrecto

Explanation
This indexing returns all characters in the position 5 or later.

Video 1.1.1 Python basics

The interactive mode is meant for experimenting your code one line or
one expression at a time.

In contrast, the standard mode is ideal for running your programs from
start to finish.
Video 1.1.2

The value of some objects can change in the course of program execution.

Objects whose value can change are said to be mutable objects, whereas
objects whose value is unchangeable after they've been created are
called immutable.

The bulk of the Python library consists of modules.


In order for you to be able to make use of modules in your own code, you
first need to import those modules using the import statement.

These characteristics are called object type, object value, and object
identity.

Object value is the data value that is contained by the object. This could
be a specific number, for example.

Finally, you can think of object identity as an identity number for the
object. Each distinct object in the computer's memory will have its own
identity number.

Most Python objects have either data or functions or both associated with
them.

These are known as attributes. The name of the attribute follows the
name of the object.

The two types of attributes are called either data attributes or methods.

A data attribute is a value that is attached to a specific object.

In contrast, a method is a function that is attached to an object. In other


words, depending on the type of the object.

Different methods may be available to you as a programmer. For


example, you could have two strings. They may have different values
stored in them, but they nevertheless support the same set of methods.

Syntax : mean([data-set])
# list of positive integer numbers
data1 = [1, 3, 4, 5, 7, 9, 2]

x = statistics.mean(data1)

# Printing the mean


print("Mean is :", x)

x.mean() ---- mean function

x.shape -- data attribute


Video 1.1.3

Python modules are libraries of code and youcan import Python


modules using the import statements.

What is a namespace?

Well namespace is a container of names shared by objects that typically


go together. And its intention is to prevent naming conflicts.

What exactly happens when you run the Python import statement?

Three things happen. The first thing that happens is Python creates a new
namespace for all the objects which are defined in the new module.

So, in abstract sense, this is our new namespace. That's the first step. The
second step that Python does is it executes the code of the module and it
runs it within this newly created namespace.

The third thing that happens is Python creates a name-- let's say np for
numpy-- and this name references this new namespace object.

You can do this in two different ways. We can use to dir, dir function, to
get a directory of the methods. I can use the object type.

We're then going to import the numpy module as np. Now, the math
module has a square root method, sqrt, but numpy also has a square root
method, sqrt. What is the difference between these two functions? Well,
let's try an example. If I type math.sqrt, I can ask Python to calculate the
value of the square root of 2. I can do the same exact thing using the
square root function from the numpy module. So far, it appears that these
two functions are identical, but actually these two functions are quite
separate and they exist in different namespaces. It turns out that the
numpy square root function can do things that the math square root
function doesn't know how to do.

1.1.4 Numbers and basic calculations

And Python, in fact, provides three different numeric types.


These are called integers, floating point numbers, and complex numbers.
Python integers have unlimited precision.
That means your integer will never be too long to fit into Python's integer
type.
We can also raise a number to a power.** loor division, or integer division.
This is accomplished by using two slash signs.

It then rounds that number to the closest integer, which is less than the
actual floating point answer. If I hit underscore, Python is returning the
value of the latest operation.

math.factorial.
import math
def fact(n):
return(math.factorial(4))

num = int(input("Enter the number:"))


f = fact(num)
print("Factorial of", num, "is", f)

1.1.5 Random Choice

I can’t figure it out, check later

Video 1.1.6

Expression is a combination of objects and operators that computes a


value.
Many expressions involve what is known as the boolean data type. Objects
of the boolean type have only two values. These are called True and False.
There are only three boolean operations, which are "or", "and", and "not".

There are a total of eight different comparison operations in Python.


Although these are commonly used for numeric types,

==  identical in content
=!  They are the same object
2  integer
2.0  floating point
1.2.1 Sequences

A sequence is a collection of objects ordered by their position.


In Python, there are three basic sequences, which are lists, tuples, and so-
called "range objects".

S[0:2] 0: start location 2:stop location

Python is going to return a slice to which consists of the objects in locations


0 and 1, but it will not return to you the object at location 2.

1.2.2 Lists

Lists are mutable sequences of objects of any type. And they're typically
used to store homogeneous items. If we compare a string and a list, one
difference is that strings are sequences of individual characters, whereas
lists are sequences of any type of Python objects.

If we compare a string and a list, one difference is that strings are


sequences of individual characters, whereas lists are sequences of any type
of Python objects.

In Python, indexes start at zero.


Number[-1] last element of my list

Numbers.append(10)

To show the content of the list : numbers

Another operation we commonly would like to do is to concatenate two or


more lists together
Numbers + x

List + list

Reverse the content of the list  numbers.reverse ()

Sort the content  names.sort()

Sorted: we're actually asking Python to construct a completely new list.


It will construct this new list using the objects in the previous list in such a
way that the objects in the new list will appear in a sorted order

sorted_names = sorted()
Finally, if you wanted to find out how many objects our list contains, we can
use a generic sequence function, len. So we can type len(names), and
Python tells us that our list contains four objects.

Q:Consider a list x=[1,2,3]. Enter the code below for how you
would use the append method to add the number 4 to the end
of list x.
A: x.append(4)

1.2.3 Tuples

Tuples are immutable sequences typically used to store


heterogeneous data.

The best way to view tuples is as a single object that consists of


several different parts.

Because tuples are sequences, the way you access different objects
within a tuple is by their position.
T = (1,3,5,7)
>>> len(T)
4
>>> T + (9,11)
(1, 3, 5, 7, 9, 11)

1. how to pack tuples

x = 35
>>> y = 78
>>> coordinate=(x,y)
>>> type(coordinate)
<class 'tuple'>

2. how you unpack a tuple.

>>> coordinate
(35, 78)
>>> (x,y) = coordinate
>>> x
35
But what if you just have one object within your tuple? To construct a
tuple with just one object, we have to use the following syntax.

We start by saying c is equal to. We put our tuple parentheses. We put


it in our number 2. And we add the comma.

>>> c=(2,3)
>>> type(c)
<class 'tuple'>
>>> c=(2,)
>>> type(c)
<class 'tuple'>

1.2.4 Ranges

Ranges are immutable sequences of integers,


and they are commonly used in for loops.

>>> range(5)
range(0, 5)
>>> list(range(5))
[0, 1, 2, 3, 4]

Ranges require less memory so don’t turn them into list before using them
1.2.5 Strings

Strings are immutable sequences of characters.

In Python, you can enclose strings in either single quote

\\OPERATIONS WITH STRINGS

>>> s = "Python"
>>> len(s) len()function
6
>>> s[0]
'P'
>>> s[-1]
'n'

Slicinng

>>> s[0:5]
'Pytho'

>>> s[-3]
'h'
>>> s[-3:]
'hon'

>>> "y" in s
True. membership

polymorphism means that what an operator does depends on the type


of objects it is being applied to.

I can also add two strings together. In that case, the operation is not
called addition, but concatenation.

1. >>> "hello" + "world"


2. 'helloworld
3.
4. >>> s = "pytgon"
5. >>> 3* s
6. 'pytgonpytgonpytgon'
Dir(str) : Python gives me a long list of different attributes that are
available for strings.
Str.replace? : Python will give a short definition of the method

Because strings are immutable objects, Python doesn't actually modify


your string. Instead what it does -- it returns a new string to you.

The split method takes a string and breaks that down into substrings .

1.2.6 sets

Sets are unordered collections of distinct hashable objects. But what


does it mean for an object to be hashable?

In practice, what that means is you can use sets for immutable objects
like numbers and strings, but not for mutable objects like lists and
dictionaries.

One type of set is called just "a set". And the other type of set is called
"a frozen set". The difference between these two is that a frozen set is
not mutable once it has been created. In other words, it's immutable. In
contrast, your usual, normal set is mutable.

One of the key ideas about sets is that they cannot be indexed. So the
objects inside sets don't have locations.

Another key feature about sets is that the elements can never be
duplicated

>>> ids
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> males = set([1,3,5,7,8])
>>> females= ids- males
>>> type(females)
<class 'set'>
>>> females
{0, 2, 4, 6, 9}
>>> males
{1, 3, 5, 7, 8}
>>> everyone = males | females
>>> everyone
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> everyone & set([1,2,3])
{1, 2, 3}
>>> word="antidisestablishmentarianism"

>>> letter = set(word)


>>> len(letter)
12

>>> x.symmetric_difference(y)
{1, 4}

1.2.7 Dictionaries

Dictionaries are mappings from key objects to value objects.

Dictionaries consists of Key:Value pairs, where the keys must be


immutable, Dictionaries themselves are mutable so this means once
you create your dictionary, you can modify its contents on the fly.

is that they are not sequences, and therefore do not maintain any type
of left-right order.

Uses
age = {"Tim":24,"Jenna":37,"Jim":3}
age["Jim"]
"age["Tim"]+= 3"

The type of the returned object is what's called a "view object". View
objects do precisely what you might imagine that they do. They
provide a dynamic view of the keys or values in the dictionary.

A key point here is that as you update or modify your dictionary, the
views will also change correspondingly.

1.3.1 Dynamic Typing

What a type does is two things. First, it tells a program, you should be
reading these sequences in chunks of, let's say, 32 bits. The second
thing that it tells computer is, what does this number
here, this sequence of bits, represent?

Does it represent a floating-point number, or a character, or a piece of


music, or something else?

If you move data from one variable to another, if the types of these
variables do not match, you could potentially lose information.

Static typing means that type checking is performed during compile


time, whereas dynamic typing means that type checking is performed
at run time.

A key point to remember here is that variable names always link to


objects, never to other variables.

Remember, mutable objects, like lists and dictionaries, can be modified


at any point of program execution.

In contrast, immutable objects, like numbers and strings, cannot be


altered after they've been created in the program.

Remember, a variable cannot reference another variable. A variable


can only reference an object.

Each object in Python has a type, value, and an identity. Mutable


objects in Python can be identical in content and yet be actually
different objects.

Another way to create a copy of a list is to use the slicing syntax.

M = L[:]

1.3.2 Copies

The copy module, which you can use for creating identical copies of
object. There are two types of copies that are available.

A shallow copy constructs a new compound object and then insert its
references into it to the original object.

In contrast, a deep copy constructs a new compound object and then


recursively inserts copies into it of the original objects.
1.3.3 Statements

Statements are used to compute values, assign values, and modify


attributes, The return statement is used to return values from a
function.

Another example is the import statement, which is used to import


modules.

Finally, the pass statement is used to do nothing in situations where we


need a placeholder for syntactical reasons.

Compound statements contain groups of other statements, and they


affect or control the execution of those other statements in some way.
A compound statement consists of one or more clauses, where a clause
consists of a header and a block or a suite of code.

The close headers of a particular compound statement start with a


keyword, end with a colon, and are all at the same indentation level.

A block or a suite of code of each clause, however, must be indented to


indicate that it forms a group of statements that logically fall under
that header.

Remember, the absolute value tells us how far two numbers are from
one another.

1.3.4 For and While Loops

The For Loop is a sequence iteration that assigns items in sequence to


target one at a time and runs the block of code for each item.

Unless the loop is terminated early with the break statement, the block
of code is run as many times as there are items in the sequence.
However, remember that the key value pairs themselves don't follow
any particular ordering inside the dictionary.

The Python while is used for repeated execution of code as long as a


given expression is true.

For a While Loop you're testing some condition some number of times.
When you enter that loop you don't know how many times exactly
you'll be running through that loop. This is in contrast with For Loops
where when beginning the loop, you know exactly how many times you
would like to run through the block of code.

for bear in bears:


if bears[bear]==
print("Hello, "+bear+" bear!")
else:
print("odd")

1.3.5 List comprehensions

to take an existing list, apply some operation to all of the items on the
list, and then create a new list that contains the results.

In Python, there is an operator for this task known as a "list


comprehension".

>>> numbers=range(10)
>>> squares=[]
>>> for number in numbers:
... square=number**2
... squares.append(square)
...
>>>
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> squares2=[number**2 for number in numbers]
>>> squares2
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

sum(x for x in range(1,10) if x % 2)


1.3.6

I'm going to do a re-assignment to line by typing "line=line.restrip()". I


now have line.rstrip().split().

Inside the split, as an argument, I have to provide the character that I


want to use for splitting the line that the string split method returns not
a string but a list.

It has split every line, wherever there is a whitespace, and it returns a


list. When writing a file, we need a second argument, which tells
Python that we would like to create a file object for writing, not for
reading.

We indicate this by providing that second argument as a string, and


the content of the string is simply "w". What this does is it creates a file
object for writing
F.write()
F.close()

1.3.7 Functions

Functions are devices for grouping statements so that they can be


easily run more than once in a program.

Functions maximize code reuse and minimize code redundancy.


Functions enable dividing larger tasks into smaller chunks, an approach
that is called procedural decomposition.

Functions are written using the def statement. You can send the result
object back to the caller using the return statement
>>> def add(a,b):
... mysum=a+b
... return mysum
...
>>> add(12,15)
27

To modify the value of a global variable from inside a function, you can
use the global statement.
Arguments to Python functions are matched by position.
>>> def add_and_sub(a,b):
... mysum=a+b
... mydiff=a-b
... return(mysum, mydiff)
...
>>> add_and_sub(20,15)
(35, 5)

A function is not executed until the given function is called using the
function name followed by parentheses syntax.

The def statement creates an object and assigns it to a name. This


means that we can later in the code reassign the function object to
another name.

>>> newadd= add

>>> def modify(mylist):


... mylist[0]+=10
...
>>> L=[1,2,4,7]
>>> modify(L)
>>> L
[11, 2, 4, 7]

1.3.8

>>> intersect([1,2,3,4,5],[3,4,5,6,7])
[3, 4, 5]

# Program that creates passwords


def password(length):
pw = str()
characters = "abcdefghijklmnopqrstuvwxyz" + "1234567890"
for i in range(length):
pw = pw + random.choice(characters)
return pw

1.3.9 Common mistake

Whenever you're accessing objects in a sequence, make sure you know


how long that sequence is.
Remember in a dictionary, a given key object is always coupled with its
value object, but the key value pairs themselves can appear in any
order inside the dictionary.

So, the lesson here is, make sure you know the type of the object you
are working with, and you know what are the methods that the object
supports.

Therefore, whenever accessing dictionaries, make sure you know the


type of your key objects.

The fundamental problem here is strings are immutable objects.


Therefore their content cannot be modified.

Therefore, make sure that you always know the type of your objects.

2.1.1 Scope Rules

1. L stands for "local," E stands for "enclosing function," G for "global,"

2. and B stands for "built-in."

3. In other words, local is the current function you're in.

4. Enclosing function is the function that called the current function, if any.

5. Global refers to the module in which the function was defined.

6. And built-in refers to Python's built-in namespace.


Name error

Video 2.1.2: Classes and Object-Oriented Programming

1. Inheritance means that you can define a new object

2. type, a new class, that inherits properties from an existing object

3. type.

List.sort()
Class name(list):

1. So another way to state what I just said is that the class statement doesn't

2. create any instances of the class.


Min(list)
List.remove()
Dir(x) methods availables
List.remove_min

Video 2.2.1: Introduction to NumPy Arrays

1. NumPy arrays are n-dimensional array objects

2. and they are a core component of scientific and numerical


computation

3. in Python.

4. NumPy arrays are an additional data type provided by NumPy,

5. and they are used for representing vectors and matrices.

6. Unlike dynamically growing Python lists, NumPy arrays

7. have a size that is fixed when they are constructed.

8. Elements of NumPy arrays are also all of the same data

9. type leading to more efficient and simpler code

10. than using Python's standard data types.

11. By default, the elements are floating point numbers.


12.>>> import numpy as np
13.
14.zero_vect>>>
15.>>> zero_vector = np.zeros(5)
16.>>> zero_matrix=np.zeros((5,3))
17.>>> zero_vector
18.array([0., 0., 0., 0., 0.])
19.>>> zero_matrix
20.array([[0., 0., 0.],
21. [0., 0., 0.],
22. [0., 0., 0.],
23. [0., 0., 0.],
24. [0., 0., 0.]])

>>> x = np.array([1,2,3])
>>> y = np.array ([2,4,6])
>>> [[1,3],[5,9]]
[[1, 3], [5, 9]]
>>> np.array([[1, 3], [5, 9]])
array([[1, 3],
[5, 9]])
>>> A = np.array([[1, 3], [5, 9]])
>>> A.transpose()
array([[1, 5],
[3, 9]])

2.2.2 Slicing Numpy arrays

1. With one-dimension arrays, we can index a given element

2. by its position, keeping in mind that indices start at 0.

3. With two-dimensional arrays, the first index

4. specifies the row of the array and the second index

5. specifies the column of the array.


2.2.3 Indexing numpy arrays

Ind = [elements]

Name of the array[ind]

1. NumPy arrays can also be indexed using logical indices,


Boolean arrays can also be index – logical arrays

1. When you slice an array using the colon operator, you get a view of the
object.

2. This means that if you modify it, the original array will also be modified.

3. This is in contrast with what happens when you index an array, in which case

4. what is returned to you is a copy of the original data.

5. In summary, for all cases of indexed arrays, what is returned

6. is a copy of the original data, not a view as one gets for slices.

2.2.4 Building and Examing Numpy Arrays


Np.linspace(starting point, ending point, number of points I want
to have in my array)
Np.logspace(log of the starting point, endpoint of the array,
number of elements in our array)

Arrayname.shape
Arrayname.size
Np.random.random(10)
Np.any(x < 0.9)
Np.all ()

x%i == 0 tests if x has a remainder when divided by i . If this is not


true for all values strictly between 1 and x , it must be prime!

2.3.1 introduction to matplotlib and Pyplot

1. Matplotlib is a Python plotting library that

2. produces publication-quality figures.

3. It can be used both in Python scripts and when

4. using Python's interactive mode.

5. Pyplot is a collection of functions that make matplotlib work like Matlab,

6. which you may be familiar with.

7. Pyplot is especially useful for interactive work,

8. for example, when you'd like to explore a dataset

9. or visually examine your simulation results.

import matplotlib.pyplot as plt

Plt.plot([list])

Plt.show in the terminal

1. In short, a keyword argument is an argument

2. which is supplied to the function by explicitly naming each parameter

3. and specifying its value.

Plt.plot(x,y, “bo- ”, linewidth=2, markersize=12)


The first letter is the color so blue – b green – g
The second letter is the shape o – circle , s- square

red_patch = mpatches.Patch(color='red', label='The red data')


ax.legend(handles=[red_patch])

2.3.2 customizing your plots

You can use latex functions


Loc=location of the level

1. The working directory is the directory where you have launched your Python.
2.3.3 Plotting using Logarithmic axes

1. In some plots, it's helpful to have one or both axes be logarithmic.

2. This means that for any given point to be plotted, its x or y-coordinate,

3. or both, are transformed using the log function.

Semilogx()
Semilogy()
Loglog()

1. semilogx() plots the x-axes on a log scale and the y in the original scale;

2. semilogy() plots the y-axes on the log scale and the x in the original scale;

3. loglog() plots both x and y on logarithmic scales.

4. So the lesson here is that functions of the form y is equal to x to power alpha
5. show up as straight lines on a loglog() plot.

6. The exponent alpha is given by the slope of the line.


2.3.4 generating histagrams

Np.random.normal(size = 1000)
Plt.hist(x, normed= true, bins= np.linspace (-5, 5 , 21));

To have 20 you need 20 + 1


Subplot(numbers of the rows, number of columns, plot number)
The first integer describes the number of subplot rows, the
second integer describes the number of subplot columns, and the
third integer describes the location index of the subplot to be
created, where indices are laid out along the rows and columns in
the same order as reading Latin characters on a page.

Np.random.gamma(2, 3, 100000)
Comulative = true, histtype = “step”
Plt.figure()

import matplotlib.pyplot as plt # importing matplotlib


import numpy as np # importing numpy
%matplotlib inline # see plot in Jupyter notebook
x=np.arange(0,10,0.5) # define x
y1=2*x+3 # define y1
y2=3*x # define y2
plt.figure(1,figsize=(12,12)) # create a figure object
plt.subplot(331) # divide the figure 3*3, 1st item
plt.plot(x,y1,'r-') # Functional plot
plt.title('Line graph') # define title of the plot
plt.subplot(332) # divide the figure 3*3, 2nd item
plt.plot([1,3,2,7],[10,20,30,40]) # Functional plot
plt.title('Two-dimensional data') # define title of the plot
plt.subplot(333) # divide the figure 3*3, 3rd item
plt.plot([10,20,30,40],[2,6,8,12], label='price') # set label name
plt.axis([15,35,3,10]) # axis scale - x:15-35, y:3-10
plt.legend() # show legends on plot
plt.title('Axis scaled graph') # define title of the plot
plt.subplot(334) # divide the figure 3*3, 4th item
plt.plot(x,y1,'r-',x,y2,'b--') # y1 - red line, y2 - blue line
plt.title('Multi line plot') # define title of the plot
plt.subplot(335) # divide the figure 3*3, 5th item
x=np.random.normal(5,3,1000) # normal distribution - mean: 5,
variance: 3, number of data: 1000
y=np.random.normal(5,3,1000) # normal distribution - mean: 5,
variance: 3, number of data: 1000
plt.scatter(x,y,c='k') # Functional plot
plt.title('Scatter plot')
plt.subplot(336) # divide the figure 3*3, 6th item
player=('Ronaldo','Messi','Son')
goal=[51,48,25]
plt.bar(player,goal,align='center',color='red',width=0.5)
plt.title('Bar plot') # define title of the plot
plt.show() # show plot

import numpy as np # importing numpy


import matplotlib.pyplot as plt # importing matplotlib
from mpl_toolkits.mplot3d import Axes3D # importing Axes3D, 3d plot
fig = plt.figure() # create a figure object
ax = Axes3D(fig) # defining 3d figure ax
X = np.arange(-4, 4, 0.25) # defining X
Y = np.arange(-4, 4, 0.25) # defining Y
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X ** 2 + Y ** 2)
Z = np.sin(R)
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=plt.cm.hot)#plot
ax.contourf(X, Y, Z, zdir='z', offset=-2, cmap=plt.cm.hot)
ax.set_zlim(-2, 2) # set z-axes limit
plt.show()

import matplotlib.pyplot as plt # importing matplotlib


%matplotlib inline # see plot in Jupyter notebook
x = [0, 2, 4, 6] # define x
y = [1, 3, 4, 8] # define y
plt.plot(x,y) # functional plot
plt.xlabel('x values') # define x label
plt.ylabel('y values') # define y label
plt.title('plotted x and y values') # define title
plt.legend(['line 1']) # define legend# save the figure
plt.savefig('plot.png', dpi=300, bbox_inches='tight')
fig.savefig('E:\Bioinformatics/foo.png')plt.show()

import matplotlib.pyplot as plt # importing matplotlib


%matplotlib inline # see plot in Jupyter notebook
x = np.array([0, 2, 4, 6]) # define x
fig = plt.figure() # create a figure object
ax = fig.add_axes([0,0,1,1]) # axes with (L,B,W,H) value
ax.plot(x, x**2, label = 'X2', color='red') # functional plot
ax.plot(x, x**3, label = 'X3', color='black') # functional plot
ax.set_xlim([0, 5]) # set x-axes limit
ax.set_ylim([0,100]) # set y-axes limit
ax.legend() # show legend
plt.show() # show plot
2.4.1 simulating randomness

1. we often use randomness when modeling complicated systems

2. to abstract away those aspects of the phenomenon for which we do not

3. have useful simple models.


Import random
Random.chice([list])
Random.choice(range(1,7))

Choosing a die
Random.choice(random.choice([range(1,7), ranger(1,9),
range(1,11)]))

2.4.2 examples involving randomness

1. 1. Our first example is to roll the die 100 times

2. and plot a histogram of the outcomes, meaning

3. a histogram that shows how frequent the numbers from 1 to 6

4. appeared in the 100 samples.


Import random

Rolls =[]
For k in range(100):
Rolls.append(Random.choice([1,2,3,4,5,6]))

Plt.hist(rolls, bins=np.linspace(0.5, 6.5,7)

Ys =[ ]
For rep in range(100):
Y=0
For k in range(10):
X = Random.choice([1,2,3,4,5,6]))
Y=y+x
Ys.append(y)
The central limit theorem states that if you have a population with mean μ and
standard deviation σ and take sufficiently large random samples from the
population with replacement , then the distribution of the sample means will be
approximately normally distributed.

2.4.3 numpy random module

Import numpy as np
Np.random.random(size of the 1d array)
Np.random.random((number of rows, number of culumns)) – as a
tuple

1. Np.random.normal(The first argument is the mean of the distribution, in


this case 0.

2. And the second argument is the standard deviation, which is equal to 1.

3. from the same distribution, we can specify the length of the 1d array

4. as the third argument.

1. Finally, we can use the same function to generate 2d, or even

2. 3d arrays of random numbers.

3. In that case, we need to insert another pair of parentheses

4. because the dimensions of the array will be added as a tuple.

1. The only problem is that we don't know how to generate

2. an area of random integers in NumPy.

Np.random.randint(low, high =, size=)

X = np.random.randint (1,7(100,10))
X.shape()

Np.sum(X)
Np.sum(X , axis =1)
X = np.random.randint (1,7(100,10))

Y = np.sum(X, axis=1)
Plt.hist(Y);

1. And we can see that the histogram looks smoother.


As we increase

2.4.4 measuring time

Import time
Start_time= time.clock()
End_time = time.clock()
Print(End_time – start_time)

Time / time
How many times faster the second one is

2.4.5 random walks

X (t=k)= xo + delta x (t = 1) + ...... delta x(t= k)

delta_X = Np.random.normal(0,1(2,5))

plt.plot(delta_X[0], delta_X[1], “go”)

#cumulative sum
X = Np.cumsum(delta_x , axis = 1)

Np.random.normal(0,1(2,5))
X = Np.concatenate((X_0, np.cumsum(delta_X , axis = 1)), axis = 1)

plt.plot(delta_X[0], delta_X[1], “ro-”)


plt.safefig(“name.pdf”)
X_0 = np.array(([0],[0]))

X = Np.concatenate((X_0, np.cumsum(delta_X , axis = 1)), axis = 1)

2468
2 6 12 20
Topics:
 Array
 Linked List
 Stack
 Queue
 Binary Tree
 Binary Search Tree
 Heap
 Hashing
 Graph
 Matrix
 Misc
 Advanced Data Structure

3.1.1 DNA translation

Adenine
Cytosine
Guanine
Thymine

1. The so called central dogma of molecular biology

2. describes the flow of genetic information in a biological system.

3. Instructions in the DNA are first transcribed into RNA

4. and the RNA is then translated into proteins.

3.1.2 ncbi
3.1.3 import dna data into python

Inputfile = “dna.txt”

F = open(inputfile, “r”)

Seq = f.read()
Seq
Print(seq)

To remove /n

Seq = Seq.replace(“/n”,” “)

Remove visible or non visible extra charater


Seq = Seq.replace(“/r”,” “)
3.1.4 translating the dna sequence

table = {
'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W',
}

If len(seq) % 3 ==0:
For I in range(0, len(seq), 3):
Codon = seq[I : I+3]
Protein += table[codon]
Return protein
Slicing a string
Seq[0:3]

3.1.5 comparing your translation


How to use a with statement to read a full file

Def read_seq(inputfile):

“””reads and returns the input sequence with special characters removed”””
With open(inputfile, “r”) as f:

Seq = f.read()
Seq = Seq.replace(“/n”,” “)
Seq = Seq.replace(“/r”,” “)
Return seq

Dna = read_seq(“filename”)

3.2.1 language processing


Project Gutenberg
3.2.2 counting words

Before we start coding the function itself


it's helpful to create a test string.
I'm going to call that text, and I'll just
copy paste a short string that I wrote previously.
The purpose behind having a text string like this
is to be able to test our function as we make progress with it.
Since this function will keep track of all unique words
and count their frequencies, I'm going to call this function count_words.
It's a function so we'll need the def statement
and the input argument is going to be just simply text in this case.
We agreed that using a dictionary would be
a good solution for this specific task.
I'm going to create an empty dictionary called word_counts.
The next step for us is to break the text down into words.
To accomplish that we'll be using the split method and the character
we want to use for splitting is just an empty space.
This will give us a list that we can loop over so this calls for a for loop.
Because the items in the sequence or list are words,
I'm going to be using word as my lop variable.
So for word in text.split.
And now we're ready to loop.
There are two possible things that can happen as we loop over our text.
We can either come across a word that we've seen before, in which case,
we have to increase the counter associated with that word by one.
In case we see a word we haven't seen before,
we have to establish that entry in the dictionary
and initialize the counter to be equal to 1.
So let's divide this into two subtasks.
We have the case where we have a non-word
and the second case is where we have an unknown word.
Let's deal with the non-word case first.
What we'd like to test for is, whether this word
has appeared in this dictionary before.
This calls for an if statement.
If word in word_counts.
So now we know that we have seen this word before.
What we need to do is we need to access our dictionary word_counts
and we want to increase the counter that's
associated with this specific word.
We want to increase that by 1.
And I'm using the shorthand operation here.
This is the non-word case.
The other situation is where we come across a previously unseen word.
We can use the else statement here.
In this case we still like to access the word_counts dictionary.
But in this case we have to set that counter to be equal to 1,
because this is the first instance of the word that we're seeing.
That deals with the second case, the unknown word.
At this point, we are ready to return the dictionary
to whoever called the function.
We need one more statement in our code, in our function,
which is the return statement.
So we need to return word_counts.
Before we move on, let's make sure to add a docstring in our function.
Now that we have defined the function, let's run it.
And the function has now been defined.
We also want to make sure we have our test text defined
and now we can try running the function and see what happens.
And as expected Python returns a dictionary to us
where the keys are words, unique words,
and the values associated with these keys
are the number of times each word occurs in the text.
Having some test data handy is very useful.
Looking at the dictionary, one obvious shortcoming of our current routine
is that it includes punctuation like periods, or full stops,
as part of the word.
This would lead to an inflation of the word count
because, for example, a word that appears in the middle of the sentence
will be counted separately from the same word
if it appears at the end of a sentence and is immediately
followed by a period.
Another problem is that if the word appears at the beginning of a sentence,
its first letter is capitalized, again giving rise
to double counting of some words.
To address these issues, we're first going to turn the text into lower case.
This means that any word, whether capitalized or not,
will count as one word.
Addressing punctuation is a bit more complex.
Our strategy is to first specify all the punctuation marks
that we'd like to skip, and then loop over that container
and replace every occurrence of a punctuation mark with an empty string.
As the first task we need to turn the text into a lower case.
We can do that using the lower method and then
we just have to recapture that new text.
So we're typing text=text.lower.
The second thing we need to do is, we need
to define the characters that we will be skipping
as we're looping over the text.
We'll construct a list for this purpose, and we
can include a few of the most common punctuation marks
here that we'd like to skip.
For example we can include period, comma, semi-colon,
colon, have single quote, and we can also include double quote.
In this case we have to use single quotes for Python's own string.
The reason we cannot use double quotes for the last string is because double
quotes are also used to begin and end a string.
This is why we'll be using single quotes, which
surround the character that we really want to represent,
which is a double quote.
The next step for us, is loop over all of the skip characters
and replace them with an empty string.
This calls for a for loop.
We'll be taking our text and we will replace ch, the skip character
in question, with an empty string.
We then also want to capture the modified string that the replace method
returns, and this part is done.
Finally, to complete this modification to our function,
we want to make sure to update the docstring
to reflect the change we just made.
We'll just say skip punctuation.
Let's then run the definition of our function.
And now we can try running the function using our test
string that we had defined before.
In this case, looking at the output, it's a dictionary before,
but you'll notice said all of the keys are lowercase, which is what we wanted.
We also go to def the punctuation marks that we included in the skips list.
It's useful to be able to write your own counting routine like we just did.
However, counting the frequency of objects
is such a common operation that Python provides
what is known as a counter tool to support rabbit tallies.
We first need to import it from the collections module, which
provides many additional high performance data types.
The object returned by counter behaves much like a dictionary,
although strictly speaking it's a subclass of the Python dictionary
object.
Let's modify our function to use the counter object.
In this case, I would like to retain both my original function and the one
that uses to counter object.
Our first step is going to be to import that,
so from collections import counter.
To start the function I'm going to take my previous function
and I'm just going to copy paste it here underneath.
This is the code that I'll be working with.
Because this is a different function because it's
using the counter object from collections,
I'm to call this something else.
I'm going to add the word fast at the end.
The counting takes place in the last few lines of the code.
We don't change the first part where we simply convert the text to lowercase,
and we also want to keep the part that skips over punctuation characters.
The only thing that will be changed is the looping
over individual words in our text string.
The last several lines of the code can all simply
be replaced with a single expression.
We will define word_counts on this line, which is the first time we're using it.
The input to our counter object will be the text
that we would like to use for counting.
We'll take our text, we'll split it to get the words, and we're done.
Before we run the function let's first do the import.
We can now run the definition of the function
and then we can test it on our test dataset.
In this case, again as expected, the function
returns a counter object which looks essentially identical to the dictionary
object.
Let's see if the objects returned by these two different functions
are actually the same.
We'll first call the count_words function using our text.
And we want to ask Python if that's equal to the object which is returned
by count_words_fast on that same input.
In this case, the answer is true, therefore
we know that these two different implementations of the same function
return identical objects.

3.2.4 computing word frequency statistics


Def word_stats(word_counts):
Num_unique = Len(word_counts)
Counts =word_counts.values()
Return (num_unique, counts)
3.2.5 reading multiple files
Read_book
3.2.6 plotting book statistics

Pandas
>>> pd.Series([1,2,3],index = ["q","w","e"])
q 1
w 2
e 3
dtype: int64
>>> x= pd.Series([1,2,3],index = ["q","w","e"])
>>> x["w"]
2
>>> age = {"Tim":29, "Jim":31, "Pam":27, "Sam":35}
>>> x = pd.Series(age)
>>> x
Tim 29
Jim 31
Pam 27
Sam 35
dtype: int64
>>> #dataframe
>>> data = {"name" : ["Tim", "Jim", "Pam", "Sam"],
... "age" : [29, 31, 27,35],
... "ZIP" : ["02115","02130","67700","00100"]}
>>> x = pd.DataFrame(data, columns = ["name","age","ZIP"])
>>> x
name age ZIP
0 Tim 29 02115
1 Jim 31 02130
2 Pam 27 67700
3 Sam 35 00100
>>> x.name
0 Tim
1 Jim
2 Pam
3 Sam
Name: name, dtype: object
>>> x= pd.Series([1,2,3,4],index = ["q","w","e","r"])
>>> x
q 1
w 2
e 3
r 4
dtype: int64
>>> x.index
Index(['q', 'w', 'e', 'r'], dtype='object')
>>> sorted(x.index)
['e', 'q', 'r', 'w']
>>> x.reindex(sorted(x.index))
e 3
q 1
r 4
w 2
dtype: int64
>>> x= pd.Series([1,2,3,4],index = ["q","w","e","r"])
>>> y= pd.Series([5,6,7,8],index = ["q","w","t","w"])
>>> x + y
e NaN
q 6.0
r NaN
t NaN
w 8.0
w 10.0
dtype: float64
>>>

You might also like