harvard python for research
harvard python for research
The Fibonacci function, as you may have guessed, computes the first
terms of the Fibonacci sequence.
The code underneath the function calls to function 10,000 times, asking
Python to compute the first 10,000 numbers in the Fibonacci sequence.
x = "Hello, world!"
y = x[5:]
Explanation
This indexing returns all characters in the position 5 or later.
The interactive mode is meant for experimenting your code one line or
one expression at a time.
In contrast, the standard mode is ideal for running your programs from
start to finish.
Video 1.1.2
The value of some objects can change in the course of program execution.
Objects whose value can change are said to be mutable objects, whereas
objects whose value is unchangeable after they've been created are
called immutable.
These characteristics are called object type, object value, and object
identity.
Object value is the data value that is contained by the object. This could
be a specific number, for example.
Finally, you can think of object identity as an identity number for the
object. Each distinct object in the computer's memory will have its own
identity number.
Most Python objects have either data or functions or both associated with
them.
These are known as attributes. The name of the attribute follows the
name of the object.
The two types of attributes are called either data attributes or methods.
Syntax : mean([data-set])
# list of positive integer numbers
data1 = [1, 3, 4, 5, 7, 9, 2]
x = statistics.mean(data1)
What is a namespace?
What exactly happens when you run the Python import statement?
Three things happen. The first thing that happens is Python creates a new
namespace for all the objects which are defined in the new module.
So, in abstract sense, this is our new namespace. That's the first step. The
second step that Python does is it executes the code of the module and it
runs it within this newly created namespace.
The third thing that happens is Python creates a name-- let's say np for
numpy-- and this name references this new namespace object.
You can do this in two different ways. We can use to dir, dir function, to
get a directory of the methods. I can use the object type.
We're then going to import the numpy module as np. Now, the math
module has a square root method, sqrt, but numpy also has a square root
method, sqrt. What is the difference between these two functions? Well,
let's try an example. If I type math.sqrt, I can ask Python to calculate the
value of the square root of 2. I can do the same exact thing using the
square root function from the numpy module. So far, it appears that these
two functions are identical, but actually these two functions are quite
separate and they exist in different namespaces. It turns out that the
numpy square root function can do things that the math square root
function doesn't know how to do.
It then rounds that number to the closest integer, which is less than the
actual floating point answer. If I hit underscore, Python is returning the
value of the latest operation.
math.factorial.
import math
def fact(n):
return(math.factorial(4))
Video 1.1.6
== identical in content
=! They are the same object
2 integer
2.0 floating point
1.2.1 Sequences
1.2.2 Lists
Lists are mutable sequences of objects of any type. And they're typically
used to store homogeneous items. If we compare a string and a list, one
difference is that strings are sequences of individual characters, whereas
lists are sequences of any type of Python objects.
Numbers.append(10)
List + list
sorted_names = sorted()
Finally, if you wanted to find out how many objects our list contains, we can
use a generic sequence function, len. So we can type len(names), and
Python tells us that our list contains four objects.
Q:Consider a list x=[1,2,3]. Enter the code below for how you
would use the append method to add the number 4 to the end
of list x.
A: x.append(4)
1.2.3 Tuples
Because tuples are sequences, the way you access different objects
within a tuple is by their position.
T = (1,3,5,7)
>>> len(T)
4
>>> T + (9,11)
(1, 3, 5, 7, 9, 11)
x = 35
>>> y = 78
>>> coordinate=(x,y)
>>> type(coordinate)
<class 'tuple'>
>>> coordinate
(35, 78)
>>> (x,y) = coordinate
>>> x
35
But what if you just have one object within your tuple? To construct a
tuple with just one object, we have to use the following syntax.
>>> c=(2,3)
>>> type(c)
<class 'tuple'>
>>> c=(2,)
>>> type(c)
<class 'tuple'>
1.2.4 Ranges
>>> range(5)
range(0, 5)
>>> list(range(5))
[0, 1, 2, 3, 4]
Ranges require less memory so don’t turn them into list before using them
1.2.5 Strings
>>> s = "Python"
>>> len(s) len()function
6
>>> s[0]
'P'
>>> s[-1]
'n'
Slicinng
>>> s[0:5]
'Pytho'
>>> s[-3]
'h'
>>> s[-3:]
'hon'
>>> "y" in s
True. membership
I can also add two strings together. In that case, the operation is not
called addition, but concatenation.
The split method takes a string and breaks that down into substrings .
1.2.6 sets
In practice, what that means is you can use sets for immutable objects
like numbers and strings, but not for mutable objects like lists and
dictionaries.
One type of set is called just "a set". And the other type of set is called
"a frozen set". The difference between these two is that a frozen set is
not mutable once it has been created. In other words, it's immutable. In
contrast, your usual, normal set is mutable.
One of the key ideas about sets is that they cannot be indexed. So the
objects inside sets don't have locations.
Another key feature about sets is that the elements can never be
duplicated
>>> ids
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> males = set([1,3,5,7,8])
>>> females= ids- males
>>> type(females)
<class 'set'>
>>> females
{0, 2, 4, 6, 9}
>>> males
{1, 3, 5, 7, 8}
>>> everyone = males | females
>>> everyone
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> everyone & set([1,2,3])
{1, 2, 3}
>>> word="antidisestablishmentarianism"
>>> x.symmetric_difference(y)
{1, 4}
1.2.7 Dictionaries
is that they are not sequences, and therefore do not maintain any type
of left-right order.
Uses
age = {"Tim":24,"Jenna":37,"Jim":3}
age["Jim"]
"age["Tim"]+= 3"
The type of the returned object is what's called a "view object". View
objects do precisely what you might imagine that they do. They
provide a dynamic view of the keys or values in the dictionary.
A key point here is that as you update or modify your dictionary, the
views will also change correspondingly.
What a type does is two things. First, it tells a program, you should be
reading these sequences in chunks of, let's say, 32 bits. The second
thing that it tells computer is, what does this number
here, this sequence of bits, represent?
If you move data from one variable to another, if the types of these
variables do not match, you could potentially lose information.
M = L[:]
1.3.2 Copies
The copy module, which you can use for creating identical copies of
object. There are two types of copies that are available.
A shallow copy constructs a new compound object and then insert its
references into it to the original object.
Remember, the absolute value tells us how far two numbers are from
one another.
Unless the loop is terminated early with the break statement, the block
of code is run as many times as there are items in the sequence.
However, remember that the key value pairs themselves don't follow
any particular ordering inside the dictionary.
For a While Loop you're testing some condition some number of times.
When you enter that loop you don't know how many times exactly
you'll be running through that loop. This is in contrast with For Loops
where when beginning the loop, you know exactly how many times you
would like to run through the block of code.
to take an existing list, apply some operation to all of the items on the
list, and then create a new list that contains the results.
>>> numbers=range(10)
>>> squares=[]
>>> for number in numbers:
... square=number**2
... squares.append(square)
...
>>>
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> squares2=[number**2 for number in numbers]
>>> squares2
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
1.3.7 Functions
Functions are written using the def statement. You can send the result
object back to the caller using the return statement
>>> def add(a,b):
... mysum=a+b
... return mysum
...
>>> add(12,15)
27
To modify the value of a global variable from inside a function, you can
use the global statement.
Arguments to Python functions are matched by position.
>>> def add_and_sub(a,b):
... mysum=a+b
... mydiff=a-b
... return(mysum, mydiff)
...
>>> add_and_sub(20,15)
(35, 5)
A function is not executed until the given function is called using the
function name followed by parentheses syntax.
1.3.8
>>> intersect([1,2,3,4,5],[3,4,5,6,7])
[3, 4, 5]
So, the lesson here is, make sure you know the type of the object you
are working with, and you know what are the methods that the object
supports.
Therefore, make sure that you always know the type of your objects.
4. Enclosing function is the function that called the current function, if any.
3. type.
List.sort()
Class name(list):
1. So another way to state what I just said is that the class statement doesn't
3. in Python.
>>> x = np.array([1,2,3])
>>> y = np.array ([2,4,6])
>>> [[1,3],[5,9]]
[[1, 3], [5, 9]]
>>> np.array([[1, 3], [5, 9]])
array([[1, 3],
[5, 9]])
>>> A = np.array([[1, 3], [5, 9]])
>>> A.transpose()
array([[1, 5],
[3, 9]])
Ind = [elements]
1. When you slice an array using the colon operator, you get a view of the
object.
2. This means that if you modify it, the original array will also be modified.
3. This is in contrast with what happens when you index an array, in which case
6. is a copy of the original data, not a view as one gets for slices.
Arrayname.shape
Arrayname.size
Np.random.random(10)
Np.any(x < 0.9)
Np.all ()
Plt.plot([list])
1. The working directory is the directory where you have launched your Python.
2.3.3 Plotting using Logarithmic axes
2. This means that for any given point to be plotted, its x or y-coordinate,
Semilogx()
Semilogy()
Loglog()
1. semilogx() plots the x-axes on a log scale and the y in the original scale;
2. semilogy() plots the y-axes on the log scale and the x in the original scale;
4. So the lesson here is that functions of the form y is equal to x to power alpha
5. show up as straight lines on a loglog() plot.
Np.random.normal(size = 1000)
Plt.hist(x, normed= true, bins= np.linspace (-5, 5 , 21));
Np.random.gamma(2, 3, 100000)
Comulative = true, histtype = “step”
Plt.figure()
Choosing a die
Random.choice(random.choice([range(1,7), ranger(1,9),
range(1,11)]))
Rolls =[]
For k in range(100):
Rolls.append(Random.choice([1,2,3,4,5,6]))
Ys =[ ]
For rep in range(100):
Y=0
For k in range(10):
X = Random.choice([1,2,3,4,5,6]))
Y=y+x
Ys.append(y)
The central limit theorem states that if you have a population with mean μ and
standard deviation σ and take sufficiently large random samples from the
population with replacement , then the distribution of the sample means will be
approximately normally distributed.
Import numpy as np
Np.random.random(size of the 1d array)
Np.random.random((number of rows, number of culumns)) – as a
tuple
3. from the same distribution, we can specify the length of the 1d array
X = np.random.randint (1,7(100,10))
X.shape()
Np.sum(X)
Np.sum(X , axis =1)
X = np.random.randint (1,7(100,10))
Y = np.sum(X, axis=1)
Plt.hist(Y);
Import time
Start_time= time.clock()
End_time = time.clock()
Print(End_time – start_time)
Time / time
How many times faster the second one is
delta_X = Np.random.normal(0,1(2,5))
#cumulative sum
X = Np.cumsum(delta_x , axis = 1)
Np.random.normal(0,1(2,5))
X = Np.concatenate((X_0, np.cumsum(delta_X , axis = 1)), axis = 1)
2468
2 6 12 20
Topics:
Array
Linked List
Stack
Queue
Binary Tree
Binary Search Tree
Heap
Hashing
Graph
Matrix
Misc
Advanced Data Structure
Adenine
Cytosine
Guanine
Thymine
3.1.2 ncbi
3.1.3 import dna data into python
Inputfile = “dna.txt”
F = open(inputfile, “r”)
Seq = f.read()
Seq
Print(seq)
To remove /n
Seq = Seq.replace(“/n”,” “)
table = {
'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W',
}
If len(seq) % 3 ==0:
For I in range(0, len(seq), 3):
Codon = seq[I : I+3]
Protein += table[codon]
Return protein
Slicing a string
Seq[0:3]
Def read_seq(inputfile):
“””reads and returns the input sequence with special characters removed”””
With open(inputfile, “r”) as f:
Seq = f.read()
Seq = Seq.replace(“/n”,” “)
Seq = Seq.replace(“/r”,” “)
Return seq
Dna = read_seq(“filename”)
Pandas
>>> pd.Series([1,2,3],index = ["q","w","e"])
q 1
w 2
e 3
dtype: int64
>>> x= pd.Series([1,2,3],index = ["q","w","e"])
>>> x["w"]
2
>>> age = {"Tim":29, "Jim":31, "Pam":27, "Sam":35}
>>> x = pd.Series(age)
>>> x
Tim 29
Jim 31
Pam 27
Sam 35
dtype: int64
>>> #dataframe
>>> data = {"name" : ["Tim", "Jim", "Pam", "Sam"],
... "age" : [29, 31, 27,35],
... "ZIP" : ["02115","02130","67700","00100"]}
>>> x = pd.DataFrame(data, columns = ["name","age","ZIP"])
>>> x
name age ZIP
0 Tim 29 02115
1 Jim 31 02130
2 Pam 27 67700
3 Sam 35 00100
>>> x.name
0 Tim
1 Jim
2 Pam
3 Sam
Name: name, dtype: object
>>> x= pd.Series([1,2,3,4],index = ["q","w","e","r"])
>>> x
q 1
w 2
e 3
r 4
dtype: int64
>>> x.index
Index(['q', 'w', 'e', 'r'], dtype='object')
>>> sorted(x.index)
['e', 'q', 'r', 'w']
>>> x.reindex(sorted(x.index))
e 3
q 1
r 4
w 2
dtype: int64
>>> x= pd.Series([1,2,3,4],index = ["q","w","e","r"])
>>> y= pd.Series([5,6,7,8],index = ["q","w","t","w"])
>>> x + y
e NaN
q 6.0
r NaN
t NaN
w 8.0
w 10.0
dtype: float64
>>>