0% found this document useful (0 votes)
19 views

Python Summary

Copyright
© © All Rights Reserved
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Python Summary

Copyright
© © All Rights Reserved
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 20

1 "String".

upper()
2 Series.index
3 Series/+-* number
4 Series.size
5 Series.is_unique
6 Series.values
7 Series1+Series2 Nan if there is no matching between two
8 sales_h1 = sales_q1.add(sales_q2, fill_value=0) Series.add(other, level=None, fill_value
9 Series.value_counts() Series.value_counts(normalize=F

normalize = True:
10 dict(series)
11 sorted(series)
12 Series.squeeze(axis=None) Series or DataFrames with a single element a
13 Series.sort_values() Series.sort_values(*, axis=0, ascending=
Normalize: If True then the object returned w
14 Series.sort_index()
15 value in [] or "value" in series.values
16 Series.get(key, default=None) Returns default value if not found.
17 pokemon[[1, 2, 4]] = ["Firemon", "Flamemon", "Blazemon"] overwrite value
18 pokemon_df = pd.read_csv("pokemon.csv", usecols = ["Pokemon"])
pokemon_series = pokemon_df.squeeze("columns").copy()
19 google = google.sort_values() # google.sort_values(inplace = True) ca
20 google.describe()
21 Series.apply()
22 Series.map(arg, na_action=None) arg: mapping correspondence
re is no matching between two series
d(other, level=None, fill_value=None, axis=0)[source]
alue_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True) = false → also show Nan
dropna
Normalize If True then the object returned will cont

ataFrames with a single element are squeezed to a scalar. DataFrames with a single column or a single row are squeezed to a Series. Otherwise the object is
t_values(*, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)[source]
If True then the object returned will contain the relative frequencies of the unique values.

efault value if not found. can be used for dataframe as well

sort_values(inplace = True) can be recreated by the below syntax

ping correspondence
so show Nan
en the object returned will contain the relative frequencies of the unique values.

to a Series. Otherwise the object is unchanged.


, key=None)[source]
Time method
list tuple

要素の順番(order) あり(シーケンス) あり(シーケンス)


変更 可(Mutable) 不可(Immutable)

重複 要素の重複を許容する 要素の重複を許容する

listよりメモリ使用量スペースが少な
辞書のキーにできない

setの要素にできない 辞書のキーにできる
補足
setの要素にできる
簡易的なClassの代わりにnamed
tuplesが使える
l1 = list() t1 = tuple()
空の状態での作成
l1 = [] t1 = ()
t1 = ('a','b','c')
t1 = 'a', 'b', 'c'
初期化 l1 =['a', 'b','c'] #一要素ではカンマを忘れずに
t1 = ('a',)
t1 = 'a',
初期化 l1 = list(['a', 'b', 'c']) t1 = tuple(('a', 'b', 'c'))
(Class指定) l1 = list(('a', 'b', 'c')) t1 = tuple(['a', 'b', 'c'])

要素数の取得 len(l1) len(t1)


# 末尾へ
l1.append('d')
l1 += ['d']
追加 -
# 特定の位置へ
l1.insert(1, 'e')
l1[1:1] = 'e'
l1 = ['a', 'b', 'c']
l1[2] = 'x'
#無いとIndexError
置換 -
l1[9] = 'x'
#これはOK(末尾追加)
l1[9:] = 'x'
削除( by Position ) del l1[2] -

削除( by Value ) l1.remove('a') -


削除( by Key ) - -
l1.clear() # 無理矢理だが
削除( 全件クリア )
del l1[:] t1 = tuple()
要素の参照 # start
(スライス) l1[0]
# start:end
l1[0:2]
listと同じ
# last
l1[-1]
# by 2
l1[::2]
# デフォ=末尾から(-1)
# 無いとIndexError
取得&削除 l1.pop() -
# 位置指定
l1.pop(2)
#追加
append()
LIFO(Stack) -
#取り出し(pop(-1)と同じ)
pop()
#追加
append()
FIFO(Queue) -
#取り出し
pop(0)
要素の位置を取得 l1.index('b') listと同じ
#True/False
存在チェック listと同じ
'a' in l1
l1 =[[1,2],[3,4],[5,6]] t1 = ((1,2),(3,4),(5,6))
二次元 # 要素の参照 # 要素の参照
l1[1][2] t1[1][1]
l1 = [1,2,3]

マージ l2 = [4,5,6]
Merge -
l1.extend(l2)

l1 = ['a','b','c'] t1 = (1, 2, 3)
l2 = ['d', 'e', 'f'] t2 = (4, 5, 6)
l1 +=l2 t3 = t1 + t2
# これは結果が異なる
マージ(2)
l1 = ['a','b']
l2 = ['c', 'd']
マージ(2)

l1.append(l2)
--> ['a', 'b', ['c', 'd']]

特定の値を持つ要素
l1.count('a') t1.count('a')
の数を取得

ソート l1.sort()
-
(破壊的) l1.sort(reverse=True)
ソート # sorted()=>list
(非破壊的) l2 = sorted(l1) t2 = tuple(sorted(t1))

並び順を逆に
l1.reverse() -
(破壊的)
並び順を逆に l2 = reversed(l1) t2 = tuple(reversed(t1))
(非破壊的) l2 = l1[::-1] t2 = t1[::-1]
a = [1, 2, 3] a = (1, 2, 3)
b=a b=a
---
コピー(浅い)
import copy
a = (1, 2, 3)
b = copy.copy(a)
a = [1, 2, 3] import copy
b = a.copy() a = (1, 2, 3)
--- b = copy.deepcopy(a)
コピー(深い) c = list(a) ---
--- c = tuple(a)
d = a[:] ---
d = a[:]

値の合計 sum(l1) listと同じ

値の最大 max(l1) listと同じ

値の最小 min(l1) listと同じ

l1 = ['a', 'b', 'c']


変換(Stringへ) ','.join(l1) listと同じ
--> a,b,c

変換(Listへ) - list(t1)

変換(Tupleへ) tuple(l1) -
変換(Tupleへ) tuple(l1) -

変換(Setへ) set(l1) set(t1)

l1 = [['a', 'b'], ['c', 'd'], ['e', 'f']] t1 = (('a', 'b'), ('c', 'd'), ('e', 'f'))
d1 = dict(l1) d1 = dict(t1)
---
変換(Dictへ)
k = ['a', 'b', 'c']
v = [1, 2, 3]
d1 = dict(zip(k, v))
複数のシーケンスか

順番に取り出し

zip(l1,l2) listと同じ

内包表記 [x for x in l1] tuple(x for x in t1)


mutableな 可能 可能
オブジェクトの格納 l1 =['a', [1, 2, 3]] t1 = ('a', [1, 2, 3])

集合演算(和) - -

集合演算(差) - -

集合演算(積) - -

集合演算(対象差) - -

キーによる参照 - -

キーの取得とループ - -
値の取得とループ - -

キー&値ペアの
取得とループ - -

キーと値の入れ替え - -
set dictionary

なし 3.7~あり ※注
可(Mutable) 可(Mutable)
キーの重複を許容しない
要素の重複を許容しない
値の重複を許容する

集合演算が可能 keyはユニークであること

keyが重複した場合は値を上書き
要素はユニーク
(upsert)
追加・置換はUpsert

listやtupleの重複排除に利用可

d1 = dict()
s1 = set()
d1 = {}

s1 = {'a', 'b', 'c'} d1 = {'a': 1, 'b': 2, 'c': 3}

s1 = set({'a', 'b', 'c'}) d1 = dict(a=1, b=2, c=3)


s1 = set(['a', 'b', 'c']) d1 = dict({'a':1, 'b':2, 'c':3})

s1 = set(('a', 'b', 'c')) d1 = dict((('a',1), ('b',2), ('c',3)))

len(s1) len(d1)
s1.add('d') d1[key] = val
s1 |= {'d'} d1.update({'e': 4})
d1.update(e=4)
d1.update(dict(e=4))

追加と同じ(upsert) 追加と同じ(upsert)

- -
s1.remove('d')
-
s1 -= {'d'}
- del d1[key]
s1.clear() d1.clear()
s1 = set() d1 = {}

- -

#無いとKeyError # 無いとKeyError
s1.pop('a') d1.pop(key)
# 無いとdefault # 無いとdefault
s1.pop('a', default) d1.pop(key, default)

- -

- -

- -
key in d1 #True/False
listと同じ
val in d1.values() #True/False
s1 = {(1,2), (3,4)} # valにdictを格納可能
#setの入れ子は不可 d1 = {'a': {'x': 1}, 'b': {'y': 2}}
× s1 = {{1, 2}, {3, 4}}
s1 = {1, 2, 3} d1 = {'a': 1, 'b': 2}
s2 = {4, 5, 6} d2 = {'b': 9, 'c':3}
s3 = s1.union(s2) d1.update(d2)
※key重複時は後者(d2)の値を反映
s1 = {1, 2, 3}
s2 = {4, 5, 6}
s3 = s1 | s2

-
-

d1 = {'a': 3, 'b': 2, 'c': 1, 'd': 3}


len({k: v for k, v in d1.items() if v ==
- 3})
---
sum(v == 3 for v in d1.values())

- -

# sorted()=>list
d2 = sorted(d1.items(), key=lambda
s2 = set(sorted(s1))
x: x[1])
# 用途??

- -

- -

a = {1, 2, 3} a = {'a': 1, 'b': 2, 'c': 3}


b=a b=a

a = {1, 2, 3} a = {'a': 1, 'b': 2, 'c': 3}


b = a.copy() b = a.copy()
---
c = set(a)

sum(d1.keys())
listと同じ
sum(d1.values())
max(d1.keys())
listと同じ
max(d1.values())
min(d1.keys())
listと同じ
min(d1.values())
,'.join(d1.keys()) >
listと同じ ,'.join(d1.values())

list(d1.keys())
list(s1)
list(d1.values())
tuple(d1.keys())
tuple(s1)
tuple(s1) tuple(d1.values())
tuple(d1.items())
set(d1.keys())
- set(d1.values())
set(d1.items())
s1 = {('a',1),('b',2),('c', 3)}
d1 = dict(s1))
---
-
s1 = {'a', 'b', 'c'}
s2 = {1, 2, 3}
d1 = dict(zip(s1, s2))

zip(s1,s2)は可能だが

組や順番は未保証
s1 = {'a', 'b', 'c'} -
s2 = {1, 2, 3}
l3 = zip(s1, s2)
--> {('a', 1), ('c', 3), ('b', 2)}
{x for x in s1} {k: v for k, v in d1.items()}
不可 Keyは不可(Type Error)
s1 = {'a', [1, 2, 3]} d1 = {[1, 2, 3]: 1}
-->TypeError Valueは可能

d1 = {'a': [1, 2, 3]}

s1 | s2
-
s1.union(s2)
s1 - s2
-
s1.difference(s2)
s1 & s2
-
s1.intersection(s2)
s1 ^ s2>
s1.symmetric_ -
difference(s2)
#キーが無いとKeyError発生
d1[key]
#無いとNoneが返る
-
d1.get(key)
#無いとdefaultが返る
d1.get(key,default)
d1.keys()
-
for key in d1.keys():
d1.values()
-
for val in d1.values():
# ( k, v )のペアがtupleで戻る
- d1.items()
for key, value in d1.items():
- d2 = {v: k for k, v in d1.items()}
Category Continuous
Chi square t-test
Category
Anova
t-test Correlation
Continuous

Paired t test ・A paired t-test is used when we are interested in the difference between two variables fo
・Often the two variables are separated by time.
・For example, in the Dixon and Massey data set we have cholesterol levels in 1952 and chol

Two samples t test a method used to test whether the unknown population means of two groups are equal or not.
e between two variables for the same subject.

erol levels in 1952 and cholesterol levels in 1962 for each subject

wo groups are equal or not.


Confidence interval for difference of two means, dependent samples
Weight loss example, lbs

Background The 365 team has developed a diet and an exercise program for losing weight. It seems that it works like a charm. However,
You have a sample of 10 people who have already completed the 12-week program. The second sheet in shows the data in
Task 1 Calculate the mean and standard deviation of the dataset
Task 2 Determine the appropriate statistic to use
Task 3 Calculate the 95% confidence interval
Task 4 Interpret the result
Optional You can try to calculate the 90% and 99% confidence intervals to see the difference. There is no solution provided for these

Solution:

Subject Weight before (lbs) Weight after (lbs) Difference


1 228.58 204.74 -23.83 Task 1: Mean -20.02
2 244.01 223.95 -20.06 St. deviation 6.86
3 262.46 232.94 -29.52
4 224.32 212.04 -12.28 Task 2: Population variance is unknown
5 202.14 191.74 -10.41 We have a small sample
6 246.98 233.47 -13.51 We assume that the population is normally d
7 195.86 177.60 -18.25 The appropriate statistic to use is the t-statist
8 231.88 213.85 -18.03
9 243.32 218.85 -24.47
10 266.74 236.86 -29.87

Note that the solution is exactly the same no matter the u


hat it works like a charm. However, you are interested in how much weight are you likely to lose.
second sheet in shows the data in kg, if you feel more comfortable using kg as a unit of measurement

re is no solution provided for these cases.

Task 3:

95% CI, t9,0.025 2.26


n variance is unknown
a small sample T CI low CI high
me that the population is normally distributed 95% -24.93 -15.12
opriate statistic to use is the t-statistic

Task 4: You are 95% confident that you will lose between 24.93lbs and 15.12lbs,
given that you follow the program as strict as the sample

is exactly the same no matter the unit of measurement


en 24.93lbs and 15.12lbs,
# A custom IQR function
def iqr(column):
    return column.quantile(0.75) - column.quantile(0.25)
    
# Print IQR of the temperature_c column
print(sales["temperature_c"].agg(iqr))

You might also like