Python Project
Python Project
Out[2]:
MovieID Title Genres
In [3]: df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3883 entries, 0 to 3882
Data columns (total 3 columns):
MovieID 3883 non-null int64
Title 3883 non-null object
Genres 3883 non-null object
dtypes: int64(1), object(2)
memory usage: 91.1+ KB
Out[4]:
UserID MovieID Rating Timestamp
0 1 1193 5 978300760
1 1 661 3 978302109
2 1 914 3 978301968
3 1 3408 4 978300275
4 1 2355 5 978824291
In [5]: df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000209 entries, 0 to 1000208
Data columns (total 4 columns):
UserID 1000209 non-null int64
MovieID 1000209 non-null int64
Rating 1000209 non-null int64
Timestamp 1000209 non-null int64
dtypes: int64(4)
memory usage: 30.5 MB
Out[6]:
UserID Gender Age Occupation Zip-code
0 1 F 1 10 48067
1 2 M 56 16 70072
2 3 M 25 15 55117
3 4 M 45 7 02460
4 5 M 25 20 55455
In [7]: df3.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6040 entries, 0 to 6039
Data columns (total 5 columns):
UserID 6040 non-null int64
Gender 6040 non-null object
Age 6040 non-null int64
Occupation 6040 non-null int64
Zip-code 6040 non-null object
dtypes: int64(3), object(2)
memory usage: 236.0+ KB
Merging datasets
Out[8]:
Zip-
MovieID Title Genres UserID Rating Timestamp Gender Age Occupation
code
Toy Story
0 1 Animation|Children's|Comedy 1 5 978824268 F 1 10 48067
(1995)
Pocahontas
1 48 Animation|Children's|Musical|Romance 1 5 978824351 F 1 10 48067
(1995)
Apollo 13
2 150 Drama 1 5 978301777 F 1 10 48067
(1995)
Star Wars:
Episode IV -
3 260 Action|Adventure|Fantasy|Sci-Fi 1 4 978300760 F 1 10 48067
A New Hope
(1977)
Schindler's
4 527 Drama|War 1 5 978824195 F 1 10 48067
List (1993)
Out[9]: Age
1 222
18 1103
25 2096
35 1193
45 550
50 496
56 380
dtype: int64
In [12]: df2['Rating'].unique()
Rating
1 56174
2 107557
3 261197
4 348971
5 226310
Name: UserID, dtype: int64
Rating
1 16
2 61
3 345
4 835
5 820
Name: UserID, dtype: int64
In [17]: Toy1995_rating.plot(kind='pie')
Rating
1 25
2 44
3 214
4 578
5 724
Name: UserID, dtype: int64
In [19]: Toy1999_rating.plot(kind='pie')
Rating
1 41
2 105
3 559
4 1413
5 1544
Name: UserID, dtype: int64
In [21]: Toy_rating.plot(kind='pie')
MovieID
2858 3428
260 2991
1196 2990
1210 2883
480 2672
2028 2653
589 2649
2571 2590
1270 2583
593 2578
1580 2538
1198 2514
608 2513
2762 2459
110 2443
2396 2369
1197 2318
527 2304
1617 2288
1265 2278
1097 2269
2628 2250
2997 2241
318 2227
858 2223
Name: Rating, dtype: int64
Rating
1 2
2 3
3 3
4 11
5 1
Name: Rating, dtype: int64
In [26]: Rating_of_2696.plot(kind='pie')
Out[28]:
Action|Adventure|Anima
Action Action|Adventure Action|Adventure|Animation Action|Adventure|Animation|Children's|Fantasy
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
Machine Learning
features = finalDF.iloc[:500,[0,7,8]]
label = finalDF.iloc[:500,4]
In [30]: features.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 500 entries, 0 to 499
Data columns (total 3 columns):
MovieID 500 non-null int64
Age 500 non-null int64
Occupation 500 non-null int64
dtypes: int64(3)
memory usage: 15.6 KB
In [31]: #train_test_split
model = KNeighborsClassifier(n_neighbors=15)
model.fit(X_train,y_train)
In [33]: print(model.score(X_train,y_train))
print(model.score(X_test,y_test))
0.4575
0.47