0% found this document useful (0 votes)
44 views10 pages

Prog Found Final

The document appears to be a programming exercise related to data analysis using Python and pandas, focusing on a dataset of salaries by college type. It includes various problems and statements that require the reader to manipulate and analyze the data, such as loading CSV files, calculating averages, and filtering data. The document contains numerous code snippets and questions aimed at testing understanding of data handling in Python.

Uploaded by

luffyzoroonep365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views10 pages

Prog Found Final

The document appears to be a programming exercise related to data analysis using Python and pandas, focusing on a dataset of salaries by college type. It includes various problems and statements that require the reader to manipulate and analyze the data, such as loading CSV files, calculating averages, and filtering data. The document contains numerous code snippets and questions aimed at testing understanding of data handling in Python.

Uploaded by

luffyzoroonep365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

C S 50 00 F1aa

__ _

: ~ - - -- -
_____
7 00 ._ _ _ __ _
N am e ed .
th Fl y be en ex ec ut
Fo r al l th e pr ob
le ms in g th f'iOI lo w in g st ate m ents ha ve al re ad
e oa f, as su m in c
as np
Im po rt au m py
u pd
Im po rt pa nd as a cs v file un da -
Th e da ta is st or ed in
on a pr oj ec t to
an aJ by co lle geby ty pe . fiv e ro w s o f th e
R ya n is w or ki ng e .a. . rr,c
or y as th • .7 u1 00 SC np t W ith
p.,
~ c sa la ry
a na m sa J .
an cs - -c ol lc ge -t yp c.c sv . Th e fiTSt
c
th e sa m e di re ct be lo w.
ye d
d at as er ar c di sp la

Sc ho ol
SC .tt ing ~
M ed ian
~ .. ~
10 U,
Pw ,c -tl le
h
lil ad -C .,- ,
2IC
Pe Ke nt lle
ne ,
~
lll d- C •w

a. la fy
..'".._.,'
..___
M ed ian W er y
Sc ho of Ha ne Ty pe Sa la, y Sa lar y Sa lar y
'2 20.00 00 0
11 68.00 0.0 0
1911.2 00 00
Ma.. .c:hUMtta $7 6.8 00.00
$7 2.2 00.00 $1 26.00 0.0 0
0 Ins titu te of ~ng
IT) Na N
Te ch no log y (M $16 1.0 00.00
NaN S1 04,00 0.0 0
e
Callfo ml a lns tiM y $75.5 00.00 $1 23 ,00 0.0 0
cl Te ch no log Engl, -r tn g Na N
1
(C IT ) $1110,00 0.0 0
Na N $9 6.0 00.00
$ 71 .80 0.0 0 $1 22 ,00 0.0 0
Ha tvey Mu dd Engi, -f n g $1 90 ,00 0.0 0
2 Co lla ge $1 ◄3.000 .00
$9◄ ,300.00
$6 6.8 00.00
Po lyted ln lc $11 ◄,000.00
w En gi, -t n g $82,◄00 .00 Na N
Unlverw/ty of Ne $1 ◄2.,000.00
3 $8 0,2 00.00
Yorlt. Broolclyn NaN
$1 1 ◄.000.00
$6 2,2 00.00
En gl .-- in g m e w ith a
4 Co op er Un ion
t into Pa nd as as a D at aF ra
en t is co rr ec t to lo ad th e da ta se
fo ll ow in g st at em
1. W hi ch is th e
?
va ri ab le na m e di )
cs v( 's al ar ie s- by -coJlege-type.csv'
A . d f= pd.read_ aries-by-coJlege-t
ype.csv')
.l oa d_ cs v( 's al e.cs v')
B . d f = pd 's aJ ar ies-by-college-typ
_cs vf iJ e(
C. d f = p d.read Jege-type.csv') th e following
.l oa d_ cs vf ile ('saJaries-by-coJ Pr ob lem 1, which of
D . d f = pd DataFrame ob ta in ed in
o f th e co lu m n na m es from the
2. T o g et a list
ld be ex ec ut ed ?
st atem en t sh ou

4 . d fi n d ex
3. df'.indexes
-;_df'.column th
y different school types does
) . df'.columns find bow man
llo w in g statement should be used to
. Which o f the fo
1t.aset have?
'] .unique()
df['School Type
df['Scbool Type'
J.nunique()
d f[ 'S ch o o l Type'
J.sumO
'].totalQ
d f['School Type
• st -· • td b c_. -, t tO {iJtd O Ut th c ~tu·a l d a ta ty p e o f
e (o lfo" 1n g at en te rt l !!,1...~
Wb.id> o (l .b e a c h it em ii'\
4. 1u m o?
,- ~ •'-
M c: ,d jd Sal ~~
Mid..c.-,c,r:r at Y co
A . rypc<cfffM . SaJar)" ))
Ni..c.,ccr M e
8 . ry p c( d tr M d :: Sa]ar)").dlY
~ .c a, -c .c r ~ ~ J>C)
c . ry p c( d tf M ian SaJary'J.d
~ .. ca ,, ec r M ed ty p eS )
c . rypc(dtfMid.. ia n 5a1ary"]l0
D
ca rc c: r
s. From _P ro b_lc m th e ty p e o f th
e d a ta points
. e M id -C a re e
4 , ~-e n o ;: r Med
co lu m n 1s st
n n g. W h ic h O
r :i - 1 :: n g st
at em en t sh o u
in thre m o
v e th e S si g n ia n S a la ry
d at a p o in ts fr ld be u se d to a n d c o n v e rt
o m st ri n g to • p o in t o u m b er th e
floating S ?
A . d tf M id -C
a re e r Median
B . d tf M id -C Salal)"].re p la [" 0-9}' ").a st y pe(fl"' '"l)
._
a rc e r M ed ia n ~ c c (' [" .0 ~ ]'
c . d Q 'M id -C ar Salary'J.str.re~ ,
9 ") .a st y p e( fl o at )
D . d ff M id -C ec r M ed ia n S a l~ .s u b 0 _ ', ") .a st y p
a re e r M ed ia n (' [ 9 e( fl o at )
Salary J.st r.su · " ~ 9 , " ).a st y p c (f lo a t)
6. In o rd e r to b (' [ . ] '
o b ta in th e in . b e lo w a b o u t
sh o u ld b e e x
e c u te d ?
fo n n a n o n sh
own th e d a ta se t, b ' h o f th e fo ll o w in g
w ,c st a te m e n t
< cl as s ·~ n
d a s. c o r e .
R an 1e in d ex fr a M . D at a F •>
: 269 e n tr
D at a c o lu a is ie s, e to 26rame
(t o ta l 8 co l\ ft 8
• Col\aVI W IS ):
Non-Null c o D ty p e
unt
e
l
S ch o o l H u
e
--·------- ----
S ch o o l T yp
e
269 n o
n -n u ll o b je c t
2 S ta rt in & M 26 9
ed n o n -n u ll o b je c t
3 H id -C at "f fr ian S a la ry 269 n o n -n u
4 H id -c aN H tr
M ed ia n S a la ry ll o b je c t
1 8 th P e rc 269 n o n -n u
S H id -C a re er e n ti le S a la ll o b je c t
2 ry 2 3 1 n o n -n u ll
6 H id -c a re er 5 th P e r c e n ti le S a la ry 2 6 o b je c t
9
7 H id -C a r• er 7 S th P e r c e n ti le S a la ry 2 6 n o n -n u ll o b je c t
d ty p e s: o b 9 9 th P e rc 9 n o n -n u ll o b je c t
je rt (S ) e n ti le S a la
■eaory
ry 2 3 1 n o n -n u ll
u sa g e: 1 6 .9 o b je c t
A . df.dtype + KB
B . df.dtypes
C. d f. in fo
D . d fi n fo ()

7. B a s e d o n th
e sc re e n s b o t show
n in Problem
A . T h e datas 6 , which o f th
et has 2 6 9 ro w e following s
B . T h e in fo rm s and 8 c o lu m ta te m e n t is in
a ti o n contain ns correct7
C . T h e Mid-C ed in th e Dtype
areer 9 0 th P e c o lu m n m o s t
D. Two colum rc e n ti le Sala likely is n o t a c
ns have missing ry c o lu m n h a c u ra te
values s 3 0 m is s in g v a lu e s
8. Which o f th
e following sta
::ach college ty tement should
pe, respective b e used to fin
ly? d the average M
id-Career Med
. df['School T ian Salary fo1
ype'J['Mid-Care
df['Mid-Care er Median Sala
er Median Sa ry'].meanO
jf g r o u p b y [' S lary'J.meanO
c h o o l Type'J
'f groupby('S ['Mid-Career M
chool Type') edian Salary'] .m
f'M id-Career Med eanQ
ian Salary'].me
anQ
o \\'h1,'h ,,; the ,;,111•" in!! s1:11r mcn 1 s huuld he
used to fiiuJ 0111 whic h univ ersit y hn'I the h ighe
( :u~r ,\ -frdm n S11lnn '? st M id-

4 dfs('lr1 , "h1<",11'~ ='M,d -Con.-er Mcd inn Snla


ry'. o!loo11ding = Puls e)
B .;( <ort_ , 11/ul"$(b~ ='M id-C aree r Med ian Sala
ry'. ascundin g = Fals e).il oc[O ]
c d1:so n , alue.~'h.' - 'Mid -Cor cer Med ian Sa lary' . asce ndin g = True )
D dfs" n~,-ahteS{h) 'Mid -Car eer M edia n Sa lary' . ascending = True
).iloc[OJ
Jo \\ oich of the follo " i "!? stalcmcn r s hould be used lo find o u t the tota l num bers of univ ersit ies
Mid- Care er Median Sala ry arc above $ I 00,0 whose
00?
A dl]dfT'Mi d-Ca reer Med ian Salr uy'J > I000O0J(
'Scbool Nam e').c ount ()
B. dff_df['/vfid-Carccr Medja11 Sala ry') > I 0000
0JCSchoo l Nam e'].sum( )
C dffdff'M id-Caree r Medran Salary'] > I000 00Jr
scho ol Nam e'J.size( )
D. dfTdfI'"l\.fid-Career Med ian Sala ry'] > 1000
00] .sum O
11. Whi ch ofrb e follo wing state men t shou
ld be used to find out the tota l num bers ofur uve rsity
wfncb contains rhe word ' State'? nam es

A dff'Scbool Nam eJ.s tr.contai ns('S tate' ).co


uot( )
B. dl['S cboo / Name'J.str.conrains('State').s umO
C. df['Scbool Naroe'J.c oota ins(' Stat e').c oun tO
D . df['School Namc'].contains('StareJ.s umO

12. ~or the Data Fram e df shown in the left belo


w, which stat eme nt from the folJowi ng sho uld
obtamed the result sbown in the right? be u sed to

subject Bob Guido Sue


type HR Temp HR Temp HR Temp
ear visit
13 1 47.0 39.1 29.0 36.3 63.0 34.8
2 49.0 38.4 51.0 37.0 27.0 39.1
3 34.0 36.9 40.0 37.4 33.0 36.3
1 21.0 39.0 28.0 37.0 56.0 38.3
2 30.0 36.7 36.0 37.7 38.0 36.7
3 35.0 37.1 41 .0 36.6 38.0 37.8
1 37.0 37.5 46.0 37.8 33.0
37.8
2 44.0 38.2 54.0 37.7 31 .0 36.8
3 46.0 37.8 40.0 37.2 35.0 subject Bob
36.1 Gui do
1 34.o Sut!
35,8 29.0 type HR
35.1 31.0 Temp HR
35.1 Temp HR Temp
2 50.0 year
37.5 39.0 visit
36.3 4 7.
363 63 .0 34.8
~ ,$ {I ss.• 51 .0 370 27.0 39.1
j 34 () 369 400 37.4 33.0 36.J
1 21 0 l9 0 28.0 37.0 56·0 38,3
2 30.0 36.7 36.0
37.7 38.0
36.7
l ~ O 37
.1 41 .0 36.6
38.0 37.8
' 37 0 3
7.5 46.0 37.8 subject Bob Guido Sue
33.0 37.8 HR
2 '4.0 type HR HR
382 54.0
3 46.0 37.7 31.0
37.8 ~.o 36.8 ~ y = visit
37.2 35.0 ea_r_~
36.1 2013 ~ - :29.0
' 34.0 35.8 2 1 47.0
9.0 35.1
31.0
2 50.0 35.1 2014
37.5 39.0 1 21.0
36.3 47.0 2B.0
3 , 3.0 38.2 35.1 2016 56 .0
25.O 37.1 1 37.0
41.O 36.4 2016 46.0 33.0
A df.loc{(slx
:e
B. df.Joc[(:, 1), (None), I), (slice(None), 'H 1 34 .0
(:, 'HRJJ R')) 29.0 31.0
C. dlJoc{T~ I]
,[:,
D.d.f.loc/1, 'H 'HR JJ
RJ

o1b4ta. inFctbt thc ereosu.wlt =


sho,.w.,n, dinfshtheowring
hint?the left below, whic
h statement from
the following s
houId 'oe u
sedto
0 1 2
0 1.0 NaN
2 NaN
- 3

1 2.0 3.0
5 NaN
4 dl dtopna(thre 2 NaN 4.0
sh ::: I) 6 NaN 0 1 2
· df. dro,,na(thresb ~3
:: 2) 1 2.0 3.0 5
NaN
~ droPfla(thresJi
·one ofthe aboveO J)
15 1:01 1hc Dn111}mm c df slmwu 111lhc ton hc low, which ~,n1c 1110 111 rrrn u lhc follow 111 ~ , hould l>e used 1
u
ob1111n the res ult sho" 11 in the ri~ hl'/

0 1 2 3 0 1 2 3

0 1.0 NaN 2 NaN 0 1.0 0.3 2 00

1 2.0 3.0 5 NaN 1 2.0 3.0 5 0.0

2 NaN 4.0 6 NaN 2 NaN 4.0 6 0 .0

A. df.fillna(melhod = 'ffill')
B. d f. fillna(m cthod = 'bfill')
C. df.filloa(O)
D. df.fillna( { I : 0.3 , 3: O})
followi ng should be used to
16. For the DataFr ame df shown in the left below, which statem ent from the
obtain the result shown in the right?
Item realgdp Inn unemp

dab!

1959-03-31 23:59:59. 9999999 99 2710.34 9 0.00 5.8

2.34 5.1 dat e i tem


1959-06-30 23:S9:69 .999999 999 2778.801 2719.3 49
1959-0 3- 31 23 : 59:59 . 999999 999 realgdp
1959-09-30 23:S9:5 9.99999 9999 2775 .488 2.74 5.3 infl e .eee
unemp S . 809
1959•12 -3123:5 9:59.999 999999 2 785.204 0.27 5.6 2778 . 861
1959- 06 - 30 23 :59 :59.999 999999 realgdp
5.2 infl 2 . 340
1960-03-31 23:59:5 9.99999 9999 2847.69 9 2.31

A . df.unstackO
B. dfstac kO
C. df.rein dexO
D. df.set_ index( )
ing should be used to
17. For the Data Frame df shown in the left below, which statem ent from the follow
obtain the result shown in the right?
v.l11t nlue2
date Item value value2
date hem

a 1959-03 -31 23:59:59.999999999 realgdp 2710.349 1.057270 1959..03--31 23:S9:59,19tH99 99 rulgdp 2710.$49 1057270

1 1959.QJ .J l 23.59;59.999999999 Inn 0.000 -0.0'1 4329 Intl 0 ,000 .() G«l29

unernp 5.800 -0.88B8i 6 unemp 5.800 .0,868826


2 1959-03-31 23:59:59.999999999
1.090907 1151-06-30 23:lll:69.lfflt11 9tl rulgdp 2TT8.801 l.090907
3 1959-06 -30 23.59:59.89~99999 realgdp 2776.80 1
Intl 2.$40 .Q.009227
4 1959-06-30 23:59:59.999999999 lnfl 2.340 -0.009227

A. df.set_index(['date', 'item'])
B. df.reindex(f'date', 'item'J)
C. df.resel_ index(['date', 'item'))
D. None oftbe above

18. Unsupervised machine learning uses _ _ _ algorithms.


I\ r,•r" ,,11'"
1, , 11,,, ,111 Id
, , Iii""'" ''"'" lno shou
1110 folIow 0
I ff(ll ll
1' , ,11,r ,,1 1hr n~' ' i , s111
• 1hc \cl\ h<'lvW, whnlh 10111cn
I Ult , h1)\\II Ill
~ I •" 1hr \,~11,I tlll'I~• ill' 11,m ' 111 1he I lithl'I
i ' ·"'111111 11,,, fl:•' 11\I ~ 11 ''
1 11
~ \I\('\ \ 1 ll\'
B C

.u ~ ,I I I 1 81 C1

8 C B C 0 2 B2 C2
A
e, Cl 3 B3 C3 03 3 B3 C3
1 A1

2 A2. 62 C2 4 84 C4 04 4 84 C4

~ pJ Nnc.al\ldfS. df6], join = 'inner')


8 p,d concatt(dfS. df6 ), join= 'ou1er')
C pd.conCA\((df5. df6])
o dfS.append(df6 )
, F thr: Datafl1lllles dO and dO shown t from the following should
_o. or in the left below, which statemen
be used 10 obtain the result shown in the . h?
ng l.
d:: df3

employee group name salary employee group salary


o Sob Aca>untlng 0 Bob 70000 0 Bob Accounting 70000
Jat.e Engineering 1 Jake 80000 1 Jake Engineering 80000
2 Lisa Englneering 2 Lisa 120000 2 Lisa Engineering 120000
3 Sue HR 3 Sue 90000 3 Sue HR 90000
A pd.merge(dO , df3 ).drop('name', axjs =
0)
B. pd.merge(dfl , dn).drop('name', a.xis =
l)
C. pd.merge(dfl , df3 , left_on = "employee
", right_on = "name").drop('name', axis
D. pd.merge(dfl , df3 , left _on= "employee = l)
", right_on = "name").drop('name', axis
= 0)
21 For the Datafr.uncs dfla and df3 shown
in the left below, which statement from
be used to obtain the result shown in the the following should
right?
tH l11 d!l

ll'IIUp ~rn• ul.r y


t,nploJtt group
0 IIClll 70000 MIM 1111,y
lob ~ 1 ..... e4000
0 Auoun1Jno Bob 70000
Jab E/v.etrw,,,i
2 1 Engineering
Ula 120000 Jakt eoooo
Llu ~IY IJ l 611t OG()OO 2 Englnoe1w,g Uta 120000
lua kR a HR Sue 00000
A. pd.me rge (dn a. dlJ , left imle:\ - 'em
ployee', right 011 e 111am c')
B pd.m erg c(d n a. dO . lcf\ index = I r\lc
, right_on =- 111amo1)
C, pd.r ncr gcl dila . dO. lcO _on
= 'cmplnycc', righ t_on - '11nmc')
D pd.mcrie(dtl a. dn . left _on - Tmc. righ
t on = 'nnmo')
:2. \\1, ich of the foll'-1wing stotemcnt s is
folsc?
A The \,.-nearest neighbors alg.orithm atte
mpts to predict n lest sample's class by
samples tl,at are nearest (in distance) to lookin g at the k trainin g
the test sample.
B. Always pick an even value of k for the
k-nearest neighbors algorithm .
('. Scikit-\eam supports many c\assi(1cati .
on algorithms, including the simplest- k-n
't\'N). earest neighbors (k-
D ln the k-ne.arest neighbors alg,orithm
, the class with the most "votes" wins.
23 . Consider the confus ion matrix for the
Digits dataset's predictions:
arra y ( [ [ ~ 5, 0, 0, 0 , o.
o, o. o, Ol ,
0,
l o, 45, o, o, o, ol,
0 . 0, 0 , 0,
{ 0, o, 54, o, o,
o, 0 , o, o, ol ,
[ 0, o, 0, 4 2, 0,
l, o. 1, o, Ol,
l o. 0, o, 0, 49,
o, o, 1, o, 01,
l O, 0 , o. o, o, 38, o, o, o, OJ,
( 0, o, 0, o, 0, o, 42, o, o,
r o, o, o, o, o, 0, 0, 45, o, 01,
01,
( 0, 1. 1. 2, O, o, o, 0, 39, l l,
( o, o. o. o. 1, 0, 0, 0, 1, 41 l l l

Which of the following statement is fals


e?
A. The columns within a row specify bow
many of the test samples were classified
distinct class 0-9. incorrectly into each
B. The nonzero values that are not on the
principal diagonal indicate incorrect prediction
misses) . s (that is,
C. Each row represents one distinct class-
that is, one of the digits 0-9 .
D. The correct predictions are shown on the
diagonal from top-left to bottom-right-this
principal diagonal. is called the

24. The skleam.metrics module's classifica


tion_report function produces a table of
based on the expected and predicted values classification metrics
for the Digits dataset's predictions, as in
the figure below
t ro• .1U ucn. Htr ics lmpo ct clau
l.tlc atlo n npo rt
n.... • (1trCd191 t) tor dig it in
d.lQl ta , t a7ge t nun )
prln t(c lasa ltic1t lon_ rep0 rtC• •p•c
t ed, pr•d lcte d,
tacg et_nam11•nom•all

prec lalo n
re ca l l ti-sc ore aupp ort
0 l.O0
I 1.00 LOO
o.,e L. 00 ,45
2 0.98 0. 99
l ,O 45
l 0. 9~ 0.99
4 o.,e o. u 0.95
H
0 . 98 H
~ 0.98
0 . 91 l.00 so
', 1.00
0. 96
1.00
1.00
0 . 99
l.00
38
42
8 0 , 91 0.96
45
9 o. ,e 0.89
~ .. 0. 9)
\ Vhic h orule r1111,I11 \\ Ill~
. ·
,1111l· 11 w ,11 nhmrt ti ll' tt•p, 111 " '
1· 1 ,,,
' ' Sc ' . . . . d b the
Ii 0 given drg1t d1v1de Y
A 11,e P ~ ii:;inn co l1111111 <1 h lH\,. th!' tntnl 1111111 h m o l e111Tc1:t ptcdlotl~l~•<i . or k •ng ul each co lumn in the
total number .,, pl"('dr.:111)11, h)r thnt dtJ,1.lt Yr•11 co11 co.111in11 the prccr1u o n l,y 100 1
confusi0n ni:,r, , , d the support column is the
B Then .,,·,,n• ,·,~h1111 n ,s the 1wcrn~c or the precis ion ,111<1 recall . 'nic roca 11 an beled as 4s. and 38
mm, t,~, ,,f '-'Hllf'll•, "ith A p,, , c 11 e~pct:lcd vnluc for cx11111plc, 50 samples w er e I8
<:1m1r1r~ ,wn.• lnhl"'lcrl R!> c;s . , . . d ' •c.J d by the total
C 111<.' n:-..·All l·,,lumn •~ 11te hltnl 11111111.>or o l' corrcc l predictio ns for a g iven digit '"'he II by looking at
numhcr (' f AAmplcs thot shl•uld hnvc beon predicted ' ' y o u can
as t Irnl di gll.. · confirm
· t c reca
en,~h ro\\ in the C(1nf11sion matrix.
n . Onl) stateme nts A :md B nre correc t.

25 Conside-r the following code and output for the Digits dataset's predictions:

1:n 1S71 : t or t in r a nge(l, 20, 2):


lcfo1-d - IU'o1-d(n .1p1-it.5•10, random_.,, tate=ll, shutf l e•T rui.)
J.;.nn ~ 11.Neigh,bo r ;ciassi.Cie.c (11_oe i ghbo r3 • kj
scores~ cro ss va l ~core(estimaCor=knn#
x- digit .5 .d ata: y=di g i t.5. targe t: , cv=ltfoldJ
pr i n~Cf'k• lk :<2 }; mea n accur,,cy=(scores.mean(): .2 ,1; ' +
f'at.andard devia t i on%1sc ores,st d() :.2%1')

k=l ; mean accuracy-98.83 %; standard devia.Uon•0.58\


1<=3 , mean accuracy-98.78i; s tandard deviation•0 .781,
k ~ S : rnea:i <1ccu.racy--98. 72\; .5tandard de.v iat.ion-0. 75\
k= 7 : mean accu racy=-98.441; standard deviation=0.96t
k=-9 ; mean accuracy=98.33\; sta.nd.ard deviation=0.801
lt=.ll; mean accu racy-98. 39\; s ·tandard deviation• O. 80'1
k=l3; mea:1. accuracy-97. 89 \ ; s t.a.nd4rd deviation 5 Q. 89l
Jc=.15; mean ac,cuc·aCf""97 . 89%; standa r d dev:iation=l . 02\:
k•l 7; mea..-i accuracy-97. 50.\; st.andai,d devi.ation•l. 00\
Jr.=1 9; mean accuracy,,97.66\; standard deviation=0.96'11

Which oftbe following statement is false?


A . The k value 7 in kNN produces the most accurate predictions for the Digits dataset.
B . The loop creates KNeighborsCiassifiers with odd k values from 1 th,rougb 19 and performs k-fold
cross-validation on each.
C. The accuracy tends to decrease for higher k values.
D. Compute time grows with k, because k-NN needs to perform many more calculations to find the
nearest neighbors

26. Which of the following statements is false?

A . The machine- learning parameters which the estimator calculates as it learns from the data are called
hyperparameters-i.n the k-nearest neighbors algorithm, k is a hyperparameter.
B. There are two parameter types in machine learning-those the estimator calculates as it learns from the
data you provide and those you specify in advance when you create the scikit-learn estimator object that
represents the model.
C. In machine leamingj a model implements a machine-learning algorithm . In scikit-leam, models are
called estimators.
D. For simplicity, we use scikit-Jeam's default hyperparameter values. In real-world machine-learning
d' vou'II want to experiment with different I Of
5
~ill '.e · ~. . process is called hyperp . ~a ucs k to produce lhc best possible models for"Jour
stild1es-u11s
arurncter tun 111 g•

;\nusha is working on a project to annlyz , . .


27. A . D e movie ratmgs. fhc firsl five rows of the dataset arc
displayed be1ow. ssummg t1,e ntaFrame is referred by variable data.
user_id moYie_ld rating

-0 1193 5
tlmestamp g,ndtr

2000-12.J I
22:12'40 F
■ge

Under
18
occup■llon zip

K-12 aludent 48067


litlt

One Flew (}ltft lhe


a-nrn

Cudcoo's NMI (1975) Orama


2 1193 s 2000-12-31
21 :33:33 M 56+ self-employed 70072
One flew Oter the
Orama
Cucleoo't ~I 11975)
12 1193 2000-12-30 One flew (},er !he
2 M 25-3" programmer 32793
23:49:39 Cuclcoo'a Ntsl 11975) Orama
15 1193 4 2000-12.30 One flew Over the
3 M 25-34 exacutJve/managerial 22903
18.01:19 CuW>O's Nest (1975) Orama

17 1193 5 2000-12-30 01'18flew0verlhe


4 Cuci<oo'sNest (lg?S) Orama
06:<ll :III M 50-55 academic/educator 95350

1n order to display users' age distribution like the one shown below, which statement should be used?
25-34 395556
35-44 199003
18-24 183536
45-49 83633
50-55 72490
56+ 38780
Under 18 27211
Name: ageJ dtype: int64
A. data['age'].describeQ
B. dataraselcountQ
C. data['age'].sizeQ
D. data['age1,value_co untsQ
28. ln order to obtain figure below for the rating's distribution, which statement should be used? X-axis
represents the rating and y-axis represents the total number of ratings .

.r,c,ooo

DlOOO

ZIOOOO

;'00000

1:IOOOO

I
100000

0
... r, .,
"'

A. data['rating'].plot.box()
B. datarrating'].value_countsQ.plot.boxQ
C dntal'rating').vnluc count ~().plo t hnr()
D. da1n('ralin1t'l plo1 hM()
h' h
29. To get n,,crni;tc """ il" rn1111p.~ fo1 onl.lh 111111 lor l lke the c1110 11how n helow , w IC
~f't)1111l'cl hy t:t0 11 t
stntcm enl i;hn11ld h<' u-:c.t''

gendc1 F M
title
$1 ,000,0 00 Duck (1971) 3.375000 2.761905
'Nigh t Mo ther (1986) 3.388 889 3 35291\ I
'Til There Was You (1997) 2.675676 2.73333:\

'burb s, The (1989) 2.793478 2.962 085


... And Justict? tor All (1979) 3.828571 3.689024

A. data.pi\·01 table(indcx = 'title', columns= 'gender', aggfu nc


= 'mean')
B. data.pivot=table('rnting', index = 'title', columns = 'gender', aggfu
nc = 'mea n/
C. data.pivot_table('rating'. index = 'gender', colum ns = 'title',
aggfu nc - 'mea n)
D. data.p ivot_table(index = 'gend er', colum ns = 'title', aggfu nc
= 'mean')
30. To gel the top 6 movies by age 18-24 users like the one show
n below, whic h state ment shou ld you
use?

age 1&- 50. Under


24 26-34 3M' •M9 56+
55 18
title
I Am Cuba (Soy Cuba/Ya Kuba) (196') 5.0 4.666667 NaN · 5.000000 NaN NaN NaN
Sanigossa Manuscript, The (Rekopls Ut&lw ony w
Saragossie) (1165) 5.0 2.800000 2.666667 4 .50000 0 NaN NaN NaN
Arguing the World (1996) 5.0 4.200000 4.000000 NaN 2.5 4.0000 00 NaN
Under lhe Rainbow (1981 I 5.0 2.314286 2.1818 18 2.750000 3.0 1.6666 67 2.0
City, The (1998) 5.0 3.200000 3.000000 3.5000 00 NaN 4.0000 00 NaN
Twice Upon a Yesterday (1998) 5.0 3.500000 3.666667 3.333333 NaN 1.000000 NaN
A data.pivot_table('rating', index = 'title', columns = 'age', aggfunc
= 'mean'). sort_values(by = ' 18-24' ,
ascending = False)[:6)
B. data.pivot_table('rating', index = 'title', columns = 'age', aggfunc
24')[:6]
= 'mean'). sort_values(by = ' 18-
C. data.pivot_ table('rating', index = 'age', columns ='title', aggfunc =
'mean'). sort_values(by = '18-
24')[:6)
D. data.pivot_tabl e('rat ing', index = 'age', columns = 'title', aggfu
nc = 'mean'). sort_values(by ='1 8-24',
ascen ding = False)[:6]

You might also like