PythonForDataScience Small00center10 10501605
PythonForDataScience Small00center10 10501605
PythonForDataScience Small00center10 10501605
1.) can perf orm cal cul ati ons over enti re arrays
What are the advantages of a N umPy array vs a regul ar Python l i st?
2.) easy and f ast
H ow are cal cul ati ons perf ormed on el ements of a N umPy array? el ement-wi se
answer: 1
1.) Create an array of bool eans by def i ni ng the l ogi cal statement and assi gni ng i t
to an al i as
What i s an easy way to subset a N umPy array by usi ng l ogi cal operators?
2.) appl y square brackets to the array, wi th the al i as f rom step 1 i n i t
Create a l i st of l i sts; each i nsi de l i st can be vi ewed as a row i n the N -di m N umPy
H ow can you create an N -di mensi onal N umPy array?
array
Sel ect the thi rd el ement f rom the f i rst row of the 2-d N umPy array, np_2d.
What are the two ways that you can sel ect an i ndi vi dual val ue i n an N -di m N umPy
array? np_2d[ 0] [ 2]
np_2d[ 0, 2]
Sel ect the second and thi rd el ement f rom the f i rst and second row of the 2-d
np_2d[ :, 1:3]
N umPy array, np_2d.
Sel ect al l col umns f rom the second row of the 2-d N umPy array, np_2d. np_2d[ 1, :]
[ : , 1] , col on i s tel l i ng Python to i ncl ude al l rows and the el ements i n the second
col
When subsetti ng an N -di m N umPy array, what does the : tel l Python?
[ 1, :] , col on i s tel l i ng Python to i ncl ude al l col umns and the el ements i n the f i rst
row
I f you wanted 5,000 random val ues drawn f rom a normal di stro wi th mean = 1.75, sd
x = np.round(np.random.normal (1.75, 0.20, 5000), 2)
= 0.20, rounded to 2 deci mal s, how woul d you do i t i n N umPy?
Pytho n fo r Data Science
Study this set o nline at: https://www.cram.co m/flashcards/pytho n-fo r-data-science-
10 50 16 0 5
What f uncti on al l ows you to "gl ue together" to 1-D N umPy arrays to create a 2-D
np.col umn_stack((array 1, array 2))
N umPy array?
When readi ng csv's i nto Pandas, what keyword i s used to denote that col umn
header = N one
headers are not the f i rst row of the csv?
When readi ng csv's i nto Pandas, what keyword i s used to pass expl i ci t col umn
names =
names i nto the dataf rame?
When readi ng i n a csv i nto Pandas, we l earn that mi ssi ng val ues are coded as -1.
na_val ues = '-1'
H ow woul d you speci f y an argument to noti f y Pandas of thi s?
When readi ng i n a csv i nto Pandas, what keyword woul d be used to i nf er the date parse_dates = [ [ 0, 1, 2] ] ... noti ce that the col ref erences must be passed as a l i st
col umns i f the date col umns are 0, 1, 2? of l i sts
What method coul d be used to i nf er basi c i nf ormati on about the data possessed
df .i nf o()
i n the Pandas dataf rame, df ?
What i s an advantage of i nf erri ng the date col umns when readi ng a csv i nto T he resul tant col umn i s of data type dateti me64, whi ch i s i nval uabl e i n
Pandas? conducti ng many ti me-based cal cul ati ons
What attri bute i s used to expl i ci tl y set the i ndex val ues of the dataf rame, df ? df .i ndex =
Pytho n fo r Data Science
Study this set o nline at: https://www.cram.co m/flashcards/pytho n-fo r-data-science-
10 50 16 0 5
What attri bute i s used to expl i ci tl y set the name of the i ndex col umn of the
df .i ndex.name =
dataf rame, df ?
I f expl i ci tl y renami ng col headers when readi ng a csv i nto pandas, what keyword
headers = 0
other than names must al so be set?
When readi ng a csv i nto Pandas, whi ch keyword can be speci f i ed to i denti f y
comment =
symbol s that shoul d i ndi cate a val ue i s a comment?
H ow coul d you sel ect a col f rom a Pandas dataf rame so that i t can be saved to a
vari abl e as a N umPy array?
I n PyPl ot vi sual i z ati ons, what command must be cal l ed to make a pl ot vi si bl e? pl t.show()
H ow woul d you pl ot al l col umns i n the Pandas dataf rame, df , si mul taneousl y on
df .pl ot()
the same pl ot?
(l ess than i deal method: pl t.pl ot(df ))
pl t.show()
Pytho n fo r Data Science
Study this set o nline at: https://www.cram.co m/flashcards/pytho n-fo r-data-science-
10 50 16 0 5
What f uncti on woul d we use to set the yscal e to a l og scal e? pl t.yscal e('l og')
What method i s used to save a matpl otl i b f i gure? pl t.savef i g('aapl .j pg')
ex: save a f i gure wi th the name 'aapl .j pg' savef i g can i nf er the f i l e type f rom the suf f i x i n the f i l e name
I f pl otti ng a mul ti -col umn Pandas dataf rame, what keyword needs to be
subpl ots = T rue
speci f i ed i nsi de the .pl ot() cal l ?
I n order to create a scatterpl ot, what keyword and val ue must be speci f i ed i n the
ki nd = 'scatter'
.pl ot() cal l ?
I n order to create a boxpl ot, what keyword and val ue must be speci f i ed i n the
ki nd = 'box'
.pl ot() cal l ?
I n order to create a hi stogram, what keyword and val ue must be speci f i ed i n the
ki nd = 'hi st'
.pl ot() cal l ?
bi ns (i nt): # of i nterval s or bi ns
range (tupl e): extrema of bi ns (mi ni mum, maxi mum)
What are the di f f erent re-tool i ng opti ons f or hi stograms?
normed (bool ): whether to normal i z e to one
cumul ati ve (bool ): compute cumul ati ve di stro f uncti on (CD F )
What method wi l l gi ve the descri pti ve stati sti cs (count, mean, std, mi n, 25%,
.descri be()
50%, 75%, max) f or conti nuous vari abl es i n a Pandas dataf rame?
Pytho n fo r Data Science
Study this set o nline at: https://www.cram.co m/flashcards/pytho n-fo r-data-science-
10 50 16 0 5
What i s the pri mary di f f erence between usi ng the .l oc and the .i l oc accessor i n .l oc - uses l abel s
Pandas? .i l oc - uses i ndex posi ti ons
Pandas dataf rame i s a l abel ed 2-D array, wi th seri es f or col umns, and shari ng
H ow can a Pandas dataf rame be descri bed i n the context of Pandas seri es'?
common col umn/row l abel s
What i s the pri mary di f f erence b/w sl i ci ng wi th l abel ranges vs posi ti onal ranges? posi ti onal ranges are not i ncl usi ve, l abel ranges are i ncl usi ve
What i s the di f f erence between the two f ol l owi ng commands executed on the
Pandas dataf rame, df ?
df [ 'eggs'] - returns a seri es
df [ [ 'eggs'] ] - returns a Pandas dataf rame
df [ 'eggs']
df [ [ 'eggs'] ]
What syntax woul d be used to sel ect col umns w/al l nonz ero val ues?
df .l oc[ :, df .al l ()]
dataf rame = df
Pytho n fo r Data Science
Study this set o nline at: https://www.cram.co m/flashcards/pytho n-fo r-data-science-
10 50 16 0 5
What syntax woul d be used to sel ect col umns w/any nonz ero val ues?
df .l oc[ :, df .any()]
dataf rame = df
What syntax woul d be used to sel ect col umns w/any N aN val ues?
df .l oc[ :, df .i snul l ().any()]
dataf rame = df
What syntax woul d be used to sel ect col umns w/no N aN val ues?
df .l oc[ :, df .notnul l ().al l ()]
dataf rame = df
df .f l oordi v(12)
What f uncti ons (both Pandas and N umPy) woul d convert uni ts to whol e doz ens,
np.f l oor_di vi de(df , 12) - np vectori z ed f xn
rounded down?
.str
What i s the hel per attri bute that al l ows stri ng transf ormati ons to be made on
seri es, dataf rames, and i ndex obj ects? ex: make the i ndex al l upper case
df .i ndex.str.upper()
.unstake(l evel = )
H ow do we move one l evel of a mul ti -l evel i ndex to the col umns, i n order to make
the dataset shorter and wi der? speci f y whi ch l evel of the i ndex (by l abel or i ndex posi ti on) shoul d be moved to
the col umns i n the l evel argument.
What method woul d be used to swap the l evel s of a mul ti -l evel i ndex?
i nt i ndex posi ti ons i ndi cate that we want to swap the f i rst and second l evel s i n
the i ndex
.sort_i ndex()
Af ter swappi ng l evel s i n a mul ti -l evel i ndex, what must we do to properl y f ormat
the i ndex?
T hi s wi l l sort the new l evel s of the i ndex appropri atel y