PythonForDataScience Small00center10 10501605

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Pytho n fo r Data Science

Study this set o nline at: https://www.cram.co m/flashcards/pytho n-fo r-data-science-


10 50 16 0 5

1.) can perf orm cal cul ati ons over enti re arrays
What are the advantages of a N umPy array vs a regul ar Python l i st?
2.) easy and f ast

H ow can I create a N umPy array? np.array( )

command to create a N umPy array: np.array()


What i s the i nput f or the command to create a N umPy array?
the i nput i s a regul ar python l i st

H ow are cal cul ati ons perf ormed on el ements of a N umPy array? el ement-wi se

answer: 1

H ow many di f f erent data types i s a N umPy array assumed to possess?


I f the l i st that's used as the i nput to create the N umPy array contai ns two or more
di f f erent dtypes, numpy wi l l coerce al l val ues to a si ngl e dtype.

T he same as obj ects l i ke l i sts, di cts, f l oats, stri ngs


H ow shoul d a N umPy array be vi ewed i n compari son to other Python obj ects?
T hi s means that N umPy arrays have thei r own methods

1.) Create an array of bool eans by def i ni ng the l ogi cal statement and assi gni ng i t
to an al i as
What i s an easy way to subset a N umPy array by usi ng l ogi cal operators?
2.) appl y square brackets to the array, wi th the al i as f rom step 1 i n i t

H ow do you check the Python type of an obj ect? type()


Pytho n fo r Data Science
Study this set o nline at: https://www.cram.co m/flashcards/pytho n-fo r-data-science-
10 50 16 0 5

What N umPy method wi l l tel l you the di mensi on of an N -d array? .shape

Create a l i st of l i sts; each i nsi de l i st can be vi ewed as a row i n the N -di m N umPy
H ow can you create an N -di mensi onal N umPy array?
array

Sel ect the thi rd el ement f rom the f i rst row of the 2-d N umPy array, np_2d.
What are the two ways that you can sel ect an i ndi vi dual val ue i n an N -di m N umPy
array? np_2d[ 0] [ 2]
np_2d[ 0, 2]

Sel ect the second and thi rd el ement f rom the f i rst and second row of the 2-d
np_2d[ :, 1:3]
N umPy array, np_2d.

Sel ect al l col umns f rom the second row of the 2-d N umPy array, np_2d. np_2d[ 1, :]

[ : , 1] , col on i s tel l i ng Python to i ncl ude al l rows and the el ements i n the second
col
When subsetti ng an N -di m N umPy array, what does the : tel l Python?
[ 1, :] , col on i s tel l i ng Python to i ncl ude al l col umns and the el ements i n the f i rst
row

What does el ement-wi se mean? el ement by el ement

I f you wanted 5,000 random val ues drawn f rom a normal di stro wi th mean = 1.75, sd
x = np.round(np.random.normal (1.75, 0.20, 5000), 2)
= 0.20, rounded to 2 deci mal s, how woul d you do i t i n N umPy?
Pytho n fo r Data Science
Study this set o nline at: https://www.cram.co m/flashcards/pytho n-fo r-data-science-
10 50 16 0 5

What f uncti on al l ows you to "gl ue together" to 1-D N umPy arrays to create a 2-D
np.col umn_stack((array 1, array 2))
N umPy array?

When readi ng csv's i nto Pandas, what keyword i s used to denote that col umn
header = N one
headers are not the f i rst row of the csv?

When readi ng csv's i nto Pandas, what keyword i s used to pass expl i ci t col umn
names =
names i nto the dataf rame?

When readi ng i n a csv i nto Pandas, we l earn that mi ssi ng val ues are coded as -1.
na_val ues = '-1'
H ow woul d you speci f y an argument to noti f y Pandas of thi s?

When readi ng i n a csv i nto Pandas, what keyword woul d be used to i nf er the date parse_dates = [ [ 0, 1, 2] ] ... noti ce that the col ref erences must be passed as a l i st
col umns i f the date col umns are 0, 1, 2? of l i sts

What method coul d be used to i nf er basi c i nf ormati on about the data possessed
df .i nf o()
i n the Pandas dataf rame, df ?

What i s an advantage of i nf erri ng the date col umns when readi ng a csv i nto T he resul tant col umn i s of data type dateti me64, whi ch i s i nval uabl e i n
Pandas? conducti ng many ti me-based cal cul ati ons

What attri bute i s used to expl i ci tl y set the i ndex val ues of the dataf rame, df ? df .i ndex =
Pytho n fo r Data Science
Study this set o nline at: https://www.cram.co m/flashcards/pytho n-fo r-data-science-
10 50 16 0 5

What attri bute i s used to expl i ci tl y set the name of the i ndex col umn of the
df .i ndex.name =
dataf rame, df ?

What method i s used to wri te a dataf rame, df , to csv? df .to_csv(name of f i l e.csv)

I f expl i ci tl y renami ng col headers when readi ng a csv i nto pandas, what keyword
headers = 0
other than names must al so be set?

When readi ng a csv i nto Pandas, whi ch keyword can be speci f i ed to i denti f y
comment =
symbol s that shoul d i ndi cate a val ue i s a comment?

H ow coul d you sel ect a col f rom a Pandas dataf rame so that i t can be saved to a
vari abl e as a N umPy array?

cl ose_df = df [ 'cl ose'] .val ues


ex: dataf rame = df
col = cl ose
N umPy array = cl ose_df

I n PyPl ot vi sual i z ati ons, what command must be cal l ed to make a pl ot vi si bl e? pl t.show()

1.) Convert col to N umPy array, then pl ot


2.) Convert col to Pandas seri es, then pl ot
What are three di f f opti ons to pl ot a col , cl ose, f rom a Pandas dataf rame df ?
3.) Convert col to Pandas seri es, pl ot usi ng the .pl ot method on the seri es
Whi ch opti on i s best?

Best: 3.) b/c i t has the most auto-f ormatti ng

Use the .pl ot method on the dataf rame.

H ow woul d you pl ot al l col umns i n the Pandas dataf rame, df , si mul taneousl y on
df .pl ot()
the same pl ot?
(l ess than i deal method: pl t.pl ot(df ))
pl t.show()
Pytho n fo r Data Science
Study this set o nline at: https://www.cram.co m/flashcards/pytho n-fo r-data-science-
10 50 16 0 5

What f uncti on woul d we use to set the yscal e to a l og scal e? pl t.yscal e('l og')

What method i s used to save a matpl otl i b f i gure? pl t.savef i g('aapl .j pg')

ex: save a f i gure wi th the name 'aapl .j pg' savef i g can i nf er the f i l e type f rom the suf f i x i n the f i l e name

I f pl otti ng a mul ti -col umn Pandas dataf rame, what keyword needs to be
subpl ots = T rue
speci f i ed i nsi de the .pl ot() cal l ?

I n order to create a scatterpl ot, what keyword and val ue must be speci f i ed i n the
ki nd = 'scatter'
.pl ot() cal l ?

I n order to create a boxpl ot, what keyword and val ue must be speci f i ed i n the
ki nd = 'box'
.pl ot() cal l ?

I n order to create a hi stogram, what keyword and val ue must be speci f i ed i n the
ki nd = 'hi st'
.pl ot() cal l ?

bi ns (i nt): # of i nterval s or bi ns
range (tupl e): extrema of bi ns (mi ni mum, maxi mum)
What are the di f f erent re-tool i ng opti ons f or hi stograms?
normed (bool ): whether to normal i z e to one
cumul ati ve (bool ): compute cumul ati ve di stro f uncti on (CD F )

What method wi l l gi ve the descri pti ve stati sti cs (count, mean, std, mi n, 25%,
.descri be()
50%, 75%, max) f or conti nuous vari abl es i n a Pandas dataf rame?
Pytho n fo r Data Science
Study this set o nline at: https://www.cram.co m/flashcards/pytho n-fo r-data-science-
10 50 16 0 5

What i s the pri mary di f f erence between usi ng the .l oc and the .i l oc accessor i n .l oc - uses l abel s
Pandas? .i l oc - uses i ndex posi ti ons

nested bracket syntax


When i ndexi ng, to ensure that your resul t i s a Pandas dataf rame, what syntax
shoul d you use?
ex: df [ [ 'sal t', 'eggs'] ]

A 1-D array wi th a l abel ed i ndex; l i ke a hybri d b/w a N umPy array and a


di cti onary
What i s a Pandas seri es?

Pandas dataf rame i s a l abel ed 2-D array, wi th seri es f or col umns, and shari ng
H ow can a Pandas dataf rame be descri bed i n the context of Pandas seri es'?
common col umn/row l abel s

What i s the pri mary di f f erence b/w sl i ci ng wi th l abel ranges vs posi ti onal ranges? posi ti onal ranges are not i ncl usi ve, l abel ranges are i ncl usi ve

What i s the di f f erence between the two f ol l owi ng commands executed on the
Pandas dataf rame, df ?
df [ 'eggs'] - returns a seri es
df [ [ 'eggs'] ] - returns a Pandas dataf rame
df [ 'eggs']
df [ [ 'eggs'] ]

Pl ace another col on af ter the sl i ce wi th -1


I f sl i ci ng by l abel s, how woul d you sl i ce i n reverse order?
ex: df [ 'Potter':'Perry':-1, :]

What syntax woul d be used to sel ect col umns w/al l nonz ero val ues?
df .l oc[ :, df .al l ()]
dataf rame = df
Pytho n fo r Data Science
Study this set o nline at: https://www.cram.co m/flashcards/pytho n-fo r-data-science-
10 50 16 0 5

What syntax woul d be used to sel ect col umns w/any nonz ero val ues?
df .l oc[ :, df .any()]
dataf rame = df

What syntax woul d be used to sel ect col umns w/any N aN val ues?
df .l oc[ :, df .i snul l ().any()]
dataf rame = df

What syntax woul d be used to sel ect col umns w/no N aN val ues?
df .l oc[ :, df .notnul l ().al l ()]
dataf rame = df

What syntax woul d be used to drop any rows w/ N aN val ues?


df .dropna(how = 'any')
dataf rame = df

1.) Use methods that are i nherent to dataf rames


What are the two best ways to transf orm Pandas dataf rames?
2.) Use N umPy uf uncts (uni versal f uncti ons) to transf orm col umns el ement-wi se

df .f l oordi v(12)

What f uncti ons (both Pandas and N umPy) woul d convert uni ts to whol e doz ens,
np.f l oor_di vi de(df , 12) - np vectori z ed f xn
rounded down?

df .appl y(l ambda n: n//12)

.str
What i s the hel per attri bute that al l ows stri ng transf ormati ons to be made on
seri es, dataf rames, and i ndex obj ects? ex: make the i ndex al l upper case
df .i ndex.str.upper()

N o, i nstead you must use the .map() method


I s there an appl y method f or the i ndex?
Pytho n fo r Data Science
Study this set o nline at: https://www.cram.co m/flashcards/pytho n-fo r-data-science-
10 50 16 0 5

.unstake(l evel = )
H ow do we move one l evel of a mul ti -l evel i ndex to the col umns, i n order to make
the dataset shorter and wi der? speci f y whi ch l evel of the i ndex (by l abel or i ndex posi ti on) shoul d be moved to
the col umns i n the l evel argument.

.swapl evel (0, 1)

What method woul d be used to swap the l evel s of a mul ti -l evel i ndex?
i nt i ndex posi ti ons i ndi cate that we want to swap the f i rst and second l evel s i n
the i ndex

.sort_i ndex()
Af ter swappi ng l evel s i n a mul ti -l evel i ndex, what must we do to properl y f ormat
the i ndex?
T hi s wi l l sort the new l evel s of the i ndex appropri atel y

What probl ems does an unsorted i ndex cause? sl i ci ng f ai l ures

You might also like