0% found this document useful (0 votes)

18 views25 pages

UNIT I Material

The document provides lecture notes on Python for Data Science, covering key topics such as the definition of data science, datafication, exploratory data analysis (EDA), and the data science process. It emphasizes the role of a data scientist in managing data and developing models, as well as introduces NumPy for numerical computing, detailing its features and functionalities. The notes highlight the importance of EDA and the iterative nature of data science, where insights lead to further data collection and analysis.

Uploaded by

Mamatha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views25 pages

UNIT I Material

Uploaded by

Mamatha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

PYTHON FOR DATA SCIENCE

LECTURE NOTES
Data science: Definition, Datafication, Exploratory Data Analysis, The Data
science process, A data scientist role in this process.
NumPy Basics: The NumPy ndarray: A Multidimensional Array Object, Creating
ndarrays, Data Types for ndarrays, Operations between Arrays and Scalars, Basic
UNIT I
Indexing and Slicing, Boolean Indexing, Fancy Indexing, Data Processing Using
Arrays, Expressing Conditional Logic as Array Operations, Methods for Boolean
Arrays , Sorting , Unique.

Data Science:
Definition
⮚ So, what is data science? Is it new, or is it just statistics or analyticsrebranded? Is it real, or is
it pure hype? And if it’s new and if it’s real,what does that mean?
⮚ ―What is Data Science?‖ and here’s Metamarket CEO MikeDriscoll’s answer:
⮚ Data science is the civil engineering of data. Its acolytes possess apractical knowledge of tools
and materials, coupled with a theoreticalunderstanding of what’s possible.
⮚ Drew Conway’s Venn diagram of data sciencefrom 2010:

⮚
⮚ Data science is an emerging field in industry, and as yet, it is not well defined as an academic
subject.
⮚ Over the past few years, there’s been a lot of hype in the media about ―data science‖ and ―Big
Data.‖
⮚ Data Science is a blend of various tools, algorithms, and machine learning principles with the
goal to discover hidden patterns from the raw data.
⮚ Data Science is primarily used to make decisions and predictions making use of predictive
causal analytics, prescriptive analytics (predictive plus decision science) and machine
learning.

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 1

⮚
⮚ Data science is the study of data. It involves developing methods of recording, storing, and
analyzing data to effectively extract useful information. The goal of data science is to gain
insights and knowledge from any type of data — both structured and unstructured.
⮚ Data science is related to computer science, but is a separate field. Computer science involves
creating programs and algorithms to record and process data, while data science covers any
type of data analysis, which may or may not use computers.
⮚ Data science is more closely related to the mathematics field of Statistics, which includes the
collection, organization, analysis, and presentation of data.
⮚ Because of the large amounts of data modern companies and organizations maintain, data
science has become an integral part of IT.
⮚ For example, a company that has petabytes of user data may use data science to develop
effective ways to store, manage, and analyze the data.
⮚ Data science combines multiple fields, including statistics, scientific methods, artificial
intelligence (AI), and data analysis, to extract value from data.

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 2

Datafication

 We have massive amounts of data about many aspects of our lives, and, simultaneously, an
abundance of inexpensive computing power.
 Shopping, communicating, reading news, listening to music, searching for information,
expressing our opinions—all this is being tracked online
 It’s not just Internet data, though—it’s finance, the medical industry, pharmaceuticals,
bioinformatics, social welfare, government, education, retail, and the list goes on.
 It’s not only the massiveness that makes all this new data interesting (or poses challenges).
It’s that the data itself, often in real time, becomes the building blocks of data products.
 On the Internet, this means Amazon recommendation systems, friend recommendations on
Face‐
book, film and music recommendations, and so on.
 We’re witnessing the beginning of a massive, culturally saturated feedback loop where our
behavior changes the product and the product changes our behavior.
 Datafication as a process of ―taking all aspects of life andturning them into data.‖ As
examples, Twitter datafies stray thoughts. LinkedIn datafies professional networks.
 Datafication is an interesting concept and led us to consider its importance with respect to
people’s intentions about sharing their own data.
 We are being datafied, or rather our actions are, and when we ―like‖ someone or something
online, we are intending to be datafied or at least we should expect to be.
 But when we merely browse the Web, we are unintentionally, or at least passively, being
datafied through cookies.
 When we walk around in a store, or even on the street, we are being datafied in a completely
unintentional way, via sensors, cameras, or Google glasses.
 Once we datafy things, we can transform their purpose and turn the information into
new forms of value.

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 3

Exploratory Data Analysis

 Exploratory data analysis (EDA) as the first step toward building a model.
 It’s traditionally presented as a bunch of histograms and stem-and-leaf plots.
 But EDA is a critical part of the data science process.
 In EDA, there is no hypothesis and there is no model. The ―exploratory‖ aspect means that your
understanding of the problem you are solving, or might solve, is changing as you go.
 The basic tools of EDA are plots, graphs and summary statistics. Generally speaking, it’s a method of
systematically going through the data, plotting distributions of all variables (using box plots), plotting
time series of data, transforming variables, looking at all pairwise relationships between variables
using scatterplot matrices, and generating summary statistics for all of them.
 But as much as EDA is a set of tools, it’s also a mindset. And that mindset is about your relationship
with the data. You want to understand the data—gain intuition, understand the shape of it, and try to
connect your understanding of the process that generated the data to the data itself.
 EDA happens between you and the data and isn’t about proving anything to anyone else yet.


 There are important reasons anyone working with data should doEDA. Namely,
o to gain intuition about the data;
o to make comparisonsbetween distributions;
o for sanity checking to findout where data is missing or if there are outliers; and
o to summarizethe data.
 In the context of data generated from logs, EDA also helps with debugging the logging process.
 In the end, EDA helps you make sure the product is performing as intended.

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 4

The Data Science Process

1) First we have the Real World. Inside the Real World are lots of peoplebusy at various activities.
2) Specifically, we’ll start with raw data—logs, Olympics records, Enronemployee emails, or recorded
genetic material.
3) We want to process this to make it clean for analysis. So we build and use pipelines of data munging:
joining, scraping, wrangling, or whatever you want to call it. To do this we use tools such as Python,
shell scripts, R, or SQL, or all of the above.
4) Eventually we get the data down to a nice format, like something withcolumns:
name | event | year | gender | event time
5) Once we have this clean dataset, we should be doing some kind ofEDA. In the course of doing EDA,
we may realize that it isn’t actuallyclean because of duplicates, missing values, absurd outliers, and
datathat wasn’t actually logged or incorrectly logged.
6) Next, we design our model to use some algorithm like k-nearestneighbor (k-NN), linear regression,
Naive Bayes, or something else.
The model we choose depends on the type of problem we’re trying tosolve, of course, which could be
a classification problem, a predictionproblem, or a basic description problem.
7) We then can interpret, visualize, report, or communicate our results.
8) Alternatively, our goal may be to build or prototype a ―data product‖;e.g., a spam classifier, or a search
ranking algorithm, or a recommendation system.
9) NOTE: Now the key here that makes data science special anddistinct from statistics is that this data
product then gets incorporatedbackinto the real world, and users interact with that product, and that
generates more data, which creates a feedback loop.
10) This is very different from predicting the weather, say, where yourmodel doesn’t influence the
outcome at all.

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 5

11) Take this loop into account in any analysis you do by adjusting for anybiases your model caused. Your
models are not just predicting thefuture, but causing it!

A Data Scientist’s Role in This Process

 This model so far seems to suggest this will all magically happen without human intervention.
 But, someone has to make the decisions about what data to collect, and why. That person needs to
be formulating questions and hypotheses and making a plan for how the problem will be attacked.
 And that someone is the data scientist or our beloved data science team.
 It is clear that the data scientist needs to be involved in this process throughout, meaning they are
involved in the actual coding as well as in the higher-level process, as shown below.

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 6

NumPy Basics:
 NumPy, short for Numerical Python, is one of the most important foundational packages for
numerical computing in Python.
 NumPy includes:
 ndarray, an efficient multidimensional array
 Mathematical functions for fast operations on entire arrays
 Tools for reading/writing array data to disk
 Linear algebra, random number generation, and Fourier transform capabilities
 For most data analysis applications, the main areas of functionality which can be focused on
are:
• Fast vectorized array operations for data munging and cleaning, subsetting and filtering,
transformation, and any other kinds of computations
• Common array algorithms like sorting, unique, and set operations
• Efficient descriptive statistics and aggregating/summarizing data
• Data alignment and relational data manipulations for merging and joining together
heterogeneous datasets
• Expressing conditional logic as array expressions instead of loops with if-elif-else branches
• Group-wise data manipulations (aggregation, transformation, function application)
 One of the reasons NumPy is so important for numerical computations in Python is
because it is designed for efficiency on large arrays of data. There are a number of
reasons for this:
o NumPy internally stores data in a contiguous block of memory, independent of
other built-in Python objects.
o NumPy arrays also use much less memory than built-in Python sequences.
o NumPy operations perform complex computations on entire arrays without the
need for Python for loops.
 To give you an idea of the performance difference, consider a NumPy array of one million
integers, and the equivalent Python list:

import numpy as np

my_arr = np.arange(1000000)

my_list = list(range(1000000))

%time my_arr2 = my_arr * 2

Wall time: 2 ms

%time my_list2 = [x * 2 for x in my_list]

Wall time: 82.6 ms

 NumPy-based algorithms are generally 10 to 100 times faster (or more) than their pure Python
counterparts and use significantly less memory.

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 7

The NumPy ndarray: A Multidimensional Array Object

One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast,
flexible container for large datasets in Python.
Arrays enable you to perform mathematical operations on whole blocks of data.
Anndarray is a generic multidimensional container for homogeneous data; that is, all of the elements
must be the same type.
Every array has a shape, a tuple indicating the size of each dimension, and a dtype, an object
describing the data type of the array:
In [17]: data.shape
Out[17]: (2, 3)
In [18]: data.dtype
Out[18]: dtype('float64')

Creating ndarrays:
The easiest way to create an array is to use the array function. This accepts any sequence-like object
(including other arrays) and produces a new NumPy array containing the passed data.
In [19]: data1 = [6, 7.5, 8, 0, 1]
In [20]: arr1 = np.array(data1)
In [21]: arr1
Out[21]: array([ 6. , 7.5, 8. , 0. , 1. ])
Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:
In [22]: data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
In [23]: arr2 = np.array(data2)
In [24]: arr2
Out[24]:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
np.array tries to infer a good data type for the array that it creates. The data type is stored in a special
dtype metadata
object;
In [28]: arr2.dtype
Out[28]: dtype('int64')
In addition to np.array, there are a number of other functions for creating new arrays. As examples,
zeros and ones create arrays of 0s or 1s, respectively, with a given length or shape. empty creates an
array without initializing its values to any particular value. To create a higher dimensional array with
these methods, pass a tuple for the shape:
In [29]: np.zeros(10)
Out[29]: array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
In [30]: np.zeros((3, 6))
Out[30]:
array([[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.]])
In [31]: np.empty((2, 3, 2))
Out[31]:
Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 8
array([[[ 0., 0.],
[ 0., 0.],
[ 0., 0.]],
[[ 0., 0.],
[ 0., 0.],
[ 0., 0.]]])
NOTE: It’s not safe to assume that np.empty will return an array of all zeros. In some cases, it may
return uninitialized ―garbage‖ values.
arangeis an array-valued version of the built-in Python range function:
In [32]: np.arange(15)
Out[32]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

Data Types for ndarrays

The data type or dtype is a special object containing the information (or metadata, data about data)
the ndarray needs to interpret a chunk of memory as a particular type of data:
In [33]: arr1 = np.array([1, 2, 3], dtype=np.float64)
In [34]: arr2 = np.array([1, 2, 3], dtype=np.int32)
In [35]: arr1.dtype
Out[35]: dtype('float64')
In [36]: arr2.dtype
Out[36]: dtype('int32')
dtypes are a source of NumPy’s flexibility for interacting with data coming from other systems.
The numerical dtypes are named the same way:
a type name, like float or int, followed by a number indicating the number of bits per element.

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 9

You can explicitly convert or cast an array from one dtype to another using ndarray’sastype method:
In [37]: arr= np.array([1, 2, 3, 4, 5])
In [38]: arr.dtype
Out[38]: dtype('int64')
In [39]: float_arr= arr.astype(np.float64)
In [40]: float_arr.dtype
Out[40]: dtype('float64')
In this example, integers were cast to floating point.

If I cast some floating-point

numbers to be of integer dtype, the decimal part will be truncated:
In [41]: arr= np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
In [42]: arr
Out[42]: array([ 3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
In [43]: arr.astype(np.int32)
Out[43]: array([ 3, -1, -2, 0, 12, 10], dtype=int32)

If you have an array of strings representing numbers, you can use astype to convert
them to numeric form:
In [44]: numeric_strings= np.array(['1.25', '-9.6', '42'], dtype=np.string_)
In [45]: numeric_strings.astype(float)
Out[45]: array([ 1.25, -9.6 , 42. ])

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 10

If casting were to fail for some reason (like a string that cannot be converted to float64), a ValueError
will be raised.

You can also use another array’s dtype attribute:

In [46]: int_array= np.arange(10)
In [47]: calibers= np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
In [48]: int_array.astype(calibers.dtype)
Out[48]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

There are shorthand type code strings you can also use to refer to a dtype:
In [49]: empty_uint32 = np.empty(8, dtype='u4')
In [50]: empty_uint32
Out[50]:
array([ 0, 1075314688, 0, 1075707904, 0,
1075838976, 0, 1072693248], dtype=uint32)

Operations between Arrays and Scalars

Arrays are important because they enable you to express batch operations on data without writing any
for loops. NumPy users call this vectorization.
Any arithmetic operations between equal-size arrays applies the operation element-wise:
In [51]: arr= np.array([[1., 2., 3.], [4., 5., 6.]])
In [52]: arr
Out[52]:
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
In [53]: arr* arr
Out[53]:
array([[ 1., 4., 9.],
[ 16., 25., 36.]])
In [54]: arr- arr
Out[54]:
array([[ 0., 0., 0.],
[ 0., 0., 0.]])

Arithmetic operations with scalars propagate the scalar argument to each element in the array:
In [55]: 1 / arr
Out[55]:
array([[ 1. , 0.5 , 0.3333],
[ 0.25 ,0.2 ,0.1667]])
In [56]: arr** 0.5
Out[56]:
array([[ 1. , 1.4142, 1.7321],
[ 2. ,2.2361, 2.4495]])

Comparisons between arrays of the same size yield boolean arrays:

In [57]: arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 11

In [58]: arr2
Out[58]:
array([[ 0., 4., 1.],
[ 7., 2., 12.]])
In [59]: arr2 >arr
Out[59]:
array([[False, True, False],
[ True, False, True]], dtype=bool)

Basic Indexing and Slicing

NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your
data or individual elements.
One-dimensional arrays are simple; on the surface they act similarly to Python lists:
In [60]: arr = np.arange(10)
In [61]: arr
Out[61]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [62]: arr[5]
Out[62]: 5
In [63]: arr[5:8]
Out[63]: array([5, 6, 7])
In [64]: arr[5:8] = 12
In [65]: arr
Out[65]: array([ 0, 1, 2, 3, 4, 12, 12, 12, 8, 9])

As you can see, if you assign a scalar value to a slice, as in arr[5:8] = 12, the value is propagated (or
broadcasted henceforth) to the entire selection.

An important first distinction from Python’s built-in lists is that array slices are views on the
original array.
This means that the data is not copied, and any modifications to the view will be reflected in the
source array.

To give an example of this, I first create a slice of arr:

In [66]: arr_slice = arr[5:8]
In [67]: arr_slice
Out[67]: array([12, 12, 12])

Now, when I change values in arr_slice, the mutations are reflected in the original
array arr:
In [68]: arr_slice[1] = 12345
In [69]: arr
Out[69]: array([ 0, 1, 2, 3, 4, 12, 12345, 12, 8,
9])
The “bare” slice [:] will assign to all values in an array:
In [70]: arr_slice[:] = 64
In [71]: arr

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 12

Out[71]: array([ 0, 1, 2, 3, 4, 64, 64, 64, 8, 9])

NOTE: If you want a copy of a slice of anndarray instead of a view, you will need to explicitly copy
the array—for example,
arr[5:8].copy().

With higher dimensional arrays, you have many more options. In a two-dimensionalarray, the
elements at each index are no longer scalars but rather one-dimensionalarrays:
In [72]: arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
In [73]: arr2d[2]
Out[73]: array([7, 8, 9])
Thus, individual elements can be accessed recursively. But that is a bit too muchwork, so you can
pass a comma-separated list of indices to select individual elements.
So these are equivalent:
In [74]: arr2d[0][2]
Out[74]: 3
In [75]: arr2d[0, 2]
Out[75]: 3

In multidimensional arrays, if you omit later indices, the returned object will be alower dimensional
ndarray consisting of all the data along the higher dimensions.
In [76]: arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
In [77]: arr3d
Out[77]:
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
arr3d[0] is a 2 × 3 array:
In [78]: arr3d[0]

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 13

Out[78]:
array([[1, 2, 3],
[4, 5, 6]])
Both scalar values and arrays can be assigned to arr3d[0]:
In [79]: old_values= arr3d[0].copy()
In [80]: arr3d[0] = 42
In [81]: arr3d
Out[81]:
array([[[42, 42, 42],
[42, 42, 42]],
[[ 7, 8, 9],
[10, 11, 12]]])
In [82]: arr3d[0] = old_values
In [83]: arr3d
Out[83]:
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])

Similarly, arr3d[1, 0] gives you all of the values whose indices start with (1, 0),forming a 1-
dimensional array:
In [84]: arr3d[1, 0]
Out[84]: array([7, 8, 9])
This expression is the same as though we had indexed in two steps:
In [85]: x = arr3d[1]
In [86]: x
Out[86]:
array([[ 7, 8, 9],
[10, 11, 12]])
In [87]: x[0]
Out[87]: array([7, 8, 9])
Note that in all of these cases where subsections of the array have been selected, thereturned arrays
are views.

Indexing with slices

Like one-dimensional objects such as Python lists, ndarrays can be sliced with thefamiliar syntax:
In [88]: arr
Out[88]: array([ 0, 1, 2, 3, 4, 64, 64, 64, 8, 9])
In [89]: arr[1:6]
Out[89]: array([ 1, 2, 3, 4, 64])

Consider the two-dimensional array from before, arr2d. Slicing this array is a bitdifferent:
In [90]: arr2d
Out[90]:
array([[1, 2, 3],
[4, 5, 6],
Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 14
[7, 8, 9]])
In [91]: arr2d[:2]
Out[91]:
array([[1, 2, 3],
[4, 5, 6]])
As you can see, it has sliced along axis 0, the first axis. A slice, therefore, selects arange of elements
along an axis. It can be helpful to read the expression arr2d[:2] as―select the first two rows of arr2d.‖
You can pass multiple slices just like you can pass multiple indexes:
In [92]: arr2d[:2, 1:]
Out[92]:
array([[2, 3],
[5, 6]])

When slicing like this, you always obtain array views of the same number of dimensions.
By mixing integer indexes and slices, you get lower dimensional slices.For example, I can select the
second row but only the first two columns like so:
In [93]: arr2d[1, :2]
Out[93]: array([4, 5])
Similarly, I can select the third column but only the first two rows like so:
In [94]: arr2d[:2, 2]
Out[94]: array([3, 6])

See Figure 4-2 for an illustration. Note that a colon by itself means to take the entire
axis, so you can slice only higher dimensional axes by doing:
In [95]: arr2d[:, :1]

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 15

Out[95]:
array([[1],
[4],
[7]])
Of course, assigning to a slice expression assigns to the whole selection:
In [96]: arr2d[:2, 1:] = 0
In [97]: arr2d
Out[97]:
array([[1, 0, 0],
[4, 0, 0],
[7, 8, 9]])

Boolean Indexing

Let’s consider an example where we have some data in an array and an array of names with
duplicates. I’m going to use here the randnfunction in numpy.randomto generate some random
normally distributed data:
In [98]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
In [99]: data = np.random.randn(7, 4)
In [100]: names
Out[100]:
array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'],dtype='<U4')
In [101]: data
Out[101]:
array([[ 0.0929, 0.2817, 0.769 , 1.2464],
[ 1.0072, -1.2962, 0.275 , 0.2289],
[ 1.3529, 0.8864, -2.0016, -0.3718],
[ 1.669 , -0.4386, -0.5397, 0.477 ],
[ 3.2489, -1.0212, -0.5771, 0.1241],
[ 0.3026, 0.5238, 0.0009, 1.3438],
[-0.7135, -0.8312, -2.3702, -1.8608]])

Suppose each name corresponds to a row in the data array and we wanted to select all the rows with
corresponding name 'Bob'. Like arithmetic operations, comparisons (such as ==) with arrays are also
vectorized. Thus, comparing names with the string 'Bob' yields a boolean array:
In [102]: names == 'Bob'
Out[102]: array([ True, False, False, True, False, False, False], dtype=bool)

This boolean array can be passed when indexing the array:

In [103]: data[names == 'Bob']
Out[103]:
array([[ 0.0929, 0.2817, 0.769 , 1.2464],
[ 1.669 , -0.4386, -0.5397, 0.477 ]])
The boolean array must be of the same length as the array axis it’s indexing.

You can even mix and match boolean arrays with slices or integers.

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 16

In these examples, I select from the rows where names == 'Bob' and index the columns, too:
In [104]: data[names == 'Bob', 2:]
Out[104]:
array([[ 0.769 , 1.2464],
[-0.5397, 0.477 ]])
In [105]: data[names == 'Bob', 3]
Out[105]: array([ 1.2464, 0.477 ])

To select everything but 'Bob', you can either use != or negate the condition using ~:
In [106]: names != 'Bob'
Out[106]: array([False, True, True, False, True, True, True], dtype=bool)
In [107]: data[~(names == 'Bob')]
Out[107]:
array([[ 1.0072, -1.2962, 0.275 , 0.2289],
[ 1.3529, 0.8864, -2.0016, -0.3718],
[ 3.2489, -1.0212, -0.5771, 0.1241],
[ 0.3026, 0.5238, 0.0009, 1.3438],
[-0.7135, -0.8312, -2.3702, -1.8608]])

The ~ operator can be useful when you want to invert a general condition:
In [108]: cond = names == 'Bob'
In [109]: data[~cond]
Out[109]:
array([[ 1.0072, -1.2962, 0.275 , 0.2289],
[ 1.3529, 0.8864, -2.0016, -0.3718],
[ 3.2489, -1.0212, -0.5771, 0.1241],
[ 0.3026, 0.5238, 0.0009, 1.3438],
[-0.7135, -0.8312, -2.3702, -1.8608]])

Selecting two of the three names to combine multiple boolean conditions, use
boolean arithmetic operators like & (and) and | (or):
In [110]: mask = (names == 'Bob') | (names == 'Will')
In [111]: mask
Out[111]: array([ True, False, True, True, True, False, False], dtype=bool)
In [112]: data[mask]
Out[112]:
array([[ 0.0929, 0.2817, 0.769 , 1.2464],
[ 1.3529, 0.8864, -2.0016, -0.3718],
[ 1.669 , -0.4386, -0.5397, 0.477 ],
[ 3.2489, -1.0212, -0.5771, 0.1241]])

Selecting data from an array by boolean indexing always creates a copy of the data, even if the
returned array is unchanged.

NOTE: The Python keywords and andor do not work with boolean arrays. Use &(and) and | (or)
instead.

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 17

Setting values with boolean arrays works in a common-sense way. To set all of the negative values in
data to 0 we need only do:
In [113]: data[data < 0] = 0
In [114]: data
Out[114]:
array([[ 0.0929, 0.2817, 0.769 , 1.2464],
[ 1.0072, 0. , 0.275 , 0.2289],
[ 1.3529, 0.8864, 0. , 0. ],
[ 1.669 , 0. , 0. , 0.477 ],
[ 3.2489, 0. , 0. , 0.1241],
[ 0.3026, 0.5238, 0.0009, 1.3438],
[ 0. , 0. , 0. , 0. ]])

Setting whole rows or columns using a one-dimensional boolean array is also easy:
In [115]: data[names != 'Joe'] = 7
In [116]: data
Out[116]:
array([[ 7. , 7. , 7. , 7. ],
[ 1.0072, 0. , 0.275 , 0.2289],
[ 7. , 7. , 7. , 7. ],
[ 7. , 7. , 7. , 7. ],
[ 7. , 7. , 7. , 7. ],
[ 0.3026, 0.5238, 0.0009, 1.3438],
[ 0. , 0. , 0. , 0. ]])

Fancy Indexing
Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.
Suppose we had an 8 × 4 array:
In [117]: arr= np.empty((8, 4))
In [118]: for iin range(8):
.....: arr[i] = i
In [119]: arr
Out[119]:
array([[ 0., 0., 0., 0.],
[ 1., 1., 1., 1.],
[ 2., 2., 2., 2.],
[ 3., 3., 3., 3.],
[ 4., 4., 4., 4.],
[ 5., 5., 5., 5.],
[ 6., 6., 6., 6.],
[ 7., 7., 7., 7.]])

To select out a subset of the rows in a particular order, you can simply pass a list orndarray of
integers specifying the desired order:
In [120]: arr[[4, 3, 0, 6]]
Out[120]:
Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 18
array([[ 4., 4., 4., 4.],
[ 3., 3., 3., 3.],
[ 0., 0., 0., 0.],
[ 6., 6., 6., 6.]])

Using negative indices selects rows from the end:

In [121]: arr[[-3, -5, -7]]
Out[121]:

array([[ 5., 5., 5., 5.],

[ 3., 3., 3., 3.],
[ 1., 1., 1., 1.]])
Passing multiple index arrays does something slightly different; it selects a onedimensional
array of elements corresponding to each tuple of indices:
In [122]: arr= np.arange(32).reshape((8, 4))
In [123]: arr
Out[123]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]])
In [124]: arr[[1, 5, 7, 2], [0, 3, 1, 2]]
Out[124]: array([ 4, 23, 29, 10])

Here the elements (1, 0), (5, 3), (7, 1), and (2, 2) were selected. Regardless of how many dimensions
the array has (here, only 2), the result of fancy indexing is always one-dimensional.

The behavior of fancy indexing in this case is a bit different from what some users might have
expected (myself included), which is the rectangular region formed by selecting a subset of the
matrix’s rows and columns. Here is one way to get that:
In [125]: arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]
Out[125]:
array([[ 4, 7, 5, 6],
[20, 23, 21, 22],
[28, 31, 29, 30],
[ 8, 11, 9, 10]])
Keep in mind that fancy indexing, unlike slicing, always copies the data into a new array.

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 19

Transposing Arrays and Swapping Axes
Transposing is a special form of reshaping that similarly returns a view on the underlying
data without copying anything. Arrays have the transpose method and also the
special T attribute:
In [126]: arr= np.arange(15).reshape((3, 5))
In [127]: arr
Out[127]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [128]: arr.T
Out[128]:
array([[ 0, 5, 10],
[ 1, 6, 11],
[ 2, 7, 12],
[ 3, 8, 13],
[ 4, 9, 14]])

In [129]: arr= np.random.randn(6, 3)

In [130]: arr
Out[130]:
array([[-0.8608, 0.5601, -1.2659],
[ 0.1198, -1.0635, 0.3329],
[-2.3594, -0.1995, -1.542 ],
[-0.9707, -1.307 ,0.2863],
[ 0.378 ,-0.7539, 0.3313],
[ 1.3497, 0.0699, 0.2467]])

In [131]: np.dot(arr.T, arr)

Out[131]:
array([[ 9.2291, 0.9394, 4.948 ],
[ 0.9394, 3.7662, -1.3622],
[ 4.948 ,-1.3622, 4.3437]])

For higher dimensional arrays, transpose will accept a tuple of axis numbers to permute
the axes (for extra mind bending):
In [132]: arr= np.arange(16).reshape((2, 2, 4))
In [133]: arr
Out[133]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[12, 13, 14, 15]]])
In [134]: arr.transpose((1, 0, 2))
Out[134]:
array([[[ 0, 1, 2, 3],

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 20

[ 8, 9, 10, 11]],
[[ 4, 5, 6, 7],
[12, 13, 14, 15]]])

Here, the axes have been reordered with the second axis first, the first axis second,and the last axis
unchanged.
Simple transposing with .T is a special case of swapping axes. ndarray has the methodswapaxes,
which takes a pair of axis numbers and switches the indicated axes to rearrangethe data:
In [135]: arr
Out[135]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[12, 13, 14, 15]]])
In [136]: arr.swapaxes(1, 2)
Out[136]:
array([[[ 0, 4],
[ 1, 5],
[ 2, 6],
[ 3, 7]],
[[ 8, 12],
[ 9, 13],
[10, 14],
[11, 15]]])
swapaxes similarly returns a view on the data without making a copy.

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 21

Data Processing Using Arrays:
Using NumPy arrays enables you to express many kinds of data processing tasks asconcise array
expressions that might otherwise require writing loops. This practice ofreplacing explicit loops with
array expressions is commonly referred to as vectorization.
As a simple example, suppose we wished to evaluate the function sqrt(x^2 + y^2)across a regular grid of values. The
np.meshgridfunction takes two 1D arrays andproduces two 2D matrices corresponding to all pairs of (x, y) in the two
arrays:

Now, evaluating the function is a matter of writing the same expression you wouldwrite with two points:

Expressing Conditional Logic as Array Operations

The numpy.where function is a vectorized version of the ternary expression x if condition else y.
Suppose we had a boolean array and two arrays of values:
In [165]: xarr= np.array([1.1, 1.2, 1.3, 1.4, 1.5])
In [166]: yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
In [167]: cond= np.array([True, False, True, True, False])
Suppose we wanted to take a value from xarr whenever the corresponding value incond is True, and
otherwise take the value from yarr. A list comprehension doingthis might look like:
In [168]: result = [(x if c else y)
.....: for x, y, c in zip(xarr, yarr, cond)]
In [169]: result
Out[169]: [1.1000000000000001, 2.2000000000000002, 1.3, 1.3999999999999999, 2.5]

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 22

This has multiple problems. First, it will not be very fast for large arrays (because allthe work is being
done in interpreted Python code). Second, it will not work withultidimensional arrays. With np.where
you can write this very concisely:
In [170]: result = np.where(cond, xarr, yarr)
In [171]: result
Out[171]: array([ 1.1, 2.2, 1.3, 1.4, 2.5])
The second and third arguments to np.where don’t need to be arrays; one or both ofthem can be
scalars. A typical use of where in data analysis is to produce a new arrayof values based on another
array. Suppose you had a matrix of randomly generateddata and you wanted to replace all positive
values with 2 and all negative values with
–2. This is very easy to do with np.where:

In [172]: arr= np.random.randn(4, 4)

In [173]: arr
Out[173]:
array([[-0.5031, -0.6223, -0.9212, -0.7262],
[ 0.2229, 0.0513, -1.1577, 0.8167],
[ 0.4336, 1.0107, 1.8249, -0.9975],
[ 0.8506, -0.1316, 0.9124, 0.1882]])
In [174]: arr>0
Out[174]:
array([[False, False, False, False],
[ True, True, False, True],
[ True, True, True, False],
[ True, False, True, True]], dtype=bool)
In [175]: np.where(arr>0, 2, -2)
Out[175]:
array([[-2, -2, -2, -2],
[ 2, 2, -2, 2],
[ 2, 2, 2, -2],
[ 2, -2, 2, 2]])
You can combine scalars and arrays when using np.where. For example, I can replaceall positive
values in arr with the constant 2 like so:
In [176]: np.where(arr>0, 2, arr) # set only positive values to 2
Out[176]:
array([[-0.5031, -0.6223, -0.9212, -0.7262],
[ 2. ,2. , -1.1577, 2. ],
[ 2. ,2. , 2. , -0.9975],
[ 2. ,-0.1316, 2. , 2. ]])
The arrays passed to np.where can be more than just equal-sized arrays or scalars.

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 23

Methods for Boolean Arrays
There are two additional methods, any and all, useful especially for boolean arrays.
any tests whether one or more values in an array is True, while all checks if everyvalue is True:
In [192]: bools = np.array([False, False, True, False])
In [193]: bools.any()
Out[193]: True
In [194]: bools.all()
Out[194]: False
These methods also work with non-boolean arrays, where non-zero elements evaluateto True.

Sorting
Like Python’s built-in list type, NumPy arrays can be sorted in-place with the sortmethod:
In [195]: arr= np.random.randn(6)
In [196]: arr
Out[196]: array([ 0.6095, -0.4938, 1.24 , -0.1357, 1.43 , -0.8469])
In [197]: arr.sort()
In [198]: arr
Out[198]: array([-0.8469, -0.4938, -0.1357, 0.6095, 1.24 , 1.43 ])

You can sort each one-dimensional section of values in a multidimensional array inplacealong an axis
by passing the axis number to sort:
In [199]: arr= np.random.randn(5, 3)
In [200]: arr
Out[200]:
array([[ 0.6033, 1.2636, -0.2555],
[-0.4457, 0.4684, -0.9616],
[-1.8245, 0.6254, 1.0229],
[ 1.1074, 0.0909, -0.3501],
[ 0.218 ,-0.8948, -1.7415]])
In [201]: arr.sort(1)
In [202]: arr
Out[202]:
array([[-0.2555, 0.6033, 1.2636],
[-0.9616, -0.4457, 0.4684],
[-1.8245, 0.6254, 1.0229],
[-0.3501, 0.0909, 1.1074],
[-1.7415, -0.8948, 0.218 ]])

The top-level method np.sort returns a sorted copy of an array instead of modifying
the array in-place. A quick-and-dirty way to compute the quantiles of an array is to
sort it and select the value at a particular rank:
In [203]: large_arr= np.random.randn(1000)
In [204]: large_arr.sort()
In [205]: large_arr[int(0.05 * len(large_arr))] # 5% quantile
Out[205]: -1.5311513550102103

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 24

Unique
NumPy has some basic set operations for one-dimensional ndarrays. A commonly
used one is np.unique, which returns the sorted unique values in an array:
In [206]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
In [207]: np.unique(names)
Out[207]:
array(['Bob', 'Joe', 'Will'],
dtype='<U4')
In [208]: ints= np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
In [209]: np.unique(ints)
Out[209]: array([1, 2, 3, 4])
Contrast np.unique with the pure Python alternative:
In [210]: sorted(set(names))
Out[210]: ['Bob', 'Joe', 'Will']

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 25

Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
Encyclopaedia Judaica, v. 11 (Ja-Kas) PDF
100% (4)
Encyclopaedia Judaica, v. 11 (Ja-Kas) PDF
842 pages
Orientalism and Visual Culture: Imagining Mesopotamia in Nineteenth Century Europe
No ratings yet
Orientalism and Visual Culture: Imagining Mesopotamia in Nineteenth Century Europe
16 pages
Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
Riley 2017 Thesis Jesus As Kyrios
No ratings yet
Riley 2017 Thesis Jesus As Kyrios
221 pages
Akhlaqul Azam Sir Notes
No ratings yet
Akhlaqul Azam Sir Notes
8 pages
Math Grade 7 DLL Q2 W7 JAN
No ratings yet
Math Grade 7 DLL Q2 W7 JAN
4 pages
Excel7 Students Book
No ratings yet
Excel7 Students Book
168 pages
Test Units 3-4 (A) 7th Grade
No ratings yet
Test Units 3-4 (A) 7th Grade
5 pages
Bla Power Pvt. LTD: Woodward 505 Governor Valve / Actuator Calibration &test
No ratings yet
Bla Power Pvt. LTD: Woodward 505 Governor Valve / Actuator Calibration &test
23 pages
Introduction To IOS - XR 6.0: System Engineer, Global Service Providers CCIE SP #42403
No ratings yet
Introduction To IOS - XR 6.0: System Engineer, Global Service Providers CCIE SP #42403
48 pages
1
No ratings yet
1
70 pages
Walberg Theory of Educational Productivity
100% (1)
Walberg Theory of Educational Productivity
1 page
Rupee - Wikipedia
No ratings yet
Rupee - Wikipedia
37 pages
Mock Assessment Informatica - Practitioner & Specialist Level
No ratings yet
Mock Assessment Informatica - Practitioner & Specialist Level
5 pages
Ubuntu-8.10 Install Guide
No ratings yet
Ubuntu-8.10 Install Guide
21 pages
Contact For The Course: - Instructor: Dr. Kauser Ahmed P
No ratings yet
Contact For The Course: - Instructor: Dr. Kauser Ahmed P
54 pages
2.5 The Sylow Theorems
No ratings yet
2.5 The Sylow Theorems
2 pages
Data Science 1
100% (4)
Data Science 1
133 pages
Solution Manual
100% (1)
Solution Manual
30 pages
Hedy HD700 Aug V4.1
No ratings yet
Hedy HD700 Aug V4.1
160 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
Introduction To Datascience (R20DS501)
No ratings yet
Introduction To Datascience (R20DS501)
162 pages
The Banking Concept of Education
No ratings yet
The Banking Concept of Education
6 pages
A2 Assignment Sheet
No ratings yet
A2 Assignment Sheet
4 pages
Handbook Introduction of Data Science AY 23-24
No ratings yet
Handbook Introduction of Data Science AY 23-24
171 pages
6220010
No ratings yet
6220010
37 pages
Asking and Encouraging Questions
No ratings yet
Asking and Encouraging Questions
2 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Evolve 1 Unit 1 PPT Lesson 3
No ratings yet
Evolve 1 Unit 1 PPT Lesson 3
9 pages
A Biographical Timeline
No ratings yet
A Biographical Timeline
7 pages
Coldplay - Yellow: Were Came Wrote Was Took Was
No ratings yet
Coldplay - Yellow: Were Came Wrote Was Took Was
2 pages
Session 1819
No ratings yet
Session 1819
47 pages
Margin Untuk Thesis
100% (2)
Margin Untuk Thesis
7 pages
Be Your Best Self
No ratings yet
Be Your Best Self
7 pages
CUITM217-DATA-SCIENCE Data
No ratings yet
CUITM217-DATA-SCIENCE Data
48 pages
Unit I
No ratings yet
Unit I
52 pages
Exercice N°2
No ratings yet
Exercice N°2
7 pages
Data Science Intro Session-18 & 19
No ratings yet
Data Science Intro Session-18 & 19
48 pages
M1 - FDS
No ratings yet
M1 - FDS
19 pages
Datascience and Visualization
No ratings yet
Datascience and Visualization
8 pages
File
No ratings yet
File
27 pages
BI Unit 2
No ratings yet
BI Unit 2
113 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
84 pages
Unit 5 Ids
No ratings yet
Unit 5 Ids
19 pages
AD3491 - FDSA - Unit I - Introduction - Part I
100% (2)
AD3491 - FDSA - Unit I - Introduction - Part I
23 pages
Datascience Notes
No ratings yet
Datascience Notes
161 pages
Data Science With Python (MSC 3rd Sem) Unit 1
No ratings yet
Data Science With Python (MSC 3rd Sem) Unit 1
17 pages
FDS - Lecture Notes - III AIML, CSM
No ratings yet
FDS - Lecture Notes - III AIML, CSM
101 pages
FDS Unit1 Part1
No ratings yet
FDS Unit1 Part1
57 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Themes & Symbols - A Doll's House
No ratings yet
Themes & Symbols - A Doll's House
5 pages
Contrastive Linguistic - Morphology
No ratings yet
Contrastive Linguistic - Morphology
22 pages
Ds U1 chp1
No ratings yet
Ds U1 chp1
13 pages
DSF 1-2
No ratings yet
DSF 1-2
28 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Nios 302 Ch-23
No ratings yet
Nios 302 Ch-23
12 pages
CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
DAT100 - Int - Data - Ana - Lec2 - Intro II
No ratings yet
DAT100 - Int - Data - Ana - Lec2 - Intro II
39 pages
Ids Unit 1,2,3,4 & 5
No ratings yet
Ids Unit 1,2,3,4 & 5
117 pages
DS Unit 1 - ABM
No ratings yet
DS Unit 1 - ABM
103 pages
Scratch Test
100% (1)
Scratch Test
3 pages
Data Science Course Road Map
No ratings yet
Data Science Course Road Map
14 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
Data Science - Unit 1 MDM
No ratings yet
Data Science - Unit 1 MDM
64 pages
PDS Question Bank
No ratings yet
PDS Question Bank
19 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
Data Sciences in Telecommunication-Chapitre-1
No ratings yet
Data Sciences in Telecommunication-Chapitre-1
20 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
Unit 3 Ids
100% (1)
Unit 3 Ids
24 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
25 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
85 pages
Data Science Unit-1 Notes
No ratings yet
Data Science Unit-1 Notes
19 pages
IDS Lecture 1.1.1
No ratings yet
IDS Lecture 1.1.1
13 pages
Internship Report 2023-24 Data Science
100% (2)
Internship Report 2023-24 Data Science
23 pages
Fundamentals of Data Science Unit 1
No ratings yet
Fundamentals of Data Science Unit 1
33 pages
Introduction To Data Science - 23CSH-283
100% (1)
Introduction To Data Science - 23CSH-283
48 pages
Ds Lab Manual
No ratings yet
Ds Lab Manual
51 pages
Unit-5-Code Gen
No ratings yet
Unit-5-Code Gen
13 pages
r20 III I CN Lab Manual Final
No ratings yet
r20 III I CN Lab Manual Final
71 pages
FDS Notes PDF
No ratings yet
FDS Notes PDF
140 pages
Unit-1 - Introduction To Data Science
No ratings yet
Unit-1 - Introduction To Data Science
17 pages
Data Science
No ratings yet
Data Science
15 pages
Data Science Overview Basic To Advance Guide
No ratings yet
Data Science Overview Basic To Advance Guide
27 pages
Weekly Test Material-IDS
No ratings yet
Weekly Test Material-IDS
10 pages
Procedures and Displays
No ratings yet
Procedures and Displays
3 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
138 pages
Notes Unit1 Unit2
No ratings yet
Notes Unit1 Unit2
83 pages
Foundation of Data Science (BSC) 1
No ratings yet
Foundation of Data Science (BSC) 1
64 pages
DS Notes
No ratings yet
DS Notes
159 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
IDS (R22) U1 NotesRK 03092024
No ratings yet
IDS (R22) U1 NotesRK 03092024
22 pages
Bcom Python
No ratings yet
Bcom Python
71 pages
Data Science Unit 01
No ratings yet
Data Science Unit 01
19 pages
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Finding Data Patterns in the Noise: A Data Scientist's Tale
From Everand
Finding Data Patterns in the Noise: A Data Scientist's Tale
Olayinka Ugwu
No ratings yet

UNIT I Material

Uploaded by

UNIT I Material

Uploaded by

PYTHON FOR DATA SCIENCE

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 1

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 2

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 3

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 4

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 5

A Data Scientist’s Role in This Process

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 6

%time my_arr2 = my_arr * 2

%time my_list2 = [x * 2 for x in my_list]

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 7

Data Types for ndarrays

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 9

If I cast some floating-point

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 10

You can also use another array’s dtype attribute:

Operations between Arrays and Scalars

Comparisons between arrays of the same size yield boolean arrays:

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 11

Basic Indexing and Slicing

To give an example of this, I first create a slice of arr:

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 12

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 13

Indexing with slices

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 15

This boolean array can be passed when indexing the array:

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 16

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 17

Using negative indices selects rows from the end:

array([[ 5., 5., 5., 5.],

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 19

In [129]: arr= np.random.randn(6, 3)

In [131]: np.dot(arr.T, arr)

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 20

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 21

Expressing Conditional Logic as Array Operations

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 22

In [172]: arr= np.random.randn(4, 4)

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 23

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 24

Fundamentals of Data Science – UNIT 1 – Lecture Notes Page 25

You might also like