0% found this document useful (0 votes)

41 views36 pages

BTech 5 CSE Data Analytics With Python Unit 2 and 3 Notes

The document provides an introduction to data analysis, outlining key knowledge domains such as data cleaning, exploratory data analysis, statistics, and machine learning. It also discusses the importance of ethics and privacy in data management, highlighting regulations like GDPR and HIPAA. Additionally, it differentiates between quantitative and qualitative data, providing examples and applications in Python for both types.

Uploaded by

yeeshandas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views36 pages

BTech 5 CSE Data Analytics With Python Unit 2 and 3 Notes

Uploaded by

yeeshandas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

SRGI, BHILAI

Unit 2
An Introduction to Data Analysis
Knowledge Domains of the Data Analysis
A knowledge domain refers to a specialized area of expertise within a
broader discipline. It encompasses a defined body of knowledge, including
theories, principles, methodologies, and best practices that are essential for
proficiency in that area. Knowledge domains serve as frameworks for
professionals to structure their learning, problem-solving approaches, and
application of skills.
In the field of data analysis, there are several important knowledge domains that
provide a foundation for understanding, analyzing, and deriving insights from
data. These include:
1. Data Cleaning and Preprocessing
2. Exploratory Data Analysis (EDA)
3. Data Visualization and Communication
4. Statistics
5. Machine Learning
6. Data Mining
7. Programming and Scripting
8. Big Data Technologies
9. Data Management
10.Ethics and Privacy in DATA
1. Data Cleaning and Preprocessing: Data cleaning and preprocessing
involves preparing data for analysis by ensuring its quality and usability. It
starts with data collection, gathering accurate and relevant data from
various sources. Next, data cleaning addresses missing values, outliers, and
errors. Data transformation follows, where tasks like normalization,
scaling, and encoding categorical variables are applied. Finally, data
reduction techniques, such as PCA, reduce dimensionality, simplifying the
dataset while preserving essential information for analysis.
2. Exploratory Data Analysis (EDA): Data visualization, summarization,
and feature engineering are key steps in data analysis. Data visualization
uses plots like scatter plots, histograms, and bar charts to reveal patterns
and trends in the data. Summarization identifies key relationships and
correlations, enabling better understanding. Feature engineering creates
new features to enhance model performance and capture important data
patterns.
SRGI, BHILAI

3. Data Visualization and Communication: Mastery of visualization tools

like Matplotlib, Seaborn, Power BI, and Tableau is crucial for presenting
data insights clearly. Effective storytelling with data translates complex
information into compelling narratives for decision-makers. Additionally,
dashboarding allows for the creation of interactive, real-time dashboards,
enabling continuous monitoring and up-to-date insights for informed
decision-making.
4. Statistics: Descriptive statistics provides a summary of data through key
metrics such as mean, median, mode, and standard deviation, offering
insights into its central tendency and variability. In contrast, inferential
statistics enables predictions and inferences about a population based on
sample data, utilizing methods like hypothesis testing and confidence
intervals. Additionally, probability theory addresses the likelihood of
events, forming a foundational understanding of distributions, risks, and
patterns within data, thereby enhancing the analytical framework for data-
driven decision-making.
5. Machine Learning: Supervised learning involves models that utilize
labeled data to make predictions, employing algorithms such as regression
and classification techniques like decision trees and support vector
machines (SVM). Whereas, unsupervised learning identifies patterns in
data without labels, using methods like clustering and dimensionality
reduction to uncover hidden structures. Reinforcement learning, on the
other hand, focuses on learning optimal actions based on feedback or
rewards in dynamic environments, enabling agents to improve their
decision-making over time.
6. DATA Mining: Data mining involves techniques for extracting insights
from large datasets, including pattern recognition, which identifies and
extracts meaningful trends. Association rules, such as market basket
analysis, uncover relationships between data points, revealing item
purchase patterns. Additionally, clustering groups similar items based on
attributes using methods like K-means and DBSCAN, facilitating a better
understanding of the dataset's structure and aiding in informed decision-
making.
7. Programming and Scripting: Programming and scripting are vital in data
analysis, utilizing various languages and libraries. Python is prominent,
featuring libraries like Pandas for data manipulation, NumPy for numerical
computations, Scikit-learn for machine learning, and TensorFlow for deep
learning. R is popular for statistical analysis and machine learning due to
its extensive packages. SQL is essential for querying relational databases
SRGI, BHILAI

and manipulating data, while specialized environments like SAS and

Matlab offer advanced tools for data manipulation and statistical modeling.
8. Big Data Technologies: Big Data technologies are important for managing
and processing large datasets effectively. Hadoop and Spark are key
distributed computing frameworks that enable parallel processing and
efficient data analysis. NoSQL databases like MongoDB and Cassandra
offer flexible, non-relational data storage solutions. Additionally, cloud
computing platforms such as AWS, Google Cloud, and Azure provide
scalable resources for data processing and analytics, enhancing the
deployment and management of big data applications.
9. Data Management: Database management involves various practices
essential for organizing and analyzing large sets of structured data. Data
warehousing focuses on the efficient organization and storage of this data,
enabling comprehensive analysis. The ETL process, which stands for
Extract, Transform, Load, prepares data from various sources for analysis,
ensuring it is clean and structured appropriately. Understanding the
distinctions between SQL and NoSQL databases is also necessary, as SQL
databases are relational and structured, while NoSQL databases offer more
flexible, non-relational storage options.
10.Ethics and Privacy in DATA: Ethics and privacy in data are critical
considerations in data management. Data governance encompasses the
policies, procedures, and controls necessary to maintain data quality and
security. Adhering to data privacy regulations like GDPR and HIPAA is
essential to ensure compliance and protect individuals' personal
information. Additionally, recognizing and mitigating bias in data analysis
and machine learning models is crucial for promoting fairness and
accuracy in decision-making processes.
GDPR (General Data Protection Regulation) and HIPAA (Health Insurance
Portability and Accountability Act) are two significant regulations focused on
data protection and privacy, but they apply to different contexts and types of
information.

GDPR (General Data Protection Regulation)

- Purpose: GDPR is a comprehensive data protection law in the European Union
(EU) that governs how personal data of EU citizens is collected, processed, and
stored.
SRGI, BHILAI

Key Features:
- Data Protection Rights: It grants individuals rights such as the right to access
their data, the right to be forgotten, and the right to data portability.
- Consent: Organizations must obtain clear consent from individuals before
processing their personal data.
- Accountability: Businesses are required to implement appropriate technical
and organizational measures to protect personal data and report data breaches
within 72 hours.
- Fines: Non-compliance can lead to significant penalties, including fines of up
to 4% of annual global revenue or €20 million, whichever is higher.

HIPAA (Health Insurance Portability and Accountability Act)

- Purpose: HIPAA is a U.S. regulation that sets standards for the protection of
sensitive patient health information.
Key Features:
- Protected Health Information (PHI): HIPAA defines and safeguards PHI,
which includes any individually identifiable health information held by covered
entities.
- Privacy and Security Rules: It mandates the confidentiality, integrity, and
availability of PHI, requiring healthcare providers and related entities to
implement security measures.
- Patient Rights: Individuals have the right to access their medical records,
request corrections, and receive notifications of breaches affecting their health
information.
- Penalties: Violations can result in civil and criminal penalties, including fines
and, in severe cases, imprisonment.

In summary, while both GDPR and HIPAA aim to protect personal data, GDPR
focuses on the privacy rights of individuals in the EU regarding all types of
personal data, whereas HIPAA specifically addresses the privacy and security of
health information in the United States.
SRGI, BHILAI

Quantitative Data
Quantitative data refers to numerical information that can be measured or
counted. This type of data is used for statistical analysis and often involves
operations like addition, subtraction, or averaging.
• Types:
o Discrete Data: Countable values (e.g., number of students).
o Continuous Data: Measurable values within a range (e.g., height,
weight).
Example in Python:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Creating a DataFrame with quantitative data
data = {
'Student_ID': [1, 2, 3, 4, 5],
'Math_Score': [78, 85, 92, 88, 76],
'Science_Score': [72, 89, 95, 84, 80],
'Study_Hours': [3.5, 4.0, 5.0, 4.5, 3.0]
}
df = pd.DataFrame(data)
# Statistical Analysis
print("Summary Statistics:")
print(df[['Math_Score', 'Science_Score', 'Study_Hours']].describe())
# Plotting the data
plt.scatter(df['Study_Hours'], df['Math_Score'])
plt.title("Study Hours vs Math Score")
plt.xlabel("Study Hours")
plt.ylabel("Math Score")
SRGI, BHILAI

plt.show()
Output:
• The summary statistics provide the mean, median, standard deviation, etc.
• The scatter plot visualizes the relationship between study hours and math
scores.

Qualitative Data
Qualitative data refers to non-numerical information that describes qualities or
characteristics. This type of data is often categorical and used for classification or
grouping.
• Types:
o Nominal Data: Categories without an order (e.g., colors, gender).
o Ordinal Data: Categories with a meaningful order (e.g., satisfaction
levels).
Example in Python:
1. Customer Feedback in a Shopping App
• Data:
o "Satisfied," "Neutral," "Dissatisfied," "Very Satisfied."
• Type: Ordinal Data (since there is an order of satisfaction levels).
Python Example:
import pandas as pd
data = {'Customer_ID': [101, 102, 103, 104],
'Feedback': ['Satisfied', 'Neutral', 'Dissatisfied', 'Very Satisfied']}
df = pd.DataFrame(data)
print(df)

2. Product Categories
• Data:
SRGI, BHILAI

o "Electronics," "Clothing," "Groceries," "Books."

• Type: Nominal Data (no inherent order among categories).
Python Example:
data = {'Product_ID': [1, 2, 3, 4],
'Category': ['Electronics', 'Clothing', 'Groceries', 'Books']}
df = pd.DataFrame(data)
print(df)

3. Employee Roles
• Data:
o "Manager," "Engineer," "Analyst," "Technician."
• Type: Nominal Data.
Python Example:
data = {'Employee_ID': [1, 2, 3, 4],
'Role': ['Manager', 'Engineer', 'Analyst', 'Technician']}
df = pd.DataFrame(data)
print(df)

4. Movie Genres
• Data:
o "Action," "Comedy," "Drama," "Horror," "Sci-Fi."
• Type: Nominal Data.
Python Example:
data = {'Movie_ID': [101, 102, 103, 104],
'Genre': ['Action', 'Comedy', 'Drama', 'Horror']}
df = pd.DataFrame(data)
print(df)
SRGI, BHILAI

5. Education Levels
• Data:
o "High School," "Bachelor's," "Master's," "PhD."
• Type: Ordinal Data (since education levels follow a meaningful order).
Python Example:
data = {'Person_ID': [1, 2, 3, 4],
'Education_Level': ["High School", "Bachelor's", "Master's", "PhD"]}
df = pd.DataFrame(data)
print(df)

6. Car Colors
• Data:
o "Red," "Blue," "Black," "White," "Green."
• Type: Nominal Data.
Python Example:
data = {'Car_ID': [1001, 1002, 1003, 1004],
'Color': ['Red', 'Blue', 'Black', 'White']}
df = pd.DataFrame(data)
print(df)

7. Survey Responses
• Data:
o "Yes," "No," "Maybe."
• Type: Nominal Data.
Python Example:
data = {'Respondent_ID': [1, 2, 3],
SRGI, BHILAI

'Response': ['Yes', 'No', 'Maybe']}

df = pd.DataFrame(data)
print(df)

8. Marital Status
• Data:
o "Single," "Married," "Divorced," "Widowed."
• Type: Nominal Data.
Python Example:
data = {'Person_ID': [1, 2, 3, 4],
'Marital_Status': ['Single', 'Married', 'Divorced', 'Widowed']}
df = pd.DataFrame(data)
print(df)

9. Weather Descriptions
• Data:
o "Sunny," "Cloudy," "Rainy," "Windy."
• Type: Nominal Data.
Python Example:
data = {'Day': ['Monday', 'Tuesday', 'Wednesday', 'Thursday'],
'Weather': ['Sunny', 'Cloudy', 'Rainy', 'Windy']}
df = pd.DataFrame(data)
print(df)

10. Social Media Sentiments

• Data:
o "Positive," "Negative," "Neutral."
SRGI, BHILAI

• Type: Ordinal Data (can also be nominal if no order is implied).

Python Example:
data = {'Post_ID': [1, 2, 3],
'Sentiment': ['Positive', 'Negative', 'Neutral']}
df = pd.DataFrame(data)
print(df)
Key Differences
Aspect Quantitative Data Qualitative Data

Nature Numerical (e.g., age, salary) Non-numerical (e.g., gender, color)

Analysis Type Statistical and mathematical analysis Classification and grouping

Representation Numbers Text or categories

Visualization Line charts, scatter plots, histograms Bar charts, pie charts

Applications in Python:
1. Quantitative: Calculating trends, correlation, regression analysis.

2. Qualitative: Sentiment analysis, clustering, decision tree classification.

Both types of data are often used together to provide a comprehensive analysis in data science
projects.

Unit 3
An array object represents a multidimensional, homogeneous array of
fixed-size items. An associated data-type object describes the format of each
element in the array (its byte-order, how many bytes it occupies in memory,
whether it is an integer, a floating-point number, or something else, etc.)
SRGI, BHILAI

Arrays should be constructed using array, zeros or empty (refer to the See
Also section below). The parameters given here refer to a low-level method
(ndarray(…)) for instantiating an array.

Indexing on ndarray
ndarrays can be indexed using the standard Python x[obj] syntax,
where x is the array and obj the selection. There are different kinds of indexing
available depending on obj: basic indexing, advanced indexing and field access.
Most of the following examples show the use of indexing when referencing data
in an array. The examples work just as well when assigning to an array.
Note that in Python, x[(exp1, exp2, ..., expN)] is equivalent
to x[exp1, exp2, ..., expN]; the latter is just syntactic sugar for the former.
Basic indexing
Single element indexing
Single element indexing works exactly like that for other standard Python
sequences. It is 0-based, and accepts negative indices for indexing from the end
of the array.
>>> x = np.arange(10)
>>> x[2]
2
>>> x[-2]
8
It is not necessary to separate each dimension’s index into its own set of square
brackets.
>>> x.shape = (2, 5) # now x is 2-dimensional
>>> x[1, 3]
8
>>> x[1, -1]
9
SRGI, BHILAI

Note that if one indexes a multidimensional array with fewer indices than
dimensions, one gets a sub dimensional array. For example:
>>> x[0]
array ([0, 1, 2, 3, 4])
That is, each index specified selects the array corresponding to the rest of
the dimensions selected. In the above example, choosing 0 means that the
remaining dimension of length 5 is being left unspecified, and that what is
returned is an array of that dimensionality and size. It must be noted that the
returned array is a view, i.e., it is not a copy of the original, but points to the same
values in memory as does the original array. In this case, the 1-D array at the first
position (0) is returned. So using a single index on the returned array, results in a
single element being returned. That is:
>>> x[0][2]
2
So note that x[0, 2] == x[0][2] though the second case is more inefficient as a new
temporary array is created after the first index that is subsequently indexed by 2.

Slicing and striding

Basic slicing extends Python’s basic concept of slicing to N dimensions. Basic
slicing occurs when obj is a slice object (constructed by start:stop:step notation
inside of brackets), an integer, or a tuple of slice objects and integers. Ellipsis and
newaxis objects can be interspersed with these as well.
The simplest case of indexing with N integers returns an array scalar representing
the corresponding item. As in Python, all indices are zero-based: for the i-th
index ni, the valid range is 0≤ni<di where di is the i-th element of the shape of
the array. Negative indices are interpreted as counting from the end of the array
(i.e., if ni<0, it means ni+di).
All arrays generated by basic slicing are always views of the original array.
The standard rules of sequence slicing apply to basic slicing on a per-dimension
basis (including using a step index). Some useful concepts to remember include:
• The basic slice syntax is i:j:k where i is the starting index, j is the stopping index,
and k is the step (k≠0). This selects the m elements (in the corresponding
dimension) with index values i, i + k, …, i + (m - 1)
SRGI, BHILAI

k where m=q+(r≠0) and q and r are the quotient and remainder obtained by
dividing j - i by k: j - i = q k + r, so that i + (m - 1) k < j. For example:
• >>> x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
• >>> x [1:7:2]
• array ([1, 3, 5])
• Negative i and j are interpreted as n + i and n + j where n is the number of
elements in the corresponding dimension. Negative k makes stepping go towards
smaller indices. From the above example:
• >>> x [-2:10]
• array ([8, 9])
• >>> x [-3:3:-1]
• array ([7, 6, 5, 4])
• Assume n is the number of elements in the dimension being sliced. Then, if i is
not given it defaults to 0 for k > 0 and n - 1 for k < 0 . If j is not given it defaults
to n for k > 0 and -n-1 for k < 0 . If k is not given it defaults to 1. Note that :: is
the same as : and means select all indices along this axis. From the above
example:
• >>> x [5:]
• array ([5, 6, 7, 8, 9])
• If the number of objects in the selection tuple is less than N, then : is assumed for
any subsequent dimensions. For example:
• >>> x = np.array([[[1],[2],[3]], [[4],[5],[6]]])
• >>> x.shape
• (2, 3, 1)
• >>> x [1:2]
• array ([[[4],
• [5],
• [6]]])
• An integer, i, returns the same values as i:i+1 except the dimensionality of the
returned object is reduced by 1. In particular, a selection tuple with the p-th
SRGI, BHILAI

element an integer (and all other entries :) returns the corresponding sub-array
with dimension N - 1. If N = 1 then the returned object is an array scalar.
• If the selection tuple has all entries: except the p-th entry which is a slice
object i:j:k, then the returned array has dimension N formed by stacking, along
the p-th axis, the sub-arrays returned by integer indexing of elements i, i+k, …, i
+ (m - 1) k < j.
• Basic slicing with more than one non-: entry in the slicing tuple, acts like repeated
application of slicing using a single non-: entry, where the non-: entries are
successively taken (with all other non-: entries replaced by :).
Thus, x[ind1, ..., ind2,:] acts like x[ind1][..., ind2, :] under basic slicing.

Array Concatenation
import numpy as np
# Creating two arrays
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
# Concatenating along axis 0 (row-wise)
concatenated = np.concatenate((arr1, arr2), axis=0) print(concatenated)
# Concatenating along axis 1 (column-wise) concatenated_col =
np.concatenate((arr1, arr2), axis=1) print(concatenated_col)
Output:
# Concatenation along axis 0
[[1 2]
[3 4]
[5 6]
[7 8]]
# Concatenation along axis 1
[[1 2 5 6]
[3 4 7 8]]

Splitting Array
# Creating an array
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
# Splitting into 2 arrays along axis 1 (column-wise)
split_arr = np.hsplit(arr, 2)
print(split_arr)
SRGI, BHILAI

# Splitting into 2 arrays along axis 0 (row-wise)

split_arr_row = np.vsplit(arr, 2)
print(split_arr_row)
Output:
# Splitting column-wise
[array([[1, 2],
[5, 6]]), array([[3, 4],
[7, 8]])]
# Splitting row-wise
[array([[1, 2, 3, 4]]), array([[5, 6, 7, 8]])]

Shape manipulation
In Python's NumPy library, shape manipulation allows you to change the structure
of arrays without changing the data they contain. Common shape manipulation
functions include reshaping, flattening, transposing, expanding, and squeezing
arrays.
common shape manipulation methods in NumPy:
1. Reshape
• Changes the shape of an array to a specified new shape, provided the total
number of elements remains the same.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped_arr = arr.reshape(2, 3)
print(reshaped_arr)
Output:
[[1 2 3]
[4 5 6]]
2. Flatten
• Converts a multi-dimensional array into a 1D array.
arr = np.array([[1, 2, 3], [4, 5, 6]])
flattened_arr = arr.flatten()
SRGI, BHILAI

print(flattened_arr)
Output:
[1 2 3 4 5 6]
3. Transpose
• Reverses or permutes the axes of an array, commonly used for matrices.
arr = np.array([[1, 2, 3], [4, 5, 6]])
transposed_arr = arr.T
print(transposed_arr)
Output:
[[1 4]
[2 5]
[3 6]]
4. Expand Dimensions
• Adds an extra dimension to an array, useful for aligning shapes for
operations like broadcasting.
arr = np.array([1, 2, 3])
expanded_arr = np.expand_dims(arr, axis=0)
print(expanded_arr)
print("Shape:", expanded_arr.shape)
Output:
[[1 2 3]]
Shape: (1, 3)
5. Squeeze
• Removes single-dimensional entries from the shape of an array, often used
to simplify results.
arr = np.array([[[1, 2, 3]]])
squeezed_arr = np.squeeze(arr)
SRGI, BHILAI

print(squeezed_arr)
print("Shape:", squeezed_arr.shape)
Output:
[1 2 3]
Shape: (3,)

Array Manipulations:
Array manipulation in Python, especially with NumPy, allows for powerful
operations like adding, removing, splitting, and modifying elements.
Some common array manipulation techniques:
1. Appending Elements
• Use np.append() to add elements to an array. It returns a new array with the
appended values.
import numpy as np
arr = np.array([1, 2, 3])
appended_arr = np.append(arr, [4, 5, 6])
print(appended_arr)
Output:
[1 2 3 4 5 6]
2. Inserting Elements
• Use np.insert() to insert values at a specific index.
arr = np.array([1, 2, 3])
inserted_arr = np.insert(arr, 1, [9, 10])
print(inserted_arr)
Output:
[ 1 9 10 2 3]
Here, [9, 10] is inserted starting at index 1
SRGI, BHILAI

3. Deleting Elements
• Use np.delete() to remove elements at specific indices.
arr = np.array([1, 2, 3, 4, 5])
deleted_arr = np.delete(arr, [1, 3]) # Remove elements at indices 1 and 3
print(deleted_arr)
Output:
[1 3 5]
4. Concatenating Arrays
• Combine arrays along an existing axis using np.concatenate().
arr1 = np.array([1, 2])
arr2 = np.array([3, 4])
concatenated_arr = np.concatenate((arr1, arr2))
print(concatenated_arr)
Output:
[1 2 3 4]
5. Splitting Arrays
• Use np.split() to split an array into multiple sub-arrays.
arr = np.array([1, 2, 3, 4, 5, 6])
split_arr = np.split(arr, 3) # Split into 3 equal parts
print(split_arr)
Output:
[array([1, 2]), array([3, 4]), array([5, 6])]

6. Reshaping Arrays
• reshape() changes the shape of an array without modifying the data.
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped_arr = arr.reshape(2, 3)
SRGI, BHILAI

print(reshaped_arr)
Output:
[[1 2 3]
[4 5 6]]
7. Flattening Arrays
• Convert a multi-dimensional array into a 1D array with flatten().
arr = np.array([[1, 2], [3, 4]])
flattened_arr = arr.flatten()
print(flattened_arr)
Output:
[1 2 3 4]
8. Stacking Arrays
• Stack arrays along a new axis using np.vstack() for vertical stacking or
np.hstack() for horizontal stacking.
arr1 = np.array([1, 2])
arr2 = np.array([3, 4])
vstacked_arr = np.vstack((arr1, arr2))
hstacked_arr = np.hstack((arr1, arr2))
print("Vertical Stack:\n", vstacked_arr)
print("Horizontal Stack:\n", hstacked_arr)

Output:
Vertical Stack:
[[1 2]
[3 4]]
SRGI, BHILAI

Horizontal Stack:
[1 2 3 4]
9. Reversing an Array
• Reverse an array with slicing or by using np.flip().
arr = np.array([1, 2, 3, 4, 5])
reversed_arr = np.flip(arr)
print(reversed_arr)
Output:
[5 4 3 2 1]
10. Repeating Elements
• Use np.repeat() to repeat each element a specified number of times.
arr = np.array([1, 2, 3])
repeated_arr = np.repeat(arr, 2)
print(repeated_arr)
Output:
[1 1 2 2 3 3]

Vectorization:
In Python, vectorization refers to performing operations on entire arrays rather
than individual elements, allowing for faster execution, especially with large
SRGI, BHILAI

datasets. Libraries like NumPy provide tools to make operations on entire arrays
faster and more memory-efficient.
1. Adding Two Arrays
Let's add two arrays element-wise using vectorization.
import numpy as np
# Creating two arrays
array1 = np.array([1, 2, 3, 4])
array2 = np.array([10, 20, 30, 40])
# Adding arrays using vectorized operation
result = array1 + array2
print(result)
Output:
[11 22 33 44]
2. Scalar Operations on Arrays
Performing a scalar operation on each element in an array without a loop.
# Multiply each element in the array by 5
array = np.array([1, 2, 3, 4, 5])
result = array * 5
print(result)
Output:
[ 5 10 15 20 25]
3. Element-wise Multiplication
In this example, we'll multiply two arrays element by element.
array1 = np.array([1, 2, 3, 4])
array2 = np.array([5, 6, 7, 8])
# Element-wise multiplication
result = array1 * array2
SRGI, BHILAI

print(result)
Output:
[ 5 12 21 32]
4. Using Mathematical Functions on Arrays
Vectorized operations can be applied using mathematical functions on entire
arrays.
# Creating an array
array = np.array([0, np.pi / 2, np.pi, 3 * np.pi / 2])
# Applying the sine function to each element
result = np.sin(array)
print(result)
Output:
[ 0.0000000e+00 1.0000000e+00 1.2246468e-16 -1.0000000e+00]
5. Boolean Indexing
Vectorization also allows for conditional operations on arrays.
# Creating an array
array = np.array([1, 2, 3, 4, 5])
# Get elements greater than 3
result = array[array > 3]
print(result)
Output:
[4 5]

6. Dot Product of Vectors

The dot product of vectors is a common operation that can be vectorized in
Python.
SRGI, BHILAI

array1 = np.array([1, 2, 3])

array2 = np.array([4, 5, 6])
# Dot product
result = np.dot(array1, array2)
print(result)
Output:
32
7. Matrix Multiplication
Matrix multiplication can also be efficiently performed through vectorization.
# Creating two 2x2 matrices
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])
# Matrix multiplication
result = np.dot(matrix1, matrix2)
print(result)
Output:
[[19 22]
[43 50]]

Broadcasting
SRGI, BHILAI

In NumPy, broadcasting refers to the ability of NumPy to perform

element-wise operations on arrays of different shapes in a way that avoids making
unnecessary copies of data. Broadcasting allows NumPy to "stretch" smaller
arrays along specific dimensions so that they are compatible with larger arrays
when performing operations like addition, multiplication, and comparison.
Instead of forcing arrays to have the same shape by manually reshaping or
duplicating data, broadcasting automatically adjusts the shapes of the arrays so
that operations can be performed efficiently.
Broadcasting Rules
For broadcasting to work, NumPy follows specific rules to determine how arrays
of different shapes are treated:
1. Rule 1: If the arrays differ in the number of dimensions, prepend ones to
the shape of the smaller array until both arrays have the same number of
dimensions.
2. Rule 2: If the size of the dimensions of the arrays match or one of the
dimensions is 1, the arrays are compatible in that dimension and can be
broadcasted.
3. Rule 3: If the arrays are not compatible based on the above two rules,
broadcasting will fail, resulting in a ValueError.

Example 1: Broadcasting a Scalar with an Array

Broadcasting is simplest when a scalar is involved. NumPy broadcasts the scalar
across the entire array.
import numpy as np
arr = np.array([1, 2, 3, 4])
scalar = 10
# Broadcasting the scalar to each element in the array
result = arr + scalar
print(result)

Output:
SRGI, BHILAI

[11 12 13 14]
In this example, the scalar 10 is treated as if it were an array of the same shape as
arr ([10, 10, 10, 10]), and the addition is applied element-wise.
Example 2: Broadcasting a 1D Array to a 2D Array
Broadcasting also allows you to apply operations between arrays of different
dimensions. Let’s take a 2D array and a 1D array.
import numpy as np
matrix = np.array([[1, 2, 3],
[4, 5, 6]])
vector = np.array([10, 20, 30])
# Broadcasting the 1D array 'vector' across each row of the 2D array 'matrix'
result = matrix + vector
print(result)
Output:
[[11 22 33]
[14 25 36]]
Here, the shape of matrix is (2, 3), and the shape of vector is (3). Since the number
of columns (3) matches, vector is broadcasted to each row of matrix. NumPy
treats the vector as if it had shape (1, 3) and replicates it to match the (2, 3) shape
of matrix.
Example 3: Broadcasting with Arrays of Different Shapes
In this case, let’s broadcast arrays of different dimensions.
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
b = np.array([[10],
[20],
[30]])

# Broadcasting the column vector across each row

SRGI, BHILAI

result = a + b
print(result)
Output:
[[11 12 13]
[24 25 26]
[37 38 39]]
Here:
• a has shape (3, 3).
• b has shape (3, 1) (a column vector).
NumPy broadcasts b to match the shape of a by "stretching" b along the columns,
treating b as if it were:
[[10, 10, 10],
[20, 20, 20],
[30, 30, 30]]
Then, the element-wise addition is performed.
Example 4: Broadcasting Across Multiple Dimensions
Let’s look at an example with higher-dimensional arrays.
import numpy as np
# 3D array of shape (2, 3, 4)
a = np.array([[[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]],

[[13, 14, 15, 16],

[17, 18, 19, 20],
[21, 22, 23, 24]]])
# 1D array of shape (4)
b = np.array([1, 2, 3, 4])
# Broadcasting 'b' across the last dimension of 'a'
SRGI, BHILAI

result = a + b
print(result)
Output:
[[[ 2 4 6 8]
[ 6 8 10 12]
[10 12 14 16]]

[[14 16 18 20]
[18 20 22 24]
[22 24 26 28]]]
Here:
• a has shape (2, 3, 4) (a 3D array),
• b has shape (4) (a 1D array).
The array b is broadcast across the last dimension of a so that it applies to each
subarray of shape (3, 4) in a.
Example 5: Incompatible Shapes (Broadcasting Failure)
Broadcasting will fail if the shapes of the arrays are not compatible under the
broadcasting rules.
import numpy as np
a = np.array([[1, 2],
[3, 4]])
b = np.array([1, 2, 3])
# This will raise a ValueError because the shapes are incompatible
result = a + b
Error:
ValueError: operands could not be broadcast together with shapes (2,2) (3,)
In this case:
SRGI, BHILAI

• a has shape (2, 2).

• b has shape (3).
These shapes are not compatible because the number of columns in a (which is
2) does not match the size of b (which is 3), and neither can be broadcasted to fit
the other.
Example 6: Broadcasting with Uneven Shapes
Broadcasting can also handle cases where only one of the arrays has a dimension
of size 1, enabling "stretching."
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6]])
b = np.array([[1],
[2]])
# Broadcasting 'b' to match the shape of 'a'
result = a * b
print(result)
Output:
[[ 1 2 3]
[ 8 10 12]]
Here:
• a has shape (2, 3).
• b has shape (2, 1).
b is broadcasted along the second dimension, effectively treating it as:
[[1, 1, 1],
[2, 2, 2]]
The multiplication is then performed element-wise.

Structured Arrays
SRGI, BHILAI

A structured array in NumPy is a specialized ndarray that allows for the

storage and manipulation of complex, heterogeneous data, where each element of
the array is a collection of fields, each with its own name and data type. This
enables users to represent tabular or record-based data (e.g., rows of a database
table) with a more structured format than a typical homogeneous ndarray.
Structured arrays in NumPy allow you to store arrays of complex data types
with named fields, like a table in a database. Each field can have a different data
type, allowing you to represent data more clearly and efficiently.
Features of a Structured Array:
1. Named Fields: Each element in the structured array consists of fields with
distinct names, similar to columns in a database or fields in a record.
2. Heterogeneous Data Types: Fields can have different data types (e.g.,
integers, floats, strings), allowing for complex data structures.
3. Efficient Memory Management: Structured arrays ensure efficient
memory storage and access through fixed-size fields, enabling fast
computations and data retrieval.
4. Field Access: You can access fields by name (e.g., array['field_name']),
allowing for clear and intuitive data manipulation.
5. Support for Nested Structures: Structured arrays can support fields
containing other arrays or even nested structured arrays, enabling complex
data hierarchies.
Syntax for Creating a Structured Array:

A structured array is created using a custom dtype (data type) specification that
defines the names and data types of each field.

Creating a Structured Array

You can create a structured array by defining a dtype with named fields. Each
field is assigned a data type and optionally a shape.

Example 1: Basic Structured Array

import numpy as np
# Define structured array with named fields
person_dtype = np.dtype([('name', 'S20'), ('age', 'i4'), ('height', 'f4')])
# Create a structured array
SRGI, BHILAI

people = np.array([('Ram', 25, 5.5), ('Shyam', 30, 6.0)], dtype=person_dtype)

# Access the structured array
print(people)

Output:

array([(b'Ram', 25, 5.5), (b'Shyam', 30, 6. )],

dtype=[('name', 'S20'), ('age', '<i4'), ('height', '<f4')])

Here, each entry has a name (string), age (integer), and height (float).

In structured arrays in NumPy, S20, i4, and f4 refer to data types and their sizes.
These codes define the type of each field in the array. Let’s break them down:

1. S20 (String Data Type)

• S: Refers to a string (character) data type.

• 20: Specifies the maximum number of bytes (characters) the string can
have. In this case, S20 means the string can have up to 20 characters.

2. i4 (Integer Data Type)

• i: Refers to a signed integer data type.

• 4: Specifies the number of bytes (size) used to store the integer. In this
case, i4 means a 4-byte (32-bit) signed integer.

3. f4 (Floating-Point Data Type)

• f: Refers to a floating-point (decimal) data type.

• 4: Specifies the number of bytes used to store the floating-point number.
In this case, f4 means a 4-byte (32-bit) floating-point number, which is
equivalent to a float in Python.

In the array representation array([(b'Ram', 25, 5.5)]), the b before the string
'Ram' indicates that the string is a byte string or a bytes literal, rather than a
regular Unicode string. In Python, strings can be stored as either Unicode or
bytes:

• Unicode String: A regular string in Python, represented by str. It

supports various encodings like UTF-8 and can handle characters from
different languages.
• Byte String: A sequence of bytes (binary data), represented by bytes in
Python, which is why the letter b precedes the string literal.
SRGI, BHILAI

Accessing Fields in Structured Arrays

You can access individual fields (columns) in a structured array using the field
names.

Example 2: Accessing a Field

# Access the 'name' field

print(people['name'])
# Access the 'age' field
print(people['age'])

Output:

[b'Ram' b'Shyam']
[25 30]

We can treat fields like individual arrays, which can be useful for data
manipulation.

Modifying Structured Arrays

You can modify the values in structured arrays using standard NumPy array
indexing and assignment.

Example 3: Modifying a Field

# Modify 'age' field for the first person

people['age'][0] = 26
# Print the updated array
print(people)

Output:

array([(b'Ram', 26, 5.5), (b'Shyam', 30, 6. )],

dtype=[('name', 'S20'), ('age', '<i4'), ('height', '<f4')])

Complex Data Types

Structured arrays also support more complex data types, such as arrays within
fields.

Example 4: Structured Array with Arrays in Fields

SRGI, BHILAI

# Define a structured array with a field containing an array

complex_dtype = np.dtype([('name', 'S20'), ('grades', 'i4', (3,))])
students = np.array([('Ram', [85, 90, 92]), ('Shyam', [75, 80, 85])],
dtype=complex_dtype)
# Access the array
print(students)
# Access the grades field
print(students['grades'])

Output:

array([(b'Ram', [85, 90, 92]), (b'Shyam', [75, 80, 85])],

dtype=[('name', 'S20'), ('grades', '<i4', (3,))])
[[85 90 92]
[75 80 85]]

Here, each student has a list of three grades stored in the grades field.

Nested Structured Arrays

You can even nest structured arrays, where one field is itself another structured
array.

Example 5: Nested Structured Array

# Define nested dtype

address_dtype = np.dtype([('street', 'S20'), ('city', 'S20')])
person_dtype = np.dtype([('name', 'S20'), ('age', 'i4'), ('address', address_dtype)])

# Create array with nested dtype

people = np.array([('Ram', 25, ('123 Ave', 'New York')),
('Shyam', 30, ('456 St', 'Chicago'))], dtype=person_dtype)

# Access the nested field

print(people['address']['city'])

Output:

[b'New York' b'Chicago']

Operations on Structured Arrays

SRGI, BHILAI

You can perform NumPy operations on structured arrays, such as sorting or

filtering by field.

Example 6: Sorting by Field

# Sort by 'age' field

sorted_people = np.sort(people, order='age')

print(sorted_people)

Output:

array([(b'Ram', 25, (b'123 Ave', b'New York')), (b'Shyam', 30, (b'456 St',
b'Chicago'))],
dtype=[('name', 'S20'), ('age', '<i4'), ('address', [('street', 'S20'), ('city',
'S20')])])

In this example, we sorted the structured array by the age field.

Reading and Writing Array Data

In Python, you can read and write array data using libraries like NumPy,
which provides efficient methods to handle array-like data structures. Let's go
over the basic methods for reading and writing arrays.

1. Reading Array Data

Reading array data typically means loading arrays from files or converting data
into arrays.

• From a list (manually creating an array): You can manually create an

array from a Python list using NumPy's array() function.

Example:

import numpy as np

# Creating an array from a list

data = [1, 2, 3, 4, 5]

array = np.array(data)
SRGI, BHILAI

print(array)

Output:

[1 2 3 4 5]

• From a file: You can read array data from a file using functions like
np.loadtxt() or np.genfromtxt(), which are useful for text files.

Example: Let's assume you have a file called data.txt with the following content:

1, 2, 3

4, 5, 6

7, 8, 9

You can read this file as follows:

import numpy as np

# Reading array from a file

array = np.loadtxt('data.txt', delimiter=',')

print(array)

Output:

[[1. 2. 3.]

[4. 5. 6.]

[7. 8. 9.]]

2. Writing Array Data

You can write array data to files using NumPy's built-in functions such as
np.savetxt() or np.save().

• Saving as text file (CSV format): You can save the array as a text file
using savetxt().

Example:
SRGI, BHILAI

import numpy as np

array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Writing array to a text file

np.savetxt('output.txt', array, delimiter=',', fmt='%d')

This will create a file called output.txt with the following content:

1,2,3

4,5,6

7,8,9

• Saving as a binary file (for faster I/O): You can save arrays as binary
files using np.save() for more efficient storage.

Example:

import numpy as np

array = np.array([1, 2, 3, 4, 5])

# Writing array to a binary file

np.save('array_data.npy', array)

You can later load this binary file using np.load():

array_loaded = np.load('array_data.npy')

print(array_loaded)

Output:

[1 2 3 4 5]

Summary of Functions:

• np.array(data): Convert a list to an array.

• np.loadtxt('filename', delimiter=','): Read array from a text file.
SRGI, BHILAI

• np.genfromtxt('filename', delimiter=','): Another method to read array

data, handling missing values.
• np.savetxt('filename', array, delimiter=','): Save array as a text file.
• np.save('filename.npy', array): Save array as a binary .npy file.
• np.load('filename.npy'): Load a binary .npy file.

These methods make it easy to work with arrays in both textual and binary
formats.

Aws Certified Cloud Practitioner CLF c02 6798a19088ff
100% (1)
Aws Certified Cloud Practitioner CLF c02 6798a19088ff
34 pages
Snowflake Overview: The Datawarehouse Build For Cloud
100% (2)
Snowflake Overview: The Datawarehouse Build For Cloud
8 pages
BBA 202 Business Analytics
No ratings yet
BBA 202 Business Analytics
52 pages
What Is Normalization
No ratings yet
What Is Normalization
9 pages
Data Analytics Syllabus PDF
No ratings yet
Data Analytics Syllabus PDF
5 pages
Data Presentations
No ratings yet
Data Presentations
5 pages
Data 101 Terms
No ratings yet
Data 101 Terms
6 pages
Comprehensive Guide to Implementing Data Science and Analytics: Tips, Recommendations, and Strategies for Success
From Everand
Comprehensive Guide to Implementing Data Science and Analytics: Tips, Recommendations, and Strategies for Success
Rick Spair
No ratings yet
Big Datadoc
No ratings yet
Big Datadoc
9 pages
DA Unitwise Notes Detailed Cleaned
No ratings yet
DA Unitwise Notes Detailed Cleaned
5 pages
Data Analytics
No ratings yet
Data Analytics
30 pages
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
Data Science
No ratings yet
Data Science
5 pages
Advanced Data Analytics and Visualization Course Material (1)
No ratings yet
Advanced Data Analytics and Visualization Course Material (1)
45 pages
Abhijitya Midsem
No ratings yet
Abhijitya Midsem
6 pages
Data Analytics Fundamentals
No ratings yet
Data Analytics Fundamentals
3 pages
Bigdata
No ratings yet
Bigdata
54 pages
BDA Unit1 Notes
No ratings yet
BDA Unit1 Notes
14 pages
Business Analytics Chapter1 3
No ratings yet
Business Analytics Chapter1 3
3 pages
ISPFL9 Module1
100% (1)
ISPFL9 Module1
22 pages
Unit 1
No ratings yet
Unit 1
8 pages
Document
No ratings yet
Document
1 page
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Data Analytics
No ratings yet
Data Analytics
42 pages
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
DA Assignment 20241015 091512 0000
No ratings yet
DA Assignment 20241015 091512 0000
19 pages
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Da Unit Ii
No ratings yet
Da Unit Ii
25 pages
Abdur Rehman - 00829801721
No ratings yet
Abdur Rehman - 00829801721
61 pages
Bba 202 Ba Enotes Unit-1
No ratings yet
Bba 202 Ba Enotes Unit-1
19 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Assignment 2 - Yash Sanghavi - Hadoop Lecture 2 (Big Data Analytics)
No ratings yet
Assignment 2 - Yash Sanghavi - Hadoop Lecture 2 (Big Data Analytics)
10 pages
Data Analytics
No ratings yet
Data Analytics
6 pages
Da Unit-Ii
No ratings yet
Da Unit-Ii
21 pages
Intro To Big Data Analytics
No ratings yet
Intro To Big Data Analytics
14 pages
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Data Management & Data Architecture
No ratings yet
Data Management & Data Architecture
21 pages
Data Anlytics
No ratings yet
Data Anlytics
2 pages
BDA Assignment 1: Big Data Features and Characteristics
No ratings yet
BDA Assignment 1: Big Data Features and Characteristics
14 pages
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
DSBDA Unit 3 Notes
No ratings yet
DSBDA Unit 3 Notes
16 pages
Ba Notes Ete
No ratings yet
Ba Notes Ete
16 pages
Big Data Outline Notes
No ratings yet
Big Data Outline Notes
3 pages
Ba Theory
No ratings yet
Ba Theory
10 pages
UNIT-1 BigData
No ratings yet
UNIT-1 BigData
10 pages
Big Data Ashish
No ratings yet
Big Data Ashish
7 pages
Introduction To Business Analytics
No ratings yet
Introduction To Business Analytics
63 pages
Kit 601 L Unit 1 240219102731 858108ce
No ratings yet
Kit 601 L Unit 1 240219102731 858108ce
35 pages
It (r20) 4-1 Big Data Analytics Digital Notes
No ratings yet
It (r20) 4-1 Big Data Analytics Digital Notes
84 pages
Antim Prahar 2024 Data Analytics For Business Decisions
50% (2)
Antim Prahar 2024 Data Analytics For Business Decisions
38 pages
Data Science and Analytics Essentials: The Revolution of Decision-Making: Leveraging Data in the Digital Age
From Everand
Data Science and Analytics Essentials: The Revolution of Decision-Making: Leveraging Data in the Digital Age
Daniel Richards
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Group 4
No ratings yet
Group 4
10 pages
Data Analytics Complete Notes
No ratings yet
Data Analytics Complete Notes
33 pages
Chapter 1 Introduction To Data Analytics
No ratings yet
Chapter 1 Introduction To Data Analytics
4 pages
Data-Driven Decision Making
From Everand
Data-Driven Decision Making
Aadinath Pothuvaal
No ratings yet
Super 25 Unit 1 and Unit 2
No ratings yet
Super 25 Unit 1 and Unit 2
15 pages
FAI Notes - Unit 5
No ratings yet
FAI Notes - Unit 5
12 pages
Lecture 2
No ratings yet
Lecture 2
14 pages
Internship Report
No ratings yet
Internship Report
9 pages
Data Preparation&Analysis Practicals
No ratings yet
Data Preparation&Analysis Practicals
7 pages
Turing Machine
No ratings yet
Turing Machine
12 pages
R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
Computer Networks Lab Manual WORD
No ratings yet
Computer Networks Lab Manual WORD
39 pages
Data Analytics With Python Laboratory - Lab Manual
No ratings yet
Data Analytics With Python Laboratory - Lab Manual
45 pages
B. Tech. 1st & 2nd Semester (AICTE Scheme)
No ratings yet
B. Tech. 1st & 2nd Semester (AICTE Scheme)
1 page
Snypr 200 S F: Eries Undamentals
No ratings yet
Snypr 200 S F: Eries Undamentals
134 pages
Https:preview - Redd.it:batch Pipeline Cheat Sheet v0 6v3btd506xrc1.jpeg
No ratings yet
Https:preview - Redd.it:batch Pipeline Cheat Sheet v0 6v3btd506xrc1.jpeg
1 page
Chapter 5
No ratings yet
Chapter 5
54 pages
Information System Management (Ism)
No ratings yet
Information System Management (Ism)
28 pages
Srinath Selvan
No ratings yet
Srinath Selvan
3 pages
Optimmization of Management Zone
No ratings yet
Optimmization of Management Zone
9 pages
Amazon SageMaker DataWrangler Deep Dive Deck
No ratings yet
Amazon SageMaker DataWrangler Deep Dive Deck
30 pages
Dbms-Unit-1-Database and Database Users
No ratings yet
Dbms-Unit-1-Database and Database Users
50 pages
PAM - Complete
No ratings yet
PAM - Complete
322 pages
The Freelance Handbook
No ratings yet
The Freelance Handbook
35 pages
Java+Data Staff Musab Qamri Bangalore
No ratings yet
Java+Data Staff Musab Qamri Bangalore
4 pages
Jooq Adoc
No ratings yet
Jooq Adoc
8 pages
Ex - No. 3 Aircraft Database
No ratings yet
Ex - No. 3 Aircraft Database
8 pages
2020 - Superfetch - The Famous Unknown Spy
No ratings yet
2020 - Superfetch - The Famous Unknown Spy
14 pages
Modul 06
No ratings yet
Modul 06
9 pages
Bhimdatta Profile Completion
No ratings yet
Bhimdatta Profile Completion
1 page
Documents - Pub - Qad Database Definitions Technical Reference Qad20131ee PDF
No ratings yet
Documents - Pub - Qad Database Definitions Technical Reference Qad20131ee PDF
1,042 pages
Dba 1
No ratings yet
Dba 1
9 pages
Applications and Trends in Data Mining
100% (1)
Applications and Trends in Data Mining
20 pages
451 Computer Studies Paper 2 Revision Strategy 2023
No ratings yet
451 Computer Studies Paper 2 Revision Strategy 2023
10 pages
Sqli
No ratings yet
Sqli
22 pages
Artificial Intelligence m3
No ratings yet
Artificial Intelligence m3
37 pages
How To Build A Powerful Knowledge Chatbot With Lamatic - Ai, Firecrawl and RAG - by Vrijraj Singh - Lamatic - Ai Engineering - Dec, 2024 - Medium
No ratings yet
How To Build A Powerful Knowledge Chatbot With Lamatic - Ai, Firecrawl and RAG - by Vrijraj Singh - Lamatic - Ai Engineering - Dec, 2024 - Medium
16 pages
SailPoint IdentityIQ Learning Path 1733317843
No ratings yet
SailPoint IdentityIQ Learning Path 1733317843
16 pages
Apache Flink Getting Started
No ratings yet
Apache Flink Getting Started
4 pages
w4l1 - Er Model
No ratings yet
w4l1 - Er Model
56 pages
Salesforce Frequently Asked Interview Questions and Answers
No ratings yet
Salesforce Frequently Asked Interview Questions and Answers
28 pages

BTech 5 CSE Data Analytics With Python Unit 2 and 3 Notes

Uploaded by

BTech 5 CSE Data Analytics With Python Unit 2 and 3 Notes

Uploaded by

SRGI, BHILAI

3. Data Visualization and Communication: Mastery of visualization tools

and manipulating data, while specialized environments like SAS and

GDPR (General Data Protection Regulation)

HIPAA (Health Insurance Portability and Accountability Act)

o "Electronics," "Clothing," "Groceries," "Books."

'Response': ['Yes', 'No', 'Maybe']}

10. Social Media Sentiments

• Type: Ordinal Data (can also be nominal if no order is implied).

Nature Numerical (e.g., age, salary) Non-numerical (e.g., gender, color)

Analysis Type Statistical and mathematical analysis Classification and grouping

Representation Numbers Text or categories

2. Qualitative: Sentiment analysis, clustering, decision tree classification.

Slicing and striding

# Splitting into 2 arrays along axis 0 (row-wise)

6. Dot Product of Vectors

array1 = np.array([1, 2, 3])

In NumPy, broadcasting refers to the ability of NumPy to perform

Example 1: Broadcasting a Scalar with an Array

# Broadcasting the column vector across each row

[[13, 14, 15, 16],

• a has shape (2, 2).

A structured array in NumPy is a specialized ndarray that allows for the

Creating a Structured Array

Example 1: Basic Structured Array

people = np.array([('Ram', 25, 5.5), ('Shyam', 30, 6.0)], dtype=person_dtype)

array([(b'Ram', 25, 5.5), (b'Shyam', 30, 6. )],

1. S20 (String Data Type)

• S: Refers to a string (character) data type.

2. i4 (Integer Data Type)

• i: Refers to a signed integer data type.

3. f4 (Floating-Point Data Type)

• f: Refers to a floating-point (decimal) data type.

• Unicode String: A regular string in Python, represented by str. It

Accessing Fields in Structured Arrays

Example 2: Accessing a Field

# Access the 'name' field

Modifying Structured Arrays

Example 3: Modifying a Field

# Modify 'age' field for the first person

array([(b'Ram', 26, 5.5), (b'Shyam', 30, 6. )],

Complex Data Types

Example 4: Structured Array with Arrays in Fields

# Define a structured array with a field containing an array

array([(b'Ram', [85, 90, 92]), (b'Shyam', [75, 80, 85])],

Nested Structured Arrays

Example 5: Nested Structured Array

# Define nested dtype

# Create array with nested dtype

# Access the nested field

[b'New York' b'Chicago']

Operations on Structured Arrays

You can perform NumPy operations on structured arrays, such as sorting or

Example 6: Sorting by Field

# Sort by 'age' field

In this example, we sorted the structured array by the age field.

Reading and Writing Array Data

1. Reading Array Data

• From a list (manually creating an array): You can manually create an

# Creating an array from a list

You can read this file as follows:

# Reading array from a file

array = np.loadtxt('data.txt', delimiter=',')

2. Writing Array Data

array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Writing array to a text file

np.savetxt('output.txt', array, delimiter=',', fmt='%d')

array = np.array([1, 2, 3, 4, 5])

# Writing array to a binary file

You can later load this binary file using np.load():

• np.array(data): Convert a list to an array.

• np.genfromtxt('filename', delimiter=','): Another method to read array

You might also like