0% found this document useful (0 votes)
10 views8 pages

Practical 1

The document provides an overview of Python libraries commonly used in data science, highlighting their functionalities and commonly used functions. Key libraries include NumPy for numerical computing, Pandas for data manipulation, Matplotlib and Seaborn for visualization, and Scikit-learn and TensorFlow for machine learning. Each library is briefly described along with examples of functions that facilitate various data science tasks.

Uploaded by

BHOJAK ADITYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views8 pages

Practical 1

The document provides an overview of Python libraries commonly used in data science, highlighting their functionalities and commonly used functions. Key libraries include NumPy for numerical computing, Pandas for data manipulation, Matplotlib and Seaborn for visualization, and Scikit-learn and TensorFlow for machine learning. Each library is briefly described along with examples of functions that facilitate various data science tasks.

Uploaded by

BHOJAK ADITYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

22SS02IT080 SSCA3021 Data Science

what is python libraries:

Python libraries are collections of pre-written code that developers can use to perform
common tasks, making it easier to develop applications and avoid writing repetitive code. They
provide modules and functions that can be imported into a Python script to extend its functionality
without having to write the code from scratch.

Common python libraries used for data science :

1. NumPy

Provides support for large, multi-dimensional arrays and matrices, along with mathematical
functions to operate on these arrays.

2. Pandas

Offers data structures and operations for manipulating numerical tables and time series.

3. Matplotlib

A plotting library used for creating static, animated, and interactive visualizations in Python.

4. Seaborn

A statistical data visualization library based on Matplotlib, designed to make plots more
attractive and informative.

5. Plotly

An interactive graphing library that makes publication-quality graphs online.

6. SciPy

Builds on NumPy and provides additional functionality for scientific and technical
computing.

7. Scikit-learn

A machine learning library that offers simple and efficient tools for data mining and data
analysis.

8. TensorFlow

An open-source platform for machine learning, especially useful for neural networks.

9. Keras

A high-level neural networks API, written in Python and capable of running on top of
TensorFlow.
22SS02IT080 SSCA3021 Data Science

10. PyTorch

An open-source machine learning library based on the Torch library, used for applications
such as natural language processing.

11. Statsmodels

Provides classes and functions for the estimation of many different statistical models.

12. NLTK (Natural Language Toolkit)

A library for natural language processing (NLP) with a suite of text processing libraries.

13. SpaCy

An open-source software library for advanced natural language processing.

14. XGBoost

An optimized distributed gradient boosting library designed to be highly efficient, flexible,


and portable.

15. LightGBM

A fast, distributed, high-performance gradient boosting framework based on decision tree


algorithms.

16. CatBoost

A high-performance open-source library for gradient boosting on decision trees.

17. Scrapy

An open-source and collaborative web crawling framework for Python.

18. Beautiful Soup

A library for parsing HTML and XML documents and extracting data from them.

19. OpenCV

An open-source computer vision and machine learning software library.


22SS02IT080 SSCA3021 Data Science

Explanation :
1. Pandas

Description: Pandas is a powerful data manipulation and analysis library that provides data
structures like DataFrames, which are useful for handling structured data.

Commonly Used Functions:

• pd.read_csv(filepath): Reads a CSV file into a DataFrame.


• df.head(n): Displays the first n rows of the DataFrame.
• df.tail(n): Displays the last n rows of the DataFrame.
• df.describe(): Generates descriptive statistics.
• df.info(): Provides a concise summary of the DataFrame.
• df.drop(labels, axis): Removes specified labels from rows or columns.
• df.groupby(by): Groups data by a specified column for aggregation.
• df.merge(right, how, on): Merges two DataFrames.
• df.fillna(value): Fills missing values with the specified value.

Example:

2. NumPy

Description: NumPy is the fundamental package for numerical computing in Python. It


provides support for arrays, matrices, and a large collection of mathematical functions.

Commonly Used Functions:


22SS02IT080 SSCA3021 Data Science

• np.array(object): Creates an array.


• np.zeros(shape): Creates an array filled with zeros.
• np.ones(shape): Creates an array filled with ones.
• np.linspace(start, stop, num): Creates an array with linearly spaced elements.
• np.mean(array): Computes the mean of array elements.
• np.median(array): Computes the median of array elements.
• np.std(array): Computes the standard deviation.
• np.sum(array): Sums the elements of an array.
• np.dot(a, b): Computes the dot product of two arrays.

Example:

3. Matplotlib

Description: Matplotlib is a plotting library used for creating static, animated, and interactive
visualizations in Python.

Commonly Used Functions:

• plt.plot(x, y): Plots a line graph.


• plt.scatter(x, y): Creates a scatter plot.
• plt.bar(x, height): Creates a bar chart.
• plt.hist(x, bins): Creates a histogram.
• plt.xlabel(xlabel): Sets the label for the x-axis.
• plt.ylabel(ylabel): Sets the label for the y-axis.
• plt.title(label): Sets the title of the plot.
• plt.legend(): Adds a legend to the plot.
• plt.show(): Displays the plot.
22SS02IT080 SSCA3021 Data Science

Example:

4. Seaborn

Description: Seaborn is built on top of Matplotlib and provides a high-level interface for
drawing attractive and informative statistical graphics.

Commonly Used Functions:

• sns.scatterplot(x, y, data): Creates a scatter plot.


• sns.lineplot(x, y, data): Creates a line plot.
• sns.barplot(x, y, data): Creates a bar plot.
• sns.histplot(data, bins): Creates a histogram.
• sns.boxplot(x, y, data): Creates a box plot.
• sns.heatmap(data, annot): Creates a heatmap.
• sns.pairplot(data): Creates a pair plot.
• sns.catplot(x, y, data): Creates a categorical plot.

Example:
22SS02IT080 SSCA3021 Data Science

5. Plotly

Description: Plotly is an interactive graphing library that makes interactive, publication-


quality graphs online. It supports various types of plots and integrates with several data
visualization tools.

Commonly Used Functions:

• go.Figure(data): Creates a new figure.


• go.Scatter(x, y): Creates a scatter plot.
• go.Bar(x, y): Creates a bar plot.
• go.Histogram(x): Creates a histogram.
• go.Box(y): Creates a box plot.
• go.Heatmap(z): Creates a heatmap.
• fig.update_layout(): Updates the layout of the figure.
• fig.show(): Displays the figure.

Example:
22SS02IT080 SSCA3021 Data Science

6. SciPy

Description: Builds on NumPy and provides additional functionality for scientific and
technical computing. Commonly Used Functions:

• scipy.optimize.minimize(): Minimizes a function.


• scipy.integrate.quad(): Performs numerical integration.
• scipy.stats.norm(): Represents a normal (Gaussian) distribution.
• scipy.spatial.distance.cdist(): Computes distance between each pair of the
two collections of inputs.
• scipy.fft(): Computes the fast Fourier transform.

7. Scikit-learn

Description: A machine learning library that offers simple and efficient tools for data mining
and data analysis. Commonly Used Functions:

• train_test_split(): Splits data into training and testing sets.


• StandardScaler(): Standardizes features by removing the mean and scaling to unit
variance.
• LinearRegression(): Implements linear regression.
• LogisticRegression(): Implements logistic regression.
• KMeans(): Implements k-means clustering.
• RandomForestClassifier(): Implements a random forest classifier.
• cross_val_score(): Evaluates a score by cross-validation.
• GridSearchCV(): Performs grid search with cross-validation.

8. TensorFlow

Description: An open-source platform for machine learning, especially useful for neural
networks. Commonly Used Functions:

• tf.constant(): Creates a constant tensor.


• tf.Variable(): Creates a variable tensor.
• tf.GradientTape(): Records operations for automatic differentiation.
• tf.keras.Model(): Defines a Keras model.
• tf.keras.layers.Dense(): Adds a dense layer to a neural network.
• tf.data.Dataset(): Constructs a dataset for input pipeline.

9. Keras

Description: A high-level neural networks API, written in Python and capable of running on
top of TensorFlow. Commonly Used Functions:

• keras.models.Sequential(): Creates a sequential model.


• keras.layers.Dense(): Adds a dense layer.
• keras.layers.Conv2D(): Adds a convolutional layer.
• keras.layers.LSTM(): Adds a long short-term memory layer.
• keras.optimizers.Adam(): Configures the Adam optimizer.
22SS02IT080 SSCA3021 Data Science

• keras.losses.BinaryCrossentropy(): Configures the binary cross-entropy loss


function.

10. PyTorch

Description: An open-source machine learning library based on the Torch library, used for
applications such as natural language processing. Commonly Used Functions:

• torch.tensor(): Creates a tensor.


• torch.nn.Module(): Base class for all neural network modules.
• torch.optim.SGD(): Implements stochastic gradient descent optimizer.
• torch.nn.functional.relu(): Applies the rectified linear unit function.
• torch.utils.data.DataLoader(): Provides an iterable over a dataset.
• torch.autograd.Variable(): Wraps a tensor and records operations on it.

You might also like