Python Libraries Overview: NumPy, Pandas, Matplotlib, Sklearn, TensorFlow
1. NumPy
- What is NumPy and why is it used?
Answer: NumPy is a Python library for numerical computing, particularly for operations on large,
multi-dimensional arrays and matrices. It offers a range of mathematical functions to operate on
these arrays efficiently.
- How do you create a NumPy array?
Answer: Use the numpy.array() function:
import numpy as np
arr = np.array([1, 2, 3, 4])
- What is the shape and how do you get it?
Answer: The shape of a NumPy array refers to its dimensions. You can get it using the .shape
attribute:
arr.shape
2. Pandas
- What is Pandas and why is it used?
Answer: Pandas is a powerful data manipulation library in Python, designed for working with
structured data (e.g., data frames). It provides tools for reading, cleaning, transforming, and
analyzing data.
- How do you read a CSV file in Pandas?
Answer: Use the pandas.read_csv() function:
import pandas as pd
df = pd.read_csv('file.csv')
- How do you handle missing data in a Pandas DataFrame?
Answer: You can handle missing data using functions like fillna() to fill missing values or
dropna() to remove rows/columns with missing values:
df.fillna(0)
df.dropna()
3. Matplotlib
- What is Matplotlib used for?
Answer: Matplotlib is a plotting library for creating static, animated, and interactive visualizations
in Python. It's commonly used for data visualization tasks.
- How do you create a basic plot?
Answer: Use matplotlib.pyplot.plot() to create a basic line plot:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])
plt.show()
- How do you label axes and add a title to a plot?
Answer: Use the xlabel(), ylabel(), and title() functions:
plt.xlabel('X-axis label')
plt.ylabel('Y-axis label')
plt.title('Plot Title')
plt.show()
4. Sklearn (Scikit-learn)
- What is Scikit-learn?
Answer: Scikit-learn is a machine learning library for Python that provides simple and efficient
tools for data mining and data analysis, including classification, regression, clustering, and more.
- How do you split data into training and testing sets?
Answer: Use the train_test_split() function:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
- What is overfitting in machine learning?
Answer: Overfitting occurs when a model learns not only the underlying pattern in the training
data but also the noise, leading to poor generalization to new, unseen data.
5. TensorFlow
- What is TensorFlow and why is it used?
Answer: TensorFlow is an open-source library developed by Google for machine learning and
deep learning tasks. It is widely used for building neural networks and other machine learning
models.
- How do you create a basic neural network in TensorFlow?
Answer: You can use tensorflow.keras to build a neural network:
import tensorflow as tf
from tensorflow.keras import layers
model = tf.keras.Sequential([
layers.Dense(64, activation='relu', input_shape=(784,)),
layers.Dense(10, activation='softmax')
])
- What are tensors in TensorFlow?
Answer: Tensors are the core data structures in TensorFlow. They are multi-dimensional arrays
that can store data in the form of scalars, vectors, matrices, and higher-dimensional arrays.