0% found this document useful (0 votes)
20 views

Python Data Analytics Libraries

Uploaded by

yashnikam844
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Python Data Analytics Libraries

Uploaded by

yashnikam844
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Python is a powerful tool for data analytics, thanks to its extensive libraries that support data

manipulation, analysis, visualization, and machine learning. Here’s a detailed look at some of the
most popular Python libraries used in data analytics:

### 1. **Pandas**

- **Purpose**: Data manipulation and analysis.

- **Key Features**: Provides DataFrame and Series objects, powerful tools for reading and writing
data, handling missing data, and more.

```python

import pandas as pd

# Create a DataFrame

df = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'Salary': [70000, 80000, 90000]

})

# Display the DataFrame

print(df)

# Perform operations

df['Salary'] = df['Salary'] * 1.1

print(df.describe())

```

### 2. **NumPy**

- **Purpose**: Numerical computing.


- **Key Features**: Support for large, multi-dimensional arrays and matrices, mathematical
functions.

```python

import numpy as np

# Create an array

arr = np.array([1, 2, 3, 4, 5])

# Perform operations

arr = arr * 2

print(arr)

# Statistical operations

mean = np.mean(arr)

std_dev = np.std(arr)

print(f"Mean: {mean}, Standard Deviation: {std_dev}")

```

### 3. **SciPy**

- **Purpose**: Scientific computing.

- **Key Features**: Builds on NumPy, providing additional functionality for optimization,


integration, interpolation, eigenvalue problems, and more.

```python

from scipy import stats

# Perform statistical tests

data = np.random.normal(0, 1, 1000)

t_statistic, p_value = stats.ttest_1samp(data, 0)

print(f"T-statistic: {t_statistic}, P-value: {p_value}")


```

### 4. **Matplotlib**

- **Purpose**: Data visualization.

- **Key Features**: Comprehensive library for creating static, animated, and interactive
visualizations.

```python

import matplotlib.pyplot as plt

# Plot data

plt.plot([1, 2, 3], [4, 5, 6])

plt.xlabel('X-axis')

plt.ylabel('Y-axis')

plt.title('Simple Plot')

plt.show()

```

### 5. **Seaborn**

- **Purpose**: Statistical data visualization.

- **Key Features**: Based on Matplotlib, provides a high-level interface for drawing attractive and
informative graphics.

```python

import seaborn as sns

# Load dataset

tips = sns.load_dataset("tips")

# Create a bar plot


sns.barplot(x="day", y="total_bill", data=tips)

plt.show()

```

### 6. **Plotly**

- **Purpose**: Interactive data visualization.

- **Key Features**: Supports a variety of chart types, interactive plots.

```python

import plotly.express as px

# Create an interactive line plot

fig = px.line(x=[1, 2, 3], y=[4, 5, 6], title='Interactive Line Plot')

fig.show()

```

### 7. **Scikit-learn**

- **Purpose**: Machine learning.

- **Key Features**: Tools for data mining and data analysis, including classification, regression,
clustering, and dimensionality reduction.

```python

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score

# Load dataset

iris = load_iris()
X, y = iris.data, iris.target

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a model

model = RandomForestClassifier()

model.fit(X_train, y_train)

# Make predictions

predictions = model.predict(X_test)

print(f"Accuracy: {accuracy_score(y_test, predictions)}")

```

### 8. **Statsmodels**

- **Purpose**: Statistical modeling and econometrics.

- **Key Features**: Tools for estimating and testing statistical models.

```python

import statsmodels.api as sm

# Load dataset

data = sm.datasets.get_rdataset("mtcars").data

# Fit a linear regression model

X = sm.add_constant(data[['hp', 'wt']])

y = data['mpg']

model = sm.OLS(y, X).fit()

print(model.summary())

```
### 9. **Dask**

- **Purpose**: Parallel computing and larger-than-memory computations.

- **Key Features**: Integrates with Pandas and NumPy, allows for scalable data analysis.

```python

import dask.dataframe as dd

# Load a large dataset

df = dd.read_csv('large_dataset.csv')

# Perform operations

result = df.groupby('column_name').mean().compute()

print(result)

```

### 10. **TensorFlow and PyTorch**

- **Purpose**: Deep learning and machine learning.

- **Key Features**: TensorFlow provides a comprehensive ecosystem for ML; PyTorch offers
dynamic computation graphs and is favored for research.

**TensorFlow Example:**

```python

import tensorflow as tf

# Define a simple model

model = tf.keras.models.Sequential([

tf.keras.layers.Dense(10, activation='relu'),

tf.keras.layers.Dense(1)
])

# Compile and train the model

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(X_train, y_train, epochs=10)

```

**PyTorch Example:**

```python

import torch

import torch.nn as nn

import torch.optim as optim

# Define a simple model

class SimpleModel(nn.Module):

def __init__(self):

super(SimpleModel, self).__init__()

self.fc1 = nn.Linear(10, 1)

def forward(self, x):

return self.fc1(x)

model = SimpleModel()

# Define loss and optimizer

criterion = nn.MSELoss()

optimizer = optim.Adam(model.parameters(), lr=0.01)

# Train the model

for epoch in range(10):

optimizer.zero_grad()
outputs = model(torch.tensor(X_train, dtype=torch.float32))

loss = criterion(outputs, torch.tensor(y_train, dtype=torch.float32))

loss.backward()

optimizer.step()

```

These libraries form the core of Python's data analytics ecosystem. Mastering them will enable you
to handle a wide variety of data-related tasks efficiently and effectively.

You might also like