0% found this document useful (0 votes)

377 views29 pages

Data Wrangling With Python Lab Manual

Data Wrangling with Python Lab Manual

Uploaded by

Kowmala Thalisetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

377 views29 pages

Data Wrangling With Python Lab Manual

Data Wrangling with Python Lab Manual

Uploaded by

Kowmala Thalisetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Module-I Data Wrangling with Python

What Is Data Wrangling?

Data Wrangling is the process of cleaning, organizing, and transforming raw data into a
more usable format to prepare it for analysis. This is crucial because raw data often contains
errors, inconsistencies, or irrelevant information that can skew the results of any analysis.
Data wrangling ensures that the data is in the right structure and format to derive
meaningful insights.

Importance of Data Wrangling:

1. Improves Data Quality: It ensures that data is accurate, complete, and relevant,
minimizing errors in analysis.
2. Increases Efficiency: Properly wrangled data makes analysis faster, leading to
quicker decision-making.
3. Ensures Consistency: Cleaning the data eliminates duplicate or redundant
information.
4. Enhances Data Usability: Raw data is often unstructured and requires wrangling to
make it suitable for processing by algorithms and machine learning models.

How Is Data Wrangling Performed?

1. Data Collection: Gather raw data from various sources such as databases, APIs, or
files.
2. Data Exploration: Understand the structure, format, and potential issues in the data.
3. Data Cleaning: Remove duplicates, handle missing values, correct inconsistencies, and
deal with outliers.
4. Data Transformation: Convert the data into a format suitable for analysis, such as
normalizing, aggregating, or converting data types.
5. Data Integration: Combine multiple data sources into a cohesive dataset.
6. Data Validation: Ensure the final dataset meets the requirements and is error-free.

Tasks of Data Wrangling:

- Handling missing data (filling, interpolation, or deletion).
- Correcting data inconsistencies (format errors, incorrect data types).
- Filtering and selecting relevant data.
- Removing duplicates and irrelevant data.
- Aggregating data (grouping or summarizing).
- Transforming and normalizing data (scaling, converting categorical to numerical).

Data Wrangling Tools:

1. Python Libraries: Pandas, NumPy, Matplotlib, Seaborn.
2. R Programming: dplyr, tidyr.
3. SQL: For database querying and manipulation.
4. Excel/Google Sheets: For smaller data wrangling tasks.
5. OpenRefine: A powerful tool for data cleaning and transformation.

Introduction to Python
Python is a versatile, high-level programming language widely used for data science,
machine learning, web development, and automation. Its simplicity, readability, and rich
library ecosystem make it a popular choice for beginners and professionals alike.

Python Basics:
- Variables: Used to store data values.
```python
x=5
name = 'John'
```
- Data Types: int, float, str, list, dict, etc.
- Control Structures: if, for, while, etc.
- Functions: Used to modularize code.
```python
def greet():
print('Hello, World!')
```
- Libraries: Python has extensive libraries for data wrangling, including pandas, numpy,
and csv.

Data Meant to Be Read by Machines

Machine-readable data refers to structured data that computers can process directly. Some
common formats include:
1. CSV (Comma-Separated Values): A simple file format for tabular data.
2. JSON (JavaScript Object Notation): A lightweight format for storing and exchanging
data, commonly used in APIs.
3. XML (eXtensible Markup Language): A markup language that defines rules for
encoding documents in a format that is both human-readable and machine-readable.

CSV Data
CSV (Comma-Separated Values) files are used to store tabular data, with each line in the
file representing a row, and each field separated by a comma.
Example:
```csv
name,age,city
John,25,New York
Jane,30,Los Angeles
```

JSON Data
JSON is used to represent structured data in a readable text format, often used in web
applications for transmitting data.
Example:
```json
{
'name': 'John',
'age': 25,
'city': 'New York'
}
```

XML Data
XML is used to describe data in a hierarchical structure, making it useful for representing
complex data models.
Example:
```xml
<person>
<name>John</name>
<age>25</age>
<city>New York</city>
</person>
```
Experiment – 1: Develop a Python Program for Reading and Writing CSV Files
Here’s a basic Python program to read and write CSV files using the csv module.
Program:

import csv

# Writing to a CSV file

data = [['Name', 'Age', 'City'],
['John', '25', 'New York'],
['Jane', '30', 'Los Angeles']]

with open('people.csv', 'w', newline='') as file:

writer = csv.writer(file)
writer.writerows(data)

# Reading from a CSV file

with open('people.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
print(row)

Csv File:

Name Age City

John 25 New York

Jane 30 Los Angeles

Output:
Experiment – 2: Develop a Python Program for Reading XML Files
This Python program reads XML data using the xml.etree.ElementTree module.

Program:
import xml.etree.ElementTree as ET

# Parsing an XML file

tree = ET.parse('data.xml')
root = tree.getroot()

# Iterating through the XML

for person in root.findall('person'):
name = person.find('name').text
age = person.find('age').text
city = person.find('city').text
print(f'Name: {name}, Age: {age}, City: {city}')

Xml File:

<?xml version="1.0"?>

<person> Xml Output:

</person>

<city>Los Angeles</city>

</person>

</people>
Output:

Experiment – 3: Develop a Python Program for Reading and Writing JSON to a

File
Here’s how to read and write JSON using Python’s json module.
Program:
import json

# Writing multiple rows of JSON data to a file

data = [

{"name": "John", "age": 25, "city": "New York"},

{"name": "Jane", "age": 30, "city": "Los Angeles"},

{"name": "Doe", "age": 40, "city": "Chicago"}

with open('data.json', 'w') as json_file:

json.dump(data, json_file, indent=4)

# Reading multiple rows of JSON data from a file

with open('data.json', 'r') as json_file:

data = json.load(json_file)

for person in data:

print(person)

Output:
Module-II: Working with Excel Files
and PDFs
This module focuses on working with Excel files and PDFs using Python, key tasks for
automating and processing data efficiently. We’ll cover how to parse and manipulate these
files, how to install the required Python packages, and introduce some basic database
concepts that provide alternative data storage options. The hands-on experiments will give
you experience with real-world scenarios, such as converting files between formats and
parsing data.

1. Installing Python Packages

To work with Excel and PDF files in Python, you’ll need to install specific libraries.
Some common libraries include:

 pandas: For handling data in various formats (CSV, Excel, TSV, etc.).
 openpyxl: For reading and writing Excel files.
 pdfminer.six: For extracting text from PDFs.

Example: Installing necessary packages

pip install pandas openpyxl pdfminer.six

This command installs all the required libraries. Once installed, you can start using them
in your Python scripts.

2. Parsing Excel Files

Excel files (.xlsx) are commonly used for storing tabular data, and Python offers several
packages for reading and writing Excel files.

2.1 Reading Excel Files

 Library: pandas or openpyxl

- Task: You can read an Excel file and manipulate it as a DataFrame using the
pandas library.

2.2 Writing to an Excel File

- You can also write data to an Excel file using pandas.

3. Parsing PDFs
PDF parsing can be more complex than Excel, as PDFs do not have a structured tabular
format. However, Python libraries like pdfminer.six allow for extracting text from PDFs,
which can then be further processed.

3.1 Extracting Text from PDFs

 Library: pdfminer.six
 Task: Extract raw text from a PDF document.

3.2 Converting PDF to Text and Processing

4. Converting Between File Formats

 4.1 Converting a TSV File to Excel

A Tab-Separated Values (TSV) file is similar to a CSV file, but columns are
separated by tabs instead of commas. Converting TSV to Excel is straightforward
using pandas.

5. Databases: A Brief Introduction

 Relational Databases:
 Relational databases, like MySQL and PostgreSQL, store data in structured tables
with rows and columns. They are suitable when you need complex queries,
relationships between datasets, and strong consistency.

 Non-Relational Databases (NoSQL)

 Non-relational databases, such as MongoDB, store data in a flexible, document-
oriented format (e.g., JSON). They are preferred when scalability and flexibility
are needed, such as in large-scale web apps.
 When to Use: Relational Databases for structured data and complex queries.
NoSQL Databases for flexibility and scalability.
Experiments:

Experiment 4: Develop a Python Program for Reading an Excel File

Program:

import pandas as pd

# Load the Excel file

df = pd.read_excel('sample_data.xlsx')

# Display the first 5 rows

print(df.head())

Excel file:

Name Age City

John Doe 28 New York
Jane Smith 34 Los Angeles
Emily Davis 22 Chicago

Output:
Experiment 5: Develop a Python Program for Converting a TSV File into
Excel

Program:
import pandas as pd

# Read the TSV file

df = pd.read_csv('data.tsv', sep='\t')

# Convert to Excel and save

df.to_excel('output_data.xlsx', index=False)

print('TSV file successfully converted to Excel!')

TSV File:

Name Age City

Alice 24 Seattle
Bob 30 Portland
Charlie 29 San Francisco
David 35 New York

Output:

Experiment 6: Develop a Python Program for Converting a PDF File into

Excel

Program:

from pdfminer.high_level import extract_text

import pandas as pd

# Step 1: Extract text from the PDF

text = extract_text('sample.pdf')
# Step 2: Replace (cid:9) with actual tab characters

text = text.replace('(cid:9)', '\t')

# Step 3: Split the text by lines

lines = text.strip().split('\n')

# Step 4: Inspect and parse the lines into structured data

data = []

for line in lines[1:]:

columns = line.split('\t')

print(f"Parsed Line: {columns}")

data.append(columns)

# Step 5: Ensure all rows have 3 columns before proceeding

clean_data = [row for row in data if len(row) == 3]

# Step 6: Create a DataFrame with appropriate column names

df = pd.DataFrame(clean_data, columns=['Name', 'Age', 'City'])

# Step 7: Save the DataFrame as an Excel file (use a new file name to avoid permission
issues)

output_excel_path = 'output_data.xlsx'

df.to_excel(output_excel_path, index=False)

print('PDF data has been successfully converted to Excel!')

Pdf File:

Excel File:

Name Age City

John 28 New York
Jane 34 Los Angeles
Emily 22 Chicago

Output:
Module-III Data Cleanup
Why Clean Data?
Data cleanup ensures that the dataset is accurate, consistent, and usable for analysis. Dirty
data can cause incorrect models, misleading results, or failed applications. Data cleaning
involves the removal or rectification of missing values, duplicates, formatting errors, and
inconsistencies.

Data Cleanup Basics

Data cleanup involves tasks such as:

 • Handling missing values: Removing or imputing empty cells.

 • Correcting wrong formats: Ensuring consistency in date formats, string cases,
numerical types, etc.
 • Removing outliers: Identifying and addressing extreme data points that can skew
analysis results.

Identifying Values for Data Cleanup

Before cleaning the data, it is essential to identify issues like:

 • Missing or null values.

 • Incorrect data types or formats.
 • Outliers or erroneous data points.
 • Duplicated data entries.

Formatting Data
Formatting data involves ensuring consistency in date formats, numerical types (floats or
integers), string casing (lowercase/uppercase), and handling categorical variables.

Finding Outliers and Bad Data

Outliers are extreme values that differ significantly from the majority of the dataset and
can negatively affect analysis. Common methods to detect outliers include:

 Using Z-scores or Interquartile Range (IQR).

 Visualizations like box plots.
Finding Duplicates
Duplicates can bias your analysis, and removing them ensures the integrity of the dataset.
Python's pandas library provides methods to identify and remove duplicates.

Fuzzy Matching and RegEx Matching

• Fuzzy Matching: Useful for finding strings that are similar but not exact matches, often
applied when merging datasets.

• Regular Expressions (RegEx): Useful for pattern matching, such as identifying specific
string formats like email addresses or phone numbers.

Normalizing and Standardizing Data

• Normalization: Scales data to a specific range, typically [0, 1].

• Standardization: Centers data around the mean and scales it by standard deviation. This
is useful in machine learning algorithms.

Saving the Data

Once data is cleaned, saving the cleaned dataset is essential for further analysis and
modeling without repeating the process.

Scripting the Cleanup

Automating the cleanup process with Python scripts ensures the procedure is repeatable,
especially when new data is added to the dataset.

Experiment 7: Develop a Python Program for cleaning empty cells and

cleaning wrong format

Program:

import pandas as pd

# Sample DataFrame with missing values and wrong format

data = {

'Name': ['Alice', 'Bob', None, 'David'],

'Age': [25, None, 30, 'Twenty'],

'Salary': [50000, 60000, None, 80000]

df = pd.DataFrame(data)

# Cleaning empty cells by filling with default values or dropping rows

df['Name'] = df['Name'].fillna('Unknown') # No inplace=True, just assignment

df['Age'] = pd.to_numeric(df['Age'], errors='coerce') # Convert 'Age' to numeric, NaN

for errors

df['Age'] = df['Age'].fillna(df['Age'].mean()) # Assign back after filling missing values

df = df.dropna(subset=['Salary']) # No inplace, assign back to df

print("Cleaned DataFrame:")

print(df)

Output:
Experiment 8: Develop a Python Program for finding duplicates in a
data frame

Program:

import pandas as pd

# Sample DataFrame with duplicates

data = {

'Name': ['Alice', 'Bob', 'Alice', 'David', 'Alice'],

'Age': [25, 30, 25, 40, 25]

df = pd.DataFrame(data)

# Finding duplicates

duplicates = df.duplicated()

print("Duplicated Rows:")

print(df[duplicates])

# Removing duplicates

df_no_duplicates = df.drop_duplicates()

print("\nDataFrame after removing duplicates:")

print(df_no_duplicates)
Output:

Experiment 9: Develop a Python Program for normalizing data

Program:

import pandas as pd

from sklearn.preprocessing import MinMaxScaler

# Sample DataFrame for normalization

data = {

'Feature1': [10, 20, 30, 40, 50],

'Feature2': [1, 2, 3, 4, 5]

df = pd.DataFrame(data)

# Normalizing data using MinMaxScaler

scaler = MinMaxScaler()

df_normalized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

print("Normalized DataFrame:")

print(df_normalized)
Output:
Module IV: Data Exploration and
Analysis
1. Exploring Data
Exploring data involves a preliminary examination of the data to understand its characteristics.
This is where you look at basic statistics, identify data types, check for missing values, and get an
overall sense of the dataset. Key Steps include: overview of the dataset, summary statistics,
distribution checks, and identifying missing values.

2. Importing Data
Data import is the process of bringing external data into your Python environment for analysis.
Data can be imported from various file types like CSV, Excel, SQL databases, and web APIs. In
Python, methods to import data include:
• CSV Files: pandas.read_csv("filename.csv")
• Excel Files: pandas.read_excel("filename.xlsx")
• SQL Databases: Using sqlalchemy or sqlite3 for connecting and querying databases.

3. Exploring Table Functions

Table functions allow you to interact with and manipulate datasets for better understanding.
Functions include head() and tail() to inspect rows, info() for data types, and describe() for
summary statistics.

4. Joining Numerous Datasets

Joining datasets is combining multiple datasets to create a unified dataset for analysis. This
includes Inner Join, Outer Join, Left Join, and Right Join. In Python, tools like merge() and
concat() from pandas are used.

5. Identifying Correlations
Correlation measures the statistical relationship between two variables. It can show whether
changes in one variable predict changes in another. A correlation matrix shows the relationship
between all variables. Visualizing correlations using a heatmap is common.

6. Identifying Outliers
Outliers are data points that significantly differ from the rest of the dataset. Methods for
identifying outliers include the Standard Deviation method and Interquartile Range (IQR).
Visualization tools like box plots and histograms help in identifying outliers.
7. Creating Groupings
Grouping data involves categorizing data into different segments for analysis. The groupby()
function in pandas allows grouping data based on specific categories, and applying aggregation
methods like mean or sum.

8. Analyzing Data - Separating and Focusing the Data

Analyzing data involves separating relevant features for focused exploration. Methods include
filtering data using conditions or selecting specific columns for subsetting.

9. Presenting Data
After analysis, presenting data involves summarizing insights and using visuals like charts and
graphs to communicate findings clearly.

10. Visualizing the Data

Visualizations make data easier to interpret by presenting it in graphical formats. Common
visualizations include bar charts, histograms, pie charts, line charts, and scatter plots.

11. Time-Related Data

Time-related data involves handling datasets with temporal components, such as stock prices or
weather trends. Techniques like time series visualization and rolling averages are used for
analysis.

12. Maps, Interactives, Words, Images, Video, and Illustrations

Advanced visualizations include geographic maps, word clouds, and interactive charts. Python
libraries like folium, geopandas, plotly, and bokeh are used to create these visualizations.

13. Presentation Tools

Presentation tools like Tableau, Power BI, and Google Data Studio allow creating interactive
reports and dashboards for sharing results. Python libraries like matplotlib and seaborn are also
used for building visuals.

14. Publishing the Data - Open-Source Platforms

Publishing data involves sharing your analysis on platforms like GitHub or Kaggle. Tools like
Tableau Public can be used to share interactive dashboards.
Experiments
Experiment 10: Python Program for Detecting and Removing Outliers

Program:

import pandas as pd

import numpy as np

# Sample data

data = {

'Value': [10, 12, 14, 18, 90, 13, 15, 14, 300, 17, 13, 12, 16, 10]

df = pd.DataFrame(data)

# Detecting outliers using IQR

Q1 = df['Value'].quantile(0.25)

Q3 = df['Value'].quantile(0.75)

IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR

upper_bound = Q3 + 1.5 * IQR

# Detecting outliers

outliers = df[(df['Value'] < lower_bound) | (df['Value'] > upper_bound)]

print("Outliers detected:\n", outliers)

# Removing outliers

df_no_outliers = df[(df['Value'] >= lower_bound) & (df['Value'] <= upper_bound)]

print("Data after removing outliers:\n", df_no_outliers)

Output:

Experiment 11: Python Program for Drawing Bar Chart, Histogram, and Pie Chart

Program:

import matplotlib.pyplot as plt

# Sample data for visualizations

categories = ['Category A', 'Category B', 'Category C', 'Category D']

values = [23, 45, 56, 78]

# Bar chart

plt.figure(figsize=(6,4))

plt.bar(categories, values, color='blue')

plt.title('Bar Chart')

plt.show()

# Histogram

data = [10, 12, 13, 15, 18, 18, 19, 20, 23, 25, 29, 30, 31, 32, 35]

plt.figure(figsize=(6,4))

plt.hist(data, bins=5, color='green')

plt.title('Histogram')

plt.show()

# Pie chart

plt.figure(figsize=(6,4))

plt.pie(values, labels=categories, autopct='%1.1f%%', startangle=140, colors=['blue', 'orange',

'green', 'red'])

plt.title('Pie Chart')

plt.show()

Output:
Experiment 12: Python Program for Time Series Visualization
Program:

import pandas as pd

import matplotlib.pyplot as plt

# Creating a sample time series DataFrame with 'MS' for month start

date_rng = pd.date_range(start='2024-01-01', end='2024-12-31', freq='MS')

data = {'Sales': [200, 210, 215, 220, 230, 250, 245, 260, 270, 275, 290, 300]}

df = pd.DataFrame(data, index=date_rng)

# Time series plot

plt.figure(figsize=(10, 6))

plt.plot(df.index, df['Sales'], marker='o', linestyle='-', color='blue')

plt.title('Monthly Sales Time Series')

plt.xlabel('Month')

plt.ylabel('Sales')

plt.grid(True)

plt.show()

Output:
Module V: Web Scraping
Web scraping is the process of extracting data from websites. It is a critical skill in data analysis
and machine learning, especially when the required data isn't available in structured formats like
CSV or databases. This module covers various aspects of web scraping, including techniques and
tools to interact with web pages and extract meaningful information.

1. What to Scrape and How

What to Scrape: Identify the information you need from a website, such as product details, news
articles, or user reviews. Not all content on a webpage is relevant, so knowing what to scrape helps
to target the exact data needed.

How to Scrape: There are different methods like using requests for simple pages, or more advanced
tools like Selenium or Scrapy for pages that require interaction or load dynamically.

2. Analyzing a Web Page

This involves understanding the structure of a webpage by inspecting its HTML elements.
Tools like Chrome DevTools help in finding the tags (e.g., <div>, <p>, <span>) that contain the
required data. It is important to locate elements correctly before writing a scraper.

3. Network/Timeline
Network Analysis: Using browser dev tools, you can inspect network requests to understand how
data is loaded. This is especially useful for scraping dynamically loaded content, such as AJAX
calls.

Timeline: Helps to track when different elements load on a page, useful when dealing with
JavaScript-heavy websites.

4. Interacting with JavaScript

Some websites load content dynamically using JavaScript. Scrapers like Selenium can automate
interactions with JavaScript-based content, allowing you to scrape data that would otherwise be
invisible with static HTML scraping methods.

5. In-Depth Analysis of a Page

Understanding how different elements are nested and structured is crucial for writing effective
scraping scripts. This involves deep inspection of HTML tags, attributes, and JavaScript functions
that might load data dynamically.

6. Getting Pages
This involves sending HTTP requests to a URL to fetch the HTML content. Libraries like requests
in Python are commonly used for this. The response can then be parsed to extract the needed data.
7. Reading a Web Page with LXML and XPath
LXML: A powerful library for parsing XML and HTML documents. It is faster than BeautifulSoup
and allows more complex parsing.

XPath: A language used for navigating through elements and attributes in an XML/HTML
document. It allows precise selection of elements, making it effective for scraping.

8. Advanced Web Scraping - Browser-Based Parsing

Selenium: A tool that automates browsers, useful for scraping pages that require user interactions
like clicking buttons or filling forms. Selenium can simulate real user actions.

Ghost.py: A headless browser scraping tool, suitable for web scraping without opening a browser
window.

9. Screen Reading with Selenium

Screen Reading: Automating the extraction of visible elements using Selenium.
This is especially useful when you need to scrape data that requires scrolling or clicking to appear.

10. Spidering the Web - Building a Spider with Scrapy

Scrapy: A powerful and fast web crawling framework in Python that automates the extraction and
storage of data from websites. It is ideal for large-scale scraping projects.

Building a Spider: A spider is a class in Scrapy that defines how to follow links and extract content
from the target website.

11. Crawling Whole Websites with Scrapy

Scrapy spiders can be set to follow links and scrape multiple pages on a website. This process is
called crawling. Scrapy manages the crawling efficiently and allows data storage in formats like
JSON or CSV.

Experiments:
Experiment 13: Develop a Python Program for Reading an HTML Page
Program:

import requests

from bs4 import BeautifulSoup

# URL of the webpage to be scraped

url = 'https://example.com' # Replace with the URL you want to scrape

# Send a GET request to the URL

response = requests.get(url)
# Check if the request was successful

if response.status_code == 200:

# Parse the HTML content using BeautifulSoup

soup = BeautifulSoup(response.content, 'html.parser')

# Extract and print the title of the page

title = soup.title.string

print("Title of the page:", title)

# Extract all paragraphs and print their text content

paragraphs = soup.find_all('p')

print("\nParagraphs:")

for i, paragraph in enumerate(paragraphs, start=1):

print(f"Paragraph {i}: {paragraph.get_text()}")

else:

print("Failed to retrieve the web page. Status code:", response.status_code)

Web Page:
Output:

Experiment 14: Develop a Python Program for Building a Spider Using

Scrapy

Program:

import scrapy

class ExampleSpider(scrapy.Spider):

name = 'example'

start_urls = ['https://example.com'] # Replace with the target URL

def parse(self, response):

# Extracting the page title

title = response.xpath('//title/text()').get()

yield {'Title': title}

# Extracting all paragraphs' text

paragraphs = response.xpath('//p/text()').getall()

for i, paragraph in enumerate(paragraphs, start=1):

yield {f'Paragraph {i}': paragraph}

Web Page:

Output:

Data Analytics New Quantum AKTU
No ratings yet
Data Analytics New Quantum AKTU
210 pages
Mint's Original Marketing Plan
No ratings yet
Mint's Original Marketing Plan
6 pages
Zomato Restaurant Clustering & Sentiment Analysis - Ipynb - Colaboratory
No ratings yet
Zomato Restaurant Clustering & Sentiment Analysis - Ipynb - Colaboratory
27 pages
Challenges and Scope of Data Science Project
No ratings yet
Challenges and Scope of Data Science Project
21 pages
Memory
No ratings yet
Memory
7 pages
Q1 English 6 Summative Test 1-4
100% (12)
Q1 English 6 Summative Test 1-4
4 pages
New Blockchain Unconfirmed Hack Script 2 1 PDF
100% (1)
New Blockchain Unconfirmed Hack Script 2 1 PDF
3 pages
Grocery HTML Program by HTML, Css and Js
0% (1)
Grocery HTML Program by HTML, Css and Js
42 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Data Wrangling
No ratings yet
Data Wrangling
13 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
4 pages
Data Wrangling
No ratings yet
Data Wrangling
15 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Data Visualization R Programming Power Bi Lab Record
No ratings yet
Data Visualization R Programming Power Bi Lab Record
29 pages
A Project Report On Data Analysis
No ratings yet
A Project Report On Data Analysis
25 pages
Data Science - Unit-4
No ratings yet
Data Science - Unit-4
30 pages
Data Wrangling
0% (1)
Data Wrangling
7 pages
Assignment-2 Data Visualization and Data Preprocessing
No ratings yet
Assignment-2 Data Visualization and Data Preprocessing
1 page
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
47 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
Tableau Lab Manual
No ratings yet
Tableau Lab Manual
6 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
19 pages
Unit-1 Basics of Algorithms and Mathematics
No ratings yet
Unit-1 Basics of Algorithms and Mathematics
47 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
Question Bank - CSE-DS
No ratings yet
Question Bank - CSE-DS
5 pages
Unit 1 DataScience
No ratings yet
Unit 1 DataScience
105 pages
Experiment No 1
No ratings yet
Experiment No 1
7 pages
20IT503 - Big Data Analytics - Unit2
No ratings yet
20IT503 - Big Data Analytics - Unit2
62 pages
Interview Preparations - NielsenIQ
No ratings yet
Interview Preparations - NielsenIQ
1 page
Data Wrangling (Data Preprocessing) : Practical Assessment 1
No ratings yet
Data Wrangling (Data Preprocessing) : Practical Assessment 1
5 pages
Busa2001 2023 Sem2 Newcastle
No ratings yet
Busa2001 2023 Sem2 Newcastle
6 pages
Sales Prediction
No ratings yet
Sales Prediction
37 pages
DM Case Studies
No ratings yet
DM Case Studies
24 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
6 pages
Unit Iv
No ratings yet
Unit Iv
8 pages
Clickstream Analysis
No ratings yet
Clickstream Analysis
25 pages
Key Data Mining Tasks: 1. Descriptive Analytics
No ratings yet
Key Data Mining Tasks: 1. Descriptive Analytics
10 pages
Data Mining TOC
No ratings yet
Data Mining TOC
3 pages
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
No ratings yet
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
4 pages
A Modular Approach To Program Organization
No ratings yet
A Modular Approach To Program Organization
51 pages
Matrix-Vector Multiplication Using MapReduce in Big Data.
No ratings yet
Matrix-Vector Multiplication Using MapReduce in Big Data.
4 pages
Data Analytics Using SQL Final Question Bank
No ratings yet
Data Analytics Using SQL Final Question Bank
79 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Data Visualisation With Tableau
No ratings yet
Data Visualisation With Tableau
26 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
Python Interview Questions
No ratings yet
Python Interview Questions
8 pages
Data Analytics Question Bank
No ratings yet
Data Analytics Question Bank
4 pages
Java Assignment
No ratings yet
Java Assignment
6 pages
Aparna INTERN REPORT 12
No ratings yet
Aparna INTERN REPORT 12
46 pages
Unit I
No ratings yet
Unit I
6 pages
Business Analytics Local Author Book 1
No ratings yet
Business Analytics Local Author Book 1
233 pages
DM Practice
No ratings yet
DM Practice
15 pages
FDS Unit 1
No ratings yet
FDS Unit 1
21 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
Question Bank: Data Warehousing and Data Mining Semester: VII
No ratings yet
Question Bank: Data Warehousing and Data Mining Semester: VII
4 pages
Data Visualization
No ratings yet
Data Visualization
55 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
R22-Ids-Question Bank
No ratings yet
R22-Ids-Question Bank
4 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
Numpy - Tutorial - Ipynb - Colaboratory
No ratings yet
Numpy - Tutorial - Ipynb - Colaboratory
9 pages
DAP Lab Manual
No ratings yet
DAP Lab Manual
20 pages
Data Mining Using Python Lab
100% (1)
Data Mining Using Python Lab
63 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Computer Science: in The College of Sciences
No ratings yet
Computer Science: in The College of Sciences
4 pages
Java
No ratings yet
Java
205 pages
The Hottest Startups in Asia
100% (21)
The Hottest Startups in Asia
30 pages
Digital Marketing Roadmap
No ratings yet
Digital Marketing Roadmap
1 page
BCS WS Operation Manual LMJA 8xxx
No ratings yet
BCS WS Operation Manual LMJA 8xxx
54 pages
Online Platform For Blood Donation and Reception
No ratings yet
Online Platform For Blood Donation and Reception
6 pages
WordPress Multisite
100% (2)
WordPress Multisite
61 pages
Empowerment Technologies Quarter 2 Module 1
No ratings yet
Empowerment Technologies Quarter 2 Module 1
44 pages
Muhammad Saqib Ali SE SRS
No ratings yet
Muhammad Saqib Ali SE SRS
18 pages
Fabian Thylmann Tax Evasion Fabian Thylmann Times Square New York City Safe Sex
No ratings yet
Fabian Thylmann Tax Evasion Fabian Thylmann Times Square New York City Safe Sex
2 pages
Motoredutor SIEMENS - Simogear Getriebe 2030 Op Instr 0523 en-US
No ratings yet
Motoredutor SIEMENS - Simogear Getriebe 2030 Op Instr 0523 en-US
258 pages
Instructions Duplicate Document Requests
No ratings yet
Instructions Duplicate Document Requests
2 pages
Top Seo Tools PDF
100% (1)
Top Seo Tools PDF
12 pages
ET80-XG Firewall V17.5-Engineer Exam 17june
No ratings yet
ET80-XG Firewall V17.5-Engineer Exam 17june
57 pages
ADVANCED SEO Syllabus (Course Start Soon)
No ratings yet
ADVANCED SEO Syllabus (Course Start Soon)
9 pages
Internet and Passwords
No ratings yet
Internet and Passwords
3 pages
Translate - Penelusuran Google 5
No ratings yet
Translate - Penelusuran Google 5
1 page
Compuware DCRUM Intro 2012 Version 12.00
No ratings yet
Compuware DCRUM Intro 2012 Version 12.00
142 pages
Application Security Code Review Guidelines
No ratings yet
Application Security Code Review Guidelines
3 pages
A Review and Analysis of Technologies For Developing Web
No ratings yet
A Review and Analysis of Technologies For Developing Web
6 pages
Smart School SRS
100% (1)
Smart School SRS
33 pages
Current Issues in Marketing Assignment Presentation - 1
No ratings yet
Current Issues in Marketing Assignment Presentation - 1
20 pages
What's More?: Activity 1: Case Study
No ratings yet
What's More?: Activity 1: Case Study
2 pages
My Club Cooee Data 2024 03 12 09 14 17
100% (1)
My Club Cooee Data 2024 03 12 09 14 17
2 pages
Harmonised Role Model 19
No ratings yet
Harmonised Role Model 19
21 pages