0% found this document useful (0 votes)

16 views54 pages

DSBDA Manual

This document certifies the practical work of students in 'Data Science & Big Data Analytics' under the guidance of Prof. Gage P.K. It includes a series of experiments involving data wrangling, analytics, visualization, and programming tasks using Python and Java. The document serves as a partial fulfillment of the requirements for the Bachelor of Engineering degree at Savitribai Phule Pune University.

Uploaded by

Varad Gorhe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views54 pages

DSBDA Manual

Uploaded by

Varad Gorhe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Certificate

THIS IS CERTIFY TO THE WORK EMBODIES IN THE “DATA

SCIENCE & BIG DATA ANALYTICS” PRACTICAL.
THIS ARE BONAFIDE STUDENTS OF THIS INSTITUTE AND THE WORK
HAS BEEN CARRIED OUT BY THEM UNDER THE GUIDANCE OF
“Prof Gage P.K.” AND IT IS APPROVED FOR THE PARTIAL FULL-
FILLMENT OF THE REQUIREMENT OF SAVITRIBAI PHULE PUNE
UNIVERSITY FOR THE DEGREE OF BACHELOR OF ENGINEERING
THIRD YEAR OF COMPUTER ENGINEERING.

DATE:- PLACE:-Belhe

NAME: NAVALE RUPESH PANDHARINATH ROLL.NO : 36

Batch :B

Prof. Gage P.K. Prof. Shegar S.R

(Dept. of Computer Engineering) (Head of Dept. Of computer engineering)

Dr. Narawade N.S.

(Principle of SGOI COE)
EXPERIMENT NO:-01
Data wranging ,I perform the following operations using python on any
open source dataset(data.csv).
EXPERIMENT NO:02
Data wranging-2 Create an “Academic Performance” datasets of students
and perform the following operation using python
EXPERIMENT NO:03
Provide Summary Statistics for a datasets with numeric variable
grouped by one of the qualitative variable
EXPERIMENT NO:04
Create a linear regression model using python to predict home
prise using boston housing dataset.
EXPERIMENT NO:05
Data Analytics-2
EXPERIMENT NO:06
Data Analytics-3
EXPERIMENT NO:07
Text Analytics
EXPERIMENT NO:08
Data Visualization-1
EXPERIMENT NO:09
Data Visualization-2
EXPERIMENT NO:10
Download the iris flower datasets or any other datasets into a
Dataframe.
EXPERIMENT NO:11

Write a code in JAVA for a simple WordCount application that

counts the number of occurrences of each word in a given input set
using the Hadoop MapReduce framework on local-standalone set-
up.

Program:-
public class wordcount{
public static void main(String[] args) {
// Sample input string
String text = "this is a wordcount example.";
// Count words in the input string
int wordCount = countWords(text);
// Output the result
System.out.println("Word count: " + wordCount);
}
public static int countWords(String text) {
// Trim any leading or trailing spaces
text = text.trim();
// If the string is empty, return 0
if (text.isEmpty()) {
return 0;
}
// Split the text by one or more spaces
String[] words = text.split("\\s+");
// Return the number of words in the array
return words.length;
}
}

Output:-
EXPERIMENT NO:12
Design a distributed application using MapReduce which
processes a log file of a system.
Program:-
import java.io.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.regex.*;
public class kalyani {
// Mapper: Splits lines into words and creates (word, 1) pairs
public static class Mapper {
public List<Map<String, Integer>> map(String input) {
List<Map<String, Integer>> wordCountList = new ArrayList<>();
Map<String, Integer> wordCountMap = new HashMap<>();
// Split the input into words (based on non-word characters)
String[] words = input.split("\\W+")
for (String word : words) {
if (!word.isEmpty()) {
word = word.toLowerCase();
wordCountMap.put(word, wordCountMap.getOrDefault(word, 0) +
1);
}
}
// Add the word counts from this line to the result
wordCountList.add(wordCountMap);
return wordCountList;
}
}
// Reducer: Aggregates word counts from all mappers
public static class Reducer {
public Map<String, Integer> reduce(List<Map<String, Integer>>
mappedResults) {
Map<String, Integer> finalCountMap = new HashMap<>();
// For each map in the list of results
for (Map<String, Integer> map : mappedResults) {
// For each word and its count in the map
for (Map.Entry<String, Integer> entry : map.entrySet()) {
finalCountMap.put(entry.getKey(),
finalCountMap.getOrDefault(entry.getKey(), 0) +
entry.getValue());
}
}
return finalCountMap;
}
}
// Main Method
public static void main(String[] args) throws InterruptedException,
ExecutionException, IOException {
String inputText = "Hello world hello mapreduce hello Java world";
// 1. Step 1: Map phase (split input into words and create word count
pairs)
Mapper mapper = new Mapper();
List<Map<String, Integer>> mappedResults = mapper.map(inputText);
// 2. Step 2: Reduce phase (aggregate word counts)
Reducer reducer = new Reducer();
Map<String, Integer> finalWordCount = reducer.reduce(mappedResults);
// 3. Output the result
System.out.println("Word Count Results:");
for (Map.Entry<String, Integer> entry : finalWordCount.entrySet()) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
}

Output:-
EXPERIMENT NO:13
Locatate databases(eg.Sample_weather.txt) for Working on
Weather data which read the next input file & finds Average
for temperature ,dew point ,and wind speed using java.
Program:-
# This code does nothing, as the provided input is just log messages.

# To process these logs, you would need to parse them into a structured format.

# Here is an example of how you might do it using Python's regular expressions:

import re

logs = """

2025-04-01 12:01:01 INFO Starting system...

2025-04-01 12:01:15 ERROR Disk space low!

2025-04-01 12:02:45 INFO System running normally

2025-04-01 12:03:00 ERROR Disk space low!

2025-04-01 12:03:30 WARNING CPU usage high

"""

pattern = r"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (.*)"

for line in logs.splitlines():

if line:

match = re.match(pattern, line) # Indented this line to be inside the 'if' block

if match:

timestamp, level, message = match.groups()

print(f"Timestamp: {timestamp}, Level: {level}, Message: {message}")

#The following lines were not part of the code, but rather log entries.

#They have been commented out as they would cause a syntax error.

#2025-04-01 12:03:00 ERROR Disk space low!

#2025-04-01 12:03:30 WARNING CPU usage high

Output:-

CS 1103 Programming Assignment Unit 1
No ratings yet
CS 1103 Programming Assignment Unit 1
8 pages
Barangay Management Information System Chapter1
62% (47)
Barangay Management Information System Chapter1
36 pages
ClickHouse DBMS
No ratings yet
ClickHouse DBMS
203 pages
Online Resume Builder
75% (4)
Online Resume Builder
46 pages
Cloudgenerator
No ratings yet
Cloudgenerator
6 pages
Certificação Sophos
No ratings yet
Certificação Sophos
5 pages
R SQL
No ratings yet
R SQL
187 pages
Azure Cloud Intro
No ratings yet
Azure Cloud Intro
34 pages
5g Technology
100% (1)
5g Technology
20 pages
Sap Ewm Overview
No ratings yet
Sap Ewm Overview
11 pages
Grid Computing in SAS 9.4 Third Edition
No ratings yet
Grid Computing in SAS 9.4 Third Edition
154 pages
OLX - Software Requirement Specification SRS: Software Engineering (Lovely Professional University)
0% (1)
OLX - Software Requirement Specification SRS: Software Engineering (Lovely Professional University)
7 pages
Abhijeet Wankhade Angular Developer
No ratings yet
Abhijeet Wankhade Angular Developer
7 pages
Unit 5 PDF Presentation-1
No ratings yet
Unit 5 PDF Presentation-1
43 pages
Frame Layout
No ratings yet
Frame Layout
2 pages
Building Scalable Web Sites
No ratings yet
Building Scalable Web Sites
21 pages
What Is A Data Dictionary
No ratings yet
What Is A Data Dictionary
8 pages
3 - UiPath Advance Certification UIARD Certification Latest - Udemy
No ratings yet
3 - UiPath Advance Certification UIARD Certification Latest - Udemy
37 pages
Rmi
No ratings yet
Rmi
3 pages
New 6
No ratings yet
New 6
29 pages
Shilvant Maske Wordpress Developer: Work Experience
No ratings yet
Shilvant Maske Wordpress Developer: Work Experience
4 pages
ExD Company Profile - 17
No ratings yet
ExD Company Profile - 17
17 pages
Autocomplete
No ratings yet
Autocomplete
6 pages
Autocomplete 01
No ratings yet
Autocomplete 01
6 pages
Absolute Layout01
No ratings yet
Absolute Layout01
3 pages
Linear Layout
No ratings yet
Linear Layout
3 pages
Absolute Layout
No ratings yet
Absolute Layout
3 pages
Online Food Management System
No ratings yet
Online Food Management System
36 pages
To Count Using Map and Reduce Program: Wordcount - Java
No ratings yet
To Count Using Map and Reduce Program: Wordcount - Java
2 pages
(Solved) What Is The Error So That The Returned Number of Words EXCLUDE Some Word - Course Hero
No ratings yet
(Solved) What Is The Error So That The Returned Number of Words EXCLUDE Some Word - Course Hero
1 page
BDA List of Experiments For Practical Exam
No ratings yet
BDA List of Experiments For Practical Exam
21 pages
Sap Abap Consult
No ratings yet
Sap Abap Consult
5 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Appendix A - Guidelines To Install Safe Exam Browser
No ratings yet
Appendix A - Guidelines To Install Safe Exam Browser
4 pages
Lec 8
No ratings yet
Lec 8
24 pages
Map Reduce
No ratings yet
Map Reduce
4 pages
BDA Experiment 3
No ratings yet
BDA Experiment 3
7 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
Azure Fundamental Course
No ratings yet
Azure Fundamental Course
3 pages
Palak
No ratings yet
Palak
10 pages
Global Data Services Concepts and Administration Guide
No ratings yet
Global Data Services Concepts and Administration Guide
244 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Makalah English
No ratings yet
Makalah English
7 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
Inter BDSD 2022-2023
No ratings yet
Inter BDSD 2022-2023
3 pages
Map Reduce Design and Execution Framework Part 1
No ratings yet
Map Reduce Design and Execution Framework Part 1
19 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
WordCount Program Hadoop Task 2
No ratings yet
WordCount Program Hadoop Task 2
7 pages
DSBDA GRP B Print
No ratings yet
DSBDA GRP B Print
21 pages
Lesson 1 - Introduction To Database Systems
No ratings yet
Lesson 1 - Introduction To Database Systems
29 pages
Hadoop WordCount
No ratings yet
Hadoop WordCount
2 pages
Lec 8
No ratings yet
Lec 8
19 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Distributed Systems - RPC and RMI
100% (2)
Distributed Systems - RPC and RMI
24 pages
Bda Lab Exercises Lab Mannual - 2023
No ratings yet
Bda Lab Exercises Lab Mannual - 2023
72 pages
6 WIBD-Practicals
No ratings yet
6 WIBD-Practicals
19 pages
Lab2 WC
No ratings yet
Lab2 WC
2 pages
Exp5 BDI 60004200124
No ratings yet
Exp5 BDI 60004200124
5 pages
Big Data All Kumar
No ratings yet
Big Data All Kumar
24 pages
MapReduce Design Process (Word Count Example)
No ratings yet
MapReduce Design Process (Word Count Example)
3 pages
CTBD Ex02
No ratings yet
CTBD Ex02
3 pages
Exp 3-Word Count
No ratings yet
Exp 3-Word Count
4 pages
Practical 2-1
No ratings yet
Practical 2-1
4 pages
Big Data 4 Vivek
No ratings yet
Big Data 4 Vivek
3 pages
579 BDA Week-04
No ratings yet
579 BDA Week-04
1 page
Exp 11
No ratings yet
Exp 11
4 pages
Practical 3bcbs
No ratings yet
Practical 3bcbs
5 pages
Word Count Example
No ratings yet
Word Count Example
4 pages
Experiment 6 BDA
No ratings yet
Experiment 6 BDA
4 pages
084 Liza Bda File
No ratings yet
084 Liza Bda File
23 pages
Sribharanitharan.M 71762234049
No ratings yet
Sribharanitharan.M 71762234049
2 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
ITvitae Data Science Program - Introduction
No ratings yet
ITvitae Data Science Program - Introduction
35 pages
Report
No ratings yet
Report
4 pages
Programming Assignment Unit 1
No ratings yet
Programming Assignment Unit 1
12 pages
Word Count Program
No ratings yet
Word Count Program
3 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
No ratings yet
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
13 pages
Practical-2 Aim: Write A Program of Word Count in Map Reduce Over HDFS. Description
No ratings yet
Practical-2 Aim: Write A Program of Word Count in Map Reduce Over HDFS. Description
6 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
BDC Output 3
No ratings yet
BDC Output 3
4 pages
3 MapReduce Program Ex Code
No ratings yet
3 MapReduce Program Ex Code
14 pages
Hai Hadoop
No ratings yet
Hai Hadoop
14 pages
Ex No 04
No ratings yet
Ex No 04
4 pages
PART 1 - Install Java and Hadoop On Ubuntu
No ratings yet
PART 1 - Install Java and Hadoop On Ubuntu
4 pages
Ravikant Hadoop File
No ratings yet
Ravikant Hadoop File
22 pages
Ecommerce Project Planner
No ratings yet
Ecommerce Project Planner
1 page
Transaction Flow Testing Updated
No ratings yet
Transaction Flow Testing Updated
12 pages
Map Reduce Program
No ratings yet
Map Reduce Program
2 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
B1 Instructions
No ratings yet
B1 Instructions
9 pages
Cont 1
No ratings yet
Cont 1
3 pages
2 - Hadoop MapReduce
No ratings yet
2 - Hadoop MapReduce
2 pages
Program 2 Unit 1
No ratings yet
Program 2 Unit 1
7 pages
1 Polygon Fill
No ratings yet
1 Polygon Fill
3 pages
6 Animation
No ratings yet
6 Animation
1 page

DSBDA Manual

Uploaded by

DSBDA Manual

Uploaded by

Certificate

THIS IS CERTIFY TO THE WORK EMBODIES IN THE “DATA

NAME: NAVALE RUPESH PANDHARINATH ROLL.NO : 36

Prof. Gage P.K. Prof. Shegar S.R

Dr. Narawade N.S.

Write a code in JAVA for a simple WordCount application that

# Here is an example of how you might do it using Python's regular expressions:

2025-04-01 12:01:01 INFO Starting system...

2025-04-01 12:01:15 ERROR Disk space low!

2025-04-01 12:02:45 INFO System running normally

2025-04-01 12:03:00 ERROR Disk space low!

2025-04-01 12:03:30 WARNING CPU usage high

pattern = r"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (.*)"

for line in logs.splitlines():

timestamp, level, message = match.groups()

#2025-04-01 12:03:00 ERROR Disk space low!

#2025-04-01 12:03:30 WARNING CPU usage high

You might also like