0% found this document useful (0 votes)
16 views54 pages

DSBDA Manual

This document certifies the practical work of students in 'Data Science & Big Data Analytics' under the guidance of Prof. Gage P.K. It includes a series of experiments involving data wrangling, analytics, visualization, and programming tasks using Python and Java. The document serves as a partial fulfillment of the requirements for the Bachelor of Engineering degree at Savitribai Phule Pune University.

Uploaded by

Varad Gorhe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views54 pages

DSBDA Manual

This document certifies the practical work of students in 'Data Science & Big Data Analytics' under the guidance of Prof. Gage P.K. It includes a series of experiments involving data wrangling, analytics, visualization, and programming tasks using Python and Java. The document serves as a partial fulfillment of the requirements for the Bachelor of Engineering degree at Savitribai Phule Pune University.

Uploaded by

Varad Gorhe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Certificate

THIS IS CERTIFY TO THE WORK EMBODIES IN THE “DATA


SCIENCE & BIG DATA ANALYTICS” PRACTICAL.
THIS ARE BONAFIDE STUDENTS OF THIS INSTITUTE AND THE WORK
HAS BEEN CARRIED OUT BY THEM UNDER THE GUIDANCE OF
“Prof Gage P.K.” AND IT IS APPROVED FOR THE PARTIAL FULL-
FILLMENT OF THE REQUIREMENT OF SAVITRIBAI PHULE PUNE
UNIVERSITY FOR THE DEGREE OF BACHELOR OF ENGINEERING
THIRD YEAR OF COMPUTER ENGINEERING.

DATE:- PLACE:-Belhe

NAME: NAVALE RUPESH PANDHARINATH ROLL.NO : 36

Batch :B

Prof. Gage P.K. Prof. Shegar S.R


(Dept. of Computer Engineering) (Head of Dept. Of computer engineering)

Dr. Narawade N.S.


(Principle of SGOI COE)
EXPERIMENT NO:-01
Data wranging ,I perform the following operations using python on any
open source dataset(data.csv).
EXPERIMENT NO:02
Data wranging-2 Create an “Academic Performance” datasets of students
and perform the following operation using python
EXPERIMENT NO:03
Provide Summary Statistics for a datasets with numeric variable
grouped by one of the qualitative variable
EXPERIMENT NO:04
Create a linear regression model using python to predict home
prise using boston housing dataset.
EXPERIMENT NO:05
Data Analytics-2
EXPERIMENT NO:06
Data Analytics-3
EXPERIMENT NO:07
Text Analytics
EXPERIMENT NO:08
Data Visualization-1
EXPERIMENT NO:09
Data Visualization-2
EXPERIMENT NO:10
Download the iris flower datasets or any other datasets into a
Dataframe.
EXPERIMENT NO:11

Write a code in JAVA for a simple WordCount application that


counts the number of occurrences of each word in a given input set
using the Hadoop MapReduce framework on local-standalone set-
up.

Program:-
public class wordcount{
public static void main(String[] args) {
// Sample input string
String text = "this is a wordcount example.";
// Count words in the input string
int wordCount = countWords(text);
// Output the result
System.out.println("Word count: " + wordCount);
}
public static int countWords(String text) {
// Trim any leading or trailing spaces
text = text.trim();
// If the string is empty, return 0
if (text.isEmpty()) {
return 0;
}
// Split the text by one or more spaces
String[] words = text.split("\\s+");
// Return the number of words in the array
return words.length;
}
}

Output:-
EXPERIMENT NO:12
Design a distributed application using MapReduce which
processes a log file of a system.
Program:-
import java.io.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.regex.*;
public class kalyani {
// Mapper: Splits lines into words and creates (word, 1) pairs
public static class Mapper {
public List<Map<String, Integer>> map(String input) {
List<Map<String, Integer>> wordCountList = new ArrayList<>();
Map<String, Integer> wordCountMap = new HashMap<>();
// Split the input into words (based on non-word characters)
String[] words = input.split("\\W+")
for (String word : words) {
if (!word.isEmpty()) {
word = word.toLowerCase();
wordCountMap.put(word, wordCountMap.getOrDefault(word, 0) +
1);
}
}
// Add the word counts from this line to the result
wordCountList.add(wordCountMap);
return wordCountList;
}
}
// Reducer: Aggregates word counts from all mappers
public static class Reducer {
public Map<String, Integer> reduce(List<Map<String, Integer>>
mappedResults) {
Map<String, Integer> finalCountMap = new HashMap<>();
// For each map in the list of results
for (Map<String, Integer> map : mappedResults) {
// For each word and its count in the map
for (Map.Entry<String, Integer> entry : map.entrySet()) {
finalCountMap.put(entry.getKey(),
finalCountMap.getOrDefault(entry.getKey(), 0) +
entry.getValue());
}
}
return finalCountMap;
}
}
// Main Method
public static void main(String[] args) throws InterruptedException,
ExecutionException, IOException {
String inputText = "Hello world hello mapreduce hello Java world";
// 1. Step 1: Map phase (split input into words and create word count
pairs)
Mapper mapper = new Mapper();
List<Map<String, Integer>> mappedResults = mapper.map(inputText);
// 2. Step 2: Reduce phase (aggregate word counts)
Reducer reducer = new Reducer();
Map<String, Integer> finalWordCount = reducer.reduce(mappedResults);
// 3. Output the result
System.out.println("Word Count Results:");
for (Map.Entry<String, Integer> entry : finalWordCount.entrySet()) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
}

Output:-
EXPERIMENT NO:13
Locatate databases(eg.Sample_weather.txt) for Working on
Weather data which read the next input file & finds Average
for temperature ,dew point ,and wind speed using java.
Program:-
# This code does nothing, as the provided input is just log messages.

# To process these logs, you would need to parse them into a structured format.

# Here is an example of how you might do it using Python's regular expressions:

import re

logs = """

2025-04-01 12:01:01 INFO Starting system...

2025-04-01 12:01:15 ERROR Disk space low!

2025-04-01 12:02:45 INFO System running normally

2025-04-01 12:03:00 ERROR Disk space low!

2025-04-01 12:03:30 WARNING CPU usage high

"""

pattern = r"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (.*)"

for line in logs.splitlines():

if line:

match = re.match(pattern, line) # Indented this line to be inside the 'if' block

if match:

timestamp, level, message = match.groups()


print(f"Timestamp: {timestamp}, Level: {level}, Message: {message}")

#The following lines were not part of the code, but rather log entries.

#They have been commented out as they would cause a syntax error.

#2025-04-01 12:03:00 ERROR Disk space low!

#2025-04-01 12:03:30 WARNING CPU usage high

Output:-

You might also like