BIG DATA ANALYTICS Lab Manual
BIG DATA ANALYTICS Lab Manual
Anaconda is an open-source software that contains Jupyter, spyder, etc that are used for large data
processing, data analytics, heavy scientific computing. Anaconda works for R and python programming
language. Spyder(sub-application of Anaconda) is used for python. Opencv for python will work in spyder.
Package versions are managed by the package management system calledconda.
To begin working with Anaconda, one must get it installed first. Follow the below instructions to Download
and install Anaconda on your system:
Download and install Anaconda:
Headovertoanaconda.comandinstallthelatestversionofAnaconda.Makesuretodownloadthe
―Python3.7Version‖fortheappropriatearchitecture.
GettingthroughtheLicenseAgreement:
GettingthroughtheInstallationProcess:
#loadingthedatasetwhichisexcelfile
dataset=pandas.read_csv("crime.csv")
#displayingthedata dataset
importpandasaspd
dataset1=pd.read_csv("crime.csv")
dataset1
type(dataset1)
pandas.core.frame.DataFrame
importnumpy
arr=numpy.array([1,2,3,4,5]) print(arr)
NumPyisusually importedunderthenpalias.
importnumpyasnp
importnumpyasnp
arr=np.array([1,2,3,4,5])
print(arr)
CheckingNumPyVersion
Theversionstringisstored underversion attribute.
importnumpyasnp print(np.
version )
CreateaNumPyndarrayObject
NumPyisusedto workwitharrays.Thearrayobject inNumPyiscalled ndarray. We can
create a NumPy ndarrayobject byusing the array() function.
importnumpyasnp
arr=np.array([1,2,3,4,5])
print(arr)
print(type(arr))
Resourcemanger
Openlocalhost:8088 inabrowsertabtocheckresourcemanagerdetails.
plt.plot(crime.Murder,crime.Assault);
sns.barplot('Robbery','Year',data=crime);
Implement nosql Database Operations: Crud Operations, Arrays Using MONGO DB.
AIM:ToCreateaoperations forcrudandarrays withoutnosqldatasbase.
TITLE:BasicCRUD operationsinMongoDB.
CRUDoperationsrefertothebasicInsert,Read,UpdateandDeleteoperations. Inserting a
document into a collection (Create)
➢ The command db.collection.insert()will perform an insert operation into a collection of a
document.➢Letusinsert adocument toastudent collection.Youmust beconnectedto adatabase for doing
any insert. It is done as follows:
db.student.insert({ re
gNo: "3014",
name:"TestStudent",
course:{courseName:"MCA",duration:"3Years"}, address: {
city: "Bangalore",
state:"KA",
country:"India"}})
Anentryhas beenmadeintothecollectioncalledstudent.
Queryingadocumentfromacollection(Read)
Toretrieve(Select) theinserteddocument,runthebelowcommand. The find()commandwill retrieve all the
documents of the given collection.
db.collection_name.find()
➢ Ifarecordisto beretrieved basedonsomecriteria,thefind() methodshouldbecalledpassing parameters,
then the record will be retrieved based on the attributes specified.
db.collection_name.find({"fieldname":"value"})
➢ ForExample:Let usretrievetherecordfromthestudent collectionwheretheattributeregNo is 3014and the
query for the same is as shown below:
db.students.find({"regNo":"3014"})
Updatingadocument inacollection(Update)Inordertoupdatespecific fieldvaluesofacollection in MongoDB,
run the below query. db.collection_name.update()
update()methodspecifiedabovewilltakethe fieldnameandthenew valueasargumenttoupdate a document.
Let usupdatetheattributenameofthecollectionstudentforthedocument with regNo 3014.
db.student.update({
"regNo":"3014"
},
$set:
{
"name":"Viraj"
})
Removinganentryfromthecollection (Delete)
➢ Let us now look into the deleting an entry froma collection. In order to delete an entry froma
collection,runthecommandasshownbelow:db.collection_name.remove({"fieldname":"value"})
➢ ForExample: db.student.remove({"regNo":"3014"})
Notethatafterrunningtheremove()method,theentryhasbeendeleted fromthestudentcollection.
2. CreateandQueryaDocument
Let'screateablog post document.Wewilluseadatabase called as blogsand acollectioncalled as
posts.Thecodeiswritteninmongoshell(aninteractiveJavaScript interfacetoMongoDB).Mongo shell is started
fromthe command line and is connected to the MongoDB server. Fromthe shell: use blogs
NEW_POST=
{
name:"WorkingwithArrays", user:
"Database Rebel",
desc:"Maintaininganarrayofobjectsinadocument", content:
"some content...",
created: ISODate(),
updated:ISODate(),
tags:["mongodb", "arrays"]
}
db.posts.insertOne(NEW_POST)
Returns a result { "acknowledged" : true, "insertedId" : ObjectId("5ec55af811ac5e2e2aafb2b9") }
indicatingthat anewdocument iscreated.Thisisacommonacknowledgement whenyouperforma write
operation. When a document is inserted into a collection for the first time, the collection gets created (if it
doesn't exist already). The insertOne method inserts a document into the collection.
Now,let'squerythecollection:
db.posts.findOne()
{
"_id":ObjectId("5ec55af811ac5e2e2aafb2b9"),
"name" : "Working with Arrays",
"user":"DatabaseRebel",
"desc":"Maintaininganarrayofobjects inadocument",
"content" : "some content...",
"created" : ISODate("2020-05-20T16:28:55.468Z"),
"updated":ISODate("2020-05-20T16:28:55.468Z"),
"tags":[
"mongodb",
"arrays"
]
}
The findOne method retrieves one matching document fromthe collection. Notethe scalar fields
name(stringtype)andcreated(datetype),andthearrayfieldtags.Inthenewlyinserteddocument there are no
comments, yet.
EXPERIMENT:5
Implement Functions: Count –Sort–Limit–Skip–Aggregate Using MONGODB.
2.SORT
Definition
$sort
Sortsallinputdocumentsandreturnsthemtothepipelineinsortedorder.
The
$sort
{$sort:{<field1>:<sortorder>,<field2>:<sortorder>...}}
$sort
takes a document that specifies the field(s) to sort by and the respective sort order. <sort
order> can have one of the following values:
Value
Description 1
Sortascending.
-1
Sortdescending.
{$meta:"textScore"}
SortbythecomputedtextScoremetadataindescendingorder.See Text Score Metadata
Sort
foranexample.
Ifsorting on multiple fields, sort order is evaluated from left to right. For example, in the
form above, documents are first sorted by <field1>. Then documents with the same <field1>
values are further sorted by <field2>.
Behavior
Limits
Youcansortonamaximumof32keys.
SortConsistency
MongoDB does not store documents in a collection in a particular order. When sorting on a
field which contains duplicate values, documents containing those values may be returned in
any order.
If consistent sort order is desired, include at least one field in your sort that contains unique
values. The easiest way to guarantee this isto include the _id field in your sort query.
Considerthefollowingrestaurantcollection:
db.restaurants.insertMany( [
{"_id":1,"name":"CentralParkCafe","borough":"Manhattan"},
{ "_id":2,"name":"Rock AFeller BarandGrill", "borough": "Queens"},
{"_id":3,"name":"EmpireStatePub","borough":"Brooklyn"},
{"_id":4,"name":"Stan'sPizzaria","borough":"Manhattan"},
{"_id":5,"name":"Jane'sDeli","borough":"Brooklyn"},
])
db.restaurants.aggregate( [
{$sort:{ borough:1}}
]
)
In this example, sort order may be inconsistent, since the borough field contains duplicate
values for both Manhattan and Brooklyn. Documents are returned in alphabetical order by
borough, but the order of those documents with duplicate values for borough might not the
be the same across multiple executions of the same sort. For example, here are the results
from two different executions of the above command:
{"_id":3,"name":"EmpireStatePub","borough":"Brooklyn"}
{"_id":5,"name":"Jane'sDeli","borough":"Brooklyn"}
{"_id":1,"name":"CentralParkCafe","borough":"Manhattan"}
{"_id":4,"name":"Stan'sPizzaria","borough":"Manhattan"}
{"_id":2,"name":"RockAFellerBarandGrill","borough":"Queens"
}
{"_id":5,"name":"Jane'sDeli","borough":"Brooklyn"}
{"_id":3,"name":"EmpireStatePub","borough":"Brooklyn"}
{"_id":4,"name":"Stan'sPizzaria","borough":"Manhattan"}
{"_id":1,"name":"CentralParkCafe","borough":"Manhattan"}
{"_id":2,"name":"RockAFellerBarandGrill","borough":"Queens"
}
While the values for borough are still sorted in alphabetical order, the order of the
documents containing duplicate values for borough (i.e. Manhattan and Brooklyn) is not the
same.
Toachieve a consistent sort, add a fieldwhichcontains exclusively unique values to the sort.
The following command uses the
$sort
stagetosortonboththeboroughfieldandthe_idfield:
db.restaurants.aggregate( [
{$sort:{borough:1, _id:1}}
]
)
Examples Ascending/DescendingSort
For the field or fields to sort by, set the sort order to 1 or -1 to specifyan ascending or descending
sort respectively, as in the following example:
db.users.aggregate( [
{$sort:{ age :-1,posts:1} }
]
)
This operation sorts the documents in the users collection, in descending order according by
the age field and then in ascending order according to the value in the posts field.
2. LIMIT
$sort
Sortsallinputdocumentsandreturnsthemtothepipelineinsortedorder. The $sort stage has the
following prototype form:
{$sort:{<field1>:<sortorder>,<field2>:<sortorder>...}}
$sort takes a document that specifies the field(s) to sort by and the
respectivesortorder.<sortorder>canhaveoneofthefollowingvalues:
Value Description
1 Sortascending.
-1 Sortdescending.
{ $meta: Sort by the computed textScore metadata in descending order. See
"textScore" } Text Score Metadata Sort for an example.
If sorting on multiple fields, sort order is evaluated from left to right. For example, in the
form above, documents are first sorted by <field1>. Then documents with the same <field1>
values are further sorted by <field2>.
Behavior
Limits
Youcansortonamaximumof32keys.
SortConsistency
MongoDB does not store documents in a collection in a particular order. When sorting on a
field which contains duplicate values, documents containing those values may be returned in
any order.
Ifconsistentsortorderisdesired,includeatleastonefieldinyoursort
Examples Ascending/DescendingSort
For the field or fields to sort by, set the sort order to 1or-1to specify an ascending or descending
sort respectively, as in the following example:
db.users.aggregate(
[
{$sort:{age :-1,posts:1}}
]
)
#inputcomesfromSTDIN for
line in sys.stdin:
line=line.strip()#removeleadingandtrailingwhitespace #
parse the input we got from mapper.py
word,count=line.split('\t',1)
#convertcount(currentlyastring)toint try:
count=int(count)
exceptValueError:
#countwasnotanumber,sosilently #
ignore/discard this line
continue
#thisIF-switchonlyworksbecauseHadoopsortsmapoutput # by
key (here: word) before it is passed to the reducer
if current_word == word:
current_count+=count
else:
ifcurrent_word:
#writeresulttoSTDOUT
print'%s\t%s'%(current_word,current_count)
current_count = count
current_word=word
#donotforgettooutputthelastwordifneeded! if
current_word == word:
print'%s\t%s'%(current_word,current_count)
hduser@ubuntu:~$echo"foofooquuxlabsfoobarquux"|/home/hduser/mapper.py|sort-k1,1|
/home/hduser/reducer.py
bar 1
foo 3
labs1
quux2
hduser@ubuntu:~$cat/tmp/gutenberg/20417-8.txt|/home/hduser/mapper.py
The 1
Project1
Gutenberg 1
EBook1
of 1
EXPERIMENT:7
Importpandasaspd
Importpickle
data=pd.read_csv('data.csv')
#SlicingData
slice1=data.iloc[0:399,:]
slice2=data.iloc[400:800,:]
slice3=data.iloc[801:1200,:]
slice4=data.iloc[1201:,:]
defmapper(data):
mapped=[]
forindex,rowindata.iterrows():
mapped.append((row['quality'],row['volatileacidity']))
Returnmapped
map1=mapper(slice1)
map2=mapper(slice2)
map3=mapper(slice3)
map4=mapper(slice4)
shuffled={
3.0:[],
4.0:[],
5.0:[],
6.0:[],
7.0:[],
8.0:[],
}
for iin[map1,map2,map3,map4]:
for jin i:
shuffled[j[0]].append(j[1])
file=open('shuffled.pkl','ab')
pickle.dump(shuffled,file)
file.close()
print("Datahasbeenmapped.Now,run reducer.pytoreducethecontentsin
shuffled.pkl file.")
ReducerProgram
Import
Pickle
file=open('shuffled.pkl','rb')
shuffled= pickle.load(file)
defreduce(shuffled_dict):
reduced={}
for iinshuffled_dict:
reduced[i]=sum(shuffled_dict[i])/len(shuffled_dict[i])
Returnreduced
final=reduce(shuffled)
print("Averagevolatileacidityindifferentclasses ofwine:")
foriinfinal:
print(i,':',final[i])
EXPERIMENT:8
ImplementClusteringTechniquesUsingSPARK.
AIM:Tocreatea clusteringusingSPARK.
#Loadsdata.
dataset=spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt")
#Trainsak-meansmodel.
kmeans=KMeans().setK(2).setSeed(1)
model = kmeans.fit(dataset)
#EvaluateclusteringbycomputingWithinSetSumof SquaredErrors.
wssse = model.computeCost(dataset)
print("WithinSetSumofSquaredErrors="+str(wssse))
#Showstheresult.
centers=model.clusterCenters()
print("Cluster Centers: ")
forcenterincenters:
print(center)
BIGDATAANALYTICSLAB 2023-2024
BIGDATAANALYTICSLAB 2023-2024
EXPERIMENT: 9
RShinyTutorial:HowtoMakeInteractiveWebApplicationsinR Introduction
In this modern technological era, various apps are available for all of us –from tracking our fitness level, sleep
to giving usthe latest informationaboutthe stockmarkets. Appslike Robinhood,Google Fit and Workit seem so
amazingly useful because they use real-time data and statistics. As R is a frontrunnerin thefield of statistical
computing and programming, developers need a system to useits power to build apps.
This is where R Shiny comes to save the day. In this, R Shiny tutorial, you will come to know the basics.
WhatisRShiny?
Shiny is an R package that was developed for building interactive web applications in R. Using this, you can
create web applications utilizing native HTML and CSS code along with R Shinycode. You can build
standalone web apps on a website that will make data visualization easy.These applications made through R
Shinycan seamlessly display R objects such as tables and plots.
Letuslookatsomeofthefeatures ofRShiny:
Buildwebapplicationswithfewerlinesofcode,without JavaScript.
Theseapplicationsareliveandareaccessibletouserslikespreadsheets.Theoutputsmay alter in real-time
if the users change the input.
Developerswith littleknowledgeofwebtoolscanalsobuildappsusingR Shiny.
Yougetin-builtwidgetstodisplaytables,outputsofRobjectsand plots.
Youcanadd livevisualizations and reportstothewebapplicationusingthispackage.
TheuserinterfacescanbecodedinRor canbeprepared usingHTML,CSSorJavaScript.
Thedefaultuserinterface isbuiltusingBootstrap.
ItcomeswithaWebSocketpackagethat enablesfast communicationbetweenthewebserver and R.
ComponentsofanRShiny app
A Shiny app has two primary components – a user interface object and a server function. These are
the arguments passed on to the shinyApp method. This method creates an application object using
the arguments.
LetusunderstandthebasicpartsofanRShinyappindetail:
Userinterfacefunction
This function defines the appearance of the web application. It makes the application interactive by
obtaining input from the user and displaying it on the screen. HTML and CSS tags can be used for
making the application look better. So, while building the ui.R file you create an HTML file with R
functions.
If you type fluidPage() in the R console, you will see that the method returns a tag <div
class=‖container-fluid‖></div>.
Thedifferentinputfunctionsare:
selectInput() – This method is used for creating a dropdown HTML that has various choices
to select.
numericInput()–Thismethodcreatesaninputareaforwritingtextornumbers.
radioButtons()–Thisprovidesradiobuttonsfortheusertoselectan input.
Layoutmethods
ThevariouslayoutfeaturesavailableinBootstrapareimplemented byRShiny.Thecomponentsare:
Panels
Thesearemethodsthatgroupelementstogetherintoasinglepanel.Theseinclude:
absolutePanel()
inputPanel()
conditionalPanel()
headerPanel()
fixedPanel()
Layoutfunctions
Theseorganizethepanels foraparticularlayout. These include:
fluidRow()
verticalLayout()
flowLayout()
splitLayout()
sidebarLayout()
Outputmethods
ThesemethodsareusedfordisplayingRoutputcomponentsimages,tablesandplots.Theyare:
tableOutput()–Thismethod isusedfordisplayinganRtable
plotOutput()–This methodisusedfordisplayinganRplotobject
Serverfunction
After you have created the appearance of the application and the ways to take input values from the user, it
is time to set upthe server. The server functions help youto writethe server-side code forthe Shiny app. You
can create functions that map the user inputs to the corresponding outputs. This function is called bythe
web browser when the application is loaded.
It takes an input and output parameter, and return values are ignored. An optional session parameter is also
taken by this method.
RShinytutorial:HowtogetstartedwithRShiny?
Stepsto startworkingwiththeRShinypackage are asfollows:
GototheRconsoleandtypeinthecommand–install.packages(―shiny‖)
The package comes with 11 built-in application examples for you to understand how Shinyworks