0% found this document useful (0 votes)

21 views12 pages

Lab 3 Data Mining NoSQl - Harshil - Parmar

This document provides a comprehensive guide on using MongoDB with Python for non-relational data mining. It covers the installation of MongoDB, the use of the PyMongo driver, and various data manipulation techniques such as inserting, retrieving, filtering, and sorting documents within a MongoDB collection. Additionally, it includes reflection tasks for further practice with data deletion and updating operations.

Uploaded by

HS Harshil Parmar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views12 pages

Lab 3 Data Mining NoSQl - Harshil - Parmar

Uploaded by

HS Harshil Parmar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Objective

Apply non relational datamining techniques in Python using mongodb

Practice NoSQl datamining techniques and reflect

What is MongoDB?
MongoDB is a document database that can be installed in the local machine or hosted in the
cloud. The flavour of mongodb in the cloud is call MongoDB atlas.
It stores JSON-like documents providing flexibility and scalability

For this lab we will download the MongoDB community server from this
https://www.mongodb.com/try/download/community link

PyMongo
PyMongo is a python based driver that is required to access the MongoDB database

!pip install pymongo

Collecting pymongo
Downloading pymongo-4.11-cp313-cp313-win_amd64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.11-cp313-cp313-win_amd64.whl (932 kB)
---------------------------------------- 0.0/932.9 kB ? eta -:--:--
---------------------------------------- 0.0/932.9 kB ? eta -:--:--
---------------------- ----------------- 524.3/932.9 kB 2.4 MB/s
eta 0:00:01
---------------------------------------- 932.9/932.9 kB 2.1 MB/s
eta 0:00:00
Downloading dnspython-2.7.0-py3-none-any.whl (313 kB)
Installing collected packages: dnspython, pymongo
Successfully installed dnspython-2.7.0 pymongo-4.11

import pymongo as pm

Create a sample database if it does not already

exists
conn = pm.MongoClient("mongodb://localhost:27017/") ### create
connection to the database

db=conn["firstmongo"] ### create a database named "firstmongo"

Create a collection "Class" with corresponding field values and thier
data types
Note: "Table" in MongoDB is called "Collection"

1. Create a collection by specifying the name of the collection (if it does not already exists)
2. In MongoDB the collection is not created unless there is a content in the collection.
Therefore there will be no blank collections (tables) in the database.
3. We will insert a single document (same as a record in the SQL table) using insert_one()
method
mycollection = db["class"]

mydict = { "name": "Shweta", "Course": "Info 6150" , "classsize":40}

x = mycollection.insert_one(mydict)

#print list of the _id values of the inserted documents:

print(x.inserted_id)

67a28a5f5aabe18fcc1bf3e6

#### Lets add more human readable unique_ids

mylist = [
{ "_id": 100, "name": "John", "Course": "Info 6150" , "grade":30},
{ "_id": 200, "name": "Peter", "Course": "Info 6150" , "grade":40},
{ "_id": 300, "name": "Amy", "Course": "Info 6150" , "grade":50},
{ "_id": 400, "name": "Hannah", "Course": "Info 6150" , "grade":60},
{ "_id": 500, "name": "Michael", "Course": "Info 6150" ,
"grade":70},
{ "_id": 600, "name": "Sandy", "Course": "Info 6150" , "grade":80},
{ "_id": 700, "name": "Betty", "Course": "Info 6150" , "grade":80},
{ "_id": 800, "name": "Richard", "Course": "Info 6150" ,
"grade":70},
{ "_id": 900, "name": "Susan", "Course": "Info 6150" , "grade":60},
{ "_id": 1000, "name": "Vicky", "Course": "Info 6150" , "grade":50},
{ "_id": 1100, "name": "Ben", "Course": "Info 6150" , "grade":85},
{ "_id": 1200, "name": "William", "Course": "Info 6150" ,
"grade":75},
{ "_id": 1300, "name": "Chuck", "Course": "Info 6150" , "grade":65},
{ "_id": 1400, "name": "Viola", "Course": "Info 6150" , "grade":55}
]

x = mycollection.insert_many(mylist)
print(x.inserted_ids)

[100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300,
1400]

You will agree that this is a more easily understood unique id

Retreive data from the collection you just created
### Find One method
one = mycollection.find_one()
print(one)

### Find All method

for all in mycollection.find():

print(all)

{'_id': ObjectId('67a28a5f5aabe18fcc1bf3e6'), 'name': 'Shweta',

'Course': 'Info 6150', 'classsize': 40}
{'_id': ObjectId('67a28a5f5aabe18fcc1bf3e6'), 'name': 'Shweta',
'Course': 'Info 6150', 'classsize': 40}
{'_id': 100, 'name': 'John', 'Course': 'Info 6150', 'grade': 30}
{'_id': 200, 'name': 'Peter', 'Course': 'Info 6150', 'grade': 40}
{'_id': 300, 'name': 'Amy', 'Course': 'Info 6150', 'grade': 50}
{'_id': 400, 'name': 'Hannah', 'Course': 'Info 6150', 'grade': 60}
{'_id': 500, 'name': 'Michael', 'Course': 'Info 6150', 'grade': 70}
{'_id': 600, 'name': 'Sandy', 'Course': 'Info 6150', 'grade': 80}
{'_id': 700, 'name': 'Betty', 'Course': 'Info 6150', 'grade': 80}
{'_id': 800, 'name': 'Richard', 'Course': 'Info 6150', 'grade': 70}
{'_id': 900, 'name': 'Susan', 'Course': 'Info 6150', 'grade': 60}
{'_id': 1000, 'name': 'Vicky', 'Course': 'Info 6150', 'grade': 50}
{'_id': 1100, 'name': 'Ben', 'Course': 'Info 6150', 'grade': 85}
{'_id': 1200, 'name': 'William', 'Course': 'Info 6150', 'grade': 75}
{'_id': 1300, 'name': 'Chuck', 'Course': 'Info 6150', 'grade': 65}
{'_id': 1400, 'name': 'Viola', 'Course': 'Info 6150', 'grade': 55}

Return only specific document information and not all data points

Filter and Select

for all in mycollection.find({},{ "_id": 0, "name": 1, "grade": 1 }):

print(all)

{'name': 'Shweta'}
{'name': 'John', 'grade': 30}
{'name': 'Peter', 'grade': 40}
{'name': 'Amy', 'grade': 50}
{'name': 'Hannah', 'grade': 60}
{'name': 'Michael', 'grade': 70}
{'name': 'Sandy', 'grade': 80}
{'name': 'Betty', 'grade': 80}
{'name': 'Richard', 'grade': 70}
{'name': 'Susan', 'grade': 60}
{'name': 'Vicky', 'grade': 50}
{'name': 'Ben', 'grade': 85}
{'name': 'William', 'grade': 75}
{'name': 'Chuck', 'grade': 65}
{'name': 'Viola', 'grade': 55}

### Print everything but exclude grades

for all_but in mycollection.find({},{ "grade": 0 }):
print(all_but)

{'_id': ObjectId('67a28a5f5aabe18fcc1bf3e6'), 'name': 'Shweta',

'Course': 'Info 6150', 'classsize': 40}
{'_id': 100, 'name': 'John', 'Course': 'Info 6150'}
{'_id': 200, 'name': 'Peter', 'Course': 'Info 6150'}
{'_id': 300, 'name': 'Amy', 'Course': 'Info 6150'}
{'_id': 400, 'name': 'Hannah', 'Course': 'Info 6150'}
{'_id': 500, 'name': 'Michael', 'Course': 'Info 6150'}
{'_id': 600, 'name': 'Sandy', 'Course': 'Info 6150'}
{'_id': 700, 'name': 'Betty', 'Course': 'Info 6150'}
{'_id': 800, 'name': 'Richard', 'Course': 'Info 6150'}
{'_id': 900, 'name': 'Susan', 'Course': 'Info 6150'}
{'_id': 1000, 'name': 'Vicky', 'Course': 'Info 6150'}
{'_id': 1100, 'name': 'Ben', 'Course': 'Info 6150'}
{'_id': 1200, 'name': 'William', 'Course': 'Info 6150'}
{'_id': 1300, 'name': 'Chuck', 'Course': 'Info 6150'}
{'_id': 1400, 'name': 'Viola', 'Course': 'Info 6150'}

In any of the above filters we cannot use 0, & 1 both for the return fields unless one of the fields
is a primary key

i.e. for x in mycollection.find({},{ "name": 1, "Course": 0 }): ## incorrect

for x in mycollection.find({},{ "name": 1, "_id": 0 }): ## correct

# Filter by key value

doc=mycollection.find({"grade":80})

for x in doc:
print(x)

{'_id': 600, 'name': 'Sandy', 'Course': 'Info 6150', 'grade': 80}

{'_id': 700, 'name': 'Betty', 'Course': 'Info 6150', 'grade': 80}

# Filter by text contains a letter or higher

doc=mycollection.find({"name":{"$gt":"V"}})

for x in doc:
print(x)

{'_id': 1000, 'name': 'Vicky', 'Course': 'Info 6150', 'grade': 50}

{'_id': 1200, 'name': 'William', 'Course': 'Info 6150', 'grade': 75}
{'_id': 1400, 'name': 'Viola', 'Course': 'Info 6150', 'grade': 55}
Insert Data into the table created above
### Filter using exact Letter

doc=mycollection.find({"name":{"$regex":"^V"}})

for x in doc:
print(x)

{'_id': 1000, 'name': 'Vicky', 'Course': 'Info 6150', 'grade': 50}

{'_id': 1400, 'name': 'Viola', 'Course': 'Info 6150', 'grade': 55}

Sorting

mydoc = mycollection.find().sort("name")

for x in mydoc:
print(x)

{'_id': 300, 'name': 'Amy', 'Course': 'Info 6150', 'grade': 50}

{'_id': 1100, 'name': 'Ben', 'Course': 'Info 6150', 'grade': 85}
{'_id': 700, 'name': 'Betty', 'Course': 'Info 6150', 'grade': 80}
{'_id': 1300, 'name': 'Chuck', 'Course': 'Info 6150', 'grade': 65}
{'_id': 400, 'name': 'Hannah', 'Course': 'Info 6150', 'grade': 60}
{'_id': 100, 'name': 'John', 'Course': 'Info 6150', 'grade': 30}
{'_id': 500, 'name': 'Michael', 'Course': 'Info 6150', 'grade': 70}
{'_id': 200, 'name': 'Peter', 'Course': 'Info 6150', 'grade': 40}
{'_id': 800, 'name': 'Richard', 'Course': 'Info 6150', 'grade': 70}
{'_id': 600, 'name': 'Sandy', 'Course': 'Info 6150', 'grade': 80}
{'_id': ObjectId('67a28a5f5aabe18fcc1bf3e6'), 'name': 'Shweta',
'Course': 'Info 6150', 'classsize': 40}
{'_id': 900, 'name': 'Susan', 'Course': 'Info 6150', 'grade': 60}
{'_id': 1000, 'name': 'Vicky', 'Course': 'Info 6150', 'grade': 50}
{'_id': 1400, 'name': 'Viola', 'Course': 'Info 6150', 'grade': 55}
{'_id': 1200, 'name': 'William', 'Course': 'Info 6150', 'grade': 75}

To only return few documents one can define the limit

my5 = mycollection.find().limit(5)

#print the result:

for x in my5:
print(x)

{'_id': ObjectId('67a28a5f5aabe18fcc1bf3e6'), 'name': 'Shweta',

'Course': 'Info 6150', 'classsize': 40}
{'_id': 100, 'name': 'John', 'Course': 'Info 6150', 'grade': 30}
{'_id': 200, 'name': 'Peter', 'Course': 'Info 6150', 'grade': 40}
{'_id': 300, 'name': 'Amy', 'Course': 'Info 6150', 'grade': 50}
{'_id': 400, 'name': 'Hannah', 'Course': 'Info 6150', 'grade': 60}
Reflection Task (5 points)
Add additional code cells and perform the tasks as suggested below

1. Using the logic above implement sort ascending and descending Hint: sort("name",
1) #ascending sort("name", -1) #descending

2. Delete One where "name":"Ron"

mycollection.delete_one(______)

1. Delete Many and check for deletion where "name":{"$regex": "^V"}

print(x.deleted_count, " documents deleted.")

2. Delete all remaining documents in collection x = mycollection.delete_many({})

Additional Tip >> Drop and Update

Obviously in the above scenario and as discussed in the very beginning if there are no values in a
collection the collection will not exist. So if we were not doing the above exercises, we can
delete the entire table i.e. collection by using drop() function as we used in SQL commands

mycollection.drop()

Similarly we can update one and update many using the same logic

mylectioncol.update_ooldvalues={"name":"Shweta"}ernewvalues={"$set":"Shwetz"}u

mycollection.update_many(oldvalues, newvaluess)

import seaborn as sns

import pandas as pd
import matplotlib.pyplot as plt
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["mydatabase"]
mycollection = db["class"]

# Insert sample document

mydict = { "name": "Shweta", "Course": "Info 6150" , "classsize": 40 }
x = mycollection.insert_one(mydict)

# Sorting documents
ascending_sort = mycollection.find().sort("name", 1) # Ascending
order
print("Ascending Sort:")
for doc in ascending_sort:
print(doc)

descending_sort = mycollection.find().sort("name", -1) # Descending

order
print("Descending Sort:")
for doc in descending_sort:
print(doc)

# Delete one document where name is "Ron"

mycollection.delete_one({"name": "Ron"})

# Delete many documents where name starts with "V"

x = mycollection.delete_many({"name": {"$regex": "^V"}})
print(x.deleted_count, "documents deleted.")

# Delete all remaining documents

x = mycollection.delete_many({})
print(x.deleted_count, "documents deleted.")

# Load Titanic dataset

titanic = sns.load_dataset('titanic')

# Explore the dataset and display basic statistics

print("Titanic Dataset Info:")
print(titanic.info())
print("\nBasic Statistics:")
print(titanic.describe(include='all'))

# Box plot for age distribution by class

plt.figure(figsize=(8, 6))
sns.boxplot(x='class', y='age', data=titanic, palette='coolwarm')
plt.title('Age Distribution by Passenger Class')
plt.xlabel('Passenger Class')
plt.ylabel('Age')
plt.show()

# Bar plot for male and female passenger count

plt.figure(figsize=(6, 4))
sns.countplot(x='sex', data=titanic, palette='pastel')
plt.title('Count of Male and Female Passengers')
plt.xlabel('Sex')
plt.ylabel('Count')
plt.show()

# Load Iris dataset

iris = sns.load_dataset('iris')

# Summary statistics for sepal length by species

summary_stats = iris.groupby('species')['sepal_length'].agg(['mean',
'median', 'std'])
print("\nSummary Statistics for Sepal Length by Species:")
print(summary_stats)
# Scatter plot for petal length vs. petal width with size representing
sepal length/sepal width ratio
plt.figure(figsize=(8, 6))
sns.scatterplot(x='petal_length', y='petal_width', hue='species',
size=iris['sepal_length'] / iris['sepal_width'], palette='viridis',
sizes=(20, 200), data=iris)
plt.title('Petal Length vs. Petal Width with Size Representing Sepal
Ratio')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
plt.legend(title='Species')
plt.show()

Ascending Sort:
{'_id': ObjectId('67a28b8c5aabe18fcc1bf3e8'), 'name': 'Shweta',
'Course': 'Info 6150', 'classsize': 40}
Descending Sort:
{'_id': ObjectId('67a28b8c5aabe18fcc1bf3e8'), 'name': 'Shweta',
'Course': 'Info 6150', 'classsize': 40}
0 documents deleted.
1 documents deleted.
Titanic Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 survived 891 non-null int64
1 pclass 891 non-null int64
2 sex 891 non-null object
3 age 714 non-null float64
4 sibsp 891 non-null int64
5 parch 891 non-null int64
6 fare 891 non-null float64
7 embarked 889 non-null object
8 class 891 non-null category
9 who 891 non-null object
10 adult_male 891 non-null bool
11 deck 203 non-null category
12 embark_town 889 non-null object
13 alive 891 non-null object
14 alone 891 non-null bool
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB
None

Basic Statistics:
survived pclass sex age sibsp
parch \
count 891.000000 891.000000 891 714.000000 891.000000
891.000000
unique NaN NaN 2 NaN NaN
NaN
top NaN NaN male NaN NaN
NaN
freq NaN NaN 577 NaN NaN
NaN
mean 0.383838 2.308642 NaN 29.699118 0.523008
0.381594
std 0.486592 0.836071 NaN 14.526497 1.102743
0.806057
min 0.000000 1.000000 NaN 0.420000 0.000000
0.000000
25% 0.000000 2.000000 NaN 20.125000 0.000000
0.000000
50% 0.000000 3.000000 NaN 28.000000 0.000000
0.000000
75% 1.000000 3.000000 NaN 38.000000 1.000000
0.000000
max 1.000000 3.000000 NaN 80.000000 8.000000
6.000000

fare embarked class who adult_male deck embark_town

alive \
count 891.000000 889 891 891 891 203 889
891
unique NaN 3 3 3 2 7 3
2
top NaN S Third man True C Southampton
no
freq NaN 644 491 537 537 59 644
549
mean 32.204208 NaN NaN NaN NaN NaN NaN
NaN
std 49.693429 NaN NaN NaN NaN NaN NaN
NaN
min 0.000000 NaN NaN NaN NaN NaN NaN
NaN
25% 7.910400 NaN NaN NaN NaN NaN NaN
NaN
50% 14.454200 NaN NaN NaN NaN NaN NaN
NaN
75% 31.000000 NaN NaN NaN NaN NaN NaN
NaN
max 512.329200 NaN NaN NaN NaN NaN NaN
NaN

alone
count 891
unique 2
top True
freq 537
mean NaN
std NaN
min NaN
25% NaN
50% NaN
75% NaN
max NaN

C:\Users\Hp15d\AppData\Local\Temp\ipykernel_15440\3701347398.py:48:
FutureWarning:

Passing `palette` without assigning `hue` is deprecated and will be

removed in v0.14.0. Assign the `x` variable to `hue` and set
`legend=False` for the same effect.

sns.boxplot(x='class', y='age', data=titanic, palette='coolwarm')

C:\Users\Hp15d\AppData\Local\Temp\ipykernel_15440\3701347398.py:56:
FutureWarning:

Passing `palette` without assigning `hue` is deprecated and will be

removed in v0.14.0. Assign the `x` variable to `hue` and set
`legend=False` for the same effect.

sns.countplot(x='sex', data=titanic, palette='pastel')

Summary Statistics for Sepal Length by Species:

mean median std
species
setosa 5.006 5.0 0.352490
versicolor 5.936 5.9 0.516171
virginica 6.588 6.5 0.635880

Arman Khan NGT - Practical
83% (6)
Arman Khan NGT - Practical
72 pages
Bdalabmanual
No ratings yet
Bdalabmanual
11 pages
MongoDB Chapter4
No ratings yet
MongoDB Chapter4
27 pages
Dod Unit3
No ratings yet
Dod Unit3
21 pages
NoSQL Lab Fat
No ratings yet
NoSQL Lab Fat
46 pages
Nurtured Womb e Book
100% (5)
Nurtured Womb e Book
22 pages
Chief Affidavit of Petitioner M.v.O.P.225 of 2013
No ratings yet
Chief Affidavit of Petitioner M.v.O.P.225 of 2013
4 pages
Shah Project
No ratings yet
Shah Project
13 pages
Mongo Commands
No ratings yet
Mongo Commands
10 pages
Niit Campus Technology LTD
No ratings yet
Niit Campus Technology LTD
20 pages
mongoDB 1
No ratings yet
mongoDB 1
23 pages
Nosql
No ratings yet
Nosql
11 pages
02 BDAV Practicals 5-7
No ratings yet
02 BDAV Practicals 5-7
40 pages
Assign 10 (08-A)
No ratings yet
Assign 10 (08-A)
10 pages
MongoDB Chapter3
No ratings yet
MongoDB Chapter3
30 pages
Python MongoDB
No ratings yet
Python MongoDB
36 pages
MongoDB Prac 1
No ratings yet
MongoDB Prac 1
8 pages
Mongo Document
No ratings yet
Mongo Document
9 pages
2024 Accounting Grade 10 Project - QP
No ratings yet
2024 Accounting Grade 10 Project - QP
5 pages
Wa0005.
No ratings yet
Wa0005.
145 pages
All Lab Programs New
No ratings yet
All Lab Programs New
24 pages
Program4 WM
No ratings yet
Program4 WM
15 pages
SLIP's fsemMCA
No ratings yet
SLIP's fsemMCA
19 pages
Mongodb Disucssion
No ratings yet
Mongodb Disucssion
26 pages
Mongodb Crud Operations
No ratings yet
Mongodb Crud Operations
48 pages
Methods in MongoDB
No ratings yet
Methods in MongoDB
5 pages
Mongo DB
No ratings yet
Mongo DB
9 pages
New DBMS-08
No ratings yet
New DBMS-08
3 pages
Group B Assign 1 Crud Operat
No ratings yet
Group B Assign 1 Crud Operat
47 pages
Crud
No ratings yet
Crud
9 pages
Update Operator
No ratings yet
Update Operator
11 pages
01 Mongodb 11
No ratings yet
01 Mongodb 11
7 pages
Nosql
No ratings yet
Nosql
8 pages
Python Imp Questions With Answers
No ratings yet
Python Imp Questions With Answers
31 pages
BDA - MongoDB
No ratings yet
BDA - MongoDB
12 pages
Module 3 Mongodb CRUD Queries
No ratings yet
Module 3 Mongodb CRUD Queries
10 pages
Basic Commands in MongoDB
No ratings yet
Basic Commands in MongoDB
4 pages
Assignment On Mongo DB Practicle
No ratings yet
Assignment On Mongo DB Practicle
4 pages
Practical 5 6
No ratings yet
Practical 5 6
8 pages
MongoDB Update Documents
No ratings yet
MongoDB Update Documents
13 pages
NOSQL Aggregate
No ratings yet
NOSQL Aggregate
2 pages
MongoDB With Python
No ratings yet
MongoDB With Python
4 pages
EXp No.5
No ratings yet
EXp No.5
3 pages
Adt Lab Record
No ratings yet
Adt Lab Record
35 pages
MongoDB Exercise
No ratings yet
MongoDB Exercise
3 pages
MongoDB Notes
No ratings yet
MongoDB Notes
9 pages
UE20MC505B Unit3 QuestionAnswers
No ratings yet
UE20MC505B Unit3 QuestionAnswers
2 pages
MongoDb Manual
No ratings yet
MongoDb Manual
29 pages
Assignment 3
No ratings yet
Assignment 3
5 pages
Early On Kenpo History
No ratings yet
Early On Kenpo History
4 pages
Ims E23cseu0002 Assignment-11
No ratings yet
Ims E23cseu0002 Assignment-11
3 pages
Ims E23cseu0002 Assignment-11
No ratings yet
Ims E23cseu0002 Assignment-11
3 pages
Mongodb Exp 2 New
No ratings yet
Mongodb Exp 2 New
3 pages
8 TH
No ratings yet
8 TH
3 pages
Practical 4
No ratings yet
Practical 4
3 pages
01 Mongodb 10
No ratings yet
01 Mongodb 10
8 pages
MONGODB Experiment 5
No ratings yet
MONGODB Experiment 5
6 pages
Mongodb PGM 5
No ratings yet
Mongodb PGM 5
5 pages
Bda 2
No ratings yet
Bda 2
2 pages
MongoDB Manual
No ratings yet
MongoDB Manual
25 pages
Waiver of Rights
100% (10)
Waiver of Rights
2 pages
Rajeshexp 3
No ratings yet
Rajeshexp 3
3 pages
Excavation JSA
No ratings yet
Excavation JSA
4 pages
DLL Gr8 Edited
No ratings yet
DLL Gr8 Edited
60 pages
MAPEH 7 Badminton
No ratings yet
MAPEH 7 Badminton
3 pages
School Directory: Santy C. Balaoro
No ratings yet
School Directory: Santy C. Balaoro
3 pages
Tolkien, - Beowulf-The Monsters & The Critics - Ocr
100% (1)
Tolkien, - Beowulf-The Monsters & The Critics - Ocr
27 pages
Instruction Manual Fieldvue dvc2000 Digital Valve Controller Fisher en 135208
No ratings yet
Instruction Manual Fieldvue dvc2000 Digital Valve Controller Fisher en 135208
80 pages
Figurative Speech
No ratings yet
Figurative Speech
9 pages
PEZA Ecozone Developer
No ratings yet
PEZA Ecozone Developer
4 pages
Value Oriented Education
No ratings yet
Value Oriented Education
10 pages
E Mahesh PGT Mathematics
No ratings yet
E Mahesh PGT Mathematics
14 pages
Internal Audit Report: 1. Summary of Findings
No ratings yet
Internal Audit Report: 1. Summary of Findings
7 pages
Algebra P4
No ratings yet
Algebra P4
95 pages
Series Circuits
No ratings yet
Series Circuits
4 pages
Test Accessories Main Catalog: Test & Measureline - Test & Measurement
No ratings yet
Test Accessories Main Catalog: Test & Measureline - Test & Measurement
188 pages
Winter 20
No ratings yet
Winter 20
2 pages
External Environment
No ratings yet
External Environment
54 pages
Haiku News (Edited by Laurence Stacey and Dick Whyte)
No ratings yet
Haiku News (Edited by Laurence Stacey and Dick Whyte)
152 pages
The Role of Well-Being in The Perceived Parental Involvement and Academic Achievement of Dean's Listers of BS Psychology Program
No ratings yet
The Role of Well-Being in The Perceived Parental Involvement and Academic Achievement of Dean's Listers of BS Psychology Program
17 pages
Bee Assignmet
No ratings yet
Bee Assignmet
4 pages
Orientation - Induction Agenda APR25 POTSDAM
No ratings yet
Orientation - Induction Agenda APR25 POTSDAM
4 pages
Global Marketing
No ratings yet
Global Marketing
9 pages
5th It Book - Link
No ratings yet
5th It Book - Link
3 pages
Lab 1 - Harshil - Parmar
No ratings yet
Lab 1 - Harshil - Parmar
2 pages
DNA Technology PDF
No ratings yet
DNA Technology PDF
7 pages
ESR Pipette MSDS
No ratings yet
ESR Pipette MSDS
3 pages
Brochure Antech Type C
No ratings yet
Brochure Antech Type C
2 pages
Question of Fact Speech & PowerPoint
No ratings yet
Question of Fact Speech & PowerPoint
2 pages
Asian Terminals vs. Reyes, JR
No ratings yet
Asian Terminals vs. Reyes, JR
2 pages
Villarba Vs Court of Appeals
No ratings yet
Villarba Vs Court of Appeals
16 pages
Unit-5: Attacks and Techniques Used in Cyber Crime
No ratings yet
Unit-5: Attacks and Techniques Used in Cyber Crime
17 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
3 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
5th IT
No ratings yet
5th IT
1 page
Adani Institute of Infrastructure Engineering: Assignment-2
No ratings yet
Adani Institute of Infrastructure Engineering: Assignment-2
1 page

Lab 3 Data Mining NoSQl - Harshil - Parmar

Uploaded by

Lab 3 Data Mining NoSQl - Harshil - Parmar

Uploaded by

Objective

Apply non relational datamining techniques in Python using mongodb

!pip install pymongo

Create a sample database if it does not already

db=conn["firstmongo"] ### create a database named "firstmongo"

mydict = { "name": "Shweta", "Course": "Info 6150" , "classsize":40}

#print list of the _id values of the inserted documents:

#### Lets add more human readable unique_ids

You will agree that this is a more easily understood unique id

### Find All method

for all in mycollection.find():

{'_id': ObjectId('67a28a5f5aabe18fcc1bf3e6'), 'name': 'Shweta',

Filter and Select

for all in mycollection.find({},{ "_id": 0, "name": 1, "grade": 1 }):

### Print everything but exclude grades

{'_id': ObjectId('67a28a5f5aabe18fcc1bf3e6'), 'name': 'Shweta',

i.e. for x in mycollection.find({},{ "name": 1, "Course": 0 }): ## incorrect

for x in mycollection.find({},{ "name": 1, "_id": 0 }): ## correct

# Filter by key value

{'_id': 600, 'name': 'Sandy', 'Course': 'Info 6150', 'grade': 80}

# Filter by text contains a letter or higher

{'_id': 1000, 'name': 'Vicky', 'Course': 'Info 6150', 'grade': 50}

{'_id': 1000, 'name': 'Vicky', 'Course': 'Info 6150', 'grade': 50}

{'_id': 300, 'name': 'Amy', 'Course': 'Info 6150', 'grade': 50}

To only return few documents one can define the limit

#print the result:

{'_id': ObjectId('67a28a5f5aabe18fcc1bf3e6'), 'name': 'Shweta',

2. Delete One where "name":"Ron"

1. Delete Many and check for deletion where "name":{"$regex": "^V"}

2. Delete all remaining documents in collection x = mycollection.delete_many({})

Additional Tip >> Drop and Update

import seaborn as sns

# Insert sample document

descending_sort = mycollection.find().sort("name", -1) # Descending

# Delete one document where name is "Ron"

# Delete many documents where name starts with "V"

# Delete all remaining documents

# Load Titanic dataset

# Explore the dataset and display basic statistics

# Box plot for age distribution by class

# Bar plot for male and female passenger count

# Load Iris dataset

# Summary statistics for sepal length by species

fare embarked class who adult_male deck embark_town

Passing `palette` without assigning `hue` is deprecated and will be

sns.boxplot(x='class', y='age', data=titanic, palette='coolwarm')

Passing `palette` without assigning `hue` is deprecated and will be

sns.countplot(x='sex', data=titanic, palette='pastel')

Summary Statistics for Sepal Length by Species:

You might also like