Lab 3 Data Mining NoSQl - Harshil - Parmar
Lab 3 Data Mining NoSQl - Harshil - Parmar
What is MongoDB?
MongoDB is a document database that can be installed in the local machine or hosted in the
cloud. The flavour of mongodb in the cloud is call MongoDB atlas.
It stores JSON-like documents providing flexibility and scalability
For this lab we will download the MongoDB community server from this
https://www.mongodb.com/try/download/community link
PyMongo
PyMongo is a python based driver that is required to access the MongoDB database
Collecting pymongo
Downloading pymongo-4.11-cp313-cp313-win_amd64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.11-cp313-cp313-win_amd64.whl (932 kB)
---------------------------------------- 0.0/932.9 kB ? eta -:--:--
---------------------------------------- 0.0/932.9 kB ? eta -:--:--
---------------------- ----------------- 524.3/932.9 kB 2.4 MB/s
eta 0:00:01
---------------------------------------- 932.9/932.9 kB 2.1 MB/s
eta 0:00:00
Downloading dnspython-2.7.0-py3-none-any.whl (313 kB)
Installing collected packages: dnspython, pymongo
Successfully installed dnspython-2.7.0 pymongo-4.11
import pymongo as pm
1. Create a collection by specifying the name of the collection (if it does not already exists)
2. In MongoDB the collection is not created unless there is a content in the collection.
Therefore there will be no blank collections (tables) in the database.
3. We will insert a single document (same as a record in the SQL table) using insert_one()
method
mycollection = db["class"]
x = mycollection.insert_one(mydict)
67a28a5f5aabe18fcc1bf3e6
x = mycollection.insert_many(mylist)
print(x.inserted_ids)
[100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300,
1400]
Return only specific document information and not all data points
{'name': 'Shweta'}
{'name': 'John', 'grade': 30}
{'name': 'Peter', 'grade': 40}
{'name': 'Amy', 'grade': 50}
{'name': 'Hannah', 'grade': 60}
{'name': 'Michael', 'grade': 70}
{'name': 'Sandy', 'grade': 80}
{'name': 'Betty', 'grade': 80}
{'name': 'Richard', 'grade': 70}
{'name': 'Susan', 'grade': 60}
{'name': 'Vicky', 'grade': 50}
{'name': 'Ben', 'grade': 85}
{'name': 'William', 'grade': 75}
{'name': 'Chuck', 'grade': 65}
{'name': 'Viola', 'grade': 55}
In any of the above filters we cannot use 0, & 1 both for the return fields unless one of the fields
is a primary key
for x in doc:
print(x)
for x in doc:
print(x)
doc=mycollection.find({"name":{"$regex":"^V"}})
for x in doc:
print(x)
Sorting
mydoc = mycollection.find().sort("name")
for x in mydoc:
print(x)
1. Using the logic above implement sort ascending and descending Hint: sort("name",
1) #ascending sort("name", -1) #descending
mycollection.drop()
Similarly we can update one and update many using the same logic
mylectioncol.update_ooldvalues={"name":"Shweta"}ernewvalues={"$set":"Shwetz"}u
mycollection.update_many(oldvalues, newvaluess)
# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["mydatabase"]
mycollection = db["class"]
# Sorting documents
ascending_sort = mycollection.find().sort("name", 1) # Ascending
order
print("Ascending Sort:")
for doc in ascending_sort:
print(doc)
Ascending Sort:
{'_id': ObjectId('67a28b8c5aabe18fcc1bf3e8'), 'name': 'Shweta',
'Course': 'Info 6150', 'classsize': 40}
Descending Sort:
{'_id': ObjectId('67a28b8c5aabe18fcc1bf3e8'), 'name': 'Shweta',
'Course': 'Info 6150', 'classsize': 40}
0 documents deleted.
1 documents deleted.
Titanic Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 survived 891 non-null int64
1 pclass 891 non-null int64
2 sex 891 non-null object
3 age 714 non-null float64
4 sibsp 891 non-null int64
5 parch 891 non-null int64
6 fare 891 non-null float64
7 embarked 889 non-null object
8 class 891 non-null category
9 who 891 non-null object
10 adult_male 891 non-null bool
11 deck 203 non-null category
12 embark_town 889 non-null object
13 alive 891 non-null object
14 alone 891 non-null bool
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB
None
Basic Statistics:
survived pclass sex age sibsp
parch \
count 891.000000 891.000000 891 714.000000 891.000000
891.000000
unique NaN NaN 2 NaN NaN
NaN
top NaN NaN male NaN NaN
NaN
freq NaN NaN 577 NaN NaN
NaN
mean 0.383838 2.308642 NaN 29.699118 0.523008
0.381594
std 0.486592 0.836071 NaN 14.526497 1.102743
0.806057
min 0.000000 1.000000 NaN 0.420000 0.000000
0.000000
25% 0.000000 2.000000 NaN 20.125000 0.000000
0.000000
50% 0.000000 3.000000 NaN 28.000000 0.000000
0.000000
75% 1.000000 3.000000 NaN 38.000000 1.000000
0.000000
max 1.000000 3.000000 NaN 80.000000 8.000000
6.000000
alone
count 891
unique 2
top True
freq 537
mean NaN
std NaN
min NaN
25% NaN
50% NaN
75% NaN
max NaN
C:\Users\Hp15d\AppData\Local\Temp\ipykernel_15440\3701347398.py:48:
FutureWarning: