0% found this document useful (0 votes)
12 views

Data Analytics Project

This document contains code to extract data from Facebook and Twitter APIs and build predictive models for diabetes classification. It includes code to: 1. Extract comments from a public Facebook page post using the Graph API. 2. Extract the most recent tweets matching a keyword search using the Twitter API. 3. Build logistic regression, SVM, random forest, and decision tree models on a diabetes dataset to classify patients and compare their performance.

Uploaded by

vishal.gahlot14
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Data Analytics Project

This document contains code to extract data from Facebook and Twitter APIs and build predictive models for diabetes classification. It includes code to: 1. Extract comments from a public Facebook page post using the Graph API. 2. Extract the most recent tweets matching a keyword search using the Twitter API. 3. Build logistic regression, SVM, random forest, and decision tree models on a diabetes dataset to classify patients and compare their performance.

Uploaded by

vishal.gahlot14
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Exercise 1: Extract facebook data available on any public pagle like

Amazon.

Code:
import requests
import json

access_token='EAADraRiwnasBALJgEL4vbyvv2DTJvAYjBlLfk1iO0xgL56Vf70mE1MYlv
dv2A5RupQZBOctpcE8Qdu1COESmobBxTwC6DFTOrbaXCRWcBzsZB6wlZBuzFSx5A
gvXZAfLnp9etZBBTHwCL9U5klw4Q9sBFpmfVAEiJCZBFMD2CXCXyS5sPepoEqCDfY
32DeUUoZD'

post_id = "1973749942700563"

URL = 'https://graph.facebook.com/v3.2/'+post_id+'/comments'

PARAMS = {'access_token':access_token}

# sending get request and saving the response as response object


r = requests.get(url = URL, params = PARAMS)

# extracting data in json format


data = r.json()

for comment in data['data']:


print "----------------------------------------------------------\n\n"
print
"id:",comment['id'],"\n","created_time:",comment['created_time'],"\n","message:",comme
nt['message']
print "----------------------------------------------------------\n\n"
Output:
Exercise 2: Extract 1000 latest tweets from twitter using any keyword.

Code:
from twitter import Twitter,OAuth, TwitterStream
import json

ACCESS_TOKEN =
'1064460892694241281-yHNHebYDMQgaoEjLD8BrcyVpDzIeGf'
ACCESS_SECRET =
'DQFUjh3TklipgH9dN6cGIlCW6KPXok2Q3oiN6HNJARxRM'
CONSUMER_KEY = '4mGaUsqkD2EyHkagZpHKOpBXF'
CONSUMER_SECRET =
'5xI9anpq1O0F5CT5Gj5TC9tzz3s4pDfjxmtLIse88clK9E4REy'

oauth = OAuth(ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY,


CONSUMER_SECRET)
twitter = Twitter(auth=oauth)
#print twitter.GetFriends()
twt = twitter.search.tweets(q='machine learning', result_type='recent',
lang='en', count=5)

i=0
for tweet in twt['statuses']:
print "Tweet_count: ", i
print "id:",tweet['id'],"\n","text:",tweet['text'],"\n\n"
i=i+1
Output:
Exercise 3: Design a predictive model for diabetes on the given
dataset of 535 patients using following machine learning techniques:
1. Logistic Regression
2. SVM
3. Random Forest
4. Decision Tree

Code:
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

import numpy

A = numpy.loadtxt(open("data.csv", "rb"), delimiter=",", skiprows=1)

X_features = A[:,:9]
y_targets = A[:,9:]

X_train, X_test, y_train, y_test = train_test_split(X_features, y_targets, test_size=0.4,


random_state=0)

print "Support Vector Machine:"


svm_model = svm.SVC(kernel='linear', C=1).fit(X_train, y_train.ravel())
print "Score: ", svm_model.score(X_test, y_test.ravel())

print "-----------------------------------------------------------------------"
print "Decision Tree:"
max_score = ()
max_val = 0
for i in range(1,100,2):
dtree_model = DecisionTreeClassifier(max_depth = i).fit(X_train, y_train.ravel())
curr_score = dtree_model.score(X_test, y_test.ravel())
if(max_val<curr_score):
max_val = curr_score
max_score = (i, curr_score)
print "max_score: ",max_score[1], "max_depth: ",max_score[0]

print "-----------------------------------------------------------------------"

print "Random Forest:"

rf_model = RandomForestClassifier(n_estimators=100,n_jobs=-1).fit(X_train,
y_train.ravel())
rf_score = rf_model.score(X_test, y_test.ravel())
print "Score: ", rf_score

print "-----------------------------------------------------------------------"

print "Logistic Regression:"

lr_model = LogisticRegression(penalty='l1',C=50).fit(X_train, y_train.ravel())

print "Score: ", lr_model.score(X_test, y_test.ravel())


Output:

You might also like