Knowledge Enginnering Record
Knowledge Enginnering Record
EX.NO: 1
PERFORM OPERATIONS WITH EVIDENCE BASED
DATE: LEARNING
AIM
To perform operation with evidence based model using Python.
ALGORITHM
Data Preparation:
• Import the necessary libraries, such as pandas for data handling and Scikit-Learn
for machine learning.
• Load the dataset from a CSV file into a Data Frame (data).
• Split the dataset into features (X) and the target variable (y).
• Further split the data into training and testing sets using the train_test_split
function from Scikit-Learn. Typically, you use about 80% of the data for training
and 20% for testing.
Model Selection:
• Choose a machine learning model suitable for the problem. In this case, a
Random Forest Classifier is selected. Random forests are an ensemble
learningmethod used for classification tasks.
Model Training:
• Train the selected model (Random Forest Classifier) using the training
data(X_train and y_train) by calling the fit method on the model instance.
Model Evaluation:
• Use the trained model to make predictions (y_pred) on the test data (X_test).
• Calculate the accuracy of the model's predictions by comparing them to the true
labels (y_test). The accuracy score is a common metric for classification tasks and
is calculated using the accuracy_score function from Scikit-Learn.
• Print the accuracy score to evaluate the model's performance.
Inference or Prediction:
• Load new data from a CSV file into a Data Frame (new_data) to make
predictionson unseen data.
• Use the trained model to predict the target variable for the new data and store the
predictions in the predictions variable.
• You can further process or analyze these predictions as needed for your
application.
lOM oAR c P S D | 4759 001 2
PROGRAM
import pandas as pd
from sklearn.model_selection import train_test_split
data = pd.read_csv("your_data.csv")
X = data.drop("target_column", axis=1)
y = data["target_column"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
from sklearn.metrics import accuracy_score
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
new_data = pd.read_csv("new_data.csv") # Load new evidence-based data
predictions = model.predict(new_data)
lOM oAR c P S D | 4759 001 2
DATA SET
OUTPUT
Accuracy: 0.85
RESULT
Thus the performance of operation with evidence based model had been
successfully implemented.
lOM oAR c P S D | 4759 001 2
EX.NO: 2
AIM
To performance of evidence based Analysis using Python.
ALGORITHM
Data Collection:
• Load data from a CSV file (in this case, 'your_data.csv') into a Pandas DataFrame.
The data represents the information you want to analyze.
Data Preprocessing:
• This section is a placeholder for data cleaning, normalization, and transformation.
You would customize this part to suit your specific dataset and analysis needs.
Common preprocessing steps include handling missing values, encoding
categorical data, and scaling numerical features.
EDA (Exploratory Data Analysis):
• Visualize your data using a pairplot created with Seaborn. EDA is essential for
understanding the data's characteristics and relationships between variables.
Hypothesis Testing:
• Conduct a statistical test (t-test in this example) to assess whether there is a
significant difference between two groups (group1 and group2). The result of the
test includes the test statistic and p-value.
Machine Learning:
• Train a linear regression model to predict a target variable using features 'feature1'
and 'feature2'. Evaluate the model's performance on a test set by calculating the
mean squared error (MSE).
Statistical Analysis:
• This section is a placeholder for additional statistical analyses you may need based
on your research or analysis objectives. You should insert specific statistical tests
and analyses here.
Data Visualization:
• Create informative plots and charts to present the data and analysis results
visually. This section is a placeholder for adding the appropriate visualizations for
your analysis.
Reporting and Documentation:
• Document your analysis process and results. Effective documentation is crucial for
sharing your findings and insights with others.
lOM oAR c P S D | 4759 001 2
PROGRAM
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
plt.scatter(X, y, alpha=0.5)
plt.title('Generated Data for Linear Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.show()
model = LinearRegression()
model.fit(X, y)
X_new = np.array([[0], [2]])
y_pred = model.predict(X_new)
plt.scatter(X, y, alpha=0.5)
plt.plot(X_new, y_pred, color='red', linewidth=2, label='Linear Regression')
plt.title('Linear Regression Analysis')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()
print(f'Intercept: {model.intercept_[0]}')
print(f'Coefficient: {model.coef_[0][0]}')
lOM oAR c P S D | 4759 001 2
OUTPUT
RESULT
Thus the performance of evidence based Analysis had been successfully implemented
lOM oAR c P S D | 4759 001 2
EX.NO: 3
PERFORM OPERATIONS ON PROBABILITY BASED REASONING
DATE:
AIM
To performance operation on probability based reasoning using Python.
ALGORITHM
Normal Distribution
• Define the parameters for a normal distribution operation.
• Use the norm.cdf function from scipy.stats to calculate the cumulative
probability.
• Display the result.
Conditional Probability
• Define the probabilities and apply Bayes' theorem to calculate conditional
probability.
• Display the conditional probability result.
Random Sampling
• Define a population and sample size.
• Use the np.random.choice function from numpy to simulate random
sampling.
• Display the random sample.
lOM oAR c P S D | 4759 001 2
PROGRAM
import numpy as np
from scipy.stats import binom, norm
n=3
p = 0.5
k=2
probability = binom.pmf(k, n, p)
print(f"Probability of getting exactly {k} heads in {n} coin flips: {probability:.4f}")
z=1
cumulative_probability = norm.cdf(z)
print(f"Cumulative Probability (Z < {z}): {cumulative_probability:.4f}")
P_A = 0.4
P_B_given_A = 0.3
P_A_given_B = (P_B_given_A * P_A) / P_B_given_A
print(f"Conditional Probability P(A|B): {P_A_given_B:.4f}")
population = [1, 2, 3, 4, 5]
sample_size = 3
random_sample = np.random.choice(population, size=sample_size, replace=True)
print(f"Random Sample: {random_sample}")
lOM oAR c P S D | 4759 001 2
OUTPUT
Random Sample: [3 5 2]
RESULT
Thus the performance of operation on probability based reasoning had
implemented successfully
lOM oAR c P S D | 4759 001 2
EX.NO: 4
PERFORM BELIEVABILITY BASED ANALYSIS
DATE:
AIM
To perform Believability Analysis using Python
ALGORITHM
1. Initialize the SentimentIntensityAnalyzer from the NLTK library to perform
sentiment analysis.
2. Define a function analyze_believability(text) to analyze the believability of a
given text based on sentiment analysis.
• Input: text (the text to be analyzed)
• Output: believability (a numerical score representing believability)
3. Perform sentiment analysis using the SentimentIntensityAnalyzer:
4. Define a function analyze_source_credibility(url) to analyze the credibility of a
given source URL.
• Input: url (the URL of the source to be analyzed)
• Output: source_credibility (a numerical score representing source )
5. Inside the analyze_source_credibility(url) function:
• Use the requests library to fetch the webpage content from the provided
URL.
• Parse the HTML content using BeautifulSoup to extract relevant
information, such as author, publication date, and source credibility
indicators. The extraction logic may vary depending on the webpage
structure.
6. In the main part of the code and source URL for analysis.
(name == " main ":), provide a sample text
7. Call analyze_believability(text) to calculate the believability score for the given
text.
8. Call analyze_source_credibility(source_url) to calculate the source credibility
score for the provided source URL.
9. If both believability and source credibility scores are successfully calculated,
compute the final believability score as the average of the two scores.
10. Print the final believability score.
lOM oAR c P S D | 4759 001 2
PROGRAM
import nltk
from nltk.sentiment.vader
import SentimentIntensityAnalyzer
import requests
from bs4 import BeautifulSoup
nltk.download('vader_lexicon')
sid = SentimentIntensityAnalyzer()
def analyze_believability(text):
sentiment_scores = sid.polarity_scores(text)
believability = 1.0 - sentiment_scores['neg']
return believability
def analyze_source_credibility(url):
try:
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
source_credibility = 0.7
return source_credibility
except Exception as e:
print(f"Error fetching or analyzing the source: {e}")
return None
if name == " main ":
text = "This is a sample text that you want to analyze for believability."
source_url = "https://www.example.com/sample-article"
believability_score = analyze_believability(text)
source_credibility_score = analyze_source_credibility(source_url)
if believability_score is not None and source_credibility_score is not None:
final_believability_score = (believability_score + source_credibility_score) / 2
print(f"Believability Score: {final_believability_score}")
else:
print("Unable to calculate believability due to errors.")
lOM oAR c P S D | 4759 001 2
OUTPUT
RESULT
Thus the performance of Believability Analysis had been successfully
implemented.
lOM oAR c P S D | 4759 001 2
EX.NO: 5
IMPLEMENT RULE LEARNING AND REFINEMENT
DATE:
AIM
ALGORITHM
PROGRAM
From sklearn.datasets import load iris
from sklearn.tree import DecisionTreeClassifier
accuracy_score
data=load_iris()
X = data.data
y = data.target
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
y_pred = clf.predict(X_test)
OUTPUT
RESULT
Thus the Performance of Rule Learning and Refinement had been successfully
implemented.
lOM oAR c P S D | 4759 001 2
EX.NO: 6
PERFORM ANALYSIS BASED ON LEARNED PATTERNS
DATE:
AIM
To perform the analysis based on Learned Pattern using Python.
ALGORITHM
a. Data Collection:
i. Gather and collect the dataset relevant to your analysis. Ensure that the
data is clean, well-structured, and contains the necessary information
for pattern discovery.
b. Data Preprocessing:
i. Handle missing data by imputing or removing it as appropriate.
ii. Normalize or scale numerical features to ensure they are on a
similar scale. Encode categorical variables into numerical
values.
c. Data Split:
i. Split the dataset into a training set and a testing/validation set. The training
set is used to learn patterns, and the testing set is used to evaluate the model's
performance.
d. Pattern Learning:
i. Choose an appropriate machine learning algorithm, such as decision trees,
random forests, neural networks, or clustering algorithms, depending on
the type of analysis you want to perform.
ii. Train the selected model on the training data.
e. Model Evaluation:
i. Evaluate the model's performance on the testing/validation dataset.
Common evaluation metrics include accuracy, precision, recall, F1-score,
and ROC-AUC, depending on the nature of your analysis (classification,
regression, clustering, etc.).
f. Iterate:
i. If the initial analysis does not meet your objectives, consider iterating
through the process adjusting hyperparameters, trying different algorithms, or
collecting more data.
g. Deployment:
i. If the analysis meets your goals, deploy the model to make
predictions or informdecision-making.
lOM oAR c P S D | 4759 001 2
PROGRAM
Import numpy as np
model = LinearRegression()
model.fit(study_hours, exam_scores)
predicted_scores = model.predict(study_hours)
plt.xlabel('Study Hours')
plt.ylabel('Exam Scores')
plt.legend()
plt.show()
lOM oAR c P S D | 4759 001 2
OUTPUT
RESULT
Thus the Performance of analysis based on Learned Pattern had been successfully
implemented.
lOM oAR c P S D | 4759 001 2
EX.NO: 7
AIM
To perform the analysis based on Learned Pattern using Python.
ALGORITHM
b. Define Namespace:
Define namespaces for your ontology.
c. Define Classes:
Define classes in your ontology using RDF triples.
d. Define Properties:
Define properties (attributes or relations) for your classes, if necessary.
e. Define Individual:
Define individuals (instances) and specify their types by adding triples.
f. Specify Relationship:
Establish relationships between individuals and properties.
PROGRAM
OUTPUT
RESULT
Thus the Construction of Ontology for a given domain had been successfully
implemented.