Generative AI Report
Generative AI Report
Submitted by,
A NAVEENA 20211ISE0034
NISHA M 20211ISE0021
BHAVANA 20221LIE0002
Of
Ms. POORNIMA
PRESIDENCY UNIVERSITY,
BENGALURU
DECEMBER 2024
TABLE OF CONTENT:
ABSTRACT
INTRODUCTION
LITERATURE REVIEW
RESEARCH GAPS OF EXISTING METHODOLOGY
PROPOSED METHODOLOGY
OBJECTIVES
SYSTEM DESIGN AND IMPLEMENTATION
OUTCOMES
CODE
RESULTS
CONCLUSION
APPENDICES
REFERENCES
ABSTRACT:
Plagiarism is an unethical act of using someone else's work or
ideas without giving them credit, which is a growing problem in
various fields. However, the current systems for plagiarism
detection require revealing the full content of input documents
and document collections, which can raise procedural and legal
concerns regarding data confidentiality, limiting or prohibiting
the use of plagiarism detection services. To address these issues,
we aim to create a plagiarism detection approach that doesn't
need a centralized provider or expose any content as cleartext.
Our research has produced initial results showing that our
content-protecting method achieves the same detection
effectiveness as the original method while making it practically
impossible to reveal the protected content through common
attacks. Various techniques, such as manual detection, text
similarity analysis, and automated plagiarism detection using
machine learning, have been developed to prevent plagiarism.
This paper focuses on machine learning techniques for
plagiarism detection and discusses different approaches,
algorithms, and datasets used in detecting plagiarism, along
with their advantages and limitations. The paper also presents
some future research directions in this area.
INTRODUCTION:
Plagiarism has become a major issue in academic and other
fields, as it can harm the author's reputation and the credibility
of their research work. Plagiarism is the act of using someone
else's work, ideas, or words without proper credit, and it can
occur intentionally or unintentionally through various forms,
such as copying and pasting, paraphrasing, or using synonyms.
Plagiarism detection systems (PDS) typically require users to
submit input documents, which the systems compare to a large
proprietary database of documents to retrieve similar content
and highlight it for user inspection. There are two types of
Plagiarism: a. Unintentional Plagiarism Paraphrasing poorly:
changing a few words without changing the sentence structure
of the original, or changing the sentence structure but not the
words. Quoting poorly: putting quotation marks around part
of a quotation but not around all of it, or putting quotation
marks around a passage that is partly paraphrased and partly
quoted. Citing poorly: omitting an occasional citation or citing
inaccurately. b. Intentional Plagiarism • Presenting pre-existing
papers found on the Internet or elsewhere as one's own work. •
Reproducing an essay or article from the Internet, an online
resource, or an electronic database without proper citation or
acknowledgment. • Creating a paper by merging material from
various sources without attribution or citation. • Taking
language or concepts from other sources or classmates without
properly acknowledging the origin of the information.
LITERATURE REVIEW:
Plagiarism Detection in Programming Assignments using Machine Learning
Nishesh Awale, Mitesh Pandey, Anish Dulal Department of Electronics and
Computer Engineering, Pulchowk Campus, Lalitpur, Nepal. These days,
there has been a rise in plagiarism in programming assignments, which has a
negative impact on how students are evaluated. This article suggests using a
machine learning technique to detect plagiarism in programming
assignments. Methodology Perform in the hopes of writing report in order
to eliminate the copied report and highlighting the critical aspect of writing
assignment on their own. Findings Various characteristics associated with a
programming assignment pair were calculated, and the Xg boost model was
employed to classify them. The accuracy score achieved was 92%. 2.2 Paper
2 - Plagiarism Detector Using Machine Learning Algorithms The easy
accessibility of vast information resources has led to an increase in
plagiarism in free text. To address this issue, automated plagiarism detection
systems are used to identify plagiarized content in large databases. However,
this task is complicated by advanced plagiarism methods like paraphrasing
and summarizing that conceal the occurrence of plagiarism. Methodology
The recognition paraphrase is NLP and the objective of this study is to
propose a unified technique to detect plagiarism. It compares the perspective
with that of a sim plagiarism detector. Findings Operation of the system
does not require any complex directions or training. It is a time- efficient
plagiarism detection system. 2.3 Paper 3 - Complex Dynamic Event
Participant in an Event-Based Social Network: A ThreeDimensional
Matching The current methods primarily concentrate on organizing
techniques that involve users and events on an EBSN (Online Social
Network) platform in an offline situation, where all data is pre-known.
Methodology Detection by using feature extraction from the Ultra- Fined
Trained repositories extracted by using Data Mining Techniques and NLP.
Findings Full Connected layers implementation using PyTorch - 100 percent
of accuracy which gives authorization to user that someone else actually
write it.provides immediate feedback. This tool lessens the dependency on
human
Drawbacks:
PROPOSED METHODOLOGY:
1. Preprocessing the Input Data
Text Extraction:
o Extract plain text from the input document(s). This step handles
various file formats like .txt, .docx, .pdf, etc., using libraries like
docx for Word documents or PyPDF2 for PDF files.
Normalization:
o Convert the text to lowercase.
o Remove special characters, numbers, and extra spaces.
o Tokenize the text into smaller units (e.g., sentences or words).
o Lemmatize or stem words to reduce them to their base forms (e.g.,
"running" → "run").
3. Similarity Analysis
Exact Matching:
o Direct word-for-word matching between the input text and the
database.
Shingling (N-Gram Matching):
o Break the text into overlapping sequences of N words (e.g., "I love
programming" → ["I love", "love programming"]).
o Compare these N-grams to detect similarities.
Semantic Similarity:
o Use Natural Language Processing (NLP) models to detect
paraphrased or semantically similar sentences.
o Tools like Word2Vec, BERT, or Sentence Transformers are often
employed.
Citation Checking:
o Determine whether properly cited content appears as a match or
whether citations are missing.
4. Plagiarism Scoring
Percentage Similarity:
o The tool calculates the percentage of the document that matches
content from other sources.
Type of Match:
o Identifies whether the match is:
Direct (verbatim copying).
Near-verbatim (minor changes in wording).
Paraphrased (content rephrased but retains the same meaning).
Threshold:
o Apply a threshold (e.g., 15%) to distinguish between acceptable and
plagiarized content.
5. Report Generation
Highlight Matches:
o Mark plagiarized portions in the text with links to the matched sources.
Detailed Report:
o Provide a breakdown of:
Matched content.
Matched sources (e.g., URLs, document names).
Overall similarity score (e.g., 25% plagiarized).
Categorization:
o Separate the matches into properly cited and uncited categories.
6. Additional Features
Exclusion Filters:
o Exclude common phrases, citations, or bibliography sections from
plagiarism detection.
Customization:
o Allow users to define thresholds and match types (e.g., exclude
matches below a certain percentage).
6. Iterative Improvement
• Collect feedback from the users during the testing phase to identify
aspects that need improvement.
• Enhance the prompt templates for a better variety in questions and
clearer feedback.
• Optimize system performance for quicker responses and smooth user
interactions.
OBJECTIVES:
A plagiarism checker is a vital tool designed to ensure originality
and uphold ethical standards in academic, professional, and creative
domains. By detecting instances of unoriginal or copied content, it
promotes academic integrity and fosters a culture of honesty. These
tools ensure that submitted work genuinely reflects the creator’s
effort and knowledge, discouraging unethical practices like copying
or paraphrasing without proper citation. In educational settings,
they encourage students to produce independent and innovative
work while guiding researchers to maintain high standards in their
publications.
CONCLUSION:
1. Vo Ngoc Mai Anh; Hoang Kim Ngoc Anh; Vo Nhat Huy; Huynh Gia Huy;
Minh Ly. "Improve
Productivity and Quality Using Lean Six Sigma: A Case Study". International
Research Journal on
Advanced Science Hub, 5, 03, 2023, 71-83. doi: 10.47392/irjash.2023.016
2. R. Devi Priya, R. Sivaraj, Ajith Abraham, T. Pravin, P. Sivasankar and N.
Anitha. "MultiObjective
Particle Swarm Optimization Based Preprocessing of Multi-Class Extremely
Imbalanced Datasets".
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Vol. 30, No. 05, pp.
735-755 (2022). Doi: 10.1142/S0218488522500209
3. Swathi Buragadda; Siva Kalyani Pendum V P; Dulla Krishna Kavya; Shaik
Shaheda Khanam.
"Multi Disease Classification System Based on Symptoms using The Blended
Approach". International Research Journal on Advanced Science Hub, 5, 03,
2023, 84-90. doi:
10.47392/irjash.2023.017
4. Susanta Saha; Sohini Mondal. "An in-depth analysis of the Entertainment
Preferences before and
after Covid-19 among Engineering Students of West Bengal". International
Research Journal on
Advanced Science Hub, 5, 03, 2023, 91-102. doi: 10.47392/irjash.2023.018