Indian Institute of Information Technology-Merged
Indian Institute of Information Technology-Merged
Indian Institute of Information Technology-Merged
Technology, Ranchi
Name – KUMAR SHANU
Branch – CSE
Group –B
Roll No. – 2021UG1079
Subject – Computer Networks
Course Code – CS-3006
Submitted to – Dr. Kirti Kumari
1. Internet Protocol version 6
IPv6 stands for Internet Protocol version 6. It is a network layer protocol that
provides an identification and location system for computers on networks and
routes traffic across the internet. IPv6 was developed as the successor to IPv4 to
address the exhaustion of IPv4 addresses and to provide additional features and
improvements. IPv6 uses a 128-bit address scheme, allowing for a vastly larger
number of unique addresses compared to IPv4's 32-bit address space. This
expansion of address space is essential as the number of devices connected to the
internet continues to grow exponentially. IPv6 also offers features such as
simplified header format, improved security through mandatory IPsec support, and
better support for multicast traffic. Transitioning from IPv4 to IPv6 is a gradual
process, but as the demand for internet-connected devices increases and the
available IPv4 addresses become scarcer, the adoption of IPv6 becomes
increasingly important for the continued growth and stability of the internet.
It provides features such as error checking, sequencing, flow control, and congestion
control.
TCP is used for applications where reliability and ordered delivery are essential, such
as web browsing, email, file transfer (FTP), and remote access (SSH).
UDP is used for applications where speed and low overhead are prioritized over
reliability, such as online gaming, streaming media (audio/video), DNS (Domain
Name System), and real-time communication (VoIP, video conferencing).
In summary, TCP provides reliability and ensures data delivery, making it suitable for
applications that require guaranteed transmission. On the other hand, UDP offers
lower latency and reduced overhead, making it suitable for real-time applications
where occasional packet loss is acceptable.
SMTP is a standard protocol used for sending email over the Internet.
SMTP operates on port 25 for unencrypted communication and port 587 (Submission)
for encrypted communication.
It follows a client-server model, where a client (such as a web browser) sends requests
to a server, which then responds with the requested resources (such as web pages,
images, or videos).
HTTP operates on top of the TCP/IP protocol suite and typically uses TCP port 80 for
communication.
It is a stateless protocol, meaning each request from the client to the server is
independent and not affected by previous requests.
HTTP requests consist of methods (such as GET, POST, PUT, DELETE) that define the
action to be performed, headers containing additional information, and an optional
message body.
HTTP responses include status codes indicating the outcome of the request (e.g., 200
for success, 404 for not found) and the requested resource, along with response
headers.
In summary, HTTP is the foundation of data communication on the web, enabling the
exchange of information between clients and servers, and facilitating the retrieval
and presentation of web resources.
RIP has a maximum hop count limit of 15, which limits its scalability in larger networks.
Due to its simplicity, RIP is easy to configure and deploy, making it suitable for small
to medium-sized networks.
4.LSA(LINK-STATE ADVERTISEMENT):
LSA is a routing algorithm used in the OSPF (Open Shortest Path First) routing
protocol.
OSPF is a link-state routing protocol that uses the Dijkstra algorithm to calculate the
shortest path tree for each router.
Each router uses the link-state database to calculate the shortest path to each
destination network.
OSPF routers exchange link-state advertisements (LSAs) to build and maintain a detailed view of the
network topology.
OSPF supports features such as multiple areas, hierarchical design, and support for different types of
networks (e.g., broadcast, point-to-point).
It provides better support for load balancing, route summarization, and authentication compared to
RIP.
In summary, RIP is a simple distance-vector routing protocol suitable for small to medium-sized
networks, while OSPF is a more advanced link-state routing protocol designed for larger and more
complex networks. OSPF provides faster convergence, better scalability, and more advanced
features compared to RIP. LSA is a key component of OSPF, used for exchanging routing
information between OSPF routers.
Choosing the right routing protocol depends on your network size and complexity.
• For small networks: RIP can be a good choice due to its simplicity.
• For larger or more complex networks: LSA protocols like OSPF offer faster convergence,
better scalability, and more advanced features.
Subnetting:
• Process: Dividing a large network (subnet mask with fewer bits set to 1) into smaller, more
manageable networks (subnet mask with more bits set to 1).
• Analogy: Imagine cutting a large pizza (network) into smaller slices (subnets) to share with
more people (devices).
• Benefits:
• Drawbacks:
• Benefits:
• Drawbacks:
o Less efficient use of IP addresses (potentially creates wasted addresses in the larger
network).
o Reduces network security by allowing more traffic flow within the larger network.
In conclusion:
• Subnetting is the more common and practical approach for managing IP addresses in most
network scenarios.
Abstract
This research paper argues for a new way to address negativity online, focusing on promoting
positive social interactions instead of just removing negativity. Here are the key points:
• Traditionally, social media platforms focused on identifying and deleting hateful or offensive
content.
• This paper proposes a new approach: identifying and amplifying hopeful messages that
promote Equality, Diversity, and Inclusion (EDI).
• The authors created a unique multilingual dataset (English, Tamil, Malayalam, Kannada)
specifically focused on hope speech related to EDI issues.
• This dataset includes messages supporting marginalized groups like LGBTQIA+, people with
disabilities, and women in STEM fields.
• The dataset is designed to train AI models to recognize and promote hopeful and inclusive
language online.
• This approach aims to create a more positive and inclusive online environment for everyone.
• The paper also highlights the importance of considering code-switching, where users switch
between languages in their communication.
Overall, this research promotes using AI for positivity and social integration, fostering a more
inclusive online world.
1 Introduction
This research focuses on fostering a more positive online environment by identifying and amplifying
hopeful messages. Here are the key points:
• Traditionally, social media platforms concentrated on removing negativity like hate speech.
• This paper proposes a new approach: recognizing and promoting messages that encourage
Equality, Diversity, and Inclusion (EDI).
• The authors built a unique dataset (English, Tamil, Malayalam, Kannada) specifically aimed at
hope speech related to EDI issues.
• This dataset targets messages that support marginalized groups like LGBTQIA+, people with
disabilities, and women in STEM fields.
• The goal is to train AI models to amplify hopeful and inclusive language online.
• This method promotes a more positive and inclusive online space for everyone.
• The limitations of focusing solely on negativity detection, potentially causing bias against
marginalized groups.
• The positive influence of hope speech in social media on mental health, especially for
vulnerable populations.
Overall, this research advocates for using AI to spread positivity and social integration, leading to a
more inclusive online world.
2 Related works
This section of the paper discusses related research on social media data analysis and highlights the
limitations of current approaches:
• Focus on Negativity: Most research focuses on analyzing negative aspects of social media
data, such as sentiment analysis, opinion mining of comments, and detection of abusive or
hateful language.
• Limited Consideration of Bias: These methods often don't consider potential biases in the
datasets used to train the models, which can lead to unfair or discriminatory outcomes,
especially for marginalized groups.
• Gender Bias: Gender bias is a well-studied issue in Natural Language Processing (NLP), but
there's limited research on addressing bias related to other EDI (Equality, Diversity, and
Inclusion) factors.
• Potential for Escalation: Approaches that directly intervene with negative content by
deleting or blocking comments might lead to increased hostility.
• Focus on Positivity: Instead of focusing on negativity, this research aims to identify and
promote hopeful messages that support EDI.
• Multilingual Dataset: The authors introduce a unique dataset specifically designed for hope
speech detection in English, Tamil, and Malayalam, including under-resourced languages.
This approach offers a new perspective on social media analysis, promoting positive online
interactions and inclusivity.
3 Hope speech
Hope is an upbeat state of mind based on a desire for positive outcomes in one’s life or the world at
large, and it is both present and future-oriented [23]. Inspirational talks about how people deal with
and overcome adversity may also provide hope. Hope speech instills optimism and resilience, which
have a beneficial impact on many parts of life, including [51] college [52] and other factors that put
us at risk [53]. For our problem, we defined hope speech as ’YouTube comments/posts that offer
support, reassurance, suggestions, inspiration and insight’.
The notion that one may uncover and become motivated to use routes to their desired goals is
reflected in hope speech. Our approach sought to shift the dominant mindset away from a focus on
discrimination, loneliness or the negative aspects of life and towards a focus on promoting
confidence, offering support and creating positive characteristics based on individual remarks. Thus,
we instructed annotators that if a comment or post meets the following conditions, then it should be
annotated as hope speech.
• The comment contains inspiration provided to participants by their peers and others and/or
offers support, reassurance, suggestions and insight.
• The comment promotes well-being and satisfaction (past), joy, sensual pleasures and
happiness (present).
• The comment triggers constructive cognition about the future—optimism, hope and faith.
• The comment brings out a survival story of gay, lesbian or transgender individuals, women in
science or a COVID-19 survivor.
• The comment talks about fairness in the industry (e.g. [I do not think banning all apps is
right; we should ban only the apps which are not safe]).
• The comment explicitly talks about a hopeful future (e.g. [We will survive these things]).
• The comment explicitly talks about and says no to division in any form.
Non-hope speech includes comments that do not bring positivity, such as the following:
• The comment is highly prejudiced and attacks people without thinking about the
consequences.
Non-hope speech is different from hate speech. Some examples are provided below.
• ’How is that the same thing???’ This is non-hope speech, but it is not hate speech.—
explanation
• ’Society says don’t assume, but they assume to anyways’ This is non-hope speech, but it is
not hate speech.—explanation
A hate speech or offensive language detection dataset is not available for code-mixed Tamil and
code-mixed Malayalam, and it does not take into account LGBTQIA+ people, women in STEM or
other minority or under-represented groups. Thus, we cannot use the existing hate speech or
offensive language detection datasets to detect hope or non-hope for EDI of minorities.
4 Dataset construction
This section details the data collection process for the research on hope speech detection in social
media comments:
• Platform: YouTube comments were chosen due to its popularity for public opinion sharing.
• Focus on Public Content: The research avoided using comments from personal stories of
LGBTQIA+ individuals to protect privacy.
o English: Focused on recent EDI themes like women in STEM, LGBTQIA+ rights,
COVID-19, and international relations. Data was collected from English-speaking
countries like the US, UK, Canada, etc.
o Tamil and Malayalam (India): Focused on themes like LGBTQIA+ rights, COVID-19,
women in STEM, Indo-China conflict, and language diversity issues.
• Data Collection Period: Comments were collected between November 2019 and June 2020.
• Scraper Tool: A YouTube comment scraper was used for data collection.
The paper emphasizes that this multilingual dataset (English, Tamil, Malayalam) aims to:
4.1 Code-mixing
When a speaker employs two or more languages in a single speech, it is known as code-mixing. It is
prevalent in the social media discourse of multilingual speakers. Code-mixing has long been
connected with a lack of formal or informal linguistic expertise. It is, nevertheless, common in
usergenerated social media material according to studies. In a multilingual country like India,
codemixing is quite a frequent occurrence [54,55,56,57]. Our Tamil and Malayalam datasets are
codemixed since our data was collected from YouTube. In our corpus, we found all three forms of
codemixing, including tag, inter-sentential and intra-sentential. Our corpus also includes code-mixing
between Latin and native scripts.
4.2 Ethical concerns
Data collected from social media is extremely sensitive, especially when it concerns minorities such
as the LGBTQIA+ community or women. By eliminating personal information from the dataset, such
as names but not celebrity names, we have taken great care to reduce the danger of the data
revealing an individual’s identity. However, in order to investigate EDI, we needed to keep track of
the information on race, gender, sexual orientation, ethnicity and philosophical views. The
annotators only viewed anonymised postings and promised not to contact the author of a remark.
Only researchers who agree to follow ethical norms will be given access to the dataset for research
purposes. We opted not to ask the annotator for racial information after a lengthy debate with our
local EDI committee members.Footnote4 Due to recent events, the EDI committee was strongly against
the collection of racial information based on the belief that it would split people according to their
racial origin. Thus, we recorded only the nationality of the annotators.
Table 1 Annotators
Language English Tamil Malayalam
Gender Male 4 2 2
Female 5 3 5
Non-binary 2 1 0
Graduate 4 4 5
Postgraduate 6 2 2
Total 11 6 7
After the data collection phase, we cleaned the data using LangdetectFootnote5 to identify the language
of the comments and removed comments that were not in the specified languages. However, owing
to code-mixing at various levels, comments in other languages became unintentionally included in
the cleaned corpus of the Tamil and Malayalam comments. Finally, based on our description from
Sect. 3, we identified three groups, two of which were hope and non-hope; the last group (Other
languages) was introduced to account for comments that were not in the required language. These
classes were chosen since they provided a sufficient amount of generalisation for describing the
remarks in the EDI hope speech dataset.
4.4 Annotators
We created Google forms to collect annotations from annotators. To maintain the level of
annotation, each form was limited to 100 comments and each page to ten comments. We collected
information on the annotator’s gender, educational background and preferred medium of
instruction in order to comprehend the annotator’s diversity and avoid bias. The annotators were
warned that the comments may contain profanity and hostile material. If the annotator deemed the
remarks to be too upsetting or unmanageable, they were offered the choice of ceasing to annotate.
We trained annotators by directing them to YouTube videos on
EDI.Footnote6,Footnote7,Footnote8,Footnote9 Each form was annotated by at least three individuals. After the
annotators marked the first form with 100 comments, the findings were manually validated in the
warm-up phase. This strategy was utilised to help them acquire a better knowledge of EDI and focus
on the project. Following the initial stage of annotating their first form, a few annotators withdrew
from the project and their remarks were deleted. The annotators were told to conduct another
evaluation of the EDI videos and annotation guidelines. From Table 1, we can see the statistics
pertaining to the annotators. The annotators for English language remarks came from Australia,
Ireland, the United Kingdom and the United States of America. We were able to obtain annotations
in Tamil from persons from both India’s Tamil Nadu and Sri Lanka. Graduate and postgraduate
students made up the majority of the annotators.
We used the majority to aggregate the hope speech annotations from several annotators; the
comments that did not get a majority in the first round were collected and added to a second Google
form to allow more annotators to contribute them. We calculated the inter-annotator agreement
following the last round of annotation. We quantified the clarity of the annotation and reported the
inter-annotator agreement using Krippendorff’s alpha. Krippendorff’s alpha is a statistical measure
of annotator agreement that indicates how well the resulting data corresponds to actual data [58].
Although Krippendorff’s alpha (𝛼) is computationally hard, it was more relevant in our instance
since the comments were annotated by more than two annotators and not all sentences were
commented on by the same annotator. It is unaffected by missing data, allows for variation in
sample sizes, categories and the number of raters and may be used with any measurement level,
including nominal, ordinal, interval and ratio. 𝛼 is characterised by the following:
4.6 Corpus statistics
Our dataset contains 59,354 YouTube comments, with 28,451 comments in English, 20,198
comments in Tamil and 10,705 comments in Malayalam. Our dataset also includes 59,354 comments
in other languages. The distribution of our dataset is depicted in Table 2. When tokenising words and
phrases in the comments, we used the nltk tool to obtain corpus statistics for use in research. Tamil
and Malayalam have a broad vocabulary as a result of the various types of code-switching that take
place.
Table 3 shows the distribution of the annotated dataset by the label in the reference tab: data
distribution. As a result, the data was found to be biased, with nearly all of the comments being
classified as ’not optimistic’ (NOT). An automatic detection system that can manage imbalanced data
is essential for being really successful in the age of user-generated content on internet platforms,
which is becoming increasingly popular. Using the fully annotated dataset, a train set, a
development set and a test set were produced.
A few samples from the dataset, together with their translations and hope speech class annotations,
are shown below.
• kashtam thaan. irundhaalum muyarchi seivom – It is indeed difficult. Let us try it out
though. Hope speech
• uff. China mon vannallo– Phew! Here comes the Chinese guy. Non-hope speech
• paambu kari saappittu namma uyirai vaanguranunga– These guys (Chinese) eat snake meat
and make our lives miserable. Non-hope speech
• ’God gave us a choice.’ This sentence was interpreted by some as hopeful and others as not
hopeful.
• Sri Lankan Tamilar history patti pesunga—Please speak about history of Tamil people in Sri
Lanka. Inter-sentential switch in Tamil corpus written using Latin script. The history of Tamil
people in Sri Lanka is both hopeful and non-hopeful due to the recent civil war.
• Bro helo app ku oru alternate appa solunga.— Bro tell me an alternate app for Helo app.
Intra-sentential and tag switch in Tamil corpus written using Latin script.
Precision
Recall
F-score
Table 6 Precision, recall and F-score for Tamil: support is the number of actual occurrences of the
class in the specified dataset
Classifier Hope Speech Not-Hope Speech Macro Avg Weighted Avg
Precision
Recall
F-score
Table 7 Precision, recall and F-score for Malayalam: support is the number of actual occurrences
of the class in the specified dataset
Classifier Hope Speech Not-Hope Speech Macro Avg Weighted Avg
Precision
Recall
F-score
5 Benchmark experiments
This section of the paper discusses the technical aspects of the research, including the experiments
conducted to evaluate the hope speech detection models:
• Evaluation Metrics:
o Macro-averaged F1 score: This metric was chosen due to the imbalanced nature of
the dataset (unequal distribution of hope and non-hope speech examples).
o Accuracy, Recall, and Weighted F1 score: These metrics were also reported for
individual classes to provide a more comprehensive picture of model performance.
• Baseline Classifiers:
• Advanced Model: XLM-Roberta: This pre-trained transformer model from Facebook AI was
used for comparison with the baseline models.
Key Findings:
• Challenges of Class Imbalance: All models suffered from lower performance due to the
imbalanced dataset (more non-hope speech examples).
• SVM Performance: SVM showed the weakest performance across all languages (English,
Tamil, Malayalam).
• Decision Tree vs. Logistic Regression: Decision Tree performed better for English and
Malayalam, while Logistic Regression was better for Tamil.
• Impact of "Other Languages" Label: The presence of comments in languages other than the
target languages (English, Tamil, Malayalam) affected the overall performance, particularly
for English. This inconsistency could have been avoided for English by removing these
comments, but it was necessary for code-mixed languages like Tamil and Malayalam.
• Tamil Language Balance: The distribution of hope and non-hope speech examples was more
balanced for Tamil compared to English and Malayalam.
Conclusion:
• The paper acknowledges the limitations of the models due to class imbalance but highlights
the potential of the Hope Speech Detection (HopeEDI) dataset.
• The authors believe this dataset can be a valuable resource for future research on positivity
in language technology.
6 Task description
We also organised a shared task to invite more researchers to perform hope speech detection and
benchmark the data. For our problem, we defined the hope speech as ’YouTube comments/posts
that offer support, reassurance, suggestions, inspiration and insight’. A comment or post within the
corpus may contain more than one sentence, but the average sentence length of the corpus is one.
The annotations in the corpus were made at a comment/post level. The datasets for development,
training and testing were supplied to the participants in English, Tamil and Malayalam.
During the first phase, participants were provided with training, validation and development data in
order to train and develop hope speech detection for one or more of the three languages.
Crossvalidation on the training data was an option as was using the validation dataset for early
evaluations and the development set for hyper-parameter sharing. The objective of this step was to
guarantee that the participants’ systems were ready for review before the test data was released. In
total, 137 people registered and downloaded the data in all three languages.
The test dataset was provided without the gold labels in CodaLab during this phase. Participants
were given Google forms to fill out in order to submit their predictions. They were given the option
of submitting their findings as many times as they wished, with the best entry being picked for
assessment and the creation of the rank list. The outcomes were compared to the gold standard
labels. Across all classes, the classification system’s performance was assessed in terms of the
weighted averaged precision, recall and F-score. The support-weighted mean per label was
calculated using the weighted averaged scores. The metric used for preparing the rank list was the
weighted F1 score. Participants were encouraged to check their systems using the Sklearn
classification report.Footnote11 The final test included 30, 31 and 31 participants for Tamil, Malayalam
and English languages, respectively.
7 Systems
Table 8 Rank list based on F1-score along with other evaluation metrics (precision and recall) for
the Tamil language
Team-Name Precision Recall F1 Score Rank
Table 9 Rank list based on F1-score along with other evaluation metrics (precision and recall) for
the Malayalam language
Team-Name Precision Recall F1 Score Rank
Table 10 Rank list based on F1 score along with other evaluation metrics (precision and recall) for
the English language
Team-name Precision Recall F1 score Rank
• Common Techniques:
• Key Findings:
Top Performers:
The paper acknowledges several participants who achieved top rankings for different languages:
• Dominance of XLM-RoBERTa: The most successful submissions across all languages (English,
Malayalam, Tamil) relied on XLM-RoBERTa, a pre-trained transformer model well-suited for
multilingual tasks.
• RNNs with Contextual Embeddings: One top performer for English used a combination of
Recurrent Neural Networks (RNNs) with context-aware string embeddings and pooled
document embeddings.
• Challenges with Tamil: Performance on Tamil was significantly lower compared to English
and Malayalam. This could be due to:
o Class Imbalance: The distribution of hope and non-hope speech examples might
have been uneven in the Tamil dataset.
Future Directions
• Investigating Lower Performing Languages: Analyze the reasons behind lower performance
in Tamil and explore methods to improve hope speech detection in code-mixed languages.
• Mitigating Class Imbalance: Develop techniques to address class imbalance issues in hope
speech detection datasets.
• Explainability of Models: Explore methods to understand how the models make predictions,
particularly for identifying hope speech.
• Multilingual Hope Speech Detection: Further research on hope speech detection in more
languages beyond English, Tamil, and Malayalam.
9 Conclusion
As online content increases massively, it is necessary to encourage positivity, such as in the form of
hope speech on online forums, to induce compassion and acceptable social behaviour. In this paper,
we presented the largest manually annotated dataset of hope speech detection in English, Tamil and
Malayalam consisting of 28,451, 20,198 and 10,705 comments, respectively. We believe that this
dataset will facilitate future research on encouraging positivity. We aimed to promote research on
hope speech and encourage positive content on online social media for ensuring EDI. In the future,
we plan to extend the study by introducing a larger dataset with further fine-grained classification
and content analysis.