0% found this document useful (0 votes)

33 views5 pages

CS 3308 Learning Journal Unit 7

The seventh unit of the CS 3308 course focused on Information Retrieval, where students created a web crawler using Python, enhancing their understanding of web search engines. The practical experience combined theoretical knowledge with hands-on application, leading to significant personal growth in programming and problem-solving skills. Feedback from peers and the challenges faced during the project reinforced the importance of optimization and ethical practices in web crawling.

Uploaded by

Reg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views5 pages

CS 3308 Learning Journal Unit 7

Uploaded by

Reg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

In this learning journal entry, I would like to express my reflections on the experiences

and insights that I acquired during the seventh unit of my CS 3308 course, which was focused on

Information Retrieval. Specifically, this unit delved into the intricacies of web search engines,

culminating in a practical exercise where we were tasked with constructing a simple web

crawler.

Description of Activities

The unit commenced with a thorough review of the provided learning guide and pertinent

literature. Notable works included Schwartz's (1998) discussion on the fundamental principles of

web search engines and Bar-Ilan's (2004) insights into their practical usage in information

science research. These resources laid the groundwork for understanding the evolution and

operational mechanisms of these systems.

Following this theoretical foundation, the practical component of the unit required the

creation of a web crawler. The objectives of this programming assignment were to write a Python

script that would take a URL input from the user, traverse the linked pages, extract relevant

content, and build an index of the collected information. The challenge was to adhere to specific

constraints, such as limiting the number of URLs processed to 500.

The BeautifulSoup library was instrumental in parsing the HTML content effectively.

This project served as an integration of various elements from earlier units, particularly the

search engine that we had developed in Unit 5. The final deliverable included key performance

metrics such as the number of documents processed, and unique terms indexed.

Reactions to Activities
The task of building a web crawler was initially quite overwhelming, yet it grew

increasingly rewarding as I progressed through the development process. The complexity of the

task, which encompassed both web scraping and indexing, presented a steep learning curve.

However, upon successful completion of the project, I felt a profound sense of achievement. This

hands-on project served as an excellent vehicle to apply the theoretical knowledge in a tangible

context, thereby reinforcing my comprehension of the underlying principles of information

retrieval systems.

Feedback and Interactions

Peer feedback played a critical role in refining my work. Through the peer assessments,

one of my classmates offered constructive criticism regarding the efficiency of my code, which

motivated me to revisit and optimize my implementation. This interaction underscored the

importance of considering optimization strategies, as highlighted by Manning et al. (2009) in

their textbook on information retrieval. Additionally, a colleague’s guidance on ethical best

practices, such as respecting the robots.txt protocol, was invaluable in shaping my approach to

web crawling.

Feelings and Attitudes

Emotionally, this unit elicited a blend of excitement and apprehension. The excitement

stemmed from the prospect of creating a functioning web crawler, while the anxiety arose from

the fear of failing to meet the assignment's stringent requirements. Despite these challenges, the

successful completion of the project instilled in me a sense of pride and significantly bolstered

my self-assurance in my programming abilities.

Lessons Learned
This unit imparted several significant lessons about information retrieval systems and my

personal learning preferences. Firstly, I developed a more nuanced understanding of the

intricacies involved in indexing and retrieving web content. Risvik and Michelsen's (2002) work

on search engines and web dynamics emphasized the importance of these processes.

Additionally, I discovered that breaking down complex tasks into smaller, more manageable

components can significantly reduce their intimidation factor and enhance my ability to tackle

them.

Surprises and Challenges

One unexpected aspect was the sheer amount of effort required to ensure accurate

indexing of web content. The challenge of managing memory usage while navigating large

websites was particularly demanding, necessitating careful planning and the application of

judicious coding techniques.

Skills and Knowledge Gained

Throughout the course of this unit, my skills in Python programming were significantly

honed. Furthermore, I improved my problem-solving abilities, particularly in the realm of

troubleshooting code. I also acquired a deeper comprehension of the web's structure,

distinguishing between static and dynamic content, as discussed in Chu and Rosenthal's (1996)

study.

Realizations About Myself as a Learner

This experience underscored my preference for learning through practical application.

Integrating theoretical knowledge with hands-on projects allows for a more profound and
effective assimilation of material. This approach has not only reinforced my interest in the field

but has also highlighted the potential for growth and specialization within computer science.

Application of Concepts

The knowledge and skills acquired from Unit 7 are directly applicable to my future career

aspirations as a software engineer and in building my own edtech company. Having a firm grasp

on how search engine’s function will undoubtedly benefit me in designing and implementing

more efficient data retrieval algorithms.

Conclusion

In summary, the seventh unit of CS 3308 has been a transformative experience,

expanding my understanding of web search engines and enhancing my technical expertise. The

integration of theory and practice has solidified my interest in the field of information retrieval

systems. I look forward to applying these concepts in future academic and professional

endeavors.
References

Bar-Ilan, J. (2004). The use of Web search engines in information science research. Annual

Review of Information Science and Technology, 38, 231-288.

Chu, H., & Rosenthal, M. (1996). Search engines for the World Wide Web: A comparative study

and evaluation methodology. In Proceedings of the Annual Meeting-American Society

for Information Science (Vol. 33, pp. 127-135).

Manning, C. D., Raghavan, P., & Schütze, H. (2009). An Introduction to Information Retrieval

(Online ed.). Cambridge University Press.

Risvik, K. M., & Michelsen, R. (2002). Search engines and web dynamics. Computer Networks,

39(3), 289-302.

Schwartz, C. (1998). Web search engines. Journal of the American Society for Information

Science, 49(11), 973-982.

Information Retrieval - Search Engines in The Web
No ratings yet
Information Retrieval - Search Engines in The Web
2 pages
CS 3308-01 Information Retrieval Learning Journal 7
No ratings yet
CS 3308-01 Information Retrieval Learning Journal 7
7 pages
Information Retrieval Techniques by Iresh Dhotre
100% (3)
Information Retrieval Techniques by Iresh Dhotre
168 pages
Learning Guide Unit 8 - Home
No ratings yet
Learning Guide Unit 8 - Home
10 pages
Prem Report Internship
No ratings yet
Prem Report Internship
56 pages
CS8080 Irt
No ratings yet
CS8080 Irt
258 pages
Irt Book1
No ratings yet
Irt Book1
175 pages
Iseeker: A Client-Side Internet Search Application
No ratings yet
Iseeker: A Client-Side Internet Search Application
97 pages
Search Engines Information Retrieval in Practice PDF
No ratings yet
Search Engines Information Retrieval in Practice PDF
542 pages
CS8080 IRT Unit V Digital Material 06.11.2023
No ratings yet
CS8080 IRT Unit V Digital Material 06.11.2023
64 pages
Web Info PDF
No ratings yet
Web Info PDF
4 pages
CS 3308 Learning Journal Unit 8
No ratings yet
CS 3308 Learning Journal Unit 8
5 pages
Minor Report
No ratings yet
Minor Report
46 pages
Learning Guide Unit 7 - Home
No ratings yet
Learning Guide Unit 7 - Home
12 pages
Effective Web Crawling
No ratings yet
Effective Web Crawling
191 pages
Umang Vyas Report
No ratings yet
Umang Vyas Report
51 pages
Title Search Engine: Submitted in Partial Fulfillment For The Award of Degree
No ratings yet
Title Search Engine: Submitted in Partial Fulfillment For The Award of Degree
40 pages
Extending The Performance and Quality of Search
No ratings yet
Extending The Performance and Quality of Search
20 pages
Arasu 2001
No ratings yet
Arasu 2001
42 pages
WT Menual Final File-1
No ratings yet
WT Menual Final File-1
72 pages
New - CS3EW04-Internet and Web-Technologies-Lab Manual
No ratings yet
New - CS3EW04-Internet and Web-Technologies-Lab Manual
67 pages
CS 3308 Learning Journal 6
No ratings yet
CS 3308 Learning Journal 6
8 pages
7 CurrentTrendsAndIssues
No ratings yet
7 CurrentTrendsAndIssues
50 pages
Project Proposal
No ratings yet
Project Proposal
10 pages
CS 3308 Learning Journal Unit 5
No ratings yet
CS 3308 Learning Journal Unit 5
6 pages
Crawler Thesis
No ratings yet
Crawler Thesis
188 pages
Irt Syllabus
No ratings yet
Irt Syllabus
3 pages
1941
No ratings yet
1941
105 pages
Information Retrieval LJ
No ratings yet
Information Retrieval LJ
5 pages
Paypal Credit Cashout Guide
100% (2)
Paypal Credit Cashout Guide
6 pages
Year 6 - Sample - Theory Paper-1
No ratings yet
Year 6 - Sample - Theory Paper-1
2 pages
CS8080 Information Retrieval Technique Ripped From Amazon Kindle
No ratings yet
CS8080 Information Retrieval Technique Ripped From Amazon Kindle
168 pages
Rpi 08 FSG 1
No ratings yet
Rpi 08 FSG 1
384 pages
Ch05 Logistics
No ratings yet
Ch05 Logistics
72 pages
Information Retrieval Systems (A70533)
No ratings yet
Information Retrieval Systems (A70533)
11 pages
Major Project PROPOSAL-BACHELOR OF ENGINEERING
No ratings yet
Major Project PROPOSAL-BACHELOR OF ENGINEERING
37 pages
A Comparison of Open Source Search Engine
No ratings yet
A Comparison of Open Source Search Engine
46 pages
MATH 1281 - Unit 4 Discussion Assignment
No ratings yet
MATH 1281 - Unit 4 Discussion Assignment
5 pages
Report Format
No ratings yet
Report Format
15 pages
An Energy-Efficient Hybrid Clustering Technique EEHCT For IoT-Based Multilevel Heterogeneous Wireless Sensor Networks
No ratings yet
An Energy-Efficient Hybrid Clustering Technique EEHCT For IoT-Based Multilevel Heterogeneous Wireless Sensor Networks
18 pages
Learning Guide Unit 6 - Home
No ratings yet
Learning Guide Unit 6 - Home
10 pages
Learning Guide Unit 1 - Home
No ratings yet
Learning Guide Unit 1 - Home
10 pages
Review of "Search Engines: Information Retrieval in Practice"
No ratings yet
Review of "Search Engines: Information Retrieval in Practice"
6 pages
Iste Search Engine
No ratings yet
Iste Search Engine
6 pages
Zidio Web Development
No ratings yet
Zidio Web Development
12 pages
BA Learn
No ratings yet
BA Learn
7 pages
MATH 1281 - Unit 3 Assignment
No ratings yet
MATH 1281 - Unit 3 Assignment
5 pages
OS Search Engine Comparison
No ratings yet
OS Search Engine Comparison
46 pages
Relevancy Based Content Search in Semantic Web
No ratings yet
Relevancy Based Content Search in Semantic Web
2 pages
AIMS PLUS CONTROL - Atlantic Auto Suppliers
No ratings yet
AIMS PLUS CONTROL - Atlantic Auto Suppliers
30 pages
MATH 1281 - Unit 5 Assignment
No ratings yet
MATH 1281 - Unit 5 Assignment
4 pages
PCE21CS178 VANDITA GOYAL Industrial Training Report
No ratings yet
PCE21CS178 VANDITA GOYAL Industrial Training Report
33 pages
Supplement 20091006
No ratings yet
Supplement 20091006
58 pages
Design and Implementation of A Simple Web Search E
No ratings yet
Design and Implementation of A Simple Web Search E
9 pages
WaveSculptor22 - Software
0% (1)
WaveSculptor22 - Software
20 pages
Strum Session 2 Manual
No ratings yet
Strum Session 2 Manual
64 pages
Frequency Inverters 8200, 9300 Vector (Filters) PDF
No ratings yet
Frequency Inverters 8200, 9300 Vector (Filters) PDF
118 pages
In Term Review Report
No ratings yet
In Term Review Report
8 pages
Cs6007 - Information Retrieval: Objectives: The Student Should Be Made To
No ratings yet
Cs6007 - Information Retrieval: Objectives: The Student Should Be Made To
24 pages
Bharghav Srivatsav Yerravalli-SOP
No ratings yet
Bharghav Srivatsav Yerravalli-SOP
2 pages
Web Crawler A Survey
No ratings yet
Web Crawler A Survey
3 pages
Lab2 VHDL
No ratings yet
Lab2 VHDL
1 page
Oracle Database As A Service (Dbaas)
No ratings yet
Oracle Database As A Service (Dbaas)
17 pages
Lecture 2
No ratings yet
Lecture 2
29 pages
A SWOT Analysis of Reliability Centered Maintenance Framework
100% (1)
A SWOT Analysis of Reliability Centered Maintenance Framework
19 pages
Anand Institute of Higher Technology KAZHIPATTUR - 603 103
No ratings yet
Anand Institute of Higher Technology KAZHIPATTUR - 603 103
5 pages
Manual Parts L405
No ratings yet
Manual Parts L405
16 pages
Conclusion For Srs
No ratings yet
Conclusion For Srs
5 pages
MATH 1281 - Unit 8 Assignment
100% (1)
MATH 1281 - Unit 8 Assignment
2 pages
MATH 1302 - Unit 2 Discussion Assignment
No ratings yet
MATH 1302 - Unit 2 Discussion Assignment
4 pages
Unit I - Introduction Part B - Questions
No ratings yet
Unit I - Introduction Part B - Questions
3 pages
Tufin Securetrack Product Brief
No ratings yet
Tufin Securetrack Product Brief
4 pages
MATH 1280-Unit 2 Discussion Assignment
No ratings yet
MATH 1280-Unit 2 Discussion Assignment
2 pages
DADT Group 29 Section A MBAFT24
No ratings yet
DADT Group 29 Section A MBAFT24
10 pages
Syllabus
No ratings yet
Syllabus
9 pages
Course Plan: Department of Computer Science Enginnering
No ratings yet
Course Plan: Department of Computer Science Enginnering
8 pages
Sicam Pas Substation Automation System: Function Overview
No ratings yet
Sicam Pas Substation Automation System: Function Overview
6 pages
Java Web Crawler
No ratings yet
Java Web Crawler
1 page
MATH 1280-Unit 1 Discussion Assignment
No ratings yet
MATH 1280-Unit 1 Discussion Assignment
3 pages
Pune Cycle Plan
No ratings yet
Pune Cycle Plan
1 page
ENGL 1102-Unit 2 Discussion Assignment
No ratings yet
ENGL 1102-Unit 2 Discussion Assignment
3 pages
SPA44K0040
No ratings yet
SPA44K0040
1 page
Focus 2 Unit 3 TEST Live Worksheets
No ratings yet
Focus 2 Unit 3 TEST Live Worksheets
1 page
MC34063 Ic Tăng Áp DC-DC
No ratings yet
MC34063 Ic Tăng Áp DC-DC
7 pages
Mil Spec
No ratings yet
Mil Spec
36 pages
Saphdar Alam
No ratings yet
Saphdar Alam
1 page
EX4 Constraint
No ratings yet
EX4 Constraint
3 pages
InductionHeatingDevices FAG PDF
No ratings yet
InductionHeatingDevices FAG PDF
8 pages
Ansible Automation Workshop
100% (2)
Ansible Automation Workshop
127 pages
FPTS - 2016 Brochure
No ratings yet
FPTS - 2016 Brochure
2 pages
De 1 - ISTQB - On Thi
No ratings yet
De 1 - ISTQB - On Thi
8 pages