0% found this document useful (0 votes)
33 views5 pages

CS 3308 Learning Journal Unit 7

The seventh unit of the CS 3308 course focused on Information Retrieval, where students created a web crawler using Python, enhancing their understanding of web search engines. The practical experience combined theoretical knowledge with hands-on application, leading to significant personal growth in programming and problem-solving skills. Feedback from peers and the challenges faced during the project reinforced the importance of optimization and ethical practices in web crawling.

Uploaded by

Reg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views5 pages

CS 3308 Learning Journal Unit 7

The seventh unit of the CS 3308 course focused on Information Retrieval, where students created a web crawler using Python, enhancing their understanding of web search engines. The practical experience combined theoretical knowledge with hands-on application, leading to significant personal growth in programming and problem-solving skills. Feedback from peers and the challenges faced during the project reinforced the importance of optimization and ethical practices in web crawling.

Uploaded by

Reg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

In this learning journal entry, I would like to express my reflections on the experiences

and insights that I acquired during the seventh unit of my CS 3308 course, which was focused on

Information Retrieval. Specifically, this unit delved into the intricacies of web search engines,

culminating in a practical exercise where we were tasked with constructing a simple web

crawler.

Description of Activities

The unit commenced with a thorough review of the provided learning guide and pertinent

literature. Notable works included Schwartz's (1998) discussion on the fundamental principles of

web search engines and Bar-Ilan's (2004) insights into their practical usage in information

science research. These resources laid the groundwork for understanding the evolution and

operational mechanisms of these systems.

Following this theoretical foundation, the practical component of the unit required the

creation of a web crawler. The objectives of this programming assignment were to write a Python

script that would take a URL input from the user, traverse the linked pages, extract relevant

content, and build an index of the collected information. The challenge was to adhere to specific

constraints, such as limiting the number of URLs processed to 500.

The BeautifulSoup library was instrumental in parsing the HTML content effectively.

This project served as an integration of various elements from earlier units, particularly the

search engine that we had developed in Unit 5. The final deliverable included key performance

metrics such as the number of documents processed, and unique terms indexed.

Reactions to Activities
The task of building a web crawler was initially quite overwhelming, yet it grew

increasingly rewarding as I progressed through the development process. The complexity of the

task, which encompassed both web scraping and indexing, presented a steep learning curve.

However, upon successful completion of the project, I felt a profound sense of achievement. This

hands-on project served as an excellent vehicle to apply the theoretical knowledge in a tangible

context, thereby reinforcing my comprehension of the underlying principles of information

retrieval systems.

Feedback and Interactions

Peer feedback played a critical role in refining my work. Through the peer assessments,

one of my classmates offered constructive criticism regarding the efficiency of my code, which

motivated me to revisit and optimize my implementation. This interaction underscored the

importance of considering optimization strategies, as highlighted by Manning et al. (2009) in

their textbook on information retrieval. Additionally, a colleague’s guidance on ethical best

practices, such as respecting the robots.txt protocol, was invaluable in shaping my approach to

web crawling.

Feelings and Attitudes

Emotionally, this unit elicited a blend of excitement and apprehension. The excitement

stemmed from the prospect of creating a functioning web crawler, while the anxiety arose from

the fear of failing to meet the assignment's stringent requirements. Despite these challenges, the

successful completion of the project instilled in me a sense of pride and significantly bolstered

my self-assurance in my programming abilities.

Lessons Learned
This unit imparted several significant lessons about information retrieval systems and my

personal learning preferences. Firstly, I developed a more nuanced understanding of the

intricacies involved in indexing and retrieving web content. Risvik and Michelsen's (2002) work

on search engines and web dynamics emphasized the importance of these processes.

Additionally, I discovered that breaking down complex tasks into smaller, more manageable

components can significantly reduce their intimidation factor and enhance my ability to tackle

them.

Surprises and Challenges

One unexpected aspect was the sheer amount of effort required to ensure accurate

indexing of web content. The challenge of managing memory usage while navigating large

websites was particularly demanding, necessitating careful planning and the application of

judicious coding techniques.

Skills and Knowledge Gained

Throughout the course of this unit, my skills in Python programming were significantly

honed. Furthermore, I improved my problem-solving abilities, particularly in the realm of

troubleshooting code. I also acquired a deeper comprehension of the web's structure,

distinguishing between static and dynamic content, as discussed in Chu and Rosenthal's (1996)

study.

Realizations About Myself as a Learner

This experience underscored my preference for learning through practical application.

Integrating theoretical knowledge with hands-on projects allows for a more profound and
effective assimilation of material. This approach has not only reinforced my interest in the field

but has also highlighted the potential for growth and specialization within computer science.

Application of Concepts

The knowledge and skills acquired from Unit 7 are directly applicable to my future career

aspirations as a software engineer and in building my own edtech company. Having a firm grasp

on how search engine’s function will undoubtedly benefit me in designing and implementing

more efficient data retrieval algorithms.

Conclusion

In summary, the seventh unit of CS 3308 has been a transformative experience,

expanding my understanding of web search engines and enhancing my technical expertise. The

integration of theory and practice has solidified my interest in the field of information retrieval

systems. I look forward to applying these concepts in future academic and professional

endeavors.
References

Bar-Ilan, J. (2004). The use of Web search engines in information science research. Annual

Review of Information Science and Technology, 38, 231-288.

Chu, H., & Rosenthal, M. (1996). Search engines for the World Wide Web: A comparative study

and evaluation methodology. In Proceedings of the Annual Meeting-American Society

for Information Science (Vol. 33, pp. 127-135).

Manning, C. D., Raghavan, P., & Schütze, H. (2009). An Introduction to Information Retrieval

(Online ed.). Cambridge University Press.

Risvik, K. M., & Michelsen, R. (2002). Search engines and web dynamics. Computer Networks,

39(3), 289-302.

Schwartz, C. (1998). Web search engines. Journal of the American Society for Information

Science, 49(11), 973-982.

You might also like