CS 3308 Learning Journal Unit 7
CS 3308 Learning Journal Unit 7
and insights that I acquired during the seventh unit of my CS 3308 course, which was focused on
Information Retrieval. Specifically, this unit delved into the intricacies of web search engines,
culminating in a practical exercise where we were tasked with constructing a simple web
crawler.
Description of Activities
The unit commenced with a thorough review of the provided learning guide and pertinent
literature. Notable works included Schwartz's (1998) discussion on the fundamental principles of
web search engines and Bar-Ilan's (2004) insights into their practical usage in information
science research. These resources laid the groundwork for understanding the evolution and
Following this theoretical foundation, the practical component of the unit required the
creation of a web crawler. The objectives of this programming assignment were to write a Python
script that would take a URL input from the user, traverse the linked pages, extract relevant
content, and build an index of the collected information. The challenge was to adhere to specific
The BeautifulSoup library was instrumental in parsing the HTML content effectively.
This project served as an integration of various elements from earlier units, particularly the
search engine that we had developed in Unit 5. The final deliverable included key performance
metrics such as the number of documents processed, and unique terms indexed.
Reactions to Activities
The task of building a web crawler was initially quite overwhelming, yet it grew
increasingly rewarding as I progressed through the development process. The complexity of the
task, which encompassed both web scraping and indexing, presented a steep learning curve.
However, upon successful completion of the project, I felt a profound sense of achievement. This
hands-on project served as an excellent vehicle to apply the theoretical knowledge in a tangible
retrieval systems.
Peer feedback played a critical role in refining my work. Through the peer assessments,
one of my classmates offered constructive criticism regarding the efficiency of my code, which
practices, such as respecting the robots.txt protocol, was invaluable in shaping my approach to
web crawling.
Emotionally, this unit elicited a blend of excitement and apprehension. The excitement
stemmed from the prospect of creating a functioning web crawler, while the anxiety arose from
the fear of failing to meet the assignment's stringent requirements. Despite these challenges, the
successful completion of the project instilled in me a sense of pride and significantly bolstered
Lessons Learned
This unit imparted several significant lessons about information retrieval systems and my
intricacies involved in indexing and retrieving web content. Risvik and Michelsen's (2002) work
on search engines and web dynamics emphasized the importance of these processes.
Additionally, I discovered that breaking down complex tasks into smaller, more manageable
components can significantly reduce their intimidation factor and enhance my ability to tackle
them.
One unexpected aspect was the sheer amount of effort required to ensure accurate
indexing of web content. The challenge of managing memory usage while navigating large
websites was particularly demanding, necessitating careful planning and the application of
Throughout the course of this unit, my skills in Python programming were significantly
distinguishing between static and dynamic content, as discussed in Chu and Rosenthal's (1996)
study.
Integrating theoretical knowledge with hands-on projects allows for a more profound and
effective assimilation of material. This approach has not only reinforced my interest in the field
but has also highlighted the potential for growth and specialization within computer science.
Application of Concepts
The knowledge and skills acquired from Unit 7 are directly applicable to my future career
aspirations as a software engineer and in building my own edtech company. Having a firm grasp
on how search engine’s function will undoubtedly benefit me in designing and implementing
Conclusion
expanding my understanding of web search engines and enhancing my technical expertise. The
integration of theory and practice has solidified my interest in the field of information retrieval
systems. I look forward to applying these concepts in future academic and professional
endeavors.
References
Bar-Ilan, J. (2004). The use of Web search engines in information science research. Annual
Chu, H., & Rosenthal, M. (1996). Search engines for the World Wide Web: A comparative study
Manning, C. D., Raghavan, P., & Schütze, H. (2009). An Introduction to Information Retrieval
Risvik, K. M., & Michelsen, R. (2002). Search engines and web dynamics. Computer Networks,
39(3), 289-302.
Schwartz, C. (1998). Web search engines. Journal of the American Society for Information