Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe how mirrors-countme interfaces with external components #41

Open
5 tasks
nphilipp opened this issue May 10, 2023 · 0 comments
Open
5 tasks

Describe how mirrors-countme interfaces with external components #41

nphilipp opened this issue May 10, 2023 · 0 comments
Labels
contributing Improvements to contributing documentation Improvements or additions to documentation maintenance Something outside of a full-scale project

Comments

@nphilipp
Copy link
Member

nphilipp commented May 10, 2023

Story

As a contributor to mirrors-countme,
I want to know how it interfaces with external components,
so that I better understand how it works.

Acceptance Criteria

  • A document exists which describes these interactions of mirrors-countme scripts:
    • what feeds into them (e.g. scripts, cron jobs, files)
    • intermediate data like the raw_db
    • where outputs go and how they are used further
  • Nice to have: a diagram for the data flows (possibly break out into its own ticket?)

Background

This repository is only part of the picture, the rest lives in the Fedora Infrastructure Ansible repo on Pagure, in the web-data-analysis role (which also contains unrelated analytical functionality 🥳).

Findings so far, most if not all of this happens on the central log host:

  • Log files of servers in infrastructure get collected, among them the various proxy hosts which let the outside world access mirrors.fedoraproject.org, each proxy has a corresponding log file for this web service.
  • A script (which? ⇒ Ansible) combines the mirror log files from the various proxies into a combined log file (/mnt/fedora_stats/combined-http/$YEAR/$MONTH/$DAY) with a delay of several days (for reasons, but which again?).
  • The log lines for hosts that specify to be counted are processed by countme-update-rawdb.sh (⇒ parse_access_log.py) into /var/lib/countme/raw.db.
  • The individual accesses are summarized into totals.db and totals.csv in the same place (by countme-update-totals.shcountme-totals.py). The CSV file is tracked in GIT.
  • Copies of the totals files are put into /var/www/html/csv-reports/countme.
@nphilipp nphilipp added documentation Improvements or additions to documentation contributing Improvements to contributing labels May 10, 2023
@nphilipp nphilipp added the maintenance Something outside of a full-scale project label Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributing Improvements to contributing documentation Improvements or additions to documentation maintenance Something outside of a full-scale project
Projects
No open projects
Development

No branches or pull requests

1 participant