Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legifrance: fix error on import & parse more fields #2995

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

Prometheos2
Copy link

@Prometheos2 Prometheos2 commented Mar 15, 2023

Hello,

There was an error while parsing statutes from Légifrance (e.g., this page or this one)
It was also mentioned on the forum: https://forums.zotero.org/discussion/97635/probleme-denregistrement-sur-le-site-legifrance (in French)

It was due to a layout(?) change, and the title was no longer available at the previous XPath.
I changed the Xpath and added a few modifications, like adding missing var declarations.

I'm not sure I will implement all changes I intended; it might be better to push the bug fixes earlier (and uni is more urgent).
I'm mostly making the PR draft right now to get some feedback, the linter output (doesn't work on my machine), and let you push as soon as you wish to do so.

I would be glad to have some feedback, as I'm fairly new to JS.

List of changes

Bug fixes

  • Fix Translation failed: TypeError: title is null errors on statutes
  • Add dates on articles
  • Add URLs
  • Put law statutes' code number in the proper category

Feature changes

  • Add multiple selection for Codes (e.g., this page)
  • Add more fields?
    • fr-FR language
    • section
    • session
    • history
    • official code number?
    • decrees' data (i.e., NOR, JOR, text number)
    • original date, dependent on the page type
  • Download web snapshots
  • Download article text when available?

Code/lint changes

  • add missing var declaration
  • replace the deprecated processDocuments
  • missing 'url' in scrapecase's signature / abnormal 'url' input from doWeb
  • Run lint and apply changes

Test changes

@Prometheos2 Prometheos2 marked this pull request as ready for review March 15, 2023 13:55
@Prometheos2
Copy link
Author

I'm getting a "no header" error on all linter executions, regardless of the file.
The regex seems good, according to regex101.com

@adam3smith
Copy link
Collaborator

IIRC no header warnings are typically warnings due to Windows line breaks. The linter only works with LF. As you can tell from the automated lint, you definitely want to run this through lint --fix

@adam3smith
Copy link
Collaborator

No, those latest changes don't make any sense -- we don't want all those empty lines in the code. Are you really not able to get lint --fix to work? Doing this manually is a complete waste of time.

@Prometheos2
Copy link
Author

Are you really not able to get lint --fix to work?

No, I tried switching to LF EOL, but it didn't fix the problem with the linter.
I tried making an .eslintrc file with the same rules, minus the blocking Zotero-plugin ones, but it resulted in those blanks.

(Sorry for not answering to your earlier message, I thought I should I should try solutions before coming back to you)

@adam3smith
Copy link
Collaborator

OK, let me run this through the linter once here

@adam3smith
Copy link
Collaborator

OK, I think that should pass the CI. I haven't looked at the code changes proper at all, though, so those still require review

@Prometheos2
Copy link
Author

Thank you for the help
Turns out I misconfigured brace-style and padded-blocks

Is there a file reference you would recommend for comments and code style?
I kept the original's comments in French, but it may be better in English with a proper document style

Prometheos2 and others added 18 commits March 20, 2023 23:11
to revert?
(done by VSC's autoformat)
– Auto-update of tags for each tests
– Pinpoint missing items:
–– attachments (website's fault),
–– libraryCatalog (unused by case and statute),
–– accessDate (handled by Zotero?)
Update XPATH
Update regex
Reconstruct nameOfAct to fit previous naming
Update and add related tests
The origdate Xpath depends on the code; requires more study before implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants