Skip to content

Fix some number ranges were wrongly extracted as datetime ranges#1033

Closed
WujiaShi wants to merge 5 commits intomicrosoft:masterfrom
WujiaShi:Fix21385
Closed

Fix some number ranges were wrongly extracted as datetime ranges#1033
WujiaShi wants to merge 5 commits intomicrosoft:masterfrom
WujiaShi:Fix21385

Conversation

@WujiaShi
Copy link
Contributor

@WujiaShi WujiaShi commented Dec 5, 2018

Fix some number ranges were wrongly extracted as datetime ranges, like "one may", "twenty-one", "two to four". This issue is due to CenturyRegex has little constraints.

In order to filter out those cases, a new regex "CenturyPhraseEndRegex" was added. The mechanism behind this is if a simple number ( eg. twenty) in the middle of sentences, we consider it is possible to be a datetime range.

However, when it at the end of a phrase, it must be followed by another number (eg. thousand, hundred, etc) and cannot be recognized as a datetime range by itself.

# Conflicts:
#	.NET/Microsoft.Recognizers.Definitions/English/DateTimeDefinitions.cs
#	Patterns/English/English-DateTime.yaml
#	Python/libraries/recognizers-date-time/recognizers_date_time/resources/english_date_time.py
#	Specs/DateTime/English/DateTimeModel.json
#	Specs/DateTime/English/DateTimeModelComplexCalendar.json
#	Specs/DateTime/English/DateTimeModelExperimentalMode.json
@WujiaShi WujiaShi changed the title Fix21385 Fix some number ranges were wrongly extracted as datetime ranges Dec 5, 2018
export const ImplicitDayRegex = `(the\\s*)?(?<day>10th|11th|11st|12nd|12th|13rd|13th|14th|15th|16th|17th|18th|19th|1st|20th|21st|21th|22nd|22th|23rd|23th|24th|25th|26th|27th|28th|29th|2nd|30th|31st|3rd|4th|5th|6th|7th|8th|9th)\\b`;
export const MonthNumRegex = `(?<month>01|02|03|04|05|06|07|08|09|10|11|12|1|2|3|4|5|6|7|8|9)\\b`;
export const CenturyRegex = `\\b(?<century>((one|two)\\s+thousand(\\s+and)?(\\s+(one|two|three|four|five|six|seven|eight|nine)\\s+hundred(\\s+and)?)?)|((twenty one|twenty two|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|eighteen|nineteen|twenty)(\\s+hundred)?(\\s+and)?))\\b`;
export const CenturyPhraseEndRegex = `\\b(?<century>((one|two)\\s+thousand(\\s+and)?(\\s+(one|two|three|four|five|six|seven|eight|nine)\\s+hundred(\\s+and)?)?)|((twenty one|twenty two|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|eighteen|nineteen|twenty)(\\s+hundred)+(\\s+and)?))\\b`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplication of regex seems strange. It would be better to re-write or break down CenturyRegex and FullTextYearRegex.

"ReferenceDateTime": "2018-12-05T12:00:00"
},
"NotSupportedByDesign": "java",
"Results":[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. "May" here should not be considered a daterange. If we want to add a spec, it should be for the correct behaviour.
So here there are two alternatives. Either we can keep this spec, but the result is [] and mark it as NotSupported for now. Or the PR should fix the "may" false positive and also the spec expected result.

@tellarin
Copy link
Collaborator

tellarin commented Dec 7, 2018

I see you've pushed a new version as #1044. I'll close this one then.
In future PRs, please push changes/fixes to the same PR.
Thanks.

@tellarin tellarin closed this Dec 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants