NIFI-4907: add 'view provenance' component policy#2703
NIFI-4907: add 'view provenance' component policy#2703markobean wants to merge 7 commits intoapache:masterfrom
Conversation
|
Fixed conflicts with master. |
|
Will review... |
mcgilman
left a comment
There was a problem hiding this comment.
Thanks for the PR @markobean! I did a first-pass on the code here. I haven't had a chance to run it and evaluate different scenario's functionally yet. I'll be doing that soon.
I know there was some discussion on the JIRA regarding what was part of the event model and whatnot, but since we're introducing these finer grain controls, I think it does also make sense to add the check prior to populating the component details. Thanks again.
| ---- | ||
| nifi.security.identity.mapping.pattern.dn=^CN=(.*?), OU=(.*?), O=(.*?), L=(.*?), ST=(.*?), C=(.*?)$ | ||
| nifi.security.identity.mapping.value.dn=$1@$2 | ||
| nifi.security.identity.mapping.transform.dn=NONE |
There was a problem hiding this comment.
Did you intend to remove this?
There was a problem hiding this comment.
Somehow, there was a bad rebase to master which removed some recently modified lines. Re-rebased to master.
| nifi.security.identity.mapping.transform.dn=NONE | ||
| nifi.security.identity.mapping.pattern.kerb=^(.*?)/instance@(.*?)$ | ||
| nifi.security.identity.mapping.value.kerb=$1@$2 | ||
| nifi.security.identity.mapping.transform.kerb=NONE |
There was a problem hiding this comment.
Did you intend to remove this?
There was a problem hiding this comment.
Somehow, there was a bad rebase to master which removed some recently modified lines. Re-rebased to master.
|
|
||
| The last segment of each property is an identifier used to associate the pattern with the replacement value. When a user makes a request to NiFi, their identity is checked to see if it matches each of those patterns in lexicographical order. For the first one that matches, the replacement specified in the `nifi.security.identity.mapping.value.xxxx` property is used. So a login with `CN=localhost, OU=Apache NiFi, O=Apache, L=Santa Monica, ST=CA, C=US` matches the DN mapping pattern above and the DN mapping value `$1@$2` is applied. The user is normalized to `localhost@Apache NiFi`. | ||
|
|
||
| In addition to mapping a transform may be applied. The supported versions are NONE (no transform applied), LOWER (identity lowercased), and UPPER (identity uppercased). If not specified, the default value is NONE. |
There was a problem hiding this comment.
Did you intend to remove this?
There was a problem hiding this comment.
Somehow, there was a bad rebase to master which removed some recently modified lines. Re-rebased to master.
| provenancePolicies.add(new RoleAccessPolicy(ResourceType.Provenance.getValue(), READ_ACTION)); | ||
| if (rootGroupId != null) { | ||
| provenancePolicies.add(new RoleAccessPolicy(ResourceType.Data.getValue() + ResourceType.ProcessGroup.getValue() + "/" + rootGroupId, READ_ACTION)); | ||
| provenancePolicies.add(new RoleAccessPolicy(ResourceType.ProvenanceData.getValue() + ResourceType.ProcessGroup.getValue() + "/" + rootGroupId, READ_ACTION)); |
There was a problem hiding this comment.
In order to be consistent with our 0.x concept of provenance access, this should include both ProvenanceData and Data.
There was a problem hiding this comment.
Agree. Added Data back in.
|
|
||
| The last segment of each property is an identifier used to associate the pattern with the replacement value. When a user makes a request to NiFi, their identity is checked to see if it matches each of those patterns in lexicographical order. For the first one that matches, the replacement specified in the `nifi.security.identity.mapping.value.xxxx` property is used. So a login with `CN=localhost, OU=Apache NiFi, O=Apache, L=Santa Monica, ST=CA, C=US` matches the DN mapping pattern above and the DN mapping value `$1@$2` is applied. The user is normalized to `localhost@Apache NiFi`. | ||
|
|
||
| In addition to mapping a transform may be applied. The supported versions are NONE (no transform applied), LOWER (identity lowercased), and UPPER (identity uppercased). If not specified, the default value is NONE. |
There was a problem hiding this comment.
Did you intend to remove this?
There was a problem hiding this comment.
Somehow, there was a bad rebase to master which removed some recently modified lines. Re-rebased to master.
| LoggableComponent<Processor> processor; | ||
|
|
||
| // make sure the first reference to LogRepository happens outside of a NarCloseable so that we use the framework's ClassLoader | ||
| final LogRepository logRepository = LogRepositoryFactory.getRepository(id); |
There was a problem hiding this comment.
I don't think we can move this line. This needs to happen outside of the NarCloseable. Please refer to JIRA it was added for additional information. https://issues.apache.org/jira/browse/NIFI-5136
| boolean creationSuccessful = true; | ||
|
|
||
| // make sure the first reference to LogRepository happens outside of a NarCloseable so that we use the framework's ClassLoader | ||
| final LogRepository logRepository = LogRepositoryFactory.getRepository(id); |
There was a problem hiding this comment.
I don't think we can move this line. This needs to happen outside of the NarCloseable. Please refer to JIRA it was added for additional information. https://issues.apache.org/jira/browse/NIFI-5136
| @Override | ||
| public ControllerServiceNode createControllerService(final String type, final String id, final BundleCoordinate bundleCoordinate, final Set<URL> additionalUrls, final boolean firstTimeAdded) { | ||
| // make sure the first reference to LogRepository happens outside of a NarCloseable so that we use the framework's ClassLoader | ||
| final LogRepository logRepository = LogRepositoryFactory.getRepository(id); |
There was a problem hiding this comment.
I don't think we can move this line. This needs to happen outside of the NarCloseable. Please refer to JIRA it was added for additional information. https://issues.apache.org/jira/browse/NIFI-5136
| dataAuthorizable = flowController.createLocalDataAuthorizable(event.getComponentId()); | ||
| } | ||
| dataAuthorizable.authorize(authorizer, RequestAction.READ, NiFiUserUtils.getNiFiUser(), attributes); | ||
| // If not authorized for 'view the data', create only summarized provenance event |
There was a problem hiding this comment.
I believe the event summaries are what's necessary to populate the table. However, even if the user does not have 'view the data' they can still open the event dialog. Shouldn't we be returning more than a summary? The event should include everything but the attributes and content fields. Piggybacking on the summarization concept could inadvertently change this if we ever change what comprises a summary (if we change the table for instance).
There was a problem hiding this comment.
The summarized event does seem to exclude other details that do not fall under 'view the data' (i.e. attributes and content.) For example, event duration and parent/child UUIDs. It seems either more event details besides lineageStartDate need to be moved out of the "if (!summarized)" block, or... what else would you suggest? A new method to generate the ProvenanceEventDTO which explicitly excludes all attributes and content?
There was a problem hiding this comment.
@markobean The summary concept was introduced for performance reasons [1]. The summary represents the details required to render a row in the table. Some events can contain a lot of details (many children/parents UUIDs, flowfile attributes, etc) which was causing the table to load extremely slowly. The fully populated event (not summary) is returned once a dialog is opened and those details can be rendered.
My suggestion would be to not modify the summary concept. Returning more details in the summary for users with access to the event but not the data will begin to regress NIFI-1135. Artificially withholding event fields they should have access to also doesn't seem right.
Since we're moving to this super granular approach, I would recommend the following.
createProvenanceEventDto(...)is only invoked once we know the user has permissions to the event.- Within
createProvenanceEventDto(...)I would check if the user is allowed to access that component to populate the component details. If the user does not have access, I would use the ID in place of the name and 'Processor' in place of the fully qualified class name (for Processors). - Within
createProvenanceEventDto(...)I would check if the user is allowed to access the component's data to populate the attributes and content details. If the user does not have access, I would leave those fields unset.
This should retain the summary concept while introducing the granular approach we're looking for. Thoughts?
[1] https://issues.apache.org/jira/browse/NIFI-1135
There was a problem hiding this comment.
My only concern with the approach you outlined is the additional authorizations calls to determine "if the user is allowed". What you suggest requires up to 2 additional authorizations per provenance event. Already on busy systems, we have observed authorizing the user to each provenance event as a limiting factor (it can result in provenance becoming unusable).
Having said that, unless you think of another approach which would require fewer authorizations calls, I'll proceed as you recommend. I suspect there may be a future JIRA ticket to address the provenance query/authorization impact anyhow; if so, this can be addressed at that time. We won't know for sure if this is a problem until we get the current fix into an appropriately loaded test environment.
There was a problem hiding this comment.
The original JIRA called to make this more granular because using the data policies was too blunt. In the PR as-is, for each event it appears that we authorize the event and then authorize the data policies twice. We are authorizing the data policy to determine if we should summarize and then again to determine if replay is authorized. The replay portion is not changed/new in this PR but is an area for improvement we could make now.
Since we're taking this more granular approach I agree with your originally filed JIRA to add the additional component based check. This shouldn't introduce too much additional cost. The component checks do not consider flow file attributes and the results should be easily cached.
Another improvement that I didn't call out specifically above, is that we really only need to check the data policies if we are not summarizing. Whether the user is approved for data of a component would only be relevant if we were returning the fully populated event.
In order to return the summary, we only need to check the policies for the event and the component. Like the component policies, I don't think the flow file attributes would need to be considered for the event policies. I believe the attributes would only need to be considered for the data policies where we are actually returning the attributes and content. This should help with some of the performance concerns regarding frequent authorization.
| } | ||
|
|
||
| // lineage duration | ||
| if (event.getLineageStartDate() > 0) { |
There was a problem hiding this comment.
If we don't piggyback off of summarization, I believe this can be moved back.
There was a problem hiding this comment.
lineage duration was pulled out specifically because there was a case in which the duration was not properly populated. This was during early testing and may now be corrected by other changes. I will try to replicate.
| eventAuthorizable = resourceFactory.createLocalDataAuthorizable(event.getComponentId()); | ||
| } | ||
| final Authorizable eventAuthorizable = resourceFactory.createProvenanceDataAuthorizable(event.getComponentId()); | ||
| eventAuthorizable.authorize(authorizer, RequestAction.READ, user, event.getAttributes()); |
There was a problem hiding this comment.
I don't think the attributes are necessary here. I'm pretty sure the event attributes would be necessary for authorizing access to attributes/content.
| eventAuthorizable = resourceFactory.createLocalDataAuthorizable(event.getComponentId()); | ||
| } | ||
| final Authorizable eventAuthorizable = resourceFactory.createProvenanceDataAuthorizable(event.getComponentId()); | ||
| eventAuthorizable.authorize(authorizer, RequestAction.READ, user, event.getAttributes()); |
There was a problem hiding this comment.
I don't think the attributes are necessary here. I'm pretty sure the event attributes would be necessary for authorizing access to attributes/content.
| eventAuthorizable = resourceFactory.createLocalDataAuthorizable(event.getComponentId()); | ||
| } | ||
| final Authorizable eventAuthorizable = resourceFactory.createProvenanceDataAuthorizable(event.getComponentId()); | ||
| eventAuthorizable.authorize(authorizer, RequestAction.READ, user, event.getAttributes()); |
There was a problem hiding this comment.
I don't think the attributes are necessary here. I'm pretty sure the event attributes would be necessary for authorizing access to attributes/content.
| eventAuthorizable = resourceFactory.createLocalDataAuthorizable(event.getComponentId()); | ||
| } | ||
| final Authorizable eventAuthorizable = resourceFactory.createProvenanceDataAuthorizable(event.getComponentId()); | ||
| eventAuthorizable.authorize(authorizer, RequestAction.READ, user, event.getAttributes()); |
There was a problem hiding this comment.
I don't think the attributes are necessary here. I'm pretty sure the event attributes would be necessary for authorizing access to attributes/content.
|
When calling getEvent() from the provenance repository, the user is authorized for the event (including component level authorization). See ControllerFacade.java:1353. This getEvent() method call is prior to createProvenanceEventDto(). So, it would be redundant to authorize the user for the event inside createProvenanceEventDto() as any unauthorized events will have already been filtered out. The original approach was to exclude all events from a provenance query result for which the user is not authorized (e.g. the user is not in the 'view provenance' component level policy). Therefore, it should not be necessary to perform your point #2 above. For point #3 and a slight refactor of authorizeReplay(), I've renamed it to authorizeData(). And, removed the duplicate authorization block from getProvenanceEvent(). Instead, the createProvenanceEventDto() will perform the data authorization prior to the if !summarize block. In this way, the event will need to be authorized for data access as well as not summarized in order for the dto to populate the attributes and content. I also updated some authorization unit tests with more detailed expected results. And, rebased to master. |
|
@markobean I just ran your PR and I'm not seeing the same behavior you are describing. Even without the component policy, I'm able to view the provenance event. This is the behavior I was expecting to see following the discussion on the JIRA. It's possible that we're using the same language to refer to different things. Let me try to elaborate/clarify a bit here. /processors/1234 - component policy (controls access to a component and its config) The line you referenced should only verify access to the component provenance event (and it appears that's how it's working). It should not be checking the component policy. My suggestion was to additionally check the component policy prior to populating the component details ( With your most recent changes, I'm not sure its functionality is different than before. It seems that it would be impossible to get a non-summarized event without permissions to the data of the component. I think we only need to verify permissions to data of a component for the attributes and the content specific fields. Other fields should be ok, allowing for a non-summarized event for folks without access to a component's data. It appears that Also, it does not seem like you updated or replied to my comment regarding the need to include the flowfile attributes when authorizing access to component's provenance events. I think these are only necessary when authorizing access to a component's data. Please do not squash additional commits. It makes it difficult to review when I cannot easily see the incremental changes. Thanks! |
mcgilman
left a comment
There was a problem hiding this comment.
The behavior here is really close. The commented out code can be cleared up and I've outlined a few suggestions. Thanks again.
| return new ArrayList<>(provenanceRepository.getEvents(firstEventId, maxRecords)); | ||
| } | ||
|
|
||
| public AuthorizationResult checkConnectableAuthorization(final String componentId) { |
There was a problem hiding this comment.
I don't believe this is called.
There was a problem hiding this comment.
Correct. This was moved to ControllerFacade.java. I will remove it from FlowController.java.
| }; | ||
| // try { | ||
| // AuthorizationResult result = flowController.checkConnectableAuthorization(event.getComponentId()); | ||
| AuthorizationResult result = checkConnectableAuthorization(event.getComponentId()); |
There was a problem hiding this comment.
Why not check the authorization within setComponentDetails? In there you already have the components to authorize and you'll know the corresponding type.
| if (Result.Denied.equals(result.getResult())) { | ||
| dto.setComponentType("Processor"); // is this always a Processor? | ||
| dto.setComponentName(dto.getComponentId()); | ||
| dto.setEventType("UNKNOWN"); |
There was a problem hiding this comment.
Do you think that we need to redact the event type when the user does not have permissions to the component policy? I would have considered this field under the new provenance event policy.
There was a problem hiding this comment.
If we choose to not redact event type, that makes life easier. Currently, it displays "UNKNOWN" in the table (when 'view provenance' is enabled and 'view the component' is not). But, the event type IS diplayed in the lineage graph. We need to get to consistency one way or the other on this. I'm leaning towards allowing the event type info to be visible since this is a characteristic of provenance (i.e. 'view provenance') and not a characteristic of 'view the component'.
There was a problem hiding this comment.
Yes, I agree. The event type should be controlled by the new provenance event policy. It is not controlled by the component policy that protects the component name and component type.
|
|
||
| final SortedSet<AttributeDTO> attributes = new TreeSet<>(attributeComparator); | ||
| // authorizeData(event); | ||
| final AuthorizationResult dataResult = checkAuthorizationForData(event); //(authorizer, RequestAction.READ, user, event.getAttributes()); |
There was a problem hiding this comment.
We only need to authorize for the data if the event is a non-summary. For instance, when we're pulling back 1000 summaries to load the provenance table we don't need to check any data policies.
There was a problem hiding this comment.
Also, it appears that we're checking the checkAuthorizationForData is verifying READ to the data of the corresponding component. This check is already done as part of the checkAuthorizationForReplay method. It appears that is the only place the replay authorization check is performed. It likely makes sense to refactor some of this so that we're only checking permissions for READ to the data of the corresponding component once. The remainder of the replay authorization check only needs to be performed when we're populating the data fields (READ to the data of the corresponding component is approved). See below.
| final Map<String, String> updatedAttrs = event.getUpdatedAttributes(); | ||
| final Map<String, String> previousAttrs = event.getPreviousAttributes(); | ||
| // only include all details if not summarizing and approved | ||
| if (!summarize && Result.Approved.equals(dataResult.getResult())) { |
There was a problem hiding this comment.
If the user is not authorized for the data of a component we should still be able to return a non-summary. In this case, we should just be leaving out any of the data fields in the ProvenanceEventDto. I would consider these fields data fields as they are associated with either attributes, content, or replay (all of which requires data policies to execute).
private Collection<AttributeDTO> attributes;
private Boolean contentEqual;
private Boolean inputContentAvailable;
private String inputContentClaimSection;
private String inputContentClaimContainer;
private String inputContentClaimIdentifier;
private Long inputContentClaimOffset;
private String inputContentClaimFileSize;
private Long inputContentClaimFileSizeBytes;
private Boolean outputContentAvailable;
private String outputContentClaimSection;
private String outputContentClaimContainer;
private String outputContentClaimIdentifier;
private Long outputContentClaimOffset;
private String outputContentClaimFileSize;
private Long outputContentClaimFileSizeBytes;
private Boolean replayAvailable;
private String replayExplanation;
private String sourceConnectionIdentifier;
| return dataAuthorizable.checkAuthorization(authorizer, RequestAction.READ, user, eventAttributes); | ||
| } | ||
|
|
||
| private AuthorizationResult checkAuthorizationForProvenanceData(final ProvenanceEventRecord event) { |
There was a problem hiding this comment.
I modified this method and checkConnectableAuthorization() to accomodate a Process Group being the event component. This is the case for DOWNLOAD provenance events.
|
I believe the latest changes are demonstrating the intended functionality. Provenance events are only listed in the query results if the user has 'view provenance' on the corresponding component; flowfile content in the event details is only visible based on 'view the data' policy; component name and type is only visible based on 'view the component' policy (replaced with generic info such as UUID in place of name when policy is lacking.) I'll do some further testing today. |
| if (rootGroup.getIdentifier().equals(componentId)) { | ||
| return rootGroup.checkAuthorization(authorizer, RequestAction.READ, user); | ||
| } | ||
| Connectable connectable = rootGroup.findLocalConnectable(componentId); |
There was a problem hiding this comment.
Will findLocalConnectable() versus findProcessor() include connections as well? If so, then this should return to findProcessor() to account for connections and subsequently finding the connection's source component.
|
Things are looking pretty good. I'd like to propose a few additional changes which I'm implemented here [1]. Please review them and let me know your thoughts. Thanks! [1] mcgilman@eed1be3 |
|
I like the proposed changes. It makes the authorization process a bit cleaner. I ran through several of the same tests I performed previously and confirmed its functionality. |
|
Thanks for having a look. I'll include these when I merge in your changes. |
- Minor adjustments following PR. - Avoiding additional find operation when authorizing components when populating component details. - Requiring access to provenance events when downloading content or submitting a replay as they may provide events details. - Updating the REST API docs detailing the required permissions. - Updating the wording in the documentation regarding the provenance and data policies. - Removed the event attributes from the authorization calls that were verifying access to provenance events. - Only checking content availability when the user is authorized for the components data. - Addressing typo in JavaDoc. This closes apache#2703
|
Thanks @markobean! This has been merged to master. |
Thank you for submitting a contribution to Apache NiFi.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.