Skip to content

Comments

NIFI-4907: add 'view provenance' component policy#2703

Closed
markobean wants to merge 7 commits intoapache:masterfrom
markobean:NIFI-4907
Closed

NIFI-4907: add 'view provenance' component policy#2703
markobean wants to merge 7 commits intoapache:masterfrom
markobean:NIFI-4907

Conversation

@markobean
Copy link
Contributor

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

@markobean
Copy link
Contributor Author

Fixed conflicts with master.
Added NIFI-5207 since it is from the same policy refactor and required minimal code change.

@mcgilman
Copy link
Contributor

Will review...

Copy link
Contributor

@mcgilman mcgilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @markobean! I did a first-pass on the code here. I haven't had a chance to run it and evaluate different scenario's functionally yet. I'll be doing that soon.

I know there was some discussion on the JIRA regarding what was part of the event model and whatnot, but since we're introducing these finer grain controls, I think it does also make sense to add the check prior to populating the component details. Thanks again.

----
nifi.security.identity.mapping.pattern.dn=^CN=(.*?), OU=(.*?), O=(.*?), L=(.*?), ST=(.*?), C=(.*?)$
nifi.security.identity.mapping.value.dn=$1@$2
nifi.security.identity.mapping.transform.dn=NONE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow, there was a bad rebase to master which removed some recently modified lines. Re-rebased to master.

nifi.security.identity.mapping.transform.dn=NONE
nifi.security.identity.mapping.pattern.kerb=^(.*?)/instance@(.*?)$
nifi.security.identity.mapping.value.kerb=$1@$2
nifi.security.identity.mapping.transform.kerb=NONE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow, there was a bad rebase to master which removed some recently modified lines. Re-rebased to master.


The last segment of each property is an identifier used to associate the pattern with the replacement value. When a user makes a request to NiFi, their identity is checked to see if it matches each of those patterns in lexicographical order. For the first one that matches, the replacement specified in the `nifi.security.identity.mapping.value.xxxx` property is used. So a login with `CN=localhost, OU=Apache NiFi, O=Apache, L=Santa Monica, ST=CA, C=US` matches the DN mapping pattern above and the DN mapping value `$1@$2` is applied. The user is normalized to `localhost@Apache NiFi`.

In addition to mapping a transform may be applied. The supported versions are NONE (no transform applied), LOWER (identity lowercased), and UPPER (identity uppercased). If not specified, the default value is NONE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow, there was a bad rebase to master which removed some recently modified lines. Re-rebased to master.

provenancePolicies.add(new RoleAccessPolicy(ResourceType.Provenance.getValue(), READ_ACTION));
if (rootGroupId != null) {
provenancePolicies.add(new RoleAccessPolicy(ResourceType.Data.getValue() + ResourceType.ProcessGroup.getValue() + "/" + rootGroupId, READ_ACTION));
provenancePolicies.add(new RoleAccessPolicy(ResourceType.ProvenanceData.getValue() + ResourceType.ProcessGroup.getValue() + "/" + rootGroupId, READ_ACTION));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to be consistent with our 0.x concept of provenance access, this should include both ProvenanceData and Data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Added Data back in.


The last segment of each property is an identifier used to associate the pattern with the replacement value. When a user makes a request to NiFi, their identity is checked to see if it matches each of those patterns in lexicographical order. For the first one that matches, the replacement specified in the `nifi.security.identity.mapping.value.xxxx` property is used. So a login with `CN=localhost, OU=Apache NiFi, O=Apache, L=Santa Monica, ST=CA, C=US` matches the DN mapping pattern above and the DN mapping value `$1@$2` is applied. The user is normalized to `localhost@Apache NiFi`.

In addition to mapping a transform may be applied. The supported versions are NONE (no transform applied), LOWER (identity lowercased), and UPPER (identity uppercased). If not specified, the default value is NONE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow, there was a bad rebase to master which removed some recently modified lines. Re-rebased to master.

LoggableComponent<Processor> processor;

// make sure the first reference to LogRepository happens outside of a NarCloseable so that we use the framework's ClassLoader
final LogRepository logRepository = LogRepositoryFactory.getRepository(id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can move this line. This needs to happen outside of the NarCloseable. Please refer to JIRA it was added for additional information. https://issues.apache.org/jira/browse/NIFI-5136

boolean creationSuccessful = true;

// make sure the first reference to LogRepository happens outside of a NarCloseable so that we use the framework's ClassLoader
final LogRepository logRepository = LogRepositoryFactory.getRepository(id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can move this line. This needs to happen outside of the NarCloseable. Please refer to JIRA it was added for additional information. https://issues.apache.org/jira/browse/NIFI-5136

@Override
public ControllerServiceNode createControllerService(final String type, final String id, final BundleCoordinate bundleCoordinate, final Set<URL> additionalUrls, final boolean firstTimeAdded) {
// make sure the first reference to LogRepository happens outside of a NarCloseable so that we use the framework's ClassLoader
final LogRepository logRepository = LogRepositoryFactory.getRepository(id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can move this line. This needs to happen outside of the NarCloseable. Please refer to JIRA it was added for additional information. https://issues.apache.org/jira/browse/NIFI-5136

dataAuthorizable = flowController.createLocalDataAuthorizable(event.getComponentId());
}
dataAuthorizable.authorize(authorizer, RequestAction.READ, NiFiUserUtils.getNiFiUser(), attributes);
// If not authorized for 'view the data', create only summarized provenance event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the event summaries are what's necessary to populate the table. However, even if the user does not have 'view the data' they can still open the event dialog. Shouldn't we be returning more than a summary? The event should include everything but the attributes and content fields. Piggybacking on the summarization concept could inadvertently change this if we ever change what comprises a summary (if we change the table for instance).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The summarized event does seem to exclude other details that do not fall under 'view the data' (i.e. attributes and content.) For example, event duration and parent/child UUIDs. It seems either more event details besides lineageStartDate need to be moved out of the "if (!summarized)" block, or... what else would you suggest? A new method to generate the ProvenanceEventDTO which explicitly excludes all attributes and content?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markobean The summary concept was introduced for performance reasons [1]. The summary represents the details required to render a row in the table. Some events can contain a lot of details (many children/parents UUIDs, flowfile attributes, etc) which was causing the table to load extremely slowly. The fully populated event (not summary) is returned once a dialog is opened and those details can be rendered.

My suggestion would be to not modify the summary concept. Returning more details in the summary for users with access to the event but not the data will begin to regress NIFI-1135. Artificially withholding event fields they should have access to also doesn't seem right.

Since we're moving to this super granular approach, I would recommend the following.

  1. createProvenanceEventDto(...) is only invoked once we know the user has permissions to the event.
  2. Within createProvenanceEventDto(...) I would check if the user is allowed to access that component to populate the component details. If the user does not have access, I would use the ID in place of the name and 'Processor' in place of the fully qualified class name (for Processors).
  3. Within createProvenanceEventDto(...) I would check if the user is allowed to access the component's data to populate the attributes and content details. If the user does not have access, I would leave those fields unset.

This should retain the summary concept while introducing the granular approach we're looking for. Thoughts?

[1] https://issues.apache.org/jira/browse/NIFI-1135

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern with the approach you outlined is the additional authorizations calls to determine "if the user is allowed". What you suggest requires up to 2 additional authorizations per provenance event. Already on busy systems, we have observed authorizing the user to each provenance event as a limiting factor (it can result in provenance becoming unusable).
Having said that, unless you think of another approach which would require fewer authorizations calls, I'll proceed as you recommend. I suspect there may be a future JIRA ticket to address the provenance query/authorization impact anyhow; if so, this can be addressed at that time. We won't know for sure if this is a problem until we get the current fix into an appropriately loaded test environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original JIRA called to make this more granular because using the data policies was too blunt. In the PR as-is, for each event it appears that we authorize the event and then authorize the data policies twice. We are authorizing the data policy to determine if we should summarize and then again to determine if replay is authorized. The replay portion is not changed/new in this PR but is an area for improvement we could make now.

Since we're taking this more granular approach I agree with your originally filed JIRA to add the additional component based check. This shouldn't introduce too much additional cost. The component checks do not consider flow file attributes and the results should be easily cached.

Another improvement that I didn't call out specifically above, is that we really only need to check the data policies if we are not summarizing. Whether the user is approved for data of a component would only be relevant if we were returning the fully populated event.

In order to return the summary, we only need to check the policies for the event and the component. Like the component policies, I don't think the flow file attributes would need to be considered for the event policies. I believe the attributes would only need to be considered for the data policies where we are actually returning the attributes and content. This should help with some of the performance concerns regarding frequent authorization.

}

// lineage duration
if (event.getLineageStartDate() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't piggyback off of summarization, I believe this can be moved back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lineage duration was pulled out specifically because there was a case in which the duration was not properly populated. This was during early testing and may now be corrected by other changes. I will try to replicate.

eventAuthorizable = resourceFactory.createLocalDataAuthorizable(event.getComponentId());
}
final Authorizable eventAuthorizable = resourceFactory.createProvenanceDataAuthorizable(event.getComponentId());
eventAuthorizable.authorize(authorizer, RequestAction.READ, user, event.getAttributes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the attributes are necessary here. I'm pretty sure the event attributes would be necessary for authorizing access to attributes/content.

eventAuthorizable = resourceFactory.createLocalDataAuthorizable(event.getComponentId());
}
final Authorizable eventAuthorizable = resourceFactory.createProvenanceDataAuthorizable(event.getComponentId());
eventAuthorizable.authorize(authorizer, RequestAction.READ, user, event.getAttributes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the attributes are necessary here. I'm pretty sure the event attributes would be necessary for authorizing access to attributes/content.

eventAuthorizable = resourceFactory.createLocalDataAuthorizable(event.getComponentId());
}
final Authorizable eventAuthorizable = resourceFactory.createProvenanceDataAuthorizable(event.getComponentId());
eventAuthorizable.authorize(authorizer, RequestAction.READ, user, event.getAttributes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the attributes are necessary here. I'm pretty sure the event attributes would be necessary for authorizing access to attributes/content.

eventAuthorizable = resourceFactory.createLocalDataAuthorizable(event.getComponentId());
}
final Authorizable eventAuthorizable = resourceFactory.createProvenanceDataAuthorizable(event.getComponentId());
eventAuthorizable.authorize(authorizer, RequestAction.READ, user, event.getAttributes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the attributes are necessary here. I'm pretty sure the event attributes would be necessary for authorizing access to attributes/content.

@markobean
Copy link
Contributor Author

When calling getEvent() from the provenance repository, the user is authorized for the event (including component level authorization). See ControllerFacade.java:1353. This getEvent() method call is prior to createProvenanceEventDto(). So, it would be redundant to authorize the user for the event inside createProvenanceEventDto() as any unauthorized events will have already been filtered out. The original approach was to exclude all events from a provenance query result for which the user is not authorized (e.g. the user is not in the 'view provenance' component level policy). Therefore, it should not be necessary to perform your point #2 above.

For point #3 and a slight refactor of authorizeReplay(), I've renamed it to authorizeData(). And, removed the duplicate authorization block from getProvenanceEvent(). Instead, the createProvenanceEventDto() will perform the data authorization prior to the if !summarize block. In this way, the event will need to be authorized for data access as well as not summarized in order for the dto to populate the attributes and content.

I also updated some authorization unit tests with more detailed expected results. And, rebased to master.

@mcgilman
Copy link
Contributor

mcgilman commented Jun 7, 2018

@markobean I just ran your PR and I'm not seeing the same behavior you are describing. Even without the component policy, I'm able to view the provenance event. This is the behavior I was expecting to see following the discussion on the JIRA. It's possible that we're using the same language to refer to different things. Let me try to elaborate/clarify a bit here.

/processors/1234 - component policy (controls access to a component and its config)
/provenance-data/processors/1234 - comopnent provenance event policy (controls access to the provenance events from a component)
/data/processors/1234 - component data policy (controls access to the data from a component including flowfile attributes)

The line you referenced should only verify access to the component provenance event (and it appears that's how it's working). It should not be checking the component policy. My suggestion was to additionally check the component policy prior to populating the component details (setComponentDetails). This would be in line with your initial comment on this JIRA.

With your most recent changes, I'm not sure its functionality is different than before. It seems that it would be impossible to get a non-summarized event without permissions to the data of the component. I think we only need to verify permissions to data of a component for the attributes and the content specific fields. Other fields should be ok, allowing for a non-summarized event for folks without access to a component's data.

It appears that checkAuthorizationForReplay was also verifying that the connection that would be replayed into still exists. This would affect the availability of the replay action. Also, while a little nit-picky, I would also suggest using the checkAuthorization... methods which return an AuthorizationResult instead of relying on an Exception during a non-exceptional case. The generation of the stack trace is an expensive operation.

Also, it does not seem like you updated or replied to my comment regarding the need to include the flowfile attributes when authorizing access to component's provenance events. I think these are only necessary when authorizing access to a component's data.

Please do not squash additional commits. It makes it difficult to review when I cannot easily see the incremental changes.

Thanks!

Copy link
Contributor

@mcgilman mcgilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior here is really close. The commented out code can be cleared up and I've outlined a few suggestions. Thanks again.

return new ArrayList<>(provenanceRepository.getEvents(firstEventId, maxRecords));
}

public AuthorizationResult checkConnectableAuthorization(final String componentId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This was moved to ControllerFacade.java. I will remove it from FlowController.java.

};
// try {
// AuthorizationResult result = flowController.checkConnectableAuthorization(event.getComponentId());
AuthorizationResult result = checkConnectableAuthorization(event.getComponentId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not check the authorization within setComponentDetails? In there you already have the components to authorize and you'll know the corresponding type.

if (Result.Denied.equals(result.getResult())) {
dto.setComponentType("Processor"); // is this always a Processor?
dto.setComponentName(dto.getComponentId());
dto.setEventType("UNKNOWN");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that we need to redact the event type when the user does not have permissions to the component policy? I would have considered this field under the new provenance event policy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we choose to not redact event type, that makes life easier. Currently, it displays "UNKNOWN" in the table (when 'view provenance' is enabled and 'view the component' is not). But, the event type IS diplayed in the lineage graph. We need to get to consistency one way or the other on this. I'm leaning towards allowing the event type info to be visible since this is a characteristic of provenance (i.e. 'view provenance') and not a characteristic of 'view the component'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. The event type should be controlled by the new provenance event policy. It is not controlled by the component policy that protects the component name and component type.


final SortedSet<AttributeDTO> attributes = new TreeSet<>(attributeComparator);
// authorizeData(event);
final AuthorizationResult dataResult = checkAuthorizationForData(event); //(authorizer, RequestAction.READ, user, event.getAttributes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to authorize for the data if the event is a non-summary. For instance, when we're pulling back 1000 summaries to load the provenance table we don't need to check any data policies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it appears that we're checking the checkAuthorizationForData is verifying READ to the data of the corresponding component. This check is already done as part of the checkAuthorizationForReplay method. It appears that is the only place the replay authorization check is performed. It likely makes sense to refactor some of this so that we're only checking permissions for READ to the data of the corresponding component once. The remainder of the replay authorization check only needs to be performed when we're populating the data fields (READ to the data of the corresponding component is approved). See below.

final Map<String, String> updatedAttrs = event.getUpdatedAttributes();
final Map<String, String> previousAttrs = event.getPreviousAttributes();
// only include all details if not summarizing and approved
if (!summarize && Result.Approved.equals(dataResult.getResult())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user is not authorized for the data of a component we should still be able to return a non-summary. In this case, we should just be leaving out any of the data fields in the ProvenanceEventDto. I would consider these fields data fields as they are associated with either attributes, content, or replay (all of which requires data policies to execute).

    private Collection<AttributeDTO> attributes;

    private Boolean contentEqual;
    private Boolean inputContentAvailable;
    private String inputContentClaimSection;
    private String inputContentClaimContainer;
    private String inputContentClaimIdentifier;
    private Long inputContentClaimOffset;
    private String inputContentClaimFileSize;
    private Long inputContentClaimFileSizeBytes;
    private Boolean outputContentAvailable;
    private String outputContentClaimSection;
    private String outputContentClaimContainer;
    private String outputContentClaimIdentifier;
    private Long outputContentClaimOffset;
    private String outputContentClaimFileSize;
    private Long outputContentClaimFileSizeBytes;

    private Boolean replayAvailable;
    private String replayExplanation;
    private String sourceConnectionIdentifier;

return dataAuthorizable.checkAuthorization(authorizer, RequestAction.READ, user, eventAttributes);
}

private AuthorizationResult checkAuthorizationForProvenanceData(final ProvenanceEventRecord event) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified this method and checkConnectableAuthorization() to accomodate a Process Group being the event component. This is the case for DOWNLOAD provenance events.

@markobean
Copy link
Contributor Author

I believe the latest changes are demonstrating the intended functionality. Provenance events are only listed in the query results if the user has 'view provenance' on the corresponding component; flowfile content in the event details is only visible based on 'view the data' policy; component name and type is only visible based on 'view the component' policy (replaced with generic info such as UUID in place of name when policy is lacking.) I'll do some further testing today.

if (rootGroup.getIdentifier().equals(componentId)) {
return rootGroup.checkAuthorization(authorizer, RequestAction.READ, user);
}
Connectable connectable = rootGroup.findLocalConnectable(componentId);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will findLocalConnectable() versus findProcessor() include connections as well? If so, then this should return to findProcessor() to account for connections and subsequently finding the connection's source component.

@mcgilman
Copy link
Contributor

Things are looking pretty good. I'd like to propose a few additional changes which I'm implemented here [1]. Please review them and let me know your thoughts. Thanks!

[1] mcgilman@eed1be3

@markobean
Copy link
Contributor Author

markobean commented Jun 14, 2018

I like the proposed changes. It makes the authorization process a bit cleaner. I ran through several of the same tests I performed previously and confirmed its functionality.
+1

@mcgilman
Copy link
Contributor

Thanks for having a look. I'll include these when I merge in your changes.

mcgilman added a commit to mcgilman/nifi that referenced this pull request Jun 14, 2018
- Minor adjustments following PR.
- Avoiding additional find operation when authorizing components when populating component details.
- Requiring access to provenance events when downloading content or submitting a replay as they may provide events details.
- Updating the REST API docs detailing the required permissions.
- Updating the wording in the documentation regarding the provenance and data policies.
- Removed the event attributes from the authorization calls that were verifying access to provenance events.
- Only checking content availability when the user is authorized for the components data.
- Addressing typo in JavaDoc.

This closes apache#2703
@mcgilman
Copy link
Contributor

Thanks @markobean! This has been merged to master.

@asfgit asfgit closed this in fe31a06 Jun 14, 2018
@markobean markobean deleted the NIFI-4907 branch October 12, 2019 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants