-
Notifications
You must be signed in to change notification settings - Fork 677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow outputs (third preview) #5909
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
*/ | ||
void onWorkflowPublish(Object value){} | ||
void onWorkflowPublish(String name, Object value){} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pditommaso regarding this change to onWorkflowPublish
Aside from the fact that it's a preview feature, there is a bigger issue I wanted to raise about the TraceObserver -- these overloads that we add to be backwards compatible don't actually work
Even if we add something like this:
void onWorkflowPublish(Object value) {
onWorkflowPublish(null, value)
}
If I build a plugin with Nextflow 24.10 and try to use it with 25.04, it will fail with an error like this:
ERROR ~ Receiver class nextflow.validation.ValidationObserver does not define or inherit an implementation of the resolved method 'abstract void onWorkflowPublish(java.lang.String,java.lang.Object)' of interface nextflow.trace.TraceObserver.
This is because the custom trace observer gets compiled against the 24.10 version of TraceObserver and run against the 25.04 version at runtime, but it doesn't receive the new method overload. I think it's a limitation of the Java/Groovy runtime.
So there is no point in adding these extra overloads. Plugins built with an older Nextflow will break no matter what we do
Signed-off-by: Ben Sherman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I find the syntax improved compared to the previous iteration, I think this version diluting the original idea of decoupling output publishing for the final output structure by having an intermediate model represented by "publish target" that were defined both at process and sub-workflow level.
In this version this is essentially not possible any more, and the output needs to be wired channel by channel in the main workflow definition if I'm understanding correctly.
Is there as nf-core pipeline adopting this version as reference?
} | ||
} | ||
} | ||
``` | ||
|
||
The inner closure will be applied to each file in the channel value, in this case `sample.fastq_1` and `sample.fastq_2`. | ||
Each `>>` specifies a *source file* and *publish target*. The source file should be a file or collection of files, and the publish target should be a directory or file name. If the publish target ends with a slash, it is treated as the directory in which source files are published. Otherwise, it is treated as the target filename of a source file. Only files that are published with the `>>` operator are saved to the output directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The publish target was not the output e.g. samples
in this example? this looks to me more a "publish directory" or just "output directory"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the meaning of publish target in this iteration to match the overall syntactic/semantic changes
What was previously a "publish target" like samples
is now simply an "output" or "output declaration".
Publish target now refers to the right-hand side of a publish statement e.g. sample.fastq_1 >> 'fastq/'
Essentially I have moved the concept of "publish target" into the path directive.
Basically every real-world use case I've seen when talking to users requires this fine-grained level of publishing, because of how they like to organize their output directory
I've gone back and forth on this throughout each iteration. I was initially skeptical about making the user wire the outputs all the way to the top, because I thought it would be a ton of work for little value. Two things made me change my mind:
As a user, I should be able to see all of my outputs in one place and easily trace them to upstream sources, which I can't do if any subcomponent can "contribute" to an output. It is analogous to people using params anywhere in their pipeline, instead of only in the entry workflow. It does require writing a bit more code, but the improved readability is worth it. And I would point out that we spend much more time reading code than writing code, so we should design the language accordingly. There is still an intermediate model, but instead of "publish target" it is called simply an "output" or "output declaration". It has similar syntax/semantics to a parameter.
nf-core is reluctant to adopt any features as long as we keep them in "preview", which creates a minor chicken and egg problem for us 😅 I have updated my fetchngs PR to use this preview, so you can see how it would look there. |
This PR implements the third preview of the workflow output definition for Nextflow 25.04.
Changes are described in the docs, copied here for convenience:
The
publish:
section can only be specified in the entry workflow.Workflow outputs in the
publish:
section are assigned instead of using the>>
operator. The output name must be a valid identifier.By default, output files are published to the base output directory, rather than a subdirectory corresponding to the output name.
The syntax for dynamic publish paths has changed. Instead of defining a closure that returns a closure with the
path
directive, the outer closure should use the>>
operator to publish individual files.The
mapper
index directive has been removed. Use amap
operator in the workflow body instead.Changes not described in the docs:
onWorkflowPublish
trace event has been modified to include the output name. It is emitted once for each output, rather than once for each channel value, and it is emitted regardless of whether the index file is enabled.