Conversation
|
Hey @gty929 ! This looks interesting! I'm still not sure what problem this is intended to solve, though. Perhaps you could give an example? (I presume that you don't actually need to write a script that prints |
|
Hi, @bitfield ! I used the example above just to show that the streaming is working. I think there have been quite a few discussions in issues #34, #59, and #78 about the use cases, and why we should not leave out the streaming functionality. E.g., as @besi1z and @xxxserxxx mentioned, it would be great if the output of The second reason is for time and memory efficiency, as @posener and @kepkin mentioned. By making the pipeline asynchronous, the output from one stage can be consumed by the next stage in time. For example, to find all the TODO's in 10,000 files, it's really not necessary to read all the files into memory. By the way, I just rewrote script.Exec("ping -c 100 www.google.com").Stdout()If the efforts are worthy, I will write more integration tests for the streaming feature. |
Indeed, and we don't! If you look at the example script.Stdin().Match(os.Args[1]).Stdout()The Regarding your other point, if you want to run a command like |
|
I noticed that there is already a "find TODO" example in the library. The program prints the filename and the line number before each line, so the implementation a bit more complex than the func main() {
listPath := "."
if len(os.Args) > 1 {
listPath = os.Args[1]
}
// filter hidden directories and files
filterFiles := regexp.MustCompile(`^\..*|/\.`)
files := script.FindFiles(listPath).RejectRegexp(filterFiles)
content := files.EachLine(func(filePath string, builderFile *strings.Builder) {
p := script.File(filePath)
lineNumber := 1
p.EachLine(func(str string, build *strings.Builder) {
if strings.Contains(str, "todo") {
builderFile.WriteString(fmt.Sprintf("%s:%d %s \n", filePath, lineNumber, strings.TrimSpace(str)))
}
lineNumber++
})
})
content.Stdout()
}Even though the program does not read in the file contents all at once, it still accumulates all the outputs before sending them to Stdout, which I believe is not ideal. For example, a user may be running a search program like this on a large project or file system, and s/he may lose patience if no result gets printed for a long time. What's worse, if the user makes a mistake, e.g., writing a wrong regex expression in the I just wrote another example for my proposed stream functionality. For multi-threaded programs, developers usually need to run a test suite many times to detect concurrency bugs. With the // This program runs tests on the script library 50 times and prints out the progress nicely.
func main() {
round, step := 10, 5
progressInfo := make([]string, round)
for i := 0; i < round; i++ {
progressInfo[i] = fmt.Sprintf("------ Done %v / %v ------", (i+1)*step, round*step)
}
cmd := fmt.Sprintf("bash -c 'go test -count %v github.com/bitfield/script; echo {{.}}'", step)
// with Stream(), the program can print to stdout in real time
script.Slice(progressInfo).Stream().ExecForEach(cmd).Stdout()
}Here's a sample output, which is printed out line by line: Obviously, we cannot write a program like this in Script without the streaming feature. I agree that we should avoid unnecessary complexity, but I do think that the task is worth the effort. As I mentioned above, there have been at least four other people who have demanded this feature or have tried to implement it on their own. To make the program backward compatible, we just need to add two more public functions: a From the developers' perspective, the cost of this change is almost one-off. One need not worry about streaming when writing source and sink functions. For filter functions, no change is necessary as well if it returns |
|
Yes, this sounds good! Why would we need |
|
If we use streaming by default, there's no way to make everything backward compatible. First, by the definition of streaming, a function cannot perform a full error check on its previous stage before it starts executing. The best assurance one can obtain is that each stage will set the error field before it closes the writer of Second, there's no way for a filter function to know whether it is the last stage in a pipeline. Ideally, a user should end a pipeline with a sink function (which always synchronizes the pipeline), but it's not always the case. For example, a user may call the Finally, as @posener mentioned in issue #34, the behavior of I guess adding these two functions is not a heavy burden to the library. After all, it's better than creating a new project called 'script-async' or whatever. If a user really wants streaming to be set as default, s/he can fork the library and change the initial value of the asynchronous flag in the NewPipe() function. |
|
@gty929 as for another project you suggests. I already created https://github.com/posener/script |
|
Yes, I see what you mean. Streaming is fundamentally a different model from the 'error-safe reader' which I think you've convinced me that the cost of adding this functionality would just be too high; closing accordingly, with thanks for the work you've put in to prove the concept. @posener, it's quite up to you of course, but it might be worth thinking about giving your library a different name, since it is effectively a different library now, rather than a modified fork of this one. |
This PR is an attempt to tackle issue #34. I used io.Pipe() in the
EachLine()to implement streaming, which is simple and efficient.As an example, the following program outputs 'toc' five times, with one second time interval:
Unluckily, the program is not backward compatible. A user must append a Sink method at the end of the pipeline if s/he wants to wait until the program finishes. For this reason, I added a Wait() method, and slightly modified the test for
ExecForEach(). (The code passed all other tests.) Another problem is that thestrings.BuilderinEachLine()cannot be reset, since the stream is append only.Currently, Error() and ExitStatus() return the intermediate status of a running pipe, since they are not Sink methods.
Probably, one solution is to add a bool flag in the
Pipestruct to enable/disable streaming. (Disabled by default & use a Stream() source method to enable it). Should I go ahead?