I have a program that does some heavy processing (ML algorithm) and writes lots (read GBs of plain text) of data to standard output. In some particular scenarios, I only require a tiny portion of the output, however right now I am saving a (huge) text file and then parsing the lines in there to get my data.
While totally effective my approach is awfully efficient. Is there a way to avoid the generation of such big files (since most of the data will be removed anyway), and do the parsing on-the-fly line by line.
./myProgram model test > myOutput
myOutput content (millions of lines):
0, blah blah blah thousand of more blahs -> [ I care data inside brackets ] 0, blah blah blah thousand of more blahs -> [ I care data inside brackets ] ....
What I think could be the best option would be to use the unix pipeline to chain results but I do not know how to send the data line by line lets say to a python or java app to parse it.
./myProgram model test | <now what>