Lazy Pipes
When working with some files for our paper, I needed to extract some input data from an archive, not all as there's many of them, convert them, and then run it all through some pipelines. There's different tools such as Nix or Snakemake which allow you to systematically work with isolated pipelines.
As an example you can create an attrset whose names are files inside a zip file
and the values are derivations that extract the given file. These can be wrapped
to have value be derivations that convert the given file to a target format such
as AIG
or BLIF
. Then you can use these in your other derivations.
So another part of the workflow might look like this:
pkgs.runCommandLocal "result" {} ''
some-tool zip.file1.blif zip.file2.blif
''
Due to the powerful property of Nix being lazy, only the extractions and conversions required to generate the final output are ever actually performed. But it's not ergonomic at all to work with outside of Nix.
This is all fine and dandy but I'd like something a bit more systematic, something which integrates with the system and isn't as hacky.
Something like regular pipes but with laziness, introspection and editing in mind.
If we want to ever see the results of older pipelines we need to isolate the input into some probably immutable form so they don't overwrite each other, we also must forbid any type of file access that isn't a random temporary file.
Theoretically unzipping specifically could be done with file-system translators however, conversions, selections and the like are definitely something one would need something more sophisticated for.
This also requires that the output is more structured than just text, as cutting off text is insufficient to provide the amount of laziness I desire.
The data model
What many utilities return is effectively a table of values, ls -l
returns a
table consisting of permissions, owning user, owning group, size, last-modified
time and name. Ocassionally it is useful to also return extra values however
that is probably best handled in a manner similar to common lisp, having the
default always be the one that is returned, unless a special operator is used
which returns a map of named values.
Perhaps an interface analoguing spreadsheets might be useful, immutable lazy sheets where the user may explicitly force a row, column, cell or entire table. Values can be changed by taking a subset of that data and handing it over to another program. Rank polymorphism would be extremely useful here the operation applicable to a cell could be applied to a row instead and simply map over it. It might also be useful to have separate functions for true projection and mapping over a subset.
This all creates a sort of DAG of tables where edges are operations on one constructing another.
Numeric example
Input Foo
:
A | B |
---|---|
1 | 2 |
3 | 4 |
:A + 1
Should take the B
column from Foo
and divide each cell by 2, then join that
new column now named C
onto Foo
and incrementing A
in-place by 1.
Resulting in a table if forced looking like this:
A | B | C |
---|---|---|
2 | 2 | 4 |
4 | 4 | 6 |
As you know truncate
also returns the remainder as a secondary value, so
there's a separate sort of (value remainders C)
in play here as well, which
due to truncate
already being called to evaluate the main table is
materialised as well.
C.remainders |
---|
0 |
0 |
The |
operator is some black magic I'd like to avoid, joining these lines on
just their ordering seems risky and error-prone. It also violates the nice
intuition of each row being a sort of struct, and only expanding and adding or
removing slots of the given struct, albeit sometimes broadcasting across rows.
File example