Here are examples of use of foundation models in a UNIX-like environment. A model is defined as foundational when it has been trained on a very large/diverse dataset, and can be immediately used or fine-tuned for many downstream tasks. Most examples involve using classic and more recent UNIX utilities to glue together different tools. Then, a foundation model is used to attack a problem that goes beyond well-defined solutions. Finally, utilities are used to guardrail and massage the model’s output to turn it into something useful.

Util and Model

Creating playlists

Consider a scenario where you have downloaded a number of songs and want to organize them into playlists. Manually selecting tracks so that they smoothly transition from one to the other can be time-consuming and requires intimacy with one’s music collection. However, by using a model that understands music and sound to translate each song into a point in space, and then interpolating between these points, it is possible to automatically create coherent playlists. This recipe takes advantage of the llm1 utility and some accompanying plugins,2 but the technique can be implemented using analogous tools.

wrapped image A model that understands audio can be used to embed our music collection ($MC) into a high-dimensional space, where similar songs are placed closer together. With the llm-clap plugin, we can generate embeddings for our collection using the CLAP model with llm embed-multi -m clap songs --files $MC '*' Now, each one of our songs and its corresponding embedding are saved in a local embeddings.db database, which we can query. Then, the llm-interpolate plugin returns interpolated points between a starting and ending point (song), creating between them a path (playlist). For example, this one-liner generates a 5-song .m3u playlist between PacifyHer.wav and redrum.wav:

llm interpolate songs "PacifyHer" "redrum" -n 5 | jq .[] > playlist.m3u

Taking notes

It can be tricky to take notes while watching videos of talks. A model that can generate summaries of the video content can be used to generate notes, which can be reviewed and expanded upon later. This can also help rapidly expand one’s set of notes. The following two-liner uses the llm utility to generate a summary of a video transcript downloaded using yt-dlp and finally pipes the output to create a new note object using zk.3

yt-dlp --no-download --write-subs --output "$OUT" "$URL"
cat "$OUT" | llm -s "Create notes" | zk new -i

Generating reports

It is common practice for people working together to have monthly, weekly, or even daily meetings where all members give a short update on what they have been working on. These reports can be frustrating as they demand the right level of abstraction—neither too detailed for team members lacking context nor too broad to allow meaningful feedback. Forgetting the specifics of your recent work adds to the challenge.

Digital todo-list tools like taskwarrior4 can be leveraged to generate these reports by smartly querying them and piping their output into an LLM. The following pipeline (1) queries taskwarrior for all of last week’s completed tasks, (2) exports them in json format, (3) uses jq5 to extract the .description attribute from each one, and (4) provides the completed task list to an LLM asking it to generate the report.

task status:completed end.after:today-7d export |
jq '.[] | .description' |
llm -s 'Generate a report based on these tasks.'

Renaming pictures

Consider the scenario where you have a large collection of pictures saved. If these pictures are taken by you, or downloaded from the internet, chances are the image files have vague or useless names.

$ ls 
1672714705640839.png 1689964585834142.png 2.jpg

The laborious process of renaming each one can be automated by leveraging a model with image-understanding capabilities. For this recipe, one can use ollama,6 a very usable LLM model fetching and inference tool that works great out-of-box with the moondream vision model, which is small enough to allow for quick inference on a modern laptop. The following pipeline finds every .jpg file in the current directory and asks the model to provide a title for it based on the image contents, some light formatting at the end makes whatever the model outputs into a plausible filename.

find . -name "*.jpg" | 
xargs -I{} ollama run moondream "Title for this: {}" |
tr ' ' '_' | sed 's/$/\.jpg/'

The (slightly truncated) output filenames are (zoom-in to confirm): A_green_dragon_with_wings_and_a_tail.jpg image, A_painting_of_a_serene_landscape.jpg image, urns_of_stone_red_car_in_foreground.jpg image.

Conclusion

This article serves as a starting inspiration point for the community to start using these technologies for fun and profit. Please reach out with thoughts and ideas.

This article is part of Paged Out! magazine, issue #6. You can get a free PDF of the magazine on the Paged Out! website.


  1. https://github.com/simonw/llm 

  2. llm-clap, llm-interpolate 

  3. https://github.com/zk-org/zk 

  4. https://taskwarrior.org 

  5. https://jqlang.github.io/jq 

  6. https://ollama.com