TN008 Auto Insert Cells

Automatically Inserting Suggested Cells

Objective

  • Design to have Foyle automatically and continuously generate suggested cells to insert after the current cell as the user edits a cell in a notebook.

TL;DR

Today Foyle generates a completion when the user explicitly asks for a suggestion by invoking the generate completion command. The completion results in new cells which are then inserted after the current cell. We’d like to extend Foyle to automatically and continuously generate suggestions as the user edits a cell in a notebook. As a user edits the current cell, Foyle will generate one or more suggested cells to insert after the current cell. This cells will be rendered in a manner that makes it clear they are suggestions that haven’t been accepted or rejected yet. When the user finishes editing the current cell the suggested cells will be accepted or rejected.

This feature is very similar to how GitHub Copilot AutoComplete works. The key difference is that we won’t try to autocomplete the current cell.

Not requiring users to explicitly ask for a suggestion should improve the Foyle UX. Right now users have to enter the intent and then wait while Foyle generates a completion. This interrupts the flow state and requires users to explicitly think about asking for help. By auto-inserting suggestions we can mask that latency by generating suggestions as soon as a user starts typing and updating those suggestions as the expand the intent. This should also create a feedback loop which nudges users to improve the expressiveness of their intents to better steer Foyle.

Since Foyle learns by collecting feedback on completions, auto-generating suggestions increases the opportunity for learning and should allow Foyle to get smarter faster.

The UX should also be a bigstep towards assisting with intents that require multiple steps to achieve including ones where later steps are conditional on the output of earlier steps. One way achieve multi-step workflows is to recursively predict the next action given the original intent and all previous actions and their output. By auto-inserting cells we create a seamless UX for recursively generating next action predictions; all a user has to do is accept the suggestion and then execute the code cells; as soon as they do the next action will be automatically generated.

Motivation: Examples of Multi-Step Workflows

Here are some examples of multi-step workflows.

GitOps Workflows

Suppose we are using GitOps and Flux and want to determine whether the Foo service is up to date. We can do this as follows. First we can use git to get latest commit of the repository

git ls-remote https://github.com/acme/foo.git   

Next we can get the flux kustomization for the Foo service to see what git commit has been applied

kubectl get kustomization foo -o jsonpath='{.status.lastAppliedRevision}'

Finally we can compare the two commits to see if the service is up to date.

Notably, in this example neither of the commands directly depends on the output of another command. However, this won’t always be the case. For example, if we wanted to check whether a particular K8s deployment was up to date we might do the following

  1. Use kubectl get deploy to get the flux annotation kustomize.toolkit.fluxcd.io/name to identify the kustomization
  2. Use kubectl get kustomization to get the last applied revision of the customization and to identify the source controller
  3. Use kubectl get gitrepository to get the source repository and its latest commit

In this case the command at each step depends on the output of the previous step.

Troubleshooting Workload Identity and GCP Permissions

Troubleshooting workflows are often multi-step workflows. For example, suppose we are trying to troubleshoot why a pod on a GKE cluster can’t access a GCS bucket. We might do the following

  1. Use kubectl to get the deployment to identify the Kubernetes service account (KSA)
  2. Use kubectl to get the KSA to identify and get the annotation specifying the GCP service account (GSA)
  3. Use gcloud policy-troubleshoot iam to check whether the GSA has the necessary permissions to access the GCS bucket

Motivation: Verbose Responses and Chat Models

One of the problems with the existing UX is that since we use Chat models to generate completions the completions often contain markdown cells before or after the code cells. These cells correspond to the chat model providing additional exposition or context. For example if we use a cell with the text

Use gcloud to list the buckets

Foyle inserts two cells

A markdown cell

To list the buckets using `gcloud`, you can use the following command:

and then a code cell

gcloud storage buckets list

The markdown cell is redundant with the previous markdown cell and a user would typically delete it. We’d like to create a better UX where the user can more easily accept the suggested code cell and reject the markdown cell.

UX For Continuous Suggestion

As the user edits one cell (either markdown or code), we’d like Foyle to be continuously running in the background to generate one or more cells to be inserted after the current cell. This is similar to GitHub Copilot and [Continue.Dev Autocomplete](https://docs.continue.dev/walkthroughs/tab-autocomplete. An important difference is that we don’t need Foyle to autocomplete the current cell. This should simplify the problem because

  • we have more time to generate completions
    • the completion needs to be ready by the time the user is ready to move onto the next cell rather than before they type the next character
  • we can use larger models and longer context windows since we lave a larger latency budget
  • we don’t need to check that the completion is still valid after each character is typed into the current cell because we aren’t trying to auto complete the word(s) and current cell.

If we mimicked ghost text. The experience might look as follows

  • User is editing cell N in the notebook
  • Foyle is continuously running in the background to generate suggested cells N+1,…, N+K
  • The suggested cells N+1,…,N+k are rendered in the notebook in a manner that makes it clear that they are suggestions that haven’t been accepted or rejected yet
    • Following Ghost Text we could use a grayed out font and different border for the cells
  • As the user edits cell N, Foyle updates and edits N+1,…,N+K to reflect changes to its suggestions
  • The decision to accept or reject the situation is triggered when the user decides to move onto cell N+1
    • If the user switches focus to the suggested N+1 cell that’s considered accepting the suggestion
    • If the user inserts a new cell that’s considered a rejection
  • Each time Foyle generates a new completion it replaces any previous suggestions that haven’t been accepted and inserts the new suggested cells
  • When the notebook is persisted unapproved suggestions are pruned and not saved

Since every notebook cell is just a TextEditor I think we can use the TextEditorDecoration API to change how text is rendered in the suggested cells to indicate they are unaccepted suggestions.

We can use Notebook Cell Metadata to keep track of cells that are associated with a particular suggestion and need to be accepted or rejected. Since metadata is just a key value store we can add a boolean “unaccepted” to indicate a cell is still waiting on acceptance.

Completion Protocol

Our existingGenerateService is a unary RPC call. This isn’t ideal for continually generating suggestions as a user updates a block. We can use the connect protocol to define a full streaming RPC call for generating suggestions. This will allow the client to stream updates to the doc to the generate service and the generate service to respond with a stream of completions.

// Generate completions using AI
service GenerateService {
  // StreamGenerate is a bidirectional streaming RPC for generating completions
  rpc StreamGenerate (stream StreamGenerateRequest) returns (stream StreamGenerateResponse) {}
}

message StreamGenerateRequest {
  oneof request {
    FullContext full_context = 1;
    BlockUpdate update = 2;
  }
}

message FullContext {
  Doc doc = 1;
  int selected = 2;
}

message BlockUpdate {
  string block_id = 1;
  string block_content = 2;
}

message Finish {
    // Indicates whether the completion was accepted or rejected.
    bool accepted = 1;
}

message StreamGenerateResponse {
  repeated Block blocks = 1;
}

The stream will be initiated by the client sending a FullContext message with the full document and the index of the current cell that is being edited. Subsequent messages will be BlockUpdates containing the full content of the current active cell. If its a code cell it will include the outputs if the cell has been executed.

Each StreamGenerateResponse will contain a complete copy of the latest suggestion. This is simpler than only sending and applying deltas relative to the previous response.

Client Changes

In vscode we can use the onDidChangeTextDocument to listen for changes to the a notebook cell (for more detail see appendix). We can handle this event by initializing a streaming connection to the backend to generate suggestions. On subsequent changes to the cell we can stream the updated cell contents to the backend to get updated suggestions.

We could use a rate limiting queue ( i.e. similar to workqueue but implemented in TypeScript) to ensure we don’t send too many requests to the backend. Each time the cell changes we can enqueue an event and update the current contents of the cell. We can then process the events with rate limiting enforced. Each time we process an event we’d send the most recent version of the cell contents.

Accepting or Rejecting Suggestions

We need a handler to accept or reject the suggestion. If the suggestion is accepted it would

  1. Update the metadata on the cells to indicate they are accepted
  2. Change the font rendering to indicate the cells are no longer suggestions

This could work as follows

  • A suggested cell is accepted when the user moves the focus to the suggested cell
    • e.g. by using the arrow keys or mouse
    • If we find this a noisy signal we could consider requiring additional signals such as a user executes the code cell or for a markdown cell edits or renders it
  • When the document is persisted any unaccepted suggestions would be pruned and not get persisted
  • Any time a new completion is generated; any unaccepted cells would be deleted and replaced by the new suggested cells

Since each cell is a text editor, we can use onDidChangeActiveTextEditor to detect when the focus changes to a new cell and check if that cell is a suggestion and accept it if it is.

To prune the suggestions when the document is persisted we can update RunMe’s serializer to filter out any unapproved cells.

As noted in the motivation section, we’d like to create a better UX for rejecting cells containing verbose markdown response. With the proposed UX if a user skips over a suggested cell we could use that to reject earlier cells. For example, suppose cell N+1 contains verbose markdown the user doesn’t want and N+2 contains useful commands. In this case a user might use the arrow keys to quickly switch the focus to N+2. In this case, since the user didn’t render or execute N+1 we could interpret that as a rejection of N+1 and remove that cell.

The proposal also means the user would almost always see some suggested cells after the current active cell. Ideally, to avoid spamming the user with low quality suggestions Foyle would filter out low confidence suggestions and just return an empty completion. This would trigger the frontend to remove any current suggestions and show nothing.

Ghost Cells In VSCode

We’d like to render cells in VSCode so as to make it apparent they are suggestions. An obvious way to represent a Ghost Cell would be to render its contents using GhostText; i.e. greyed out text.

In VScode the NotebookDocument and NotebookCell correspond to the in memory data representation of the notebook. Importantly, these APIs don’t represent the visual representation of the notebook. Each NotebookCell contains a TextDocument representing the actual contents of the cell.

A TextEditor is the visual representation of a cell. Each TextEditor has a TextDocument. TextEditor.setDecorations can be used to change how text is rendered in a TextEditor. We can use setDecorations to change how the contents of a TextEditor are rendered.

TextEditors aren’t guaranteed to exist for all cells in a notebook. A TextEditor is created when a cell becomes visible and can be destroyed when the cell is no longer visible.

So we can render Ghost cells as follows

  1. Use metadata in NotebookCell to indicate that a cell is a suggestion and should be rendered as a GhostCell
    • This will persist even if the cell isn’t visible
  2. Use the onDidChangeVisibleTextEditors to respond whenever a TextEditor is created or becomes vibile
  3. From the TextEditor passed to the onDidChangeVisibleTextEditors handler get the URI of the TextDocument
    • This URI should uniquely identify the cell
  4. Use the URI to find the corresponding NotebookCell
  5. Determine if the cell is a GhostCell by looking at the metadata of the cell
  6. If the cell is a GhostCell use vscode.window.createTextEditorDecorationType to render the cell as a GhostCell

LLM Completions

The backend will generate an LLM completion in response to each StreamGenerateRequest and then return the response in a StreamGenerateResponse. This is the simplest thing we can do but could be wasteful as we could end up recomputing the completion even if the cell hasn’t changed sufficiently to alter the completion. In the future, we could explore more sophisticated strategies for deciding when to compute a new completion. One simple thing we could do is

  • Tokenize the cell contents into words
  • Assign each word a score measuring its information content -log(p(w)) where p(w) is the probability of the word in the training data.
  • Generate a new completion each time a word with a score above a threshold is added or removed from the cell.

Multiple completions

Another future direction we could explore is generating multiple completions and then trying to rank them and select the best one. Alternatively, we could explore a UX that would make it easy to show multiple completions and let the user select the best one.

Collecting Feedback And Learning

As described in TN002 Learning and in the blog post Learning, Foyle relies on implicit human feedback to learn from its mistakes. Currently, Foyle only collects feedback when a user asks for a suggestion. By automatically generating suggestions we create more opportunities for learning and should hopefully increase Foyle’s learning rate.

We’d like to log when users accept or reject suggestions. We can introduce a new protocol to log events

service LoggingService {/
  rpc Log (request LogRequest) returns (response LogResponse) {}
}

message LogRequest{
  repeated CellEvent cell_events = 1;
}

enum EventType {
  UNKOWN = 0;
  ACCEPTED = 1;
  REJECTED = 2;
  EXECUTED = 3;
}

message CellEvent {
  Cell cell = 1;
  EventType event_type = 2;
}

message LogResponse {}

When a cell is accepted or rejected we can use the Log RPC to log it to Foyle.

One of the strongest signals we can collect is when the user rejects the completion. In this case, we’d like to log and learn from whatever code the user ends up manually entering and executing. Right now Foyle relies on RunMe’s Runner Service to log cell executions. This method only contains the executed cell and not any preceeding cells. We need those preceeding cells in order to learn.

Using the proposed logging service we can directly log when a cell is executed and include preceeding cells as context so we can retrain Foyle to better predict the code cell in the future.

Alternative Designs

Complete the current cell

The current proposal calls for “Auto-Inserting” one or more cells but not autocompleting the current cell. An alternative would be to do both simultaneously

  • Autocomplete the current cell
  • Auto-Insert one or more cells after the current cell

One of the reasons for not autocompleting the current cell is because GitHub copilot already does that. Below is a screen shot illustrating GitHub populating the current cell with Ghost Text containing a possible completion.

GitHub Copilot

More importantly, the problem we want Foyle to turn high level expressions of intent into concrete low level actions. In this context a user edits a markdown cell to express their intent. The actions are rendered as code cells that would come next. So by focusing on generating the next code cells we are scoping the problem to focus on generating the actions to accomplish the intent rather than helping users express their intent.

By focusing on generating the next cell rather than completing the current cell we can tolerate higher latencies for completion generation than in a traditional autocomplete system. In a traditional autocomplete system you potentially want to update the completion after each character is typed leading to very tight latency requirements. In the proposal our latency bounds are determined by how long it takes the user to complete the current cell and move onto the next cell. We can take advantage of this to use larger models and longer context windows which should lead to better suggestions.

References

VSCode

How does LLM triggering work in Autocomplete

Typically in autocomplete you are completing the stream of characters a user is typing. So as each character is added it could invalidate one or more completions. For example if the user enters the character “n” possible completions are “news…”, “normal…”, etc… When the user enters the next character “ne” we’d like to immediately update the ranking of completions and reject any that are no longer valid.

One way to handle this is each time you generate a completion you could ask the LLM to generate multiple completions (e.g. [OpenAI]’s n parameter)(https://platform.openai.com/docs/api-reference/chat/create). These completions should represent the most likely completions given the current input. Each additional character can then be used to invalidate any suggestions that don’t match.

Appendix: VSCode Background

This section provides information about how VSCode extensions and notebooks work. It is relevant for figuring out how to implement the extension changes.

In VSCode notebooks each cell is just a discrete text editor (discord thread). VScode’s extension API is defined here.

The onDidChangeTextDocument fires when the text in a document changes. This could be used to trigger the AI to generate suggestions.

As described in vscode_apis Notebooks have a data API ((NotebookData, NotebookCellData) and and editor API(NotebookDocument, NotebookCell). NotebookCell contains a TextDocument which we should be able to use to listen for onDidChangeTextDocument events.

I think you can register a handler that will fire for changes to any TextDocument change and not just for a particular cell. The document URI should use the vscode-notebook-cell scheme and allow us to identify the notebook document and cell that changed (code example).

TextDecorations

TextDecorations are properties of TextEditors. Each NotebookCell is a different TextEditor. TextEditors get created for a NotebookCell when the cell becomes visible and can be destroyed when the cell is deleted.