De novo DDA using Casanovo

This page provides instructions for running the de novo DDA with Casanovo Nextflow workflow.

Prerequisites

Note

These are optional, one-time setup steps.

Workflow Steps

  1. Connect to the GSIT (Nexus) system Follow the instructions to connect via SSH.

  2. Create a directory for the analysis Create and move into a new directory for your analysis. For example:

    mkdir casanovo-run
    cd casanovo-run
    
  3. Copy configuration files Copy the template files for the workflow into your new directory:

    cp /net/maccoss/vol1/maccoss_shared/nextflow/templates/casanovo/casanovo.yaml .
    cp /net/maccoss/vol1/maccoss_shared/nextflow/templates/casanovo/pipeline.config .
    
  4. Update casanovo.yaml (Optional) This file contains the parameters for the Casanovo de novo search. It is unlikely you will need to change anything in this file. If you do, instructions are in the file.

  5. Update pipeline.config Edit the pipeline.config file to set the parameters for the Nextflow workflow itself.

  6. Run the workflow Execute the following script to start the workflow:

    /net/maccoss/vol1/maccoss_shared/nextflow/scripts/run-nextflow-casanovo-maccosslab.sh
    

Note

If you wish to run Casanovo 4 instead of the latest Casanovo 5, use the casanovo4.yaml configuration file:

cp /net/maccoss/vol1/maccoss_shared/nextflow/templates/casanovo/casanovo4.yaml ./casanovo.yaml

and run the workflow with the -4 flag:

/net/maccoss/vol1/maccoss_shared/nextflow/scripts/run-nextflow-casanovo-maccosslab.sh -4

Editing Files with Nano

A simple way to edit files on the command line is with the nano text editor. To edit a file, run:

nano <filename>

For example, to edit pipeline.config:

nano pipeline.config

To save your changes and exit, press Ctrl + X, then Y to confirm, and finally Enter.

Running with tmux

Note

When you connect to nexus.gs.washington.edu, you are randomly logged into one of nexus1, nexus2, or nexus3. To reattach to a tmux session, you must be logged into the same server where the session is running.

It is recommended to explicitly connect to a specific server (e.g., ssh your_username@nexus1.gs.washington.edu) to make it easier to find and reattach to your sessions later.

It is highly recommended to run long processes like this workflow inside a tmux session. tmux is a terminal multiplexer that allows you to create persistent terminal sessions. This means your workflow will continue to run even if you get disconnected from the server.

  1. Start a new `tmux` session and run the workflow: This command creates a new session named nextflow_run and executes the workflow script inside it.

    tmux new -s nextflow_run "bash -c '/net/maccoss/vol1/maccoss_shared/nextflow/scripts/run-nextflow-casanovo-maccosslab.sh;exec bash'"
    
  2. Detach from the session: You can safely detach from the session, and your workflow will continue to run. To detach, press Ctrl + b followed by d.

  3. Reattach to the session: To check on your workflow’s progress, you can reattach to the session at any time:

    tmux attach -t nextflow_run
    
  4. Exit the session: Once the workflow is finished, it is important that you exit the tmux session by typing exit and pressing Enter.

    Note

    If you need to re-execute a workflow, be sure to exit the tmux session before starting another one.

Specifying a Cluster Queue

By default, the workflow runs on the sage cluster queue, which is the general queue for Genome Sciences users. You can specify a different queue using the -q flag.

Available queues:

  • maccoss: The MacCoss lab cluster queue, available to all users in the MacCoss lab.

  • sage: The general GS cluster queue, available to all users in Genome Sciences.

  • pr: The UW Proteomics Resource queue; may not be available to all users.

For example, to run the workflow on the pr queue:

/net/maccoss/vol1/maccoss_shared/nextflow/scripts/run-nextflow-casanovo-maccosslab.sh -q pr