De novo DDA using Casanovo ============================ This page provides instructions for running the de novo DDA with Casanovo Nextflow workflow. Prerequisites ------------- .. note:: These are optional, one-time setup steps. * If you plan to upload results to Limelight, please :doc:`set up your Limelight credentials `. * If you will be downloading data from Panorama, please :doc:`set up your Panorama credentials `. Workflow Steps -------------- 1. **Connect to the GSIT (Nexus) system** Follow the instructions to :doc:`connect via SSH `. 2. **Create a directory for the analysis** Create and move into a new directory for your analysis. For example: .. code-block:: bash mkdir casanovo-run cd casanovo-run 3. **Copy configuration files** Copy the template files for the workflow into your new directory: .. code-block:: bash cp /net/maccoss/vol1/maccoss_shared/nextflow/templates/casanovo/casanovo.yaml . cp /net/maccoss/vol1/maccoss_shared/nextflow/templates/casanovo/pipeline.config . 4. **Update** ``casanovo.yaml`` **(Optional)** This file contains the parameters for the Casanovo de novo search. It is unlikely you will need to change anything in this file. If you do, instructions are in the file. 5. **Update** ``pipeline.config`` Edit the ``pipeline.config`` file to set the parameters for the Nextflow workflow itself. 6. **Run the workflow** Execute the following script to start the workflow: .. code-block:: bash /net/maccoss/vol1/maccoss_shared/nextflow/scripts/run-nextflow-casanovo-maccosslab.sh .. note:: If you wish to run **Casanovo 4** instead of the latest Casanovo 5, use the ``casanovo4.yaml`` configuration file: .. code-block:: bash cp /net/maccoss/vol1/maccoss_shared/nextflow/templates/casanovo/casanovo4.yaml ./casanovo.yaml and run the workflow with the ``-4`` flag: .. code-block:: bash /net/maccoss/vol1/maccoss_shared/nextflow/scripts/run-nextflow-casanovo-maccosslab.sh -4 Editing Files with Nano ~~~~~~~~~~~~~~~~~~~~~~~ A simple way to edit files on the command line is with the `nano` text editor. To edit a file, run: .. code-block:: bash nano For example, to edit `pipeline.config`: .. code-block:: bash nano pipeline.config To save your changes and exit, press ``Ctrl + X``, then ``Y`` to confirm, and finally ``Enter``. Running with tmux ----------------- .. note:: When you connect to ``nexus.gs.washington.edu``, you are randomly logged into one of ``nexus1``, ``nexus2``, or ``nexus3``. To reattach to a ``tmux`` session, you must be logged into the same server where the session is running. It is recommended to explicitly connect to a specific server (e.g., ``ssh your_username@nexus1.gs.washington.edu``) to make it easier to find and reattach to your sessions later. It is highly recommended to run long processes like this workflow inside a `tmux` session. `tmux` is a terminal multiplexer that allows you to create persistent terminal sessions. This means your workflow will continue to run even if you get disconnected from the server. 1. **Start a new `tmux` session and run the workflow:** This command creates a new session named `nextflow_run` and executes the workflow script inside it. .. code-block:: bash tmux new -s nextflow_run "bash -c '/net/maccoss/vol1/maccoss_shared/nextflow/scripts/run-nextflow-casanovo-maccosslab.sh;exec bash'" 2. **Detach from the session:** You can safely detach from the session, and your workflow will continue to run. To detach, press ``Ctrl + b`` followed by ``d``. 3. **Reattach to the session:** To check on your workflow's progress, you can reattach to the session at any time: .. code-block:: bash tmux attach -t nextflow_run 4. **Exit the session:** Once the workflow is finished, it is important that you exit the tmux session by typing ``exit`` and pressing ``Enter``. .. note:: If you need to re-execute a workflow, be sure to exit the tmux session before starting another one. Specifying a Cluster Queue -------------------------- By default, the workflow runs on the ``sage`` cluster queue, which is the general queue for Genome Sciences users. You can specify a different queue using the ``-q`` flag. Available queues: * ``maccoss``: The MacCoss lab cluster queue, available to all users in the MacCoss lab. * ``sage``: The general GS cluster queue, available to all users in Genome Sciences. * ``pr``: The UW Proteomics Resource queue; may not be available to all users. For example, to run the workflow on the ``pr`` queue: .. code-block:: bash /net/maccoss/vol1/maccoss_shared/nextflow/scripts/run-nextflow-casanovo-maccosslab.sh -q pr