CBCA Analysis

The CBCA analysis system is designed to facilitate the sequential assignment of triple resonance protein spectra. In particular, it is appropriate for 3D experiments giving CB and CA correlations to the HN and N resonances. It could be used, with or without customization, for analyzing other experiments ( for example, HNtocsy and HNnoesy).

The graphical user interface for the CBCA analysis system looks like that shown below. A control bar appears across the top of the window and a set of spectra are displayed below. Up to four rows of spectra can be displayed. The controls and spectra which are displayed depend on what mode the system is in, and what types of datasets were collected. In some modes a separate control panel with one or more listboxes will appear as well.

The analysis proceeds in a series of steps, with both automated and manual components. The overall procedure is summarized as follows:

  1. Peak pick relevant spectra.
  2. Setup CBCA Analysis parameters.
  3. Clean and verify peaks.
  4. Cluster peaks based on common HN and N frequencies.
  5. Verify and adjust peak clusters.
  6. Link clusters based on common CB and CA frequencies.
  7. Confirm best cluster links.
  8. Form Fragments which are a series of overlapping clusters.
  9. Assign Fragments to specific ranges of residues in sequence.

Peak pick relevant spectra.

First use the standard NMRView peak picking tools to peak pick your spectra. If your are using deuterated proteins, you will probably have four spectra (HNcocaCB, HNcoCA, HNcaCB, HNCA) and with non-deuterated proteins you will probably have two spectra (CACBcoNNH, HNCACB).

Setup CBCA Analysis parameters..

The next step is to set up various parameters that determine what spectra and peaklists will be displayed. Since different users label spectra in different manners, you need to first specify what labels are used for your datasets. The proton, nitrogen and carbon dimensions should each have a unique label, but these labels should be the same for each of the spectra. Set the CBCA panel to the "Label" mode, and enter the labels for the three dimensions as illustrated in the example below.

To select the datasets and peaklists use the mode menu to select one of the four rows from the "Row" entry. The following control panel will appear.

If you want this row to be displayed select the "Show row" checkbutton. By default, the spectrum range used for the y axis (typically the carbon dimension) is the full range. If you want to limit it to a certain chemical shift range unselect the "Full" checkbutton and enter the desired upper and lower bounds in the two entry boxes. Now enter the filename and peaklists that will be displayed on this row. Clicking on the select buttons will allow you to choose the datasets and peaklists from a menu (the datasets must already be opened).

Clean and verify peaks.

In this mode you can choose a single peak list, typically the most sensitive one, to use to step through the spectra and verify the picked peaks. Go into the "Peak" mode and the control bar will change to the following:

Select the desired peak list with the "Select" button. You can then step through the peak list with the up/down buttons, or jump to a specific peak by entering it into the entry box and hitting the Enter key. As you move to each peak the spectral displays will be updated to show the region corresponding to the peak. The y axis of all spectra will be the full carbon axis. The x axis of the outer two spectra will be a range of the the nitrogen dimension chosen to bracket the peaks 15N chemical shift. The z axis (the chosen planes) will be at the 1H chemical shift of the peak. The inner two spectra have the 1H and 15N axes switched. The two spectra on the left correspond to the inter-residue experiments, the two spectra on the right correspond to the intra-residue experiments. Here is an example

The mouse bindings are set up so that when you point to a peak with the crosshair and double click the left mouse button it sets the status flag of the peak to 1. By stepping through all the peaks one can click on all the all peaks that "look good" to confirm them. Then a script (cutncf ) can be used to delete all peaks that haven't been confirmed in this way (obviously you probably want to save a copy of the database before deleting them). This may or not be the best protocol but it seems to work reasonably well. The opposite protocol would also be possible, to delete all peaks that don't look good. "Looking good" is taken to mean that several peaks in the different spectra line up with each other in both the HN and N views, unlike noise peaks which should be uncorrelated in the different spectra. It should be apparent that a simple script could be written to do this as well, but it's perhaps therapeutic or educational to look at actual peaks occasionally.

Cluster peaks based on common HN and N frequencies.

Next, switch the CBCA mode to "link" in order to see the following panel. Type in tolerances for the intralist comparisons. Then click Link, this will form links within each peak list. Then repeat (usually with slightly larger tolerance) for the interlist link. (At the moment doing the intralist links, prior to doing the interlist link is probably not necessary, but in the future I may take advantage of this.

The process of linking the peaks is done by executing a two-dimensional cluster algorithm on the 1HN and 15N frequencies of the peaks. The results are stored as so-called links between the peaks. The link attributes of peaks are automatically stored in the database. However, this step of the analysis also creates a Tcl array variable named "Clust". This array contains additional information, that will be supplemented at later stages of the analysis. It is not saved to the database so you should periodically save it to a text file. This can be done by choosing the "Save" entry from the "File" menu of the CBCA panel. This will save the Clust array and several other relevant arrays to files. Selecting the "Restore" entry will reread these arrays.

Verify and adjust peak clusters.

Next, switch the CBCA mode to "linkedit". A new panel control panel will appear that looks as follows:

Use the up/down arrows on the control bar to step through the clusters in the Clust array (or type in a cluster number and hit return (do this for the first one 0).

You'll see in the left listbox a list of all peaks linked together in that cluster. The region of the spectra corresponding to the cluster HN and N frequencies will also appear. If you click on an entry in the list it will move from the left to right list (and vice versa). Use this if you can resolve a set of peaks into two clusters. Just move all the ones that should be split out to the other box, and then click on Relink. This unlinks all the peaks in the two list boxes and then relink the ones in the left and right boxes separately. Remember that the Cluster is not saved in the database so you need to manually use the Save command to save it. Do this periodically to save your work. In this mode mouse bindings are set up so that pointing to a peak with the mouse cursor and hitting "a" will add that peak to the "left" list. Hitting "s" will switch the peak to the other list. After switching a peak you must hit reform the clusters.

Hitting the "Neighbor" button will search through the clusters to find the cluster nearest (by HN and N) frequencies to the current cluster. Then you can easily move peaks back and forth between the clusters if the automatic clustering algorithm did not separate them out correctly. Next, switch to "MatchCluster" mode. Enter a tolerance for the 13C comparison (I use about 0.6) and select the "Match" button. This will take a while (up to half an hour) to find correlations between all the clusters based on overlapping CA and CB frequencies. The progress will be indicated, and the quit button can be hit to interrupt the process. Note: This version only properly works for seperate CB/CA experiments (typically used for deuterated proteins). I will soon put back in the matching script for combinded CBCA experiments.

Confirm best cluster links.

Next change the mode to clustedit. Here you will see a list of clusters that

have overlaps based on CA and CB. The middle two spectra will show the regions corresponding to the inter and intraresidue peaks of the cluster selected. Click on any of the clusters in the either listbox and you will see that region shown in the left or right spectra. If you like the overlap you see, click on the Confirm button. Whichever cluster is currently highlighted is confirmed. You have to do the left and right ones individually. If you change your mind select the cluster in the listbox and click on UnConfirm. Step through all the clusters this way. The information on each line in the listbox is as follows:

Again, use "Save" to save the results periodically.

Form Fragments which are a series of overlapping clusters.

More info coming soon.