Separate ABC-scoring

If you’re only interested in the ABC-scored interactions, you can also call that part independently. The flags are the same as for whole STARE, although the long options are not available. Here is an example on how to run it (with bioconda call STARE_ABCpp directly, meaning remove the trailing ./Code/):

./Code/STARE_ABCpp -b <path_to_bed_file> -n <activity_column(s) -a <gtf_annotation> -o <output_path> -w <window_size> -f <contact_data_dir> -k <bin_size> -t <score_cut_off>

For convenience, here are the flags that are specific to the ABC-scoring alone:

required

Flag

Description

-b

Bed-file containing your candidate regions. Headers are allowed if they start with #. Usually non-overlapping regions make most sense.

-a

Gene annotation file in gtf-format (for example from Gencode). It is advised to give the full annotation and not only a subset in particular when running the generalised ABC-scoring approach as we require the information of all genes.

-o

Output-prefix with which all file names will start. Unlike whole STARE no separate folder will be created.

optional

Flag

Description

-w

Window size centred at the 5’ TSS in which regions from the –bed_file will be considered for a gene (Default 5MB). E.g. 5MB means ±2.5MB around the TSS.

-n

Column(s) in the –bed_file representing the activity of the region. You will get one set of output files for each column. Start counting at 1. Allowed formats are individual columns; column ranges; columns separated by comma as well as a start column with all consecutive columns.

-c

Number of cores to provide for parallel computing. Note that the processing is also heavy on memory. For >100 columns the parallelization switches from chromosomes to gene batches.

-x

Bed-file with regions to exclude. All regions in the –bed_file with ≥ 1 bp overlap will be discarded from all further analyses.

-u

File with rows of gene IDs/symbols to limit the output to. The respective IDs/symbols must be present in the gtf-file (-a) including all potential version suffixes like ENSG00000164458.5. If you don’t give any gene list you will get the result for all genes in the gtf-file.

-i

Set to “all_tss” to average across all annotated TSS for ABC-scoring or “5_tss” to use only the 5’ TSS (default “all_tss”).

-q

Whether to use the use the adapted activity for ABC-scoring or the ‘original’ one (Default True).

-f

Path to directory containing normalized chromatin contact files in coordinate format (bin|bin|contact) one gzipped file for each chromosome. Alternatively set to “false” to instead use a contact estimate based on distance.

-k

Resolution of the chromatin contact data. E.g. 5000 for a 5kb resolution.

-t

Cut-off for the ABC-score. Only interactions surpassing it are written to the output (Default 0.02). Set to 0 if you would like to get all scored interactions. Interactions with 0 activity/adapted activity are skipped either way.

-d

Whether to use a pseudocount for the contact frequency in the ABC-score (Default True).

-m

Size of the window around your candidate regions in which genes are considered for the adapted activity adjustment (Default 5MB; will be minimally set to -w).

-h

Print the flag options.