Skip to content

Potentially useful bash profile functions for HPC users

Notifications You must be signed in to change notification settings

badonyi/bash_profile

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

bash_profile

Potentially useful bash profile functions for HPC users

This repository contains a collection of Bash functions that I have accumulated over the years. They are small convenience tools intended to shorten repetitive or cumbersome command-line workflows and to improve the interactive user experience.

The functions target tasks commonly encountered during day-to-day work on HPC systems, such as copying, renaming, or inspecting large numbers of files, often with lightweight progress or summary feedback. Each function is documented below with a brief description and an example use case.

Assumptions

  • These functions are intended for interactive use in Bash on Linux-based HPC systems.
  • They rely on GNU core utilities and may not behave identically on macOS or BSD systems.
  • As with any bulk file operation, it is recommended to test on a small subset first.
  • Note: commands that touch large numbers of files may generate substantial metadata traffic on shared filesystems.

Usage

Nothing to install. Just grab the script if you like what it does and put it in your bash profile.
These functions are provided as-is and are meant to be copied, adapted, or modified to suit local environments.

Functions

pfuns

Lists all functions detected in the bash profile along with the line they start on.
Works for all functions, not just the ones here.

--- Functions found in /home/user/.bash_profile ---
concatenate                    (line 140)
cpp                            (line 188)
dirdiff                        (line 233)
filetree                       (line 292)
glimpse                        (line 314)
histgrep                       (line 396)
lines                          (line 449)
math                           (line 528)
max_width                      (line 561)
note                           (line 596)
pfuns                          (line 610)
rename_pattern                 (line 661)
rm_but                         (line 709)
rm_top                         (line 777)
sizesum                        (line 803)
treesize                       (line 826)
update_timestamps              (line 877)
----------------------------------------------------

Concatenate multiple text files, inserting a separator between them.

Usage:    concatenate [-s <separator>] <file> [file ...]
Options:
          -s   Separator characters 
Example:  concatenate -s "\n\n" *.sh > all.sh

Defaults to a single newline. E.g., can be used to paste all bash scripts in this repo together:

bash concatenate.sh
concatenate *.sh > all_funs.sh
concatenate ~/.bash_profile all_funs.sh > ~/.bash_profile_new

Copy a large number of files with a progress bar.

Usage:    cpp <source> <destination>
Example:  cpp ../*.fa tmp/

It does not do byte-tracking for single large files.
Simply displays a progress bar for the copying process.

cpp ../*.fa tmp/
[#........................................] 73/20339

Compare the contents of two directories.

Usage:    dirdiff [-f] <dir1> <dir2>

Options:
          -f    Ignore file extensions; compare only basenames

Output semantics:
          - <file>  present only in <dir1>
          + <file>  present only in <dir2>

The first argument is always the reference directory.
E.g. tmp1 directory has file1 and file2, and tmp2 has file2 and file3, then:

dirdiff tmp1 tmp2
- file1
+ file3

i.e., compared to tmp1, tmp2 lacks file1 but has file3.

Can be more useful in certain cases with the -f flag, which ignores file extensions.


Print a simple tree-like view of a directory.

Usage:    filetree <directory>
Example:  filetree .
          filetree src
filetree tmp1
filetree tmp1
tmp1
|____file1
|____file2

This function is only relevant if you work in R. dplyr::glimpse() is a very natural way of inspecting rectangular data with headers. The bash function glimpse () calls data.table::fread() on the data and passes it to dplyr::glimpse() for display.

Preview the structure of a delimited text file via dplyr::glimpse().
gz files are supported.

Usage:    glimpse [-h] [-n <rows>] <file>
Options:
          -h    Pass this flag for headerless files
          -n    Number of rows to parse through to get
                to the data (should be > comments + any header)
Example:  glimpse data.txt
          glimpse -h data_no_header.csv
          glimpse -n 5 data_with_comments.tsv.gz
glimpse gencc_2025-11-06.tsv
Rows: 1
Columns: 15
$ uuid                                <chr> "GENCC_000101-HGNC_10896-OMIM_182212-HP_0000006-GENCC_100001"
$ gene_curie                          <chr> "HGNC:10896"
$ gene_symbol                         <chr> "SKI"
$ disease_title                       <chr> "Shprintzen-Goldberg syndrome"
$ disease_original_curie              <chr> "OMIM:182212"
$ classification_title                <chr> "Definitive"
$ moi_curie                           <chr> "HP:0000006"
$ moi_title                           <chr> "Autosomal dominant"
$ submitted_as_date                   <dttm> 2018-03-30 13:31:56
$ submitted_as_public_report_url      <lgl> NA
$ submitted_as_notes                  <lgl> NA
$ submitted_as_pmids                  <lgl> NA
$ submitted_as_assertion_criteria_url <chr> "PMID:\302\24028106320"
$ submitted_as_submission_id          <int> 1034
$ submitted_run_date                  <IDate> 2020-12-24

Search shell history with optional negative filtering.

Usage:    histgrep <pattern> [OPTIONS]
Options:
          -v, --invert-match   Invert match
          --not <pattern>      Exclude matches
Example:  histgrep foldx
          histgrep slurm --not sbatch

Make sure to add this line to ~/.bashrc for the date-time format to work:

export HISTTIMEFORMAT="%Y-%m-%d %T "
histgrep singularity --not module
2025-12-18 12:57:10 singularity --version
2025-12-18 12:57:19 which singularity

Print ranges or lists of lines from a file.

Usage:    lines [-l] <range|list> <file>
Options:
          -l    Show line numbers
Examples: lines 5-10 file.txt
          lines 1:3 file.txt
          lines -l 2,4,7,9,32 file.txt
lines -l 3-5 ~/.bash_profile
     3  if [ -f ~/.bashrc ]; then
     4          . ~/.bashrc
     5  fi

Perform basic arithmetics in one line.

Usage:     math <expression>
Example:   math 3+2+1^4.5
           math "(3+2-1)^4.5"  # use quotes when expression has brackets
Operators: +  -  * or x  / or :  ^ or **

This is a primitive wrapper around bc and awk.
I'm just very used to using R as a calculator, and I often miss this functionality.

math 3+2+2^5+1^42*12/3.14
40.8217

Report the maximum line width in a text file and where it occurs.

Usage:    max_width <file>
Example:  max_width query.fasta
max_width A0A024R1R8.fa
max width: 228 on line: 1297

Create a note file typed verbatim.

Usage: note README.txt
Exit:  Ctrl-D

I found this useful when working in a directory and want to quickly jot down notes.
Not substantially faster than using an editor, but occasionally convenient (and fun).


Usage:    rename_pattern <pattern> [replacement]
Example:
          rename_pattern ".clustal" ".fa"  # replace .clustal with .fa
          rename_pattern _draft            # remove _draft from file names
rename_pattern ".clustal" ".fa" 
[##.......................................] 230/20339

Remove all files in the current directory except those matching one or more patterns.

Usage:    rm_but [-d] <pattern1> [pattern2 ...]
Options:
          -d   Include directories
Example:  rm_but *.txt
          rm_but -d README 

E.g., with file.aln, file.fa, file.muscle in a directory:

rm_but ".aln" 
Keeping: file.aln
removed 'file.fa'
removed 'file.muscle'

Remove top N lines of a text file.

Usage:    rm_top <n> <file>
Example:  rm_top 1 file.txt > file_no_header.txt

Really only useful for removing headers...


Calculate the total size of regular files in a directory (excluding subdirectories).

Usage:    sizesum <path>
Example:  sizesum .

It is useful when results (files) are being generated in a directory while other directories are present.

sizesum .
52.27 GB

Display directory sizes (human-readable) for specified paths.

Usage:    treesize -d <max_depth> <path> [<path> ...]

Options:  -d <max_depth>    Limit depth of recursion (default 1)
Example:  treesize -d 2 .
treesize -d 2 .
  13.5 GB    .
   6.0 GB    ./a_subset
   4.0 GB    ./b_subset
 860.3 MB    ./c_subset
  91.7 MB    ./d_subset

Update timestamps for all files in a directory and its subdirectories.

Usage:    update_timestamps <directory>
Example:  update_timestamps /tmp/scratch/

I found this useful when working in a scratch directory that gets wiped every few weeks, and backup is not immediately possible. Runnning this command will ensure that all the files get a fresh timestamp to win you time. (Noting that scratch spaces with such rules exist for a reason, so this is only for emergency scenarios).

update_timestamps /tmp/scratch/
Timestamps updated for files in /tmp/scratch/ and its subdirectories.

About

Potentially useful bash profile functions for HPC users

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages