List of helpful Linux commands to process FASTQ files from NGS experiments

ngs_analysisHere I’ll summarize some Linux commands that can help us to work with millions of DNA sequences from New Generation Sequencing (NGS).

A file storing biological sequences with extension ‘.fastq’ or ‘.fq’ is a file in FASTQ format, if it is also compressed with GZIP¬† the suffix will be ‘.fastq.gz’ or ‘.fq.gz’. A FASTQ file usually contain millions of sequences and takes up dozens of Gigabytes in a disk. When these files are compressed with GZIP their sizes are reduced in more than 10 times (ZIP format is less efficient).

In the next lines I’ll show you some commands to deal with compressed FASTQ files, with minor changes they also can be used with uncompressed ones and FASTA format files.

To start, let’s compress a FASTQ file in GZIP format:

> gzip reads.fq

The resulting file will be named ‘reads.fq.gz’ by default.

If we want to check the contents of the file we can use the command ‘less’ or ‘zless’:

> less reads.fq.gz
> zless reads.fq.gz

And to count the number of sequences stored into the file we can count the number of lines and divide by 4:

> zcat reads.fq.gz | echo $((`wc -l`/4))

If the file is in FASTA format, we will count the number of sequences like this:

> grep -c "^>" reads.fa

A list of R commands helpful for science and research


Here is a comprehensive compilation of many R commands that most of the scientists can find useful for analyzing research data.


  • Check working directory:
  • Change working directory:
  • Show help about a command (quotes are important):
  • Show the objects and variables of the working space:
  • Remove an object from the working space:
rm(list=ls()) # Removes all the objects
  • Save the working space:
save.image() # in the file .RData at the working directory
save(object list,file="myfile.RData") # saves the object list into the chosen file
  • Reload a working space:
  • Quit R:

