Scroll to navigation

sam(5) Bioinformatics formats sam(5)


sam - Sequence Alignment/Map file format


Sequence Alignment/Map (SAM) format is TAB-delimited. Apart from the header lines, which are started with the `@' symbol, each alignment line consists of:

1  QNAME Query template/pair NAME
2  FLAG bitwise FLAG
3  RNAME Reference sequence NAME
4  POS 1-based leftmost POSition/coordinate of clipped sequence
5  MAPQ MAPping Quality (Phred-scaled)
6  CIGAR extended CIGAR string
7  MRNM Mate Reference sequence NaMe (`=' if same as RNAME)
8  MPOS 1-based Mate POSition
9  TLEN inferred Template LENgth (insert size)
10  SEQ query SEQuence on the same strand as the reference
11  QUAL query QUALity (ASCII-33 gives the Phred base quality)
12+  OPT variable OPTional fields in the format TAG:VTYPE:VALUE

Each bit in the FLAG field is defined as:

0x0001 p the read is paired in sequencing
0x0002 P the read is mapped in a proper pair
0x0004 u the query sequence itself is unmapped
0x0008 U the mate is unmapped
0x0010 r strand of the query (1 for reverse)
0x0020 R strand of the mate
0x0040 1 the read is the first read in a pair
0x0080 2 the read is the second read in a pair
0x0100 s the alignment is not primary
0x0200 f the read fails platform/vendor quality checks
0x0400 d the read is either a PCR or an optical duplicate
0x0800 S the alignment is supplementary

where the second column gives the string representation of the FLAG field.

The full SAM/BAM file format specification
August 2013 htslib