Mapping Quality
Mapping Quality in Genetics
Mapping quality is a measure used in genomics to quantify the confidence that a DNA sequence read is mapped to the correct position in a reference genome. It provides an estimate of the probability that the read alignment is incorrect. A higher mapping quality score indicates a higher likelihood that the read is mapped accurately.
What is Mapping Quality?
When aligning millions of short DNA sequence reads to a reference genome, there can be ambiguity in where certain reads should map, especially in repetitive regions of the genome. Mapping quality helps differentiate between reads that map uniquely to one location versus those that could potentially map equally well to multiple locations.
The mapping quality score is typically represented on a Phred scale, which is a logarithmic scale. A mapping quality of 30 means there is a 1 in 1000 chance that the read is mapped incorrectly, while a score of 60 means there is only a 1 in 1,000,000 chance of an incorrect mapping.
Examples
- A read with a mapping quality of 0 indicates that the mapping is completely ambiguous or of very low quality. Such reads are often filtered out in downstream analysis.
- Mapping quality 10: There is a 1 in 10 chance (10%) that the read is mapped incorrectly. This is a relatively low mapping quality, indicating ambiguity in the mapping position.
- Mapping quality 20: There is a 1 in 100 chance (1%) that the read is mapped incorrectly. This is a moderate mapping quality.
- Mapping quality 30: There is a 1 in 1000 chance (0.1%) that the read is mapped incorrectly. This is considered a high mapping quality score, with low probability of incorrect mapping.
- Mapping quality 40: There is a 1 in 10,000 chance (0.01%) that the read is mapped incorrectly. This is an extremely high mapping quality score, indicating the read is most likely mapped accurately to a unique genomic location.
- Mapping quality 50 or higher: These are the highest possible mapping quality scores. A score of 50 indicates there is a 1 in 100,000 chance the read is mapped incorrectly, while a score of 60 means a 1 in 1,000,000 chance of incorrect mapping.
For variant calling, reads with low mapping quality (e.g. <20) are often discarded, as they are more likely to represent false positive variant calls due to incorrect mapping.
The maximum reported mapping quality can vary between different alignment tools. For example, Bowtie 2 caps mapping quality at 42, while BWA caps it at 37, even though the Phred scale theoretically allows higher values.
It’s important to note that mapping quality scores from different aligners may not be directly comparable due to differences in how they are calculated.
Additionally, very high mapping qualities do not necessarily guarantee the read is mapped correctly, as the scores are just estimates of mapping accuracy.