Genome Project Standards (Quality Score)

Score scheme implemented by Lukjancenko et al. (2010) based on Chain et al. (2009).

Standard Draft (score 6)

- Minimally or unfiltered data
+ From any mumber of different sequencing platforms, that are assembled into contigs
- Harbor many regions of poor quality and can be relatively incomplete
- It may not always be possible to remove contaminating sequence data

High-Quality Draft (score 5)

- Overall coverage representing at least 90% of the genome of target region
- Efforts should be made to exclude contaminating sequences
- This is still a draft assembly with little or no manual review of the product
- Sequence errors and misassemblies are possible, with no implied order and orientation to contigs
- This is appropriate for general assessement of gene content

Improved High-Quality Draft (score 4)

- Additional work has been performed beyond the initial shotgun sequencing and High-Quality Drft assembly, by using either manual or automated methods
- This should contain no discernable misassemblies and should have undergone some form of gap resolution to reduce the number of contigs and supercontigs (or scaffolds)
- Undetectable misassemblies are still possible, particularly in repetitive regions
- Low quality regions and potential base errors may also be present
- This standard is normally adequate for comparison with other genomes

Annotation-Directed Improvement (score 3)

- Emphasizes the verification and correction of anomalies within coding regions, such as frameshifts and stop codons
- May overlap with the previous standard
- Repeat regions at this level are not resolved, so errors in those regions are much more likely
- This standard is useful for gene comparisons, alternative splicing analysis, and pathways reconstruction

Noncontiguous Finished (score 2)

- Describes high quality assemblies that have been subject to automated and manual improvement
- Closure approaches have been successful for almost all gaps, misassemblies, and low-quality regions
- Attempts have been made to resolve all gaps and sequence uncertainties, and only those recalcitrant to resolution remain
- This product is thus of "Finished" quality with the only exception being repetitive or intractable gaps
- It is appropriate for most analyses

Finished (score 1)

- Refers to the current gold standard
- Genome sequences with less than 1 error per 100,000 base pairs and where each replicon is assembled into a single contiguous sequence with a minimal number of possible exceptions commented in the submission record
- All sequences are complete and have been reviewed and edited
- All known misassemblies have been resolved, and repetitive sequences have been ordered and correctly assembled
- The finished product is appropriate for all types of detailed analyses and acts as a high-quality reference genome for comparative purpouses


1. Chain, P.S.G et al. Genome Porject Standards in a new era of sequencing. Science, v. 326, p. 236-237, 2009.
2. Lukjancenko, O.; Wassenaar, T.M.; Ussery, D.W. Comparison of 61 Sequenced Escherichia coli Genomes. Microb. Ecol., v. 60, n. 4, p. 708-720, 2010.