"Clint" - Pan troglodytes
Photo: Yerkes National Primate Research Center, Emory University

The March 2006 chimpanzee (Pan troglodytes) browser displays data from the 6X whole genome shotgun draft assembly (Build 2 Version 1, Oct. 2005) produced by the Chimpanzee Sequencing and Analysis Consortium. This assembly contains sequence from the initial 4X chimpanzee assembly described and analyzed in Nature (The Chimpanzee Sequencing and Analysis Consortium, 2005), with additional 2X sequence generated, assembled, and assigned to chromosomes by the Genome Sequencing Center of Washington University School of Medicine, St. Louis, MO, USA. For more information about this assembly, see Pan_troglodytes-2.1 in the NCBI Assembly database.

This assembly uses a new chromosomal numbering scheme that reflects orthology between the human and chimpanzee chromosomes. For details, see the Assembly details section below and the Genome Browser FAQ. To read more about the chimpanzee assembly, see the Washington University in St. Louis School of Medicine Pan troglodytes web page and the National Institutes of Health NIH News summary of the chimpanzee analysis paper.

The chimpanzee is the species most closely related to humans, but is endangered. Consequently, it is the focus of multiple conservation efforts.

Sample position queries

A genome position can be specified by the HUGO Gene Nomenclature Committee gene name of a human RefSeq, the accession number of an EST or mRNA, a chromosomal coordinate range, or keywords from the GenBank description of an mRNA. The following list shows examples of valid position queries for the chimpanzee genome. See the User's Guide for more information.

Request:
Genome Browser Response:

chr22 Displays all of chromosome 22

chr2a:11,250,001-12,250,000 Displays a million bases of chromosome 2a, beginning at base 11,250,001. Note that
chromosome 2 in this assembly has been split into two parts: 2a and 2b.

chr2a:11,250,001+2000 Displays a region of chr 2a that spans 2000 bases, starting at position 11,250,001

BRCA1 Displays a list of genomic regions where human RefSeq gene BRCA1 (or features associated with BRCA1) aligns

AF115459 Displays region of genome with mRNA with GenBank accession number AF115459

348
Displays the region of genome with Entrez Gene identifier 348

pseudogene mRNA Lists transcribed pseudogenes, but not cDNAs

sialic acid Lists mRNAs and RefSeqs with GenBank keywords sialic acid

huntington Lists mRNAs associated with Huntington's disease

Paabo,S. Lists mRNAs deposited by co-author S. Paabo

Use this last format for author queries. Although GenBank requires the search format Paabo S, internally it uses the format Paabo,S..

Assembly details

This assembly covers about 97 percent of the genome and is based on 6X sequence coverage. It is composed of 265,882 contigs with an N50 length of 29 kb and 44,460 supercontigs with an N50 length of 9.7 Mb. The total contig length, not including estimated gap sizes, is 2.97 Gb. Of that total, 2.82 Gb of sequence have been ordered and oriented along specific chimpanzee chromosomes, 107 Mb have been placed in chr*_random, and 50 Mb remain in chrUn.

The whole genome shotgun data were derived primarily from the donor Clint, a captive-born male chimpanzee from the Yerkes Primate Research Center in Atlanta, GA, USA. The reads were assembled with the whole-genome assembly program PCAP (Huang, 2006), using stringent parameters derived by eliminating detectable global misassemblies -- interchromosomal cross-overs determined by alignment of the chimpanzee genome against the human genome -- larger than 50 Kb.

The assembly data were aligned against the human genome at UCSC utilizing BLASTZ (Schwartz, 2003) to align and score non-repetitive chimpanzee regions against repeat-masked human sequence. The alignment chains differentiated between orthologous and paralogous alignments (Kent, 2003); only "reciprocal best" alignments were retained in the alignment set. The chimpanzee AGP files were generated from these alignments in a manner similar to that described in The Chimpanzee Sequencing and Analysis Consortium (2005). Centromeres were introduced into the chimp sequence at the positions of the centromeres in the human chromosomes. Ten documented/known human inversions supported by the assembly were introduced into the ordering, as was the separation of alignments to human chromosome 2 into chimpanzee chromosomes 2a and 2b. The regions in the WGS assembly corresponding to the finished sequences for chromosomes 21 and Y and a 5-Mb finished region from chimpanzee chromosome 7 were replaced with the corresponding finished AGPs/sequences. See the Credits page for acknowledgements for these chromosomal regions.

A major difference between this assembly and the previous Nov. 2003 version is the chromosomal numbering scheme, which has been changed to reflect a new standard that preserves orthology with human chromosomes. Proposed by E.H. McConkey in 2004, the new numbering convention was subsequently endorsed by the International Chimpanzee Sequencing and Analysis Consortium. This standard assigns the identifiers "2a" and "2b" to the two chimp chromosomes that fused in the human genome to form chromosome 2. Note that the genome assembly shown in the Nov. 2003 panTro1 Genome Browser retains the older numbering scheme in which these chromosomes are numbered 12 and 13. To view a table showing the correspondence between human and chimp chromosomes, see the FAQ.

Bulk downloads of the sequence and annotation data are available via the Genome Browser FTP server or the Downloads page. The complete set of sequence reads is available at the NCBI trace archive. These data have specific conditions for use.

The chimpanzee browser annotation tracks were generated by UCSC and collaborators worldwide. See the Credits page for a detailed list of the organizations and individuals who contributed to this release.

References

Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005 Sep 1;437(7055):69-87. PMID: 16136131

Huang X, Yang SP, Chinwalla AT, Hillier LW, Minx P, Mardis ER, Wilson RK. Application of a superword array in genome assembly. Nucleic Acids Res. 2006;34(1):201-5. PMID: 16397298; PMC: PMC1325203

Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784

McConkey EH. Orthologous numbering of great ape and human chromosomes is essential for comparative genomics. Cytogenet Genome Res. 2004;105(1):157-8. PMID: 15218271

Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961

GenBank Pipeline Details

For the purposes of the GenBank alignment pipeline, this assembly is considered to be: well-ordered.

Request:	Genome Browser Response:

chr22		Displays all of chromosome 22
chr2a:11,250,001-12,250,000		Displays a million bases of chromosome 2a, beginning at base 11,250,001. Note that chromosome 2 in this assembly has been split into two parts: 2a and 2b.
chr2a:11,250,001+2000		Displays a region of chr 2a that spans 2000 bases, starting at position 11,250,001

BRCA1		Displays a list of genomic regions where human RefSeq gene BRCA1 (or features associated with BRCA1) aligns
AF115459		Displays region of genome with mRNA with GenBank accession number AF115459
348		Displays the region of genome with Entrez Gene identifier 348

pseudogene mRNA		Lists transcribed pseudogenes, but not cDNAs
sialic acid		Lists mRNAs and RefSeqs with GenBank keywords sialic acid
huntington		Lists mRNAs associated with Huntington's disease
Paabo,S.		Lists mRNAs deposited by co-author S. Paabo

Use this last format for author queries. Although GenBank requires the search format Paabo S, internally it uses the format Paabo,S..