This directory contains FASTA files which contain a modified version of the Build 36 finished human genome assembly (hg18, Mar. 2006). The chromosomal sequences were assembled by the International Human Genome Project sequencing centers. The hg18/NCBI36 assembly was changed to use IUPAC ambiguous nucleotide characters at each base covered by a stringently filtered subset of single-base substitutions annotated by dbSNP build 130. For example, if the assembly has an 'A' at a position where dbSNP has annotated an A/C/T substitution SNP, the 'A' is replaced by 'H' in the FASTA file here. dbSNP single-base substitutions were excluded from masking in the following cases: - UCSC tagged the dbSNP item with any of these exceptions (see also hg18.snp130Exceptions and hg18.snp130ExceptionDesc database tables): - MultipleAlignments: dbSNP mapped item to multiple locations - ObservedMismatch: the reference allele does not appear in the item's observed alleles. - ObservedWrongFormat: the observed sequence has an unexpected format (no instances of this exception were found in snp130) - dbSNP item class is not "single". - dbSNP item length is not exactly one base. - dbSNP item weight is greater than 1. (lower weight = higher confidence) The remaining single-base substitutions were used to mask the genomic sequence. Files included in this directory: chr*.subst.fa.gz - FASTA files with IUPAC characters for substitution SNPs md5sum.txt - checksums of files in this directory ------------------------------------------------------------------ If you plan to download a large file or multiple files from this directory, we recommend that you use ftp rather than downloading the files via our website. To do so, ftp to hgdownload.cse.ucsc.edu [username: anonymous, password: your email address], then cd to the directory goldenPath/hg18/bigZips. To download multiple files, use the "mget" command: mget <filename1> <filename2> ... - or - mget -a (to download all the files in the directory) Alternate methods to ftp access. Using an rsync command to download the entire directory: rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/hg18/snp130Mask/ . For a single file, e.g. chr1.subst.fa.gz rsync -avzP \ rsync://hgdownload.cse.ucsc.edu/goldenPath/hg18/snp130Mask/chr1.subst.fa.gz . Or with wget, all files: wget --timestamping \ 'ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/snp130Mask/*' With wget, a single file: wget --timestamping \ 'ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/snp130Mask/chr1.subst.fa.gz' \ -O chr1.subst.fa.gz To uncompress the fa.gz files: gunzip <file>.fa.gz
Name Last modified Size Description
Parent Directory - md5sum.txt 2009-07-10 14:01 1.5K chrY.subst.fa.gz 2009-07-10 11:11 7.9M chrX.subst.fa.gz 2009-07-10 11:11 48M chrM.subst.fa.gz 2009-07-10 11:11 6.0K chr22_h2_hap1.subst.fa.gz 2009-07-10 11:10 21K chr22.subst.fa.gz 2009-07-10 11:10 11M chr21.subst.fa.gz 2009-07-10 11:10 11M chr20.subst.fa.gz 2009-07-10 11:10 19M chr19.subst.fa.gz 2009-07-10 11:09 17M chr18.subst.fa.gz 2009-07-10 11:09 24M chr17.subst.fa.gz 2009-07-10 11:09 24M chr16.subst.fa.gz 2009-07-10 11:09 25M chr15.subst.fa.gz 2009-07-10 11:09 26M chr14.subst.fa.gz 2009-07-10 11:09 28M chr13.subst.fa.gz 2009-07-10 11:09 31M chr12.subst.fa.gz 2009-07-10 11:09 41M chr11.subst.fa.gz 2009-07-10 11:09 42M chr10.subst.fa.gz 2009-07-10 11:08 42M chr9.subst.fa.gz 2009-07-10 11:11 38M chr8.subst.fa.gz 2009-07-10 11:11 46M chr7.subst.fa.gz 2009-07-10 11:11 49M chr6_qbl_hap2.subst.fa.gz 2009-07-10 11:11 1.3M chr6_cox_hap1.subst.fa.gz 2009-07-10 11:10 1.5M chr6.subst.fa.gz 2009-07-10 11:10 54M chr5_h2_hap1.subst.fa.gz 2009-07-10 11:10 550K chr5.subst.fa.gz 2009-07-10 11:10 57M chr4.subst.fa.gz 2009-07-10 11:10 60M chr3.subst.fa.gz 2009-07-10 11:10 62M chr2.subst.fa.gz 2009-07-10 11:09 76M chr1.subst.fa.gz 2009-07-10 11:08 72M