This directory contains applications for stand-alone use, built specifically for a Linux 64-bit machine. For help on the bigBed and bigWig applications see: http://genome.ucsc.edu/goldenPath/help/bigBed.html http://genome.ucsc.edu/goldenPath/help/bigWig.html View the file 'FOOTER.txt' to see the usage statement for each of the applications. ############################################################################## Thank you to Bob Harris for permission to distribute a binary version of the lastz and lastz_D programs, from: https://github.com/lastz/lastz Version 1.04.00 as of April 2018: -rwxrwxr-x 1 625283 Apr 6 11:15 lastz-1.04.00 -rwxrwxr-x 1 628835 Apr 6 11:15 lastz_D-1.04.00 $ md5sum lastz* 429e61ffdf1612b7f0f0c8c2095609a7 lastz-1.04.00 4f9a558a65c3a07d0f992cd39b3a27e1 lastz_D-1.04.00 ############################################################################## This entire directory can by copied with the rsync command into the local directory ./ rsync -aP rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/ ./ Or from our mirror site: rsync -aP rsync://hgdownload-sd.soe.ucsc.edu/genome/admin/exe/linux.x86_64/ ./ Individual programs can by copied by adding their name, for example: rsync -aP \ rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/faSize ./
Name Last modified Size Description
Parent Directory - FOOTER 2018-03-13 15:36 256K FOOTER.txt 2024-09-24 16:01 330K addCols 2024-09-24 16:00 5.2M ameme 2024-09-24 16:01 8.9M autoDtd 2024-09-24 16:00 5.3M autoSql 2024-09-24 16:00 5.3M autoXml 2024-09-24 16:00 5.3M ave 2024-09-24 16:00 5.2M aveCols 2024-09-24 16:00 5.2M axtChain 2024-09-24 16:01 5.7M axtSort 2024-09-24 16:01 5.3M axtSwap 2024-09-24 16:01 5.3M axtToMaf 2024-09-24 16:01 5.3M axtToPsl 2024-09-24 16:01 5.5M bamToPsl 2024-09-24 16:00 5.6M barChartMaxLimit 2024-09-24 15:59 1.0K bedClip 2024-09-24 16:00 9.1M bedCommonRegions 2024-09-24 16:00 5.2M bedCoverage 2024-09-24 16:01 35M bedExtendRanges 2024-09-24 16:01 34M bedGeneParts 2024-09-24 16:00 8.9M bedGraphPack 2024-09-24 16:00 5.2M bedGraphToBigWig 2024-09-24 16:00 9.1M bedIntersect 2024-09-24 16:00 5.2M bedItemOverlapCount 2024-09-24 16:00 35M bedJoinTabOffset 2024-09-24 16:00 5.2M bedJoinTabOffset.py 2024-09-24 15:59 4.3K bedMergeAdjacent 2024-09-24 16:01 8.9M bedPartition 2024-09-24 16:01 8.9M bedPileUps 2024-09-24 16:00 5.2M bedRemoveOverlap 2024-09-24 16:00 5.2M bedRestrictToPositions 2024-09-24 16:00 5.2M bedSort 2024-09-24 16:00 8.9M bedToBigBed 2024-09-24 16:00 9.2M bedToExons 2024-09-24 16:01 8.9M bedToGenePred 2024-09-24 16:00 35M bedToPsl 2024-09-24 16:01 8.9M bedWeedOverlapping 2024-09-24 16:01 8.9M bigBedInfo 2024-09-24 16:00 9.0M bigBedNamedItems 2024-09-24 16:00 9.0M bigBedSummary 2024-09-24 16:00 9.0M bigBedToBed 2024-09-24 16:00 9.1M bigChainBreaks 2024-09-24 16:01 9.1M bigChainToChain 2024-09-24 16:01 35M bigGenePredToGenePred 2024-09-24 16:01 35M bigGuessDb 2024-09-24 15:59 9.1K bigHeat 2024-09-24 15:59 17K bigMafToMaf 2024-09-24 16:01 9.0M bigPslToPsl 2024-09-24 16:01 9.1M bigWigAverageOverBed 2024-09-24 16:00 9.1M bigWigCat 2024-09-24 16:00 9.1M bigWigCluster 2024-09-24 16:00 5.3M bigWigCorrelate 2024-09-24 16:00 9.1M bigWigInfo 2024-09-24 16:00 5.3M bigWigMerge 2024-09-24 16:00 5.4M bigWigSummary 2024-09-24 16:00 5.3M bigWigToBedGraph 2024-09-24 16:00 5.3M bigWigToWig 2024-09-24 16:00 5.3M binFromRange 2024-09-24 16:01 35M blastToPsl 2024-09-24 16:00 5.5M blastXmlToPsl 2024-09-24 16:00 5.7M blat/ 2024-09-24 16:29 - blatHuge 2024-09-24 16:01 6.3M calc 2024-09-24 16:00 5.2M catDir 2024-09-24 16:00 5.2M catUncomment 2024-09-24 16:00 5.2M chainAntiRepeat 2024-09-24 16:01 5.4M chainBridge 2024-09-24 16:01 5.5M chainCleaner 2024-09-24 16:01 5.6M chainFilter 2024-09-24 16:01 5.2M chainMergeSort 2024-09-24 16:01 5.3M chainNet 2024-09-24 16:01 5.3M chainPreNet 2024-09-24 16:01 5.2M chainScore 2024-09-24 16:01 5.5M chainSort 2024-09-24 16:01 5.2M chainSplit 2024-09-24 16:01 5.2M chainStitchId 2024-09-24 16:01 5.2M chainSwap 2024-09-24 16:01 5.2M chainToAxt 2024-09-24 16:01 5.4M chainToBigChain 2024-09-24 16:01 5.3M chainToPsl 2024-09-24 16:01 5.4M chainToPslBasic 2024-09-24 16:01 5.5M checkAgpAndFa 2024-09-24 16:00 5.4M checkCoverageGaps 2024-09-24 16:00 35M checkHgFindSpec 2024-09-24 16:00 35M checkTableCoords 2024-09-24 16:00 35M chopFaLines 2024-09-24 16:00 5.2M chromGraphFromBin 2024-09-24 16:01 35M chromGraphToBin 2024-09-24 16:01 35M chromToUcsc 2024-09-24 15:59 11K clusterGenes 2024-09-24 16:00 35M clusterMatrixToBarChartBed 2024-09-24 16:00 5.2M colTransform 2024-09-24 16:00 5.2M countChars 2024-09-24 16:00 5.2M cpg_lh 2024-09-24 16:00 27K crTreeIndexBed 2024-09-24 16:01 5.3M crTreeSearchBed 2024-09-24 16:01 5.3M dbDbToHubTxt 2024-09-24 16:01 35M dbSnoop 2024-09-24 16:01 5.8M dbTrash 2024-09-24 16:00 35M endsInLf 2024-09-24 16:00 5.2M estOrient 2024-09-24 16:00 35M expMatrixToBarchartBed 2024-09-24 15:59 16K faAlign 2024-09-24 16:00 5.3M faCmp 2024-09-24 16:00 5.3M faCount 2024-09-24 16:00 5.3M faFilter 2024-09-24 16:00 5.3M faFilterN 2024-09-24 16:00 5.5M faFrag 2024-09-24 16:00 5.3M faNoise 2024-09-24 16:00 5.3M faOneRecord 2024-09-24 16:00 5.2M faPolyASizes 2024-09-24 16:00 5.3M faRandomize 2024-09-24 16:00 5.3M faRc 2024-09-24 16:00 5.2M faSize 2024-09-24 16:00 5.3M faSomeRecords 2024-09-24 16:00 5.2M faSplit 2024-09-24 16:00 5.3M faToFastq 2024-09-24 16:00 5.3M faToTab 2024-09-24 16:00 5.3M faToTwoBit 2024-09-24 16:00 5.3M faToVcf 2024-09-24 16:01 5.3M faTrans 2024-09-24 16:00 5.3M fastqStatsAndSubsample 2024-09-24 16:00 5.3M fastqToFa 2024-09-24 16:00 5.2M featureBits 2024-09-24 16:00 35M fetchChromSizes 2024-09-24 16:01 3.0K findMotif 2024-09-24 16:00 5.4M fixStepToBedGraph.pl 2024-09-24 16:01 1.4K fixTrackDb 2024-09-24 16:01 35M gapToLift 2024-09-24 16:01 35M gencodeVersionForGenes 2024-09-24 16:01 5.3M genePredCheck 2024-09-24 16:01 35M genePredFilter 2024-09-24 16:01 35M genePredHisto 2024-09-24 16:00 35M genePredSingleCover 2024-09-24 16:00 35M genePredToBed 2024-09-24 16:00 35M genePredToBigGenePred 2024-09-24 16:01 35M genePredToFakePsl 2024-09-24 16:00 35M genePredToGtf 2024-09-24 16:01 35M genePredToMafFrames 2024-09-24 16:00 35M genePredToProt 2024-09-24 16:01 35M gensub2 2024-09-24 16:00 5.2M getRna 2024-09-24 16:00 35M getRnaPred 2024-09-24 16:00 35M gfServerHuge 2024-09-24 16:01 5.6M gff3ToGenePred 2024-09-24 16:01 34M gff3ToPsl 2024-09-24 16:01 5.5M gmtime 2024-09-24 16:00 12K gtfToGenePred 2024-09-24 16:01 34M headRest 2024-09-24 16:00 5.2M hgBbiDbLink 2024-09-24 16:01 5.7M hgFakeAgp 2024-09-24 16:01 5.3M hgFindSpec 2024-09-24 16:01 35M hgGcPercent 2024-09-24 16:01 34M hgGoldGapGl 2024-09-24 16:01 34M hgLoadBed 2024-09-24 16:01 34M hgLoadChain 2024-09-24 16:01 35M hgLoadGap 2024-09-24 16:01 34M hgLoadMaf 2024-09-24 16:01 34M hgLoadMafSummary 2024-09-24 16:01 34M hgLoadNet 2024-09-24 16:01 35M hgLoadOut 2024-09-24 16:01 35M hgLoadOutJoined 2024-09-24 16:01 35M hgLoadSqlTab 2024-09-24 16:01 5.8M hgLoadWiggle 2024-09-24 16:01 34M hgSpeciesRna 2024-09-24 16:00 35M hgTrackDb 2024-09-24 16:01 35M hgWiggle 2024-09-24 16:01 34M hgsql 2024-09-24 16:00 5.7M hgsqldump 2024-09-24 16:00 5.7M hgvsToVcf 2024-09-24 16:01 35M hicInfo 2024-09-24 16:01 35M htmlCheck 2024-09-24 16:00 5.2M hubCheck 2024-09-24 16:01 34M hubClone 2024-09-24 16:01 35M hubPublicCheck 2024-09-24 16:01 35M ixIxx 2024-09-24 16:01 5.3M lastz-1.04.00 2018-04-06 11:15 611K lastz_D-1.04.00 2018-04-06 11:15 614K lavToAxt 2024-09-24 16:01 5.4M lavToPsl 2024-09-24 16:01 9.0M ldHgGene 2024-09-24 16:01 35M liftOver 2024-09-24 16:00 34M liftOverMerge 2024-09-24 16:00 8.9M liftUp 2024-09-24 16:00 35M linesToRa 2024-09-24 16:00 5.2M localtime 2024-09-24 16:00 12K mafAddIRows 2024-09-24 16:01 5.4M mafAddQRows 2024-09-24 16:01 5.3M mafCoverage 2024-09-24 16:01 35M mafFetch 2024-09-24 16:01 35M mafFilter 2024-09-24 16:01 5.3M mafFrag 2024-09-24 16:01 35M mafFrags 2024-09-24 16:01 35M mafGene 2024-09-24 16:01 35M mafMeFirst 2024-09-24 16:01 5.3M mafNoAlign 2024-09-24 16:01 5.3M mafOrder 2024-09-24 16:01 5.3M mafRanges 2024-09-24 16:01 5.3M mafSpeciesList 2024-09-24 16:01 5.3M mafSpeciesSubset 2024-09-24 16:00 5.3M mafSplit 2024-09-24 16:01 8.9M mafSplitPos 2024-09-24 16:01 35M mafToAxt 2024-09-24 16:01 5.3M mafToBigMaf 2024-09-24 16:01 5.3M mafToPsl 2024-09-24 16:01 5.5M mafToSnpBed 2024-09-24 16:00 35M mafsInRegion 2024-09-24 16:00 8.9M makeTableList 2024-09-24 16:01 34M maskOutFa 2024-09-24 16:00 8.9M matrixClusterColumns 2024-09-24 16:00 5.3M matrixMarketToTsv 2024-09-24 16:00 5.2M matrixNormalize 2024-09-24 16:00 5.3M matrixToBarChartBed 2024-09-24 16:00 5.2M mktime 2024-09-24 16:00 12K mrnaToGene 2024-09-24 16:00 35M multiz/ 2024-09-30 09:44 - netChainSubset 2024-09-24 16:01 5.3M netClass 2024-09-24 16:01 35M netFilter 2024-09-24 16:01 5.2M netSplit 2024-09-24 16:01 5.2M netSyntenic 2024-09-24 16:01 5.3M netToAxt 2024-09-24 16:01 5.4M netToBed 2024-09-24 16:01 5.2M newProg 2024-09-24 16:00 5.2M newPythonProg 2024-09-24 16:00 5.2M nibFrag 2024-09-24 16:00 5.3M nibSize 2024-09-24 16:00 5.2M oligoMatch 2024-09-24 16:01 9.0M overlapSelect 2024-09-24 16:01 34M para 2024-09-24 16:00 5.5M paraFetch 2024-09-24 16:00 5.2M paraHub 2024-09-24 16:00 5.5M paraHubStop 2024-09-24 16:00 5.3M paraNode 2024-09-24 16:00 5.3M paraNodeStart 2024-09-24 16:00 5.2M paraNodeStatus 2024-09-24 16:00 5.3M paraNodeStop 2024-09-24 16:00 5.3M paraSync 2024-09-24 16:00 5.2M paraTestJob 2024-09-24 16:00 5.2M parasol 2024-09-24 16:00 5.4M positionalTblCheck 2024-09-24 16:01 35M pslCDnaFilter 2024-09-24 16:01 34M pslCat 2024-09-24 16:01 5.4M pslCheck 2024-09-24 16:01 35M pslDropOverlap 2024-09-24 16:01 5.4M pslFilter 2024-09-24 16:01 5.4M pslHisto 2024-09-24 16:01 5.5M pslLiftSubrangeBlat 2024-09-24 16:01 35M pslMap 2024-09-24 16:00 5.5M pslMapPostChain 2024-09-24 16:00 5.4M pslMrnaCover 2024-09-24 16:01 5.5M pslPairs 2024-09-24 16:01 5.5M pslPartition 2024-09-24 16:01 5.4M pslPosTarget 2024-09-24 16:00 5.4M pslPretty 2024-09-24 16:01 5.7M pslProtToRnaCoords 2024-09-24 16:00 5.5M pslRc 2024-09-24 16:00 5.4M pslRecalcMatch 2024-09-24 16:01 5.6M pslRemoveFrameShifts 2024-09-24 16:00 5.4M pslReps 2024-09-24 16:01 5.5M pslScore 2024-09-24 16:00 5.4M pslSelect 2024-09-24 16:01 5.4M pslSomeRecords 2024-09-24 16:01 5.4M pslSort 2024-09-24 16:01 5.5M pslSortAcc 2024-09-24 16:01 5.4M pslSpliceJunctions 2024-09-24 16:00 5.5M pslSplitOnTarget 2024-09-24 16:01 5.4M pslStats 2024-09-24 16:01 5.5M pslSwap 2024-09-24 16:00 5.4M pslToBed 2024-09-24 16:01 8.9M pslToBigPsl 2024-09-24 16:01 35M pslToChain 2024-09-24 16:01 5.5M pslToPslx 2024-09-24 16:00 5.6M pslxToFa 2024-09-24 16:01 5.4M qaToQac 2024-09-24 16:01 5.3M qacAgpLift 2024-09-24 16:01 5.4M qacToQa 2024-09-24 16:01 5.3M qacToWig 2024-09-24 16:01 5.3M raSqlQuery 2024-09-24 16:01 35M raToLines 2024-09-24 16:00 5.2M raToTab 2024-09-24 16:00 5.2M randomLines 2024-09-24 16:00 5.2M rmFaDups 2024-09-24 16:00 5.2M rmskAlignToPsl 2024-09-24 16:01 5.5M rowsToCols 2024-09-24 16:00 5.2M sizeof 2024-09-24 16:00 12K spacedToTab 2024-09-24 16:00 5.2M splitFile 2024-09-24 16:00 5.2M splitFileByColumn 2024-09-24 16:00 5.2M sqlToXml 2024-09-24 16:01 5.8M strexCalc 2024-09-24 16:00 5.3M stringify 2024-09-24 16:00 5.2M subChar 2024-09-24 16:00 5.2M subColumn 2024-09-24 16:00 5.2M tabQuery 2017-05-23 15:11 4.1M tabToTabDir 2024-09-24 16:01 5.4M tailLines 2024-09-24 16:00 5.2M tdbQuery 2024-09-24 16:01 34M tdbRename 2024-09-24 15:59 3.1K tdbSort 2024-09-24 15:59 4.2K textHistogram 2024-09-24 16:00 5.2M tickToDate 2024-09-24 16:00 5.2M toLower 2024-09-24 16:00 5.2M toUpper 2024-09-24 16:00 5.2M trackDbIndexBb 2024-09-24 15:59 19K transMapPslToGenePred 2024-09-24 16:01 35M trfBig 2024-09-24 16:01 5.3M twoBitDup 2024-09-24 16:00 5.3M twoBitInfo 2024-09-24 16:00 5.3M twoBitMask 2024-09-24 16:01 9.0M twoBitToFa 2024-09-24 16:00 9.0M ucscApiClient 2024-09-24 15:59 4.8K udr 2016-05-18 16:21 2.7M vai.pl 2024-09-24 15:59 12K validateFiles 2024-09-24 16:00 35M validateManifest 2024-09-24 16:00 5.3M varStepToBedGraph.pl 2024-09-24 16:01 1.7K vcfToBed 2024-09-24 16:01 5.4M webSync 2024-09-24 15:59 10K wigCorrelate 2024-09-24 16:00 9.2M wigEncode 2024-09-24 16:01 5.2M wigToBigWig 2024-09-24 16:00 9.2M wordLine 2024-09-24 16:00 5.2M xmlCat 2024-09-24 16:01 5.2M xmlToSql 2024-09-24 16:01 5.3M
================================================================ to download all of the files from one of these admin/exe/ directories, for example: admin/exe/linux.x86_64/ using the rsync command to your current directory: rsync -aP rsync://hgdownload.cse.ucsc.edu/genome/admin/exe/linux.x86_64/ ./ ================================================================ ======== addCols ==================================== ================================================================ ### kent source version 362 ### addCols - Sum columns in a text file. usage: addCols <fileName> adds all columns in the given file, outputs the sum of each column. <fileName> can be the name: stdin to accept input from stdin. Options: -maxCols=N - maximum number of colums (defaults to 16) ================================================================ ======== ameme ==================================== ================================================================ ameme - find common patterns in DNA usage ameme good=goodIn.fa [bad=badIn.fa] [numMotifs=2] [background=m1] [maxOcc=2] [motifOutput=fileName] [html=output.html] [gif=output.gif] [rcToo=on] [controlRun=on] [startScanLimit=20] [outputLogo] [constrainer=1] where goodIn.fa is a multi-sequence fa file containing instances of the motif you want to find, badIn.fa is a file containing similar sequences but lacking the motif, numMotifs is the number of motifs to scan for, background is m0,m1, or m2 for various levels of Markov models, maxOcc is the maximum occurrences of the motif you expect to find in a single sequence and motifOutput is the name of a file to store just the motifs in. rcToo=on searches both strands. If you include controlRun=on in the command line, a random set of sequences will be generated that match your foreground data set in size, and your background data set in nucleotide probabilities. The program will then look for motifs in this random set. If the scores you get in a real run are about the same as those you get in a control run, then the motifs Improbizer has found are probably not significant. ================================================================ ======== autoDtd ==================================== ================================================================ ### kent source version 362 ### autoDtd - Give this a XML document to look at and it will come up with a DTD to describe it. usage: autoDtd in.xml out.dtd out.stats options: -tree=out.tree - Output tag tree. -atree=out.atree - Output attributed tag tree. ================================================================ ======== autoSql ==================================== ================================================================ ### kent source version 362 ### autoSql - create SQL and C code for permanently storing a structure in database and loading it back into memory based on a specification file usage: autoSql specFile outRoot {optional: -dbLink -withNull -json} This will create outRoot.sql outRoot.c and outRoot.h based on the contents of specFile. options: -dbLink - optionally generates code to execute queries and updates of the table. -addBin - Add an initial bin field and index it as (chrom,bin) -withNull - optionally generates code and .sql to enable applications to accept and load data into objects with potential 'missing data' (NULL in SQL) situations. -defaultZeros - will put zero and or empty string as default value -django - generate method to output object as django model Python code -json - generate method to output the object in JSON (JavaScript) format. ================================================================ ======== autoXml ==================================== ================================================================ autoXml - Generate structures code and parser for XML file from DTD-like spec usage: autoXml file.dtdx root This will generate root.c, root.h options: -textField=xxx what to name text between start/end tags. Default 'text' -comment=xxx Comment to appear at top of generated code files -picky Generate parser that rejects stuff it doesn't understand -main Put in a main routine that's a test harness -prefix=xxx Prefix to add to structure names. By default same as root -positive Don't write out optional attributes with negative values ================================================================ ======== ave ==================================== ================================================================ ave - Compute average and basic stats usage: ave file options: -col=N Which column to use. Default 1 -tableOut - output by columns (default output in rows) -noQuartiles - only calculate min,max,mean,standard deviation - for large data sets that will not fit in memory. ================================================================ ======== aveCols ==================================== ================================================================ aveCols - average together columns usage: aveCols file adds all columns (up to 16 columns) in the given file, outputs the average (sum/#ofRows) of each column. <fileName> can be the name: stdin to accept input from stdin. ================================================================ ======== axtChain ==================================== ================================================================ axtChain - Chain together axt alignments. usage: axtChain [options] -linearGap=loose in.axt tNibDir qNibDir out.chain Where tNibDir/qNibDir are either directories full of nib files, the name of a .2bit file, or a single fasta file with additional -faQ or -faT options. options: -psl Use psl instead of axt format for input -faQ The specified qNibDir is a fasta file with multiple sequences for query -faT The specified tNibDir is a fasta file with multiple sequences for target NOTE: will not work with gzipped fasta files -minScore=N Minimum score for chain, default 1000 -details=fileName Output some additional chain details -scoreScheme=fileName Read the scoring matrix from a blastz-format file -linearGap=<medium|loose|filename> Specify type of linearGap to use. *Must* specify this argument to one of these choices. loose is chicken/human linear gap costs. medium is mouse/human linear gap costs. Or specify a piecewise linearGap tab delimited file. sample linearGap file (loose) tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 ================================================================ ======== axtSort ==================================== ================================================================ axtSort - Sort axt files usage: axtSort in.axt out.axt options: -query - Sort by query position, not target -byScore - Sort by score ================================================================ ======== axtSwap ==================================== ================================================================ axtSwap - Swap source and query in an axt file usage: axtSwap source.axt target.sizes query.sizes dest.axt options: -xxx=XXX ================================================================ ======== axtToMaf ==================================== ================================================================ ### kent source version 362 ### axtToMaf - Convert from axt to maf format usage: axtToMaf in.axt tSizes qSizes out.maf Where tSizes and qSizes is a file that contains the sizes of the target and query sequences. Very often this with be a chrom.sizes file Options: -qPrefix=XX. - add XX. to start of query sequence name in maf -tPrefex=YY. - add YY. to start of target sequence name in maf -tSplit Create a separate maf file for each target sequence. In this case output is a dir rather than a file In this case in.maf must be sorted by target. -score - recalculate score -scoreZero - recalculate score if zero ================================================================ ======== axtToPsl ==================================== ================================================================ axtToPsl - Convert axt to psl format usage: axtToPsl in.axt tSizes qSizes out.psl Where tSizes and qSizes are tab-delimited files with <seqName><size> columns. options: -xxx=XXX ================================================================ ======== bamToPsl ==================================== ================================================================ ### kent source version 362 ### bamToPsl - Convert a bam file to a psl and optionally also a fasta file that contains the reads. usage: bamToPsl [options] in.bam out.psl options: -fasta=output.fa - output query sequences to specified file -chromAlias=file - specify a two-column file: 1: alias, 2: other name for target name translation from column 1 name to column 2 name names not found are passed through intact -nohead - do not output the PSL header, default has header output -allowDups - for fasta output, allow duplicate query sequences output - default is to eliminate duplicate sequences - runs much faster without the duplicate check -noSequenceVerify - when checking for dups, do not verify each sequence - when the same name is identical, assume they are - helps speed up the dup check but not thorough -dots=N - output progress dot(.) every N alignments processed note: a chromAlias file can be obtained from a UCSC database, e.g.: hgsql -N -e 'select alias,chrom from chromAlias;' hg38 > hg38.chromAlias.tab ================================================================ ======== bedClip ==================================== ================================================================ ### kent source version 362 ### bedClip - Remove lines from bed file that refer to off-chromosome locations. usage: bedClip [options] input.bed chrom.sizes output.bed chrom.sizes is a two-column file/URL: <chromosome name> <size in bases> If the assembly <db> is hosted by UCSC, chrom.sizes can be a URL like http://hgdownload.cse.ucsc.edu/goldenPath/<db>/bigZips/<db>.chrom.sizes or you may use the script fetchChromSizes to download the chrom.sizes file. If not hosted by UCSC, a chrom.sizes file can be generated by running twoBitInfo on the assembly .2bit file. options: -truncate - truncate items that span ends of chrom instead of the default of dropping the items -verbose=2 - set to get list of lines clipped and why ================================================================ ======== bedCommonRegions ==================================== ================================================================ ### kent source version 362 ### bedCommonRegions - Create a bed file (just bed3) that contains the regions common to all inputs. Regions are common only if exactly the same chromosome, starts, and end. Overlap is not enough. Each region must be in each input at most once. Output is stdout. usage: bedCommonRegions file1 file2 file3 ... fileN ================================================================ ======== bedCoverage ==================================== ================================================================ bedCoverage - Analyse coverage by bed files - chromosome by chromosome and genome-wide. usage: bedCoverage database bedFile Note bed file must be sorted by chromosome -restrict=restrict.bed Restrict to parts in restrict.bed ================================================================ ======== bedExtendRanges ==================================== ================================================================ ### kent source version 362 ### bedExtendRanges - extend length of entries in bed 6+ data to be at least the given length, taking strand directionality into account. usage: bedExtendRanges database length files(s) options: -host mysql host -user mysql user -password mysql password -tab Separate by tabs rather than space -verbose=N - verbose level for extra information to STDERR example: bedExtendRanges hg18 250 stdin bedExtendRanges -user=genome -host=genome-mysql.cse.ucsc.edu hg18 250 stdin will transform: chr1 500 525 . 100 + chr1 1000 1025 . 100 - to: chr1 500 750 . 100 + chr1 775 1025 . 100 - ================================================================ ======== bedGeneParts ==================================== ================================================================ ### kent source version 362 ### bedGeneParts - Given a bed, spit out promoter, first exon, or all introns. usage: bedGeneParts part in.bed out.bed Where part is either 'exons' or 'firstExon' or 'introns' or 'promoter' or 'firstCodingSplice' or 'secondCodingSplice' options: -proStart=NN - start of promoter relative to txStart, default -100 -proEnd=NN - end of promoter relative to txStart, default 50 ================================================================ ======== bedGraphPack ==================================== ================================================================ ### kent source version 362 ### bedGraphPack v1 - Pack together adjacent records representing same value. usage: bedGraphPack in.bedGraph out.bedGraph The input needs to be sorted by chrom and this is checked. To put in a pipe use stdin and stdout in the command line in place of file names. ================================================================ ======== bedGraphToBigWig ==================================== ================================================================ ### kent source version 362 ### bedGraphToBigWig v 4 - Convert a bedGraph file to bigWig format. usage: bedGraphToBigWig in.bedGraph chrom.sizes out.bw where in.bedGraph is a four column file in the format: <chrom> <start> <end> <value> and chrom.sizes is a two-column file/URL: <chromosome name> <size in bases> and out.bw is the output indexed big wig file. If the assembly <db> is hosted by UCSC, chrom.sizes can be a URL like http://hgdownload.cse.ucsc.edu/goldenPath/<db>/bigZips/<db>.chrom.sizes or you may use the script fetchChromSizes to download the chrom.sizes file. If not hosted by UCSC, a chrom.sizes file can be generated by running twoBitInfo on the assembly .2bit file. The input bedGraph file must be sorted, use the unix sort command: sort -k1,1 -k2,2n unsorted.bedGraph > sorted.bedGraph options: -blockSize=N - Number of items to bundle in r-tree. Default 256 -itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024 -unc - If set, do not use compression. ================================================================ ======== bedIntersect ==================================== ================================================================ ### kent source version 362 ### bedIntersect - Intersect two bed files usage: bed columns four(name) and five(score) are optional bedIntersect a.bed b.bed output.bed options: -aHitAny output all of a if any of it is hit by b -minCoverage=0.N min coverage of b to output match (or if -aHitAny, of a). Not applied to 0-length items. Default 0.000010 -bScore output score from b.bed (must be at least 5 field bed) -tab chop input at tabs not spaces -allowStartEqualEnd Don't discard 0-length items of a or b (e.g. point insertions) ================================================================ ======== bedItemOverlapCount ==================================== ================================================================ ### kent source version 362 ### bedItemOverlapCount - count number of times a base is overlapped by the items in a bed file. Output is bedGraph 4 to stdout. usage: sort bedFile.bed | bedItemOverlapCount [options] <database> stdin To create a bigWig file from this data to use in a custom track: sort -k1,1 bedFile.bed | bedItemOverlapCount [options] <database> stdin \ > bedFile.bedGraph bedGraphToBigWig bedFile.bedGraph chrom.sizes bedFile.bw where the chrom.sizes is obtained with the script: fetchChromSizes See also: http://genome-test.cse.ucsc.edu/~kent/src/unzipped/utils/userApps/fetchChromSizes options: -zero add blocks with zero count, normally these are ommitted -bed12 expect bed12 and count based on blocks Without this option, only the first three fields are used. -max if counts per base overflows set to max (4294967295) instead of exiting -outBounds output min/max to stderr -chromSize=sizefile Read chrom sizes from file instead of database sizefile contains two white space separated fields per line: chrom name and size -host=hostname mysql host used to get chrom sizes -user=username mysql user -password=password mysql password Notes: * You may want to separate your + and - strand items before sending into this program as it only looks at the chrom, start and end columns of the bed file. * Program requires a <database> connection to lookup chrom sizes for a sanity check of the incoming data. Even when the -chromSize argument is used the <database> must be present, but it will not be used. * The bed file *must* be sorted by chrom * Maximum count per base is 4294967295. Recompile with new unitSize to increase this ================================================================ ======== bedJoinTabOffset ==================================== ================================================================ Usage: bedJoinTabOffset [options] inTabFile inBedFile outBedFile - given a bed file and tab file where each have a column with matching values: first get the value of column0, the offset and line length from inTabFile. Then go over the bed file, use the name field and append its offset and length to the bed file as two separate fields. Write the new bed file to outBed. Options: -h, --help show this help message and exit -d, --debug show debug messages -t TABKEYFIELD, --tabKeyField=TABKEYFIELD the index of the key field in the tab file that matches the key field in the bed file. default 0 -b BEDKEYFIELD, --bedKeyField=BEDKEYFIELD the index of the key field in the bed file that matches the key field in the tab file. default 3 ================================================================ ======== bedPileUps ==================================== ================================================================ ### kent source version 362 ### bedPileUps - Find (exact) overlaps if any in bed input usage: bedPileUps in.bed Where in.bed is in one of the ascii bed formats. The in.bed file must be sorted by chromosome,start, to sort a bed file, use the unix sort command: sort -k1,1 -k2,2n unsorted.bed > sorted.bed Options: -name - include BED name field 4 when evaluating uniqueness -tab - use tabs to parse fields -verbose=2 - show the location and size of each pileUp ================================================================ ======== bedRemoveOverlap ==================================== ================================================================ ### kent source version 362 ### bedRemoveOverlap - Remove overlapping records from a (sorted) bed file. Gets rid of `the smaller of overlapping records. usage: bedRemoveOverlap in.bed out.bed options: -xxx=XXX ================================================================ ======== bedRestrictToPositions ==================================== ================================================================ ### kent source version 362 ### bedRestrictToPositions - Filter bed file, restricting to only ones that match chrom/start/ends specified in restrict.bed file. usage: bedRestrictToPositions in.bed restrict.bed out.bed options: -xxx=XXX ================================================================ ======== bedSort ==================================== ================================================================ bedSort - Sort a .bed file by chrom,chromStart usage: bedSort in.bed out.bed in.bed and out.bed may be the same. ================================================================ ======== bedToBigBed ==================================== ================================================================ ### kent source version 362 ### bedToBigBed v. 2.7 - Convert bed file to bigBed. (BigBed version: 4) usage: bedToBigBed in.bed chrom.sizes out.bb Where in.bed is in one of the ascii bed formats, but not including track lines and chrom.sizes is a two-column file/URL: <chromosome name> <size in bases> and out.bb is the output indexed big bed file. If the assembly <db> is hosted by UCSC, chrom.sizes can be a URL like http://hgdownload.cse.ucsc.edu/goldenPath/<db>/bigZips/<db>.chrom.sizes or you may use the script fetchChromSizes to download the chrom.sizes file. If not hosted by UCSC, a chrom.sizes file can be generated by running twoBitInfo on the assembly .2bit file. The in.bed file must be sorted by chromosome,start, to sort a bed file, use the unix sort command: sort -k1,1 -k2,2n unsorted.bed > sorted.bed Sorting must be set to skip Unicode mapping (LC_COLLATE=C). options: -type=bedN[+[P]] : N is between 3 and 15, optional (+) if extra "bedPlus" fields, optional P specifies the number of extra fields. Not required, but preferred. Examples: -type=bed6 or -type=bed6+ or -type=bed6+3 (see http://genome.ucsc.edu/FAQ/FAQformat.html#format1) -as=fields.as - If you have non-standard "bedPlus" fields, it's great to put a definition of each field in a row in AutoSql format here. -blockSize=N - Number of items to bundle in r-tree. Default 256 -itemsPerSlot=N - Number of data points bundled at lowest level. Default 512 -unc - If set, do not use compression. -tab - If set, expect fields to be tab separated, normally expects white space separator. -extraIndex=fieldList - If set, make an index on each field in a comma separated list extraIndex=name and extraIndex=name,id are commonly used. -sizesIs2Bit -- If set, the chrom.sizes file is assumed to be a 2bit file. -udcDir=/path/to/udcCacheDir -- sets the UDC cache dir for caching of remote files. ================================================================ ======== bedToExons ==================================== ================================================================ ### kent source version 362 ### bedToExons - Split a bed up into individual beds. One for each internal exon. usage: bedToExons originalBeds.bed splitBeds.bed options: -cdsOnly - Only output the coding portions of exons. ================================================================ ======== bedToGenePred ==================================== ================================================================ ### kent source version 362 ### Too few arguments: bedToGenePred - convert bed format files to genePred format usage: bedToGenePred bedFile genePredFile Convert a bed file to a genePred file. If BED has at least 12 columns, then a genePred with blocks is created. Otherwise single-exon genePreds are created. ================================================================ ======== bedToPsl ==================================== ================================================================ ### kent source version 362 ### Too few arguments: bedToPsl - convert bed format files to psl format usage: bedToPsl chromSizes bedFile pslFile Convert a BED file to a PSL file. This the result is an alignment. It is intended to allow processing by tools that operate on PSL. If the BED has at least 12 columns, then a PSL with blocks is created. Otherwise single-exon PSLs are created. Options: -keepQuery - instead of creating a fake query, create PSL with identical query and target specs. Useful if bed features are to be lifted with pslMap and one wants to keep the source location in the lift result. ================================================================ ======== bedWeedOverlapping ==================================== ================================================================ ### kent source version 362 ### bedWeedOverlapping - Filter out beds that overlap a 'weed.bed' file. usage: bedWeedOverlapping weeds.bed input.bed output.bed options: -maxOverlap=0.N - maximum overlapping ratio, default 0 (any overlap) -invert - keep the overlapping and get rid of everything else ================================================================ ======== bigBedInfo ==================================== ================================================================ ### kent source version 362 ### bigBedInfo - Show information about a bigBed file. usage: bigBedInfo file.bb options: -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs -chroms - list all chromosomes and their sizes -zooms - list all zoom levels and their sizes -as - get autoSql spec -extraIndex - list all the extra indexes ================================================================ ======== bigBedNamedItems ==================================== ================================================================ ### kent source version 362 ### bigBedNamedItems - Extract item of given name from bigBed usage: bigBedNamedItems file.bb name output.bed options: -nameFile - if set, treat name parameter as file full of space delimited names -field=fieldName - use index on field name, default is "name" ================================================================ ======== bigBedSummary ==================================== ================================================================ ### kent source version 362 ### bigBedSummary - Extract summary information from a bigBed file. usage: bigBedSummary file.bb chrom start end dataPoints Get summary data from bigBed for indicated region, broken into dataPoints equal parts. (Use dataPoints=1 for simple summary.) options: -type=X where X is one of: coverage - % of region that is covered (default) mean - average depth of covered regions min - minimum depth of covered regions max - maximum depth of covered regions -fields - print out information on fields in file. If fields option is used, the chrom, start, end, dataPoints parameters may be omitted -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs ================================================================ ======== bigBedToBed ==================================== ================================================================ ### kent source version 362 ### bigBedToBed v1 - Convert from bigBed to ascii bed format. usage: bigBedToBed input.bb output.bed options: -chrom=chr1 - if set restrict output to given chromosome -start=N - if set, restrict output to only that over start -end=N - if set, restict output to only that under end -maxItems=N - if set, restrict output to first N items -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs ================================================================ ======== bigMafToMaf ==================================== ================================================================ ### kent source version 362 ### bigMafToMaf - convert bigMaf to maf file usage: bigMafToMaf bigMaf.bb file.maf options: -xxx=XXX ================================================================ ======== bigPslToPsl ==================================== ================================================================ ### kent source version 362 ### bigPslToPsl - convert bigPsl file to psl usage: bigPslToPsl bigPsl.bb output.psl options: -collapseStrand if target strand is '+', don't output it ================================================================ ======== bigWigAverageOverBed ==================================== ================================================================ ### kent source version 362 ### bigWigAverageOverBed v2 - Compute average score of big wig over each bed, which may have introns. usage: bigWigAverageOverBed in.bw in.bed out.tab The output columns are: name - name field from bed, which should be unique size - size of bed (sum of exon sizes covered - # bases within exons covered by bigWig sum - sum of values over all bases covered mean0 - average over bases with non-covered bases counting as zeroes mean - average over just covered bases Options: -stats=stats.ra - Output a collection of overall statistics to stat.ra file -bedOut=out.bed - Make output bed that is echo of input bed but with mean column appended -sampleAroundCenter=N - Take sample at region N bases wide centered around bed item, rather than the usual sample in the bed item. -minMax - include two additional columns containing the min and max observed in the area. ================================================================ ======== bigWigCat ==================================== ================================================================ ### kent source version 362 ### bigWigCat v 4 - merge non-overlapping bigWig files directly into bigWig format usage: bigWigCat out.bw in1.bw in2.bw ... Where in*.bw is in big wig format and out.bw is the output indexed big wig file. options: -itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024 Note: must use wigToBigWig -fixedSummaries -keepAllChromosomes (perhaps in parallel cluster jobs) to create the input files. Note: By non-overlapping we mean the entire span of each file, from first data point to last data point, must not overlap with that of other files. ================================================================ ======== bigWigCluster ==================================== ================================================================ ### kent source version 362 ### bigWigCluster - Cluster bigWigs using a hacTree usage: bigWigCluster input.list chrom.sizes output.json output.tab where: input.list is a list of bigWig file names chrom.sizes is tab separated <chrom><size> for assembly for bigWigs output.json is json formatted output suitable for graphing with D3 output.tab is tab-separated file of of items ordered by tree with the fields label - label from -labels option or from file name with no dir or extention pos - number from 0-1 representing position according to tree and distance red - number from 0-255 representing recommended red component of color green - number from 0-255 representing recommended green component of color blue - number from 0-255 representing recommended blue component of color path - file name from input.list including directory and extension options: -labels=fileName - label files from tabSeparated file with fields path - path to bigWig file label - a string with no tabs -precalc=precalc.tab - tab separated file with <file1> <file2> <distance> columns. -threads=N - number of threads to use, default 10 -tmpDir=/tmp/path - place to put temp files, default current dir ================================================================ ======== bigWigCorrelate ==================================== ================================================================ ### kent source version 362 ### bigWigCorrelate - Correlate bigWig files, optionally only on target regions. usage: bigWigCorrelate a.bigWig b.bigWig or bigWigCorrelate listOfFiles options: -restrict=restrict.bigBed - restrict correlation to parts covered by this file -threshold=N.N - clip values to this threshold -rootNames - if set just report the root (minus directory and suffix) of file names when using listOfFiles -ignoreMissing - if set do not correlate where either side is missing data Normally missing data is treated as zeros ================================================================ ======== bigWigInfo ==================================== ================================================================ ### kent source version 362 ### bigWigInfo - Print out information about bigWig file. usage: bigWigInfo file.bw options: -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs -chroms - list all chromosomes and their sizes -zooms - list all zoom levels and their sizes -minMax - list the min and max on a single line ================================================================ ======== bigWigMerge ==================================== ================================================================ ### kent source version 362 ### bigWigMerge v2 - Merge together multiple bigWigs into a single output bedGraph. You'll have to run bedGraphToBigWig to make the output bigWig. The signal values are just added together to merge them usage: bigWigMerge in1.bw in2.bw .. inN.bw out.bedGraph options: -threshold=0.N - don't output values at or below this threshold. Default is 0.0 -adjust=0.N - add adjustment to each value -clip=NNN.N - values higher than this are clipped to this value -inList - input file are lists of file names of bigWigs -max - merged value is maximum from input files rather than sum ================================================================ ======== bigWigSummary ==================================== ================================================================ ### kent source version 362 ### bigWigSummary - Extract summary information from a bigWig file. usage: bigWigSummary file.bigWig chrom start end dataPoints Get summary data from bigWig for indicated region, broken into dataPoints equal parts. (Use dataPoints=1 for simple summary.) NOTE: start and end coordinates are in BED format (0-based) options: -type=X where X is one of: mean - average value in region (default) min - minimum value in region max - maximum value in region std - standard deviation in region coverage - % of region that is covered -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs ================================================================ ======== bigWigToBedGraph ==================================== ================================================================ ### kent source version 362 ### bigWigToBedGraph - Convert from bigWig to bedGraph format. usage: bigWigToBedGraph in.bigWig out.bedGraph options: -chrom=chr1 - if set restrict output to given chromosome -start=N - if set, restrict output to only that over start -end=N - if set, restict output to only that under end -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs ================================================================ ======== bigWigToWig ==================================== ================================================================ ### kent source version 362 ### bigWigToWig - Convert bigWig to wig. This will keep more of the same structure of the original wig than bigWigToBedGraph does, but still will break up large stepped sections into smaller ones. usage: bigWigToWig in.bigWig out.wig options: -chrom=chr1 - if set restrict output to given chromosome -start=N - if set, restrict output to only that over start -end=N - if set, restict output to only that under end -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs ================================================================ ======== blastToPsl ==================================== ================================================================ ### kent source version 362 ### blastToPsl - Convert blast alignments to PSLs. usage: blastToPsl [options] blastOutput psl Options: -scores=file - Write score information to this file. Format is: strands qName qStart qEnd tName tStart tEnd bitscore eVal -verbose=n - n >= 3 prints each line of file after parsing. n >= 4 dumps the result of each query -eVal=n n is e-value threshold to filter results. Format can be either an integer, double or 1e-10. Default is no filter. -pslx - create PSLX output (includes sequences for blocks) Output only results of last round from PSI BLAST ================================================================ ======== blastXmlToPsl ==================================== ================================================================ ### kent source version 362 ### blastXmlToPsl - convert blast XML output to PSLs usage: blastXmlToPsl [options] blastXml psl options: -scores=file - Write score information to this file. Format is: strands qName qStart qEnd tName tStart tEnd bitscore eVal qDef tDef -verbose=n - n >= 3 prints each line of file after parsing. n >= 4 dumps the result of each query -eVal=n n is e-value threshold to filter results. Format can be either an integer, double or 1e-10. Default is no filter. -pslx - create PSLX output (includes sequences for blocks) -convertToNucCoords - convert protein to nucleic alignments to nucleic to nucleic coordinates -qName=src - define element used to obtain the qName. The following values are support: o query-ID - use contents of the <Iteration_query-ID> element if it exists, otherwise use <BlastOutput_query-ID> o query-def0 - use the first white-space separated word of the <Iteration_query-def> element if it exists, otherwise the first word of <BlastOutput_query-def>. Default is query-def0. -tName=src - define element used to obtain the tName. The following values are support: o Hit_id - use contents of the <Hit-id> element. o Hit_def0 - use the first white-space separated word of the <Hit_def> element. o Hit_accession - contents of the <Hit_accession> element. Default is Hit-def0. -forcePsiBlast - treat as output of PSI-BLAST. blast-2.2.16 and maybe others indentify psiblast as blastp. Output only results of last round from PSI BLAST ================================================================ ======== blat ==================================== ================================================================ ### kent source version 362 ### blat - Standalone BLAT v. 36x2 fast sequence search command line tool usage: blat database query [-ooc=11.ooc] output.psl where: database and query are each either a .fa, .nib or .2bit file, or a list of these files with one file name per line. -ooc=11.ooc tells the program to load over-occurring 11-mers from an external file. This will increase the speed by a factor of 40 in many cases, but is not required. output.psl is the name of the output file. Subranges of .nib and .2bit files may be specified using the syntax: /path/file.nib:seqid:start-end or /path/file.2bit:seqid:start-end or /path/file.nib:start-end With the second form, a sequence id of file:start-end will be used. options: -t=type Database type. Type is one of: dna - DNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein The default is dna. -q=type Query type. Type is one of: dna - DNA sequence rna - RNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein rnax - DNA sequence translated in three frames to protein The default is dna. -prot Synonymous with -t=prot -q=prot. -ooc=N.ooc Use overused tile file N.ooc. N should correspond to the tileSize. -tileSize=N Sets the size of match that triggers an alignment. Usually between 8 and 12. Default is 11 for DNA and 5 for protein. -stepSize=N Spacing between tiles. Default is tileSize. -oneOff=N If set to 1, this allows one mismatch in tile and still triggers an alignment. Default is 0. -minMatch=N Sets the number of tile matches. Usually set from 2 to 4. Default is 2 for nucleotide, 1 for protein. -minScore=N Sets minimum score. This is the matches minus the mismatches minus some sort of gap penalty. Default is 30. -minIdentity=N Sets minimum sequence identity (in percent). Default is 90 for nucleotide searches, 25 for protein or translated protein searches. -maxGap=N Sets the size of maximum gap between tiles in a clump. Usually set from 0 to 3. Default is 2. Only relevent for minMatch > 1. -noHead Suppresses .psl header (so it's just a tab-separated file). -makeOoc=N.ooc Make overused tile file. Target needs to be complete genome. -repMatch=N Sets the number of repetitions of a tile allowed before it is marked as overused. Typically this is 256 for tileSize 12, 1024 for tile size 11, 4096 for tile size 10. Default is 1024. Typically comes into play only with makeOoc. Also affected by stepSize: when stepSize is halved, repMatch is doubled to compensate. -mask=type Mask out repeats. Alignments won't be started in masked region but may extend through it in nucleotide searches. Masked areas are ignored entirely in protein or translated searches. Types are: lower - mask out lower-cased sequence upper - mask out upper-cased sequence out - mask according to database.out RepeatMasker .out file file.out - mask database according to RepeatMasker file.out -qMask=type Mask out repeats in query sequence. Similar to -mask above, but for query rather than target sequence. -repeats=type Type is same as mask types above. Repeat bases will not be masked in any way, but matches in repeat areas will be reported separately from matches in other areas in the psl output. -minRepDivergence=NN Minimum percent divergence of repeats to allow them to be unmasked. Default is 15. Only relevant for masking using RepeatMasker .out files. -dots=N Output dot every N sequences to show program's progress. -trimT Trim leading poly-T. -noTrimA Don't trim trailing poly-A. -trimHardA Remove poly-A tail from qSize as well as alignments in psl output. -fastMap Run for fast DNA/DNA remapping - not allowing introns, requiring high %ID. Query sizes must not exceed 5000. -out=type Controls output file format. Type is one of: psl - Default. Tab-separated format, no sequence pslx - Tab-separated format with sequence axt - blastz-associated axt format maf - multiz-associated maf format sim4 - similar to sim4 format wublast - similar to wublast format blast - similar to NCBI blast format blast8- NCBI blast tabular format blast9 - NCBI blast tabular format with comments -fine For high-quality mRNAs, look harder for small initial and terminal exons. Not recommended for ESTs. -maxIntron=N Sets maximum intron size. Default is 750000. -extendThroughN Allows extension of alignment through large blocks of Ns. ================================================================ ======== calc ==================================== ================================================================ ### kent source version 362 ### calc - Little command line calculator usage: calc this + that * theOther / (a + b) Options: -h - output result as a human-readable integer numbers, with k/m/g/t suffix ================================================================ ======== catDir ==================================== ================================================================ catDir - concatenate files in directory to stdout. For those times when too many files for cat to handle. usage: catDir dir(s) options: -r Recurse into subdirectories -suffix=.suf This will restrict things to files ending in .suf '-wild=*.???' This will match wildcards. -nonz Prints file name of non-zero length files ================================================================ ======== catUncomment ==================================== ================================================================ catUncomment - Concatenate input removing lines that start with '#' Output goes to stdout usage: catUncomment file(s) ================================================================ ======== chainAntiRepeat ==================================== ================================================================ ### kent source version 362 ### chainAntiRepeat - Get rid of chains that are primarily the results of repeats and degenerate DNA usage: chainAntiRepeat tNibDir qNibDir inChain outChain options: -minScore=N - minimum score (after repeat stuff) to pass -noCheckScore=N - score that will pass without checks (speed tweak) ================================================================ ======== chainFilter ==================================== ================================================================ ### kent source version 362 ### chainFilter - Filter chain files. Output goes to standard out. usage: chainFilter file(s) options: -q=chr1,chr2 - restrict query side sequence to those named -notQ=chr1,chr2 - restrict query side sequence to those not named -t=chr1,chr2 - restrict target side sequence to those named -notT=chr1,chr2 - restrict target side sequence to those not named -id=N - only get one with ID number matching N -minScore=N - restrict to those scoring at least N -maxScore=N - restrict to those scoring less than N -qStartMin=N - restrict to those with qStart at least N -qStartMax=N - restrict to those with qStart less than N -qEndMin=N - restrict to those with qEnd at least N -qEndMax=N - restrict to those with qEnd less than N -tStartMin=N - restrict to those with tStart at least N -tStartMax=N - restrict to those with tStart less than N -tEndMin=N - restrict to those with tEnd at least N -tEndMax=N - restrict to those with tEnd less than N -qOverlapStart=N - restrict to those where the query overlaps a region starting here -qOverlapEnd=N - restrict to those where the query overlaps a region ending here -tOverlapStart=N - restrict to those where the target overlaps a region starting here -tOverlapEnd=N - restrict to those where the target overlaps a region ending here -strand=? -restrict strand (to + or -) -long -output in long format -zeroGap -get rid of gaps of length zero -minGapless=N - pass those with minimum gapless block of at least N -qMinGap=N - pass those with minimum gap size of at least N -tMinGap=N - pass those with minimum gap size of at least N -qMaxGap=N - pass those with maximum gap size no larger than N -tMaxGap=N - pass those with maximum gap size no larger than N -qMinSize=N - minimum size of spanned query region -qMaxSize=N - maximum size of spanned query region -tMinSize=N - minimum size of spanned target region -tMaxSize=N - maximum size of spanned target region -noRandom - suppress chains involving '_random' chromosomes -noHap - suppress chains involving '_hap|_alt' chromosomes ================================================================ ======== chainMergeSort ==================================== ================================================================ ### kent source version 362 ### chainMergeSort - Combine sorted files into larger sorted file usage: chainMergeSort file(s) Output goes to standard output options: -saveId - keep the existing chain ids. -inputList=somefile - somefile contains list of input chain files. -tempDir=somedir/ - somedir has space for temporary sorting data, default ./ ================================================================ ======== chainNet ==================================== ================================================================ ### kent source version 362 ### chainNet - Make alignment nets out of chains usage: chainNet in.chain target.sizes query.sizes target.net query.net where: in.chain is the chain file sorted by score target.sizes contains the size of the target sequences query.sizes contains the size of the query sequences target.net is the output over the target genome query.net is the output over the query genome options: -minSpace=N - minimum gap size to fill, default 25 -minFill=N - default half of minSpace -minScore=N - minimum chain score to consider, default 2000.0 -verbose=N - Alter verbosity (default 1) -inclHap - include query sequences name in the form *_hap*|*_alt*. Normally these are excluded from nets as being haplotype pseudochromosomes ================================================================ ======== chainPreNet ==================================== ================================================================ ### kent source version 362 ### chainPreNet - Remove chains that don't have a chance of being netted usage: chainPreNet in.chain target.sizes query.sizes out.chain options: -dots=N - output a dot every so often -pad=N - extra to pad around blocks to decrease trash (default 1) -inclHap - include query sequences name in the form *_hap*|*_alt*. Normally these are excluded from nets as being haplotype pseudochromosomes ================================================================ ======== chainSort ==================================== ================================================================ ### kent source version 362 ### chainSort - Sort chains. By default sorts by score. Note this loads all chains into memory, so it is not suitable for large sets. Instead, run chainSort on multiple small files, followed by chainMergeSort. usage: chainSort inFile outFile Note that inFile and outFile can be the same options: -target sort on target start rather than score -query sort on query start rather than score -index=out.tab build simple two column index file <out file position> <value> where <value> is score, target, or query depending on the sort. ================================================================ ======== chainSplit ==================================== ================================================================ ### kent source version 362 ### chainSplit - Split chains up by target or query sequence usage: chainSplit outDir inChain(s) options: -q - Split on query (default is on target) -lump=N Lump together so have only N split files. ================================================================ ======== chainStitchId ==================================== ================================================================ ### kent source version 362 ### chainStitchId - Join chain fragments with the same chain ID into a single chain per ID. Chain fragments must be from same original chain but must not overlap. Chain fragment scores are summed. usage: chainStitchId in.chain out.chain ================================================================ ======== chainSwap ==================================== ================================================================ chainSwap - Swap target and query in chain usage: chainSwap in.chain out.chain ================================================================ ======== chainToAxt ==================================== ================================================================ ### kent source version 362 ### chainToAxt - Convert from chain to axt file usage: chainToAxt in.chain tNibDirOr2bit qNibDirOr2bit out.axt options: -maxGap=maximum gap sized allowed without breaking, default 100 -maxChain=maximum chain size allowed without breaking, default 1073741823 -minScore=minimum score of chain -minId=minimum percentage ID within blocks -bed Output bed instead of axt ================================================================ ======== chainToPsl ==================================== ================================================================ chainToPsl - Convert chain file to psl format usage: chainToPsl in.chain tSizes qSizes target.lst query.lst out.psl Where tSizes and qSizes are tab-delimited files with <seqName><size> columns. The target and query lists can either be fasta files, nib files, 2bit files or a list of fasta, 2bit and/or nib files one per line ================================================================ ======== chainToPslBasic ==================================== ================================================================ ### kent source version 362 ### chainToPslBasic - Basic conversion chain file to psl format usage: chainToPsl in.chain out.psl If you need match and mismatch stats updated, pipe output through pslRecalcMatch ================================================================ ======== checkAgpAndFa ==================================== ================================================================ ### kent source version 362 ### checkAgpAndFa - takes a .agp file and .fa file and ensures that they are in synch usage: checkAgpAndFa in.agp in.fa options: -exclude=seq - Ignore seq (e.g. chrM for which we usually get sequence from GenBank but don't have AGP) in.fa can be a .2bit file. If it is .fa then sequences must appear in the same order in .agp and .fa. ================================================================ ======== checkCoverageGaps ==================================== ================================================================ ### kent source version 362 ### checkCoverageGaps - Check for biggest gap in coverage for a list of tracks. For most tracks coverage of 10,000,000 or more will indicate that there was a mistake in generating the track. usage: checkCoverageGaps database track1 ... trackN Note: for bigWig and bigBeds, the biggest gap is rounded to the nearest 10,000 or so options: -allParts If set then include _hap and _random and other wierd chroms -female If set then don't check chrY -noComma - Don't put commas in biggest gap output ================================================================ ======== checkHgFindSpec ==================================== ================================================================ ### kent source version 362 ### checkHgFindSpec - test and describe search specs in hgFindSpec tables. usage: checkHgFindSpec database [options | termToSearch] If given a termToSearch, displays the list of tables that will be searched and how long it took to figure that out; then performs the search and the time it took. options: -showSearches Show the order in which tables will be searched in general. [This will be done anyway if no termToSearch or options are specified.] -checkTermRegex For each search spec that includes a regular expression for terms, make sure that all values of the table field to be searched match the regex. (If not, some of them could be excluded from searches.) -checkIndexes Make sure that an index is defined on each field to be searched. ================================================================ ======== checkTableCoords ==================================== ================================================================ ### kent source version 362 ### checkTableCoords - check invariants on genomic coords in table(s). usage: checkTableCoords database [tableName] Searches for illegal genomic coordinates in all tables in database unless narrowed down using options. Uses ~/.hg.conf to determine genome database connection info. For psl/alignment tables, checks target coords only. options: -table=tableName Check this table only. (Default: all tables) -daysOld=N Check tables that have been modified at most N days ago. -hoursOld=N Check tables that have been modified at most N hours ago. (days and hours are additive) -exclude=patList Exclude tables matching any pattern in comma-separated patList. patList can contain wildcards (*?) but should be escaped or single-quoted if it does. patList can contain "genbank" which will be expanded to all tables generated by the automated genbank build process. -ignoreBlocks To save time (but lose coverage), skip block coord checks. -verboseBlocks Print out more details about illegal block coords, since they can't be found by simple SQL queries. ================================================================ ======== chopFaLines ==================================== ================================================================ chopFaLines - Read in FA file with long lines and rewrite it with shorter lines usage: chopFaLines in.fa out.fa ================================================================ ======== chromGraphFromBin ==================================== ================================================================ ### kent source version 362 ### chromGraphFromBin - Convert chromGraph binary to ascii format. usage: chromGraphFromBin in.chromGraph out.tab options: -chrom=chrX - restrict output to single chromosome ================================================================ ======== chromGraphToBin ==================================== ================================================================ ### kent source version 362 ### chromGraphToBin - Make binary version of chromGraph. usage: chromGraphToBin in.tab out.chromGraph options: -xxx=XXX ================================================================ ======== colTransform ==================================== ================================================================ colTransform - Add and/or multiply column by constant. usage: colTransform column input.tab addFactor mulFactor output.tab where: column is the column to transform, starting with 1 input.tab is the tab delimited input file addFactor is what to add. Use 0 here to not change anything mulFactor is what to multiply by. Use 1 here not to change anything output.tab is the tab delimited output file ================================================================ ======== countChars ==================================== ================================================================ countChars - Count the number of occurrences of a particular char usage: countChars char file(s) Char can either be a two digit hexadecimal value or a single letter literal character ================================================================ ======== crTreeIndexBed ==================================== ================================================================ ### kent source version 362 ### crTreeIndexBed - Create an index for a bed file. usage: crTreeIndexBed in.bed out.cr options: -blockSize=N - number of children per node in index tree. Default 1024 -itemsPerSlot=N - number of items per index slot. Default is half block size -noCheckSort - Don't check sorting order of in.tab ================================================================ ======== crTreeSearchBed ==================================== ================================================================ ### kent source version 362 ### crTreeSearchBed - Search a crTree indexed bed file and print all items that overlap query. usage: crTreeSearchBed file.bed index.cr chrom start end ================================================================ ======== dbSnoop ==================================== ================================================================ ### kent source version 362 ### dbSnoop - Produce an overview of a database. usage: dbSnoop database output options: -unsplit - if set will merge together tables split by chromosome -noNumberCommas - if set will leave out commas in big numbers -justSchema - only schema parts, no contents -skipTable=tableName - if set skip a given table name -profile=profileName - use profile for connection settings, default = 'db' ================================================================ ======== dbTrash ==================================== ================================================================ ### kent source version 362 ### dbTrash - drop tables from a database older than specified N hours usage: dbTrash -age=N [-drop] [-historyToo] [-db=<DB>] [-verbose=N] options: -age=N - number of hours old to qualify for drop. N can be a float. -drop - actually drop the tables, default is merely to display tables. -db=<DB> - Specify a database to work with, default is customTrash. -historyToo - also consider the table called 'history' for deletion. - default is to leave 'history' alone no matter how old. - this applies to the table 'metaInfo' also. -extFile - check extFile for lines that reference files - no longer in trash -extDel - delete lines in extFile that fail file check - otherwise just verbose(2) lines that would be deleted -topDir - directory name to prepend to file names in extFile - default is /usr/local/apache/trash - file names in extFile are typically: "../trash/ct/..." -tableStatus - use 'show table status' to get size data, very inefficient -delLostTable - delete tables that exist but are missing from metaInfo - this operation can be even slower than -tableStatus - if there are many tables to check. -verbose=N - 2 == show arguments, dates, and dropped tables, - 3 == show date information for all tables. ================================================================ ======== estOrient ==================================== ================================================================ ### kent source version 362 ### wrong # of args: estOrient [options] db estTable outPsl Read ESTs from a database and determine orientation based on estOrientInfo table or direction in gbCdnaInfo table. Update PSLs so that the strand reflects the direction of transcription. By default, PSLs where the direction can't be determined are dropped. Options: -chrom=chr - process this chromosome, maybe repeated -keepDisoriented - don't drop ESTs where orientation can't be determined. -disoriented=psl - output ESTs that where orientation can't be determined to this file. -inclVer - add NCBI version number to accession if not already present. -fileInput - estTable is a psl file -estOrientInfo=file - instead of getting the orientation information from the estOrientInfo table, load it from this file. This data is the output of polyInfo command. If this option is specified, the direction will not be looked up in the gbCdnaInfo table and db can be `no'. -info=infoFile - write information about each EST to this tab separated file qName tName tStart tEnd origStrand newStrand orient where orient is < 0 if PSL was reverse, > 0 if it was left unchanged and 0 if the orientation couldn't be determined (and was left unchanged). ================================================================ ======== expMatrixToBarchartBed ==================================== ================================================================ usage: expMatrixToBarchartBed [-h] [--groupOrderFile GROUPORDERFILE] [--useMean] [--verbose] sampleFile matrixFile bedFile outputFile Generate a barChart bed6+5 file from a matrix, meta data, and coordinates. positional arguments: sampleFile Two column no header, the first column is the samples which should match the matrix, the second is the grouping (cell type, tissue, etc) matrixFile The input matrix file. The samples in the first row should exactly match the ones in the sampleFile. The labels (ex ENST*****) in the first column should exactly match the ones in the bed file. bedFile Bed6+1 format. File that maps the column labels from the matrix to coordinates. Tab separated; chr, start coord, end coord, label, score, strand, gene name. The score column is ignored. outputFile The output file, bed 6+5 format. See the schema in kent/src/hg/lib/barChartBed.as. optional arguments: -h, --help show this help message and exit --groupOrderFile GROUPORDERFILE Optional file to define the group order, list the groups in a single column in the order desired. The default ordering is alphabetical. --useMean Calculate the group values using mean rather than median. --verbose Show runtime messages. ================================================================ ======== faAlign ==================================== ================================================================ ### kent source version 362 ### faAlign - Align two fasta files usage: faAlign target.fa query.fa output.axt options: -dna - use DNA scoring scheme ================================================================ ======== faCmp ==================================== ================================================================ ### kent source version 362 ### faCmp - Compare two .fa files usage: faCmp [options] a.fa b.fa options: -softMask - use the soft masking information during the compare Differences will be noted if the masking is different. -sortName - sort input files by name before comparing -peptide - read as peptide sequences default: no masking information is used during compare. It is as if both sequences were not masked. Exit codes: - 0 if files are the same - 1 if files differ - 255 on an error ================================================================ ======== faCount ==================================== ================================================================ ### kent source version 362 ### faCount - count base statistics and CpGs in FA files. usage: faCount file(s).fa -summary show only summary statistics -dinuc include statistics on dinucletoide frequencies -strands count bases on both strands ================================================================ ======== faFilter ==================================== ================================================================ ### kent source version 362 ### faFilter - Filter fa records, selecting ones that match the specified conditions usage: faFilter [options] in.fa out.fa Options: -name=wildCard - Only pass records where name matches wildcard * matches any string or no character. ? matches any single character. anything else etc must match the character exactly (these will will need to be quoted for the shell) -namePatList=filename - A list of regular expressions, one per line, that will be applied to the fasta name the same as -name -v - invert match, select non-matching records. -minSize=N - Only pass sequences at least this big. -maxSize=N - Only pass sequences this size or smaller. -maxN=N Only pass sequences with fewer than this number of N's -uniq - Removes duplicate sequence ids, keeping the first. -i - make -uniq ignore case so sequence IDs ABC and abc count as dupes. All specified conditions must pass to pass a sequence. If no conditions are specified, all records will be passed. ================================================================ ======== faFilterN ==================================== ================================================================ faFilterN - Get rid of sequences with too many N's usage: faFilterN in.fa out.fa maxPercentN options: -out=in.fa.out -uniq=self.psl ================================================================ ======== faFrag ==================================== ================================================================ faFrag - Extract a piece of DNA from a .fa file. usage: faFrag in.fa start end out.fa options: -mixed - preserve mixed-case in FASTA file ================================================================ ======== faNoise ==================================== ================================================================ faNoise - Add noise to .fa file usage: faNoise inName outName transitionPpt transversionPpt insertPpt deletePpt chimeraPpt options: -upper - output in upper case ================================================================ ======== faOneRecord ==================================== ================================================================ faOneRecord - Extract a single record from a .FA file usage: faOneRecord in.fa recordName ================================================================ ======== faPolyASizes ==================================== ================================================================ ### kent source version 362 ### faPolyASizes - get poly A sizes usage: faPolyASizes in.fa out.tab output file has four columns: id seqSize tailPolyASize headPolyTSize options: ================================================================ ======== faRandomize ==================================== ================================================================ ### kent source version 362 ### faRandomize - Program to create random fasta records usage: faRandomize [-seed=N] in.fa randomized.fa Use optional -seed argument to specify seed (integer) for random number generator (rand). Generated sequence has the same base frequency as seen in original fasta records. ================================================================ ======== faRc ==================================== ================================================================ faRc - Reverse complement a FA file usage: faRc in.fa out.fa In.fa and out.fa may be the same file. options: -keepName - keep name identical (don't prepend RC) -keepCase - works well for ACGTUN in either case. bizarre for other letters. without it bases are turned to lower, all else to n's -justReverse - prepends R unless asked to keep name -justComplement - prepends C unless asked to keep name (cannot appear together with -justReverse) ================================================================ ======== faSize ==================================== ================================================================ ### kent source version 362 ### faSize - print total base count in fa files. usage: faSize file(s).fa Command flags -detailed outputs name and size of each record has the side effect of printing nothing else -tab output statistics in a tab separated format ================================================================ ======== faSomeRecords ==================================== ================================================================ ### kent source version 362 ### faSomeRecords - Extract multiple fa records usage: faSomeRecords in.fa listFile out.fa options: -exclude - output sequences not in the list file. ================================================================ ======== faSplit ==================================== ================================================================ ### kent source version 362 ### faSplit - Split an fa file into several files. usage: faSplit how input.fa count outRoot where how is either 'about' 'byname' 'base' 'gap' 'sequence' or 'size'. Files split by sequence will be broken at the nearest fa record boundary. Files split by base will be broken at any base. Files broken by size will be broken every count bases. Examples: faSplit sequence estAll.fa 100 est This will break up estAll.fa into 100 files (numbered est001.fa est002.fa, ... est100.fa Files will only be broken at fa record boundaries faSplit base chr1.fa 10 1_ This will break up chr1.fa into 10 files faSplit size input.fa 2000 outRoot This breaks up input.fa into 2000 base chunks faSplit about est.fa 20000 outRoot This will break up est.fa into files of about 20000 bytes each by record. faSplit byname scaffolds.fa outRoot/ This breaks up scaffolds.fa using sequence names as file names. Use the terminating / on the outRoot to get it to work correctly. faSplit gap chrN.fa 20000 outRoot This breaks up chrN.fa into files of at most 20000 bases each, at gap boundaries if possible. If the sequence ends in N's, the last piece, if larger than 20000, will be all one piece. Options: -verbose=2 - Write names of each file created (=3 more details) -maxN=N - Suppress pieces with more than maxN n's. Only used with size. default is size-1 (only suppresses pieces that are all N). -oneFile - Put output in one file. Only used with size -extra=N - Add N extra bytes at the end to form overlapping pieces. Only used with size. -out=outFile Get masking from outfile. Only used with size. -lift=file.lft Put info on how to reconstruct sequence from pieces in file.lft. Only used with size and gap. -minGapSize=X Consider a block of Ns to be a gap if block size >= X. Default value 1000. Only used with gap. -noGapDrops - include all N's when splitting by gap. -outDirDepth=N Create N levels of output directory under current dir. This helps prevent NFS problems with a large number of file in a directory. Using -outDirDepth=3 would produce ./1/2/3/outRoot123.fa. -prefixLength=N - used with byname option. create a separate output file for each group of sequences names with same prefix of length N. ================================================================ ======== faToFastq ==================================== ================================================================ ### kent source version 362 ### faToFastq - Convert fa to fastq format, just faking quality values. usage: faToFastq in.fa out.fastq options: -qual=X quality letter to use. Default is '<' which is good I think.... ================================================================ ======== faToTab ==================================== ================================================================ faToTab - convert fa file to tab separated file usage: faToTab infileName outFileName options: -type=seqType sequence type, dna or protein, default is dna -keepAccSuffix - don't strip dot version off of sequence id, keep as is ================================================================ ======== faToTwoBit ==================================== ================================================================ ### kent source version 362 ### faToTwoBit - Convert DNA from fasta to 2bit format usage: faToTwoBit in.fa [in2.fa in3.fa ...] out.2bit options: -long use 64-bit offsets for index. Allow for twoBit to contain more than 4Gb of sequence. NOT COMPATIBLE WITH OLDER CODE. -noMask Ignore lower-case masking in fa file. -stripVersion Strip off version number after '.' for GenBank accessions. -ignoreDups Convert first sequence only if there are duplicate sequence names. Use 'twoBitDup' to find duplicate sequences. ================================================================ ======== faTrans ==================================== ================================================================ ### kent source version 362 ### faTrans - Translate DNA .fa file to peptide usage: faTrans in.fa out.fa options: -stop stop at first stop codon (otherwise puts in Z for stop codons) -offset=N start at a particular offset. -cdsUpper - cds is in upper case ================================================================ ======== fastqStatsAndSubsample ==================================== ================================================================ ### kent source version 362 ### fastqStatsAndSubsample v2 - Go through a fastq file doing sanity checks and collecting stats and also producing a smaller fastq out of a sample of the data. The fastq input may be compressed with gzip or bzip2. usage: fastqStatsAndSubsample in.fastq out.stats out.fastq options: -sampleSize=N - default 100000 -seed=N - Use given seed for random number generator. Default 0. -smallOk - Not an error if less than sampleSize reads. out.fastq will be entire in.fastq -json - out.stats will be in json rather than text format Use /dev/null for out.fastq and/or out.stats if not interested in these outputs ================================================================ ======== fastqToFa ==================================== ================================================================ ### kent source version 362 ### # no name checks will be made on lines beginning with @ # ignore quality scores # using default Phread quality score algorithm # all errors will cause exit fastqToFa - Convert from fastq to fasta format. usage: fastqToFa [options] in.fastq out.fa options: -nameVerify='string' - for multi-line fastq files, 'string' must match somewhere in the sequence names in order to correctly identify the next sequence block (e.g.: -nameVerify='Supercontig_') -qual=file.qual.fa - output quality scores to specifed file (default: quality scores are ignored) -qualSizes=qual.sizes - write sizes file for the quality scores -noErrors - warn only on problems, do not error out (specify -verbose=3 to see warnings -solexa - use Solexa/Illumina quality score algorithm (instead of Phread quality) -verbose=2 - set warning level to get some stats output during processing ================================================================ ======== featureBits ==================================== ================================================================ ### kent source version 362 ### featureBits - Correlate tables via bitmap projections. usage: featureBits database table(s) This will return the number of bits in all the tables anded together Pipe warning: output goes to stderr. Options: -bed=output.bed Put intersection into bed format. Can use stdout. -fa=output.fa Put sequence in intersection into .fa file -faMerge For fa output merge overlapping features. -minSize=N Minimum size to output (default 1) -chrom=chrN Restrict to one chromosome -chromSize=sizefile Read chrom sizes from file instead of database. (chromInfo three column format) -or Or tables together instead of anding them -not Output negation of resulting bit set. -countGaps Count gaps in denominator -noRandom Don't include _random (or Un) chromosomes -noHap Don't include _hap|_alt chromosomes -dots=N Output dot every N chroms (scaffolds) processed -minFeatureSize=n Don't include bits of the track that are smaller than minFeatureSize, useful for differentiating between alignment gaps and introns. -bin=output.bin Put bin counts in output file -binSize=N Bin size for generating counts in bin file (default 500000) -binOverlap=N Bin overlap for generating counts in bin file (default 250000) -bedRegionIn=input.bed Read in a bed file for bin counts in specific regions and write to bedRegionsOut -bedRegionOut=output.bed Write a bed file of bin counts in specific regions from bedRegionIn -enrichment Calculates coverage and enrichment assuming first table is reference gene track and second track something else Enrichment is the amount of table1 that covers table2 vs. the amount of table1 that covers the genome. It's how much denser table1 is in table2 than it is genome-wide. '-where=some sql pattern' Restrict to features matching some sql pattern You can include a '!' before a table name to negate it. To prevent your shell from interpreting the '!' you will need to use the backslash \!, for example the gap table: \!gap Some table names can be followed by modifiers such as: :exon:N Break into exons and add N to each end of each exon :cds Break into coding exons :intron:N Break into introns, remove N from each end :utr5, :utr3 Break into 5' or 3' UTRs :upstream:N Consider the region of N bases before region :end:N Consider the region of N bases after region :score:N Consider records with score >= N :upstreamAll:N Like upstream, but doesn't filter out genes that have txStart==cdsStart or txEnd==cdsEnd :endAll:N Like end, but doesn't filter out genes that have txStart==cdsStart or txEnd==cdsEnd The tables can be bed, psl, or chain files, or a directory full of such files as well as actual database tables. To count the bits used in dir/chrN_something*.bed you'd do: featureBits database dir/_something.bed NB: by default, featureBits omits gap regions from its calculation of the total number of bases. This requires connecting to a database server using credentials from a .hg.conf file (or similar). If such a connection is not available, you will need to specify -countGaps (which skips the database connection) in addition to providing all tables as files or directories. ================================================================ ======== fetchChromSizes ==================================== ================================================================ usage: fetchChromSizes <db> > <db>.chrom.sizes used to fetch chrom.sizes information from UCSC for the given <db> <db> - name of UCSC database, e.g.: hg38, hg18, mm9, etc ... This script expects to find one of the following commands: wget, mysql, or ftp in order to fetch information from UCSC. Route the output to the file <db>.chrom.sizes as indicated above. This data is available at the URL: http://hgdownload.cse.ucsc.edu/goldenPath/<db>/bigZips/<db>.chrom.sizes Example: fetchChromSizes hg38 > hg38.chrom.sizes ================================================================ ======== findMotif ==================================== ================================================================ ### kent source version 362 ### findMotif - find specified motif in sequence usage: findMotif [options] -motif=<acgt...> sequence where: sequence is a .fa , .nib or .2bit file or a file which is a list of sequence files. options: -motif=<acgt...> - search for this specified motif (case ignored, [acgt] only) -chr=<chrN> - process only this one chrN from the sequence -strand=<+|-> - limit to only one strand. Default is both. -bedOutput - output bed format (this is the default) -wigOutput - output wiggle data format instead of bed file -verbose=N - set information level [1-4] NOTE: motif must be longer than 4 characters, less than 17 -verbose=4 - will display gaps as bed file data lines to stderr ================================================================ ======== gapToLift ==================================== ================================================================ ### kent source version 362 ### gapToLift - create lift file from gap table(s) usage: gapToLift [options] db liftFile.lft uses gap table(s) from specified db. Writes to liftFile.lft generates lift file segements separated by non-bridged gaps. options: -chr=chrN - work only on given chrom -minGap=M - examine gaps only >= than M -insane - do *not* perform coordinate sanity checks on gaps -bedFile=fileName.bed - output segments to fileName.bed -verbose=N - N > 1 see more information about procedure ================================================================ ======== genePredCheck ==================================== ================================================================ ### kent source version 362 ### genePredCheck - validate genePred files or tables usage: genePredCheck [options] fileTbl .. If fileTbl is an existing file, then it is checked. Otherwise, if -db is provided, then a table by this name in db is checked. options: -db=db - If specified, then this database is used to get chromosome sizes, and perhaps the table to check. -chromSizes=file.chrom.sizes - use chrom sizes from tab separated file (name,size) instead of from chromInfo table in specified db. ================================================================ ======== genePredFilter ==================================== ================================================================ ### kent source version 362 ### genePredFilter - filter a genePred file usage: genePredFilter [options] genePredIn genePredOut Filter a genePredFile, dropping invalid entries options: -db=db - If specified, then this database is used to get chromosome sizes. -verbose=2 - level >= 2 prints out errors for each problem found. ================================================================ ======== genePredHisto ==================================== ================================================================ ### kent source version 362 ### wrong number of arguments genePredHisto - get data for generating histograms from a genePred file. usage: genePredHisto [options] what genePredFile histoOut Options: -ids - a second column with the gene name, useful for finding outliers. The what arguments indicates the type of output. The output file is a list of numbers suitable for input to textHistogram or similar The following values are current implemented exonLen- length of exons 5utrExonLen- length of 5'UTR regions of exons cdsExonLen- length of CDS regions of exons 3utrExonLen- length of 3'UTR regions of exons exonCnt- count of exons 5utrExonCnt- count of exons containing 5'UTR cdsExonCnt- count of exons count CDS 3utrExonCnt- count of exons containing 3'UTR ================================================================ ======== genePredSingleCover ==================================== ================================================================ ### kent source version 362 ### wrong # args genePredSingleCover - create single-coverage genePred files genePredSingleCover [options] inGenePred outGenePred Create a genePred file that have single CDS coverage of the genome. UTR is allowed to overlap. The default is to keep the gene with the largest numberr of CDS bases. Options: -scores=file - read scores used in selecting genes from this file. It consists of tab seperated lines of name chrom txStart score where score is a real or integer number. Higher scoring genes will be choosen over lower scoring ones. Equaly scoring genes are choosen by number of CDS bases. If this option is supplied, all genes must be in the file ================================================================ ======== genePredToBed ==================================== ================================================================ ### kent source version 362 ### genePredToBed - Convert from genePred to bed format. Does not yet handle genePredExt usage: genePredToBed in.genePred out.bed options: -xxx=XXX ================================================================ ======== genePredToBigGenePred ==================================== ================================================================ ### kent source version 362 ### genePredToBigGenePred - converts genePred or genePredExt to bigGenePred input (bed format with extra fields) usage: genePredToBigGenePred [-known] [-score=scores] [-geneNames=geneNames] [-colors=colors] file.gp stdout | sort -k1,1 -k2,2n > file.bgpInput NOTE: to build bigBed: bedToBigBed -type=bed12+8 -tab -as=bigGenePred.as file.bgpInput chrom.sizes output.bb options: -known input file is a genePred in knownGene format -score=scores scores is two column file with id's mapping to scores -geneNames=geneNames geneNames is a three column file with id's mapping to two gene names -colors=colors colors is a four column file with id's mapping to r,g,b -cds=cds cds is a five column file with id's mapping to cds status codes and exonFrames (see knownCds.as) ================================================================ ======== genePredToFakePsl ==================================== ================================================================ ### kent source version 362 ### genePredToFakePsl - Create a psl of fake-mRNA aligned to gene-preds from a file or table. usage: genePredToFakePsl [options] db fileTbl pslOut cdsOut If fileTbl is an existing file, then it is used. Otherwise, the table by this name is used. pslOut specifies the fake-mRNA output psl filename. cdsOut specifies the output cds tab-separated file which contains genbank-style CDS records showing cdsStart..cdsEnd e.g. NM_123456 34..305 options: -chromSize=sizefile Read chrom sizes from file instead of database sizefile contains two white space separated fields per line: chrom name and size -qSizes=qSizesFile Read in query sizes to fixup qSize and qStarts ================================================================ ======== genePredToGtf ==================================== ================================================================ ### kent source version 362 ### genePredToGtf - Convert genePred table or file to gtf. usage: genePredToGtf database genePredTable output.gtf If database is 'file' then track is interpreted as a file rather than a table in database. options: -utr - Add 5UTR and 3UTR features -honorCdsStat - use cdsStartStat/cdsEndStat when defining start/end codon records -source=src set source name to use -addComments - Add comments before each set of transcript records. allows for easier visual inspection Note: use a refFlat table or extended genePred table or file to include the gene_name attribute in the output. This will not work with a refFlat table dump file. If you are using a genePred file that starts with a numeric bin column, drop it using the UNIX cut command: cut -f 2- in.gp | genePredToGtf file stdin out.gp ================================================================ ======== genePredToMafFrames ==================================== ================================================================ ### kent source version 362 ### wrong # args genePredToMafFrames - create mafFrames tables from a genePreds genePredToMafFrames [options] targetDb maf mafFrames geneDb1 genePred1 [geneDb2 genePred2...] Create frame annotations for one or more components of a MAF. It is significantly faster to process multiple gene sets in the same"run, as 95% of the CPU time is spent reading the MAF Arguments: o targetDb - db of target genome o maf - input MAF file o mafFrames - output file o geneDb1 - db in MAF that corresponds to genePred's organism. o genePred1 - genePred file. Overlapping annotations ahould have be removed. This file may optionally include frame annotations Options: -bed=file - output a bed of for each mafFrame region, useful for debugging. -verbose=level - enable verbose tracing, the following levels are implemented: 3 - print information about data used to compute each record. 4 - dump information about the gene mappings that were constructed 5 - dump information about the gene mappings after split processing 6 - dump information about the gene mappings after frame linking ================================================================ ======== genePredToProt ==================================== ================================================================ ### kent source version 362 ### genePredToProt - create protein sequences by translating gene annotations usage: genePredToProt genePredFile genomeSeqs protFa This honors frame if genePred has frames, dropping partial codons. genomeSeqs is a 2bit or directory of nib files. options: -cdsFa=fasta - output FASTA with CDS that was used to generate protein. This will not include dropped partial codons. -protIdSuffix=str - add this string to the end of the name for protein FASTA -cdsIdSuffix=str - add this string to the end of the name for CDS FASTA -translateSeleno - assume internal TGA code for selenocysteine and translate to `U'. -includeStop - If the CDS ends with a stop codon, represent it as a `*' -starForInframeStops - use `*' instead of `X' for in-frame stop codons. This will result in selenocysteine's being `*', with only codons containing `N' being translated to `X'. This doesn't include terminal stop ================================================================ ======== gensub2 ==================================== ================================================================ gensub2 - version 12.18 Generate condor submission file from template and two file lists. Usage: gensub2 <file list 1> <file list 2> <template file> <output file> This will substitute each file in the file lists for $(path1) and $(path2) in the template between #LOOP and #ENDLOOP, and write the results to the output. Other substitution variables are: $(path1) - Full path name of first file. $(path2) - Full path name of second file. $(dir1) - First directory. Includes trailing slash if any. $(dir2) - Second directory. $(lastDir1) - The last directory in the first path. Includes trailing slash if any. $(lastDir2) - The last directory in the second path. Includes trailing slash if any. $(lastDirs1=<n>) - The last n directories in the first path. $(lastDirs2=<n>) - The last n directories in the second path. $(root1) - First file name without directory or extension. $(root2) - Second file name without directory or extension. $(ext1) - First file extension. $(ext2) - Second file extension. $(file1) - Name without dir of first file. $(file2) - Name without dir of second file. $(num1) - Index of first file in list. $(num2) - Index of second file in list. The <file list 2> parameter can be 'single' if there is only one file list and 'selfPair' if there is a single list, but you want all pairs of single list with itself. By default the order is diagonal, meaning if the first list is ABC and the second list is abc the combined order is Aa Ba Ab Ca Bb Ac Cb Bc Cc. This tends to put the largest jobs first if the file lists are both sorted by size. The following options can change this: -group1 - write elements in order Aa Ab Ac Ba Bb Bc Ca Cb Cc -group2 - write elements in order Aa Ba Ca Ab Bb Cb Ac Bc Cc template file syntax help for check statement: {check 'when' 'what' <file>} where 'when' is either 'in' or 'out' and 'what' is one of: 'exists' 'exists+' 'line' 'line+' 'exists' means file exists, may be zero size 'exists+' means file exists and is non-zero size 'line' means file may have 0 or more lines of ascii data and is properly line-feed terminated 'line+' means file is 1 or more lines of ascii data and is properly line-feed terminated ================================================================ ======== getRna ==================================== ================================================================ ### kent source version 362 ### getRna - Get mrna for GenBank or RefSeq sequences found in a database usage: getRna [options] database accFile outfa Get mrna for all accessions in accFile, writing to a fasta file. If accession has a version, that version is returned or an error generated Options: -cdsUpper - lookup CDS and output it as upper case. If CDS annotation can't be obtained, the sequence is skipped with a warning. -cdsUpperAll - like -cdsUpper, except keep sequeneces without CDS -inclVer - include version with sequence id. -peptides - translate mRNAs to peptides ================================================================ ======== getRnaPred ==================================== ================================================================ ### kent source version 362 ### getRnaPred - Get virtual RNA for gene predictions usage: getRnaPred [options] database table chromosome output.fa table can be a table or a file. Specify chromosome of 'all' to to process all chromosome options: -weird - only get ones with weird splice sites -cdsUpper - output CDS in upper case -cdsOnly - only output CDS -cdsOut=file - write CDS to this tab-separated file, in the form acc start end where start..end are genbank style, one-based coordinates -keepMasking - un/masked in upper/lower case. -pslOut=psl - output a PSLs for the virtual mRNAs. Allows virtual mRNA to be analyzed by tools that work on PSLs -suffix=suf - append suffix to each id to avoid confusion with mRNAs use to define the genes. -peptides - out the translation of the CDS to a peptide sequence. The newer program genePredToProt maybe produce better results in cases were there are frame-shifting indels in the CDS. -exonIndices - output indices of exon boundaries after sequence name, e.g., "103 243 290" says positions 1-103 are from the first exon, positions 104-243 are from the second exon, etc. -maxSize=size - output a maximum of size characters. Useful when testing gene predictions by RT-PCR. -genomeSeqs=spec - get genome sequences from the specified nib directory or 2bit file instead of going though the path found in chromInfo. -includeCoords - include the genomic coordinates as a comment in the fasta header. This is necessary when there are multiple genePreds with the same name. -genePredExt - (for use with -peptides) use extended genePred format, and consider frame information when translating (Warning: only considers offset at 5' end, not frameshifts between blocks) ================================================================ ======== gfClient ==================================== ================================================================ ### kent source version 362 ### gfClient v. 36x2 - A client for the genomic finding program that produces a .psl file usage: gfClient host port seqDir in.fa out.psl where host is the name of the machine running the gfServer port is the same port that you started the gfServer with seqDir is the path of the .nib or .2bit files relative to the current dir (note these are needed by the client as well as the server) in.fa is a fasta format file. May contain multiple records out.psl is where to put the output options: -t=type Database type. Type is one of: dna - DNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein The default is dna. -q=type Query type. Type is one of: dna - DNA sequence rna - RNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein rnax - DNA sequence translated in three frames to protein -prot Synonymous with -t=prot -q=prot. -dots=N Output a dot every N query sequences. -nohead Suppresses 5-line psl header. -minScore=N Sets minimum score. This is twice the matches minus the mismatches minus some sort of gap penalty. Default is 30. -minIdentity=N Sets minimum sequence identity (in percent). Default is 90 for nucleotide searches, 25 for protein or translated protein searches. -out=type Controls output file format. Type is one of: psl - Default. Tab-separated format without actual sequence pslx - Tab-separated format with sequence axt - blastz-associated axt format maf - multiz-associated maf format sim4 - similar to sim4 format wublast - similar to wublast format blast - similar to NCBI blast format blast8- NCBI blast tabular format blast9 - NCBI blast tabular format with comments -maxIntron=N Sets maximum intron size. Default is 750000. ================================================================ ======== gfServer ==================================== ================================================================ ### kent source version 362 ### gfServer v 36x2 - Make a server to quickly find where DNA occurs in genome To set up a server: gfServer start host port file(s) where the files are .nib or .2bit format files specified relative to the current directory To remove a server: gfServer stop host port To query a server with DNA sequence: gfServer query host port probe.fa To query a server with protein sequence: gfServer protQuery host port probe.fa To query a server with translated DNA sequence: gfServer transQuery host port probe.fa To query server with PCR primers: gfServer pcr host port fPrimer rPrimer maxDistance To process one probe fa file against a .nib format genome (not starting server): gfServer direct probe.fa file(s).nib To test PCR without starting server: gfServer pcrDirect fPrimer rPrimer file(s).nib To figure out usage level: gfServer status host port To get input file list: gfServer files host port options: -tileSize=N Size of n-mers to index. Default is 11 for nucleotides, 4 for proteins (or translated nucleotides). -stepSize=N Spacing between tiles. Default is tileSize. -minMatch=N Number of n-mer matches that trigger detailed alignment. Default is 2 for nucleotides, 3 for proteins. -maxGap=N Number of insertions or deletions allowed between n-mers. Default is 2 for nucleotides, 0 for proteins. -trans Translate database to protein in 6 frames. Note: it is best to run this on RepeatMasked data in this case. -log=logFile Keep a log file that records server requests. -seqLog Include sequences in log file (not logged with -syslog). -ipLog Include user's IP in log file (not logged with -syslog). -debugLog Include debugging info in log file. -syslog Log to syslog. -logFacility=facility Log to the specified syslog facility - default local0. -mask Use masking from nib file. -repMatch=N Number of occurrences of a tile (n-mer) that triggers repeat masking the tile. Default is 1024. -maxDnaHits=N Maximum number of hits for a DNA query that are sent from the server. Default is 100. -maxTransHits=N Maximum number of hits for a translated query that are sent from the server. Default is 200. -maxNtSize=N Maximum size of untranslated DNA query sequence. Default is 40000. -maxAaSize=N Maximum size of protein or translated DNA queries. Default is 8000. -canStop If set, a quit message will actually take down the server. ================================================================ ======== gff3ToGenePred ==================================== ================================================================ ### kent source version 362 ### gff3ToGenePred - convert a GFF3 file to a genePred file usage: gff3ToGenePred inGff3 outGp options: -warnAndContinue - on bad genePreds being created, put out warning but continue -useName - rather than using 'id' as name, use the 'name' tag -rnaNameAttr=attr - If this attribute exists on an RNA record, use it as the genePred name column -geneNameAttr=attr - If this attribute exists on a gene record, use it as the genePred name2 column -attrsOut=file - output attributes of mRNA record to file. These are per-genePred row, not per-GFF3 record. Thery are derived from GFF3 attributes, not the attributes themselves. -processAllGeneChildren - output genePred for all children of a gene regardless of feature -unprocessedRootsOut=file - output GFF3 root records that were not used. This will not be a valid GFF3 file. It's expected that many non-root records will not be used and they are not reported. -bad=file - output genepreds that fail checks to file -maxParseErrors=50 - Maximum number of parsing errors before aborting. A negative value will allow an unlimited number of errors. Default is 50. -maxConvertErrors=50 - Maximum number of conversion errors before aborting. A negative value will allow an unlimited number of errors. Default is 50. -honorStartStopCodons - only set CDS start/stop status to complete if there are corresponding start_stop codon records -defaultCdsStatusToUnknown - default the CDS status to unknown rather than complete. -allowMinimalGenes - normally this programs assumes that genes contains transcripts which contain exons. If this option is specified, genes with exons as direct children of genes and stand alone genes with no exon or transcript children will be converted. This converts: - top-level gene records with RNA records - top-level RNA records - RNA records that contain: - exon and CDS - CDS, five_prime_UTR, three_prime_UTR - only exon for non-coding - top-level gene records with transcript records - top-level transcript records - transcript records that contain: - exon where RNA can be mRNA, ncRNA, or rRNA, and transcript can be either transcript or primary_transcript The first step is to parse GFF3 file, up to 50 errors are reported before aborting. If the GFF3 files is successfully parse, it is converted to gene, annotation. Up to 50 conversion errors are reported before aborting. Input file must conform to the GFF3 specification: http://www.sequenceontology.org/gff3.shtml ================================================================ ======== gff3ToPsl ==================================== ================================================================ ### kent source version 362 ### gff3ToPsl - convert a GFF3 CIGAR file to a PSL file usage: gff3ToPsl [options] queryChromSizes targetChomSizes inGff3 out.psl arguments: queryChromSizes file with query (main coordinates) chromosome sizes . File formatted: chromeName<tab>chromSize targetChromSizes file with target (Target attribute) chromosome sizes . inGff3 GFF3 formatted file with Gap attribute in match records out.psl PSL formatted output options: -dropQ drop record when query not found in queryChromSizes -dropT drop record when target not found in targetChromSizes This converts: The first step is to parse GFF3 file, up to 50 errors are reported before aborting. If the GFF3 files is successfully parse, it is converted to PSL Input file must conform to the GFF3 specification: http://www.sequenceontology.org/gff3.shtml ================================================================ ======== gmtime ==================================== ================================================================ gmtime - convert unix timestamp to date string usage: gmtime <time stamp> <time stamp> - integer 0 to 2147483647 ================================================================ ======== gtfToGenePred ==================================== ================================================================ ### kent source version 362 ### gtfToGenePred - convert a GTF file to a genePred usage: gtfToGenePred gtf genePred options: -genePredExt - create a extended genePred, including frame information and gene name -allErrors - skip groups with errors rather than aborting. Useful for getting infomation about as many errors as possible. -ignoreGroupsWithoutExons - skip groups contain no exons rather than generate an error. -infoOut=file - write a file with information on each transcript -sourcePrefix=pre - only process entries where the source name has the specified prefix. May be repeated. -impliedStopAfterCds - implied stop codon in after CDS -simple - just check column validity, not hierarchy, resulting genePred may be damaged -geneNameAsName2 - if specified, use gene_name for the name2 field instead of gene_id. -includeVersion - it gene_version and/or transcript_version attributes exist, include the version in the corresponding identifiers. ================================================================ ======== headRest ==================================== ================================================================ ### kent source version 362 ### headRest - Return all *but* the first N lines of a file. usage: headRest count fileName You can use stdin for fileName options: -xxx=XXX ================================================================ ======== hgBbiDbLink ==================================== ================================================================ ### kent source version 362 ### hgBbiDbLink - Add table that just contains a pointer to a bbiFile to database. This program is used to add bigWigs and bigBeds. usage: hgBbiDbLink database trackName fileName ================================================================ ======== hgFakeAgp ==================================== ================================================================ ### kent source version 362 ### hgFakeAgp - Create fake AGP file by looking at N's usage: hgFakeAgp input.fa output.agp options: -minContigGap=N Minimum size for a gap between contigs. Default 25 -minScaffoldGap=N Min size for a gap between scaffolds. Default 50000 ================================================================ ======== hgFindSpec ==================================== ================================================================ ### kent source version 362 ### hgFindSpec - Create hgFindSpec table from trackDb.ra files. usage: hgFindSpec [options] orgDir database hgFindSpec hgFindSpec.sql hgRoot Options: -strict Add spec to hgFindSpec only if its table(s) exist. -raName=trackDb.ra - Specify a file name to use other than trackDb.ra for the ra files. -release=alpha|beta|public - Include trackDb entries with this release tag only. ================================================================ ======== hgGcPercent ==================================== ================================================================ ### kent source version 362 ### hgGcPercent - Calculate GC Percentage in 20kb windows usage: hgGcPercent [options] database nibDir nibDir can be a .2bit file, a directory that contains a database.2bit file, or a directory that contains *.nib files. Loads gcPercent table with counts from sequence. options: -win=<size> - change windows size (default 20000) -noLoad - do not load mysql table - create bed file -file=<filename> - output to <filename> (stdout OK) (implies -noLoad) -chr=<chrN> - process only chrN from the nibDir -noRandom - ignore randome chromosomes from the nibDir -noDots - do not display ... progress during processing -doGaps - process gaps correctly (default: gaps are not counted as GC) -wigOut - output wiggle ascii data ready to pipe to wigEncode -overlap=N - overlap windows by N bases (default 0) -verbose=N - display details to stderr during processing -bedRegionIn=input.bed Read in a bed file for GC content in specific regions and write to bedRegionsOut -bedRegionOut=output.bed Write a bed file of GC content in specific regions from bedRegionIn example: calculate GC percent in 5 base windows using a 2bit assembly (dp2): hgGcPercent -wigOut -doGaps -win=5 -file=stdout -verbose=0 \ dp2 /cluster/data/dp2 \ | wigEncode stdin gc5Base.wig gc5Base.wib ================================================================ ======== hgLoadBed ==================================== ================================================================ ### kent source version 362 ### hgLoadBed - Load a generic bed file into database usage: hgLoadBed database track files(s).bed options: -noSort don't sort (you better be sorting before this) -noBin suppress bin field -oldTable add to existing table -onServer This will speed things up if you're running in a directory that the mysql server can access. -sqlTable=table.sql Create table from .sql file -renameSqlTable Rename table created with -sqlTable to match track -trimSqlTable If sqlTable has n fields, and input has m fields, only load m fields, meaning the last n-m fields in the sqlTable are optional -type=bedN[+[P]] : N is between 3 and 15, optional (+) if extra "bedPlus" fields, optional P specifies the number of extra fields. Not required, but preferred. Examples: -type=bed6 or -type=bed6+ or -type=bed6+3 (see http://genome.ucsc.edu/FAQ/FAQformat.html#format1) Recommended to use with -as option for better bedPlus validation. -as=fields.as If you have extra "bedPlus" fields, it's great to put a definition of each field in a row in AutoSql format here. -chromInfo=file.txt Specify chromInfo file to validate chrom names and sizes. -tab Separate by tabs rather than space -hasBin Input bed file starts with a bin field. -noLoad - Do not load database and do not clean up tab files -noHistory - Do not add history table comments (for custom tracks) -notItemRgb - Do not parse column nine as r,g,b when commas seen (bacEnds) -bedGraph=N - wiggle graph column N of the input file as float dataValue - bedGraph N is typically 4: -bedGraph=4 -bedDetail - bedDetail format with id and text for hgc clicks - requires tab and sqlTable options -maxChromNameLength=N - specify max chromName length to avoid - reference to chromInfo table -tmpDir=<path> - path to directory for creation of temporary .tab file - which will be removed after loading -noNameIx - no index for the name column (default creates index) -ignoreEmpty - no error on empty input file -noStrict - don't perform coord sanity checks - by default we abort when: chromStart >= chromEnd -allowStartEqualEnd - even when doing strict checks, allow chromStart==chromEnd (zero-length e.g. insertion) -allowNegativeScores - sql definition of score column is int, not unsigned -customTrackLoader - turns on: -noNameIx, -noHistory, -ignoreEmpty, -allowStartEqualEnd, -allowNegativeScores, -verbose=0 Plus, this turns on a 20 minute time-out exit. -fillInScore=colName - if every score value is zero, then use column 'colName' to fill in the score column (from minScore-1000) -minScore=N - minimum value for score field for -fillInScore option (default 100) -verbose=N - verbose level for extra information to STDERR -dotIsNull=N - if the specified field is a '.' the replace it with -1 -lineLimit=N - limit input file to this number of lines ================================================================ ======== hgLoadChain ==================================== ================================================================ ### kent source version 362 ### hgLoadChain - Load a generic Chain file into database usage: hgLoadChain database chrN_track chrN.chain options: -tIndex Include tName in indexes (for non-split chain tables) -noBin suppress bin field, default: bin field is added -noSort Don't sort by target (memory-intensive) -- input *must* be sorted by target already if this option is used. -oldTable add to existing table, default: create new table -sqlTable=table.sql Create table from .sql file -normScore add normalized score column to table, default: not added -qPrefix=xxx prepend "xxx" to query name -test suppress loading to database ================================================================ ======== hgLoadMaf ==================================== ================================================================ ### kent source version 362 ### hgLoadMaf - Load a maf file index into the database usage: hgLoadMaf database table options: -warn warn instead of error upon empty/incomplete alignments -WARN warn instead of error, with detail for the warning -test=infile use infile as input, and suppress loading the database. Just create .tab file in current dir. -pathPrefix=dir load files from specified directory (default /gbdb/database/table. -tmpDir=<path> path to directory for creation of temporary .tab file which will be removed after loading -loadFile=file use file as input -maxNameLen=N specify max chromosome name length to avoid reference to chromInfo table -defPos=file file to put default position in default position is first block -custom loading a custom track, don't use history or extFile tables NOTE: The maf files need to be in chromosome coordinates, the reference species must be the first component, and the blocks must be correctly ordered and be on the '+' strand ================================================================ ======== hgLoadMafSummary ==================================== ================================================================ ### kent source version 362 ### hgLoadMafSummary - Load a summary table of pairs in a maf into a database usage: hgLoadMafSummary database table file.maf options: -mergeGap=N max size of gap to merge regions (default 500) -minSize=N merge blocks smaller than N (default 10000) -maxSize=N break up blocks larger than N (default 50000) -minSeqSize=N skip alignments when reference sequence is less than N (default 1000000 -- match with hgTracks min window size for using summary table) -test suppress loading the database. Just create .tab file(s) in current dir. ================================================================ ======== hgLoadNet ==================================== ================================================================ ### kent source version 362 ### hgLoadNet - Load a generic net file into database usage: hgLoadNet database track files(s).net options: -noBin suppress bin field -oldTable add to existing table -sqlTable=table.sql Create table from .sql file -qPrefix=xxx prepend "xxx-" to query name -warn load even with missing fields -test suppress loading table ================================================================ ======== hgLoadOut ==================================== ================================================================ ### kent source version 362 ### hgLoadOut - load RepeatMasker .out files into database usage: hgLoadOut database file(s).out For multiple files chrN.out this will create the single table 'rmsk' in the database, use the -split argument to obtain separate chrN_rmsk tables. options: -tabFile=text.tab - don't actually load database, just create tab file -split - load chrN_rmsk separate tables even if a single file is given -table=name - use a different suffix other than the default (rmsk) ================================================================ ======== hgLoadOutJoined ==================================== ================================================================ ### kent source version 362 ### hgLoadOutJoined - load new style (2014) RepeatMasker .out files into database usage: hgLoadOutJoined database file(s).out For multiple files chrN.out this will create the single table 'rmskOutBaseline' in the database. options: -tabFile=text.tab - don't actually load database, just create tab file -table=name - use a different suffix other than the default (rmskOutBaseline) ================================================================ ======== hgLoadSqlTab ==================================== ================================================================ ### kent source version 362 ### hgLoadSqlTab - Load table into database from SQL and text files. usage: hgLoadSqlTab database table file.sql file(s).tab file.sql contains a SQL create statement for table file.tab contains tab-separated text (rows of table) The actual table name will come from the command line, not the sql file. options: -warn - warn instead of abort on mysql errors or warnings -notOnServer - file is *not* in a directory that the mysql server can see -oldTable|-append - add to existing table To load bed 3+ sorted tab files as hgLoadBed would do automatically sort the input file: sort -k1,1 -k2,2n file(s).tab | hgLoadSqlTab database table file.sql stdin ================================================================ ======== hgLoadWiggle ==================================== ================================================================ ### kent source version 362 ### hgLoadWiggle - Load a wiggle track definition into database usage: hgLoadWiggle [options] database track files(s).wig options: -noBin suppress bin field -noLoad do not load table, only create .tab file -noHistory do not add history table comments (for custom tracks) -oldTable add to existing table -tab Separate by tabs rather than space -pathPrefix=<path> .wib file path prefix to use (default /gbdb/<DB>/wib) -chromInfoDb=<DB> database to extract chromInfo size information -maxChromNameLength=N - specify max chromName length to avoid - reference to chromInfo table -tmpDir=<path> - path to directory for creation of temporary .tab file - which will be removed after loading -verbose=N N=2 see # of lines input and SQL create statement, N=3 see chrom size info, N=4 see details on chrom size info ================================================================ ======== hgSpeciesRna ==================================== ================================================================ ### kent source version 362 ### hgSpeciesRna - Create fasta file with RNA from one species usage: hgSpeciesRna database Genus species output.fa options: -est - If set will get ESTs rather than mRNAs -filter=file - only read accessions listed in file ================================================================ ======== hgTrackDb ==================================== ================================================================ ### kent source version 362 ### hgTrackDb - Create trackDb table from text files. Note that the browser supports multiple trackDb tables, usually in the form: trackDb_YourUserName. Which particular trackDb table the browser uses is specified in the hg.conf file found either in your home directory file '.hg.conf' or in the web server's cgi-bin/hg.conf configuration file with the setting: db.trackDb=trackDb see also: src/product/ex.hg.conf discussion of this setting. usage: hgTrackDb [options] org database trackDb trackDb.sql hgRoot Options: org - a directory name with a hierarchy of trackDb.ra files to examine - in the case of a single directory with a single trackDb.ra file use . database - name of database to create the trackDb table in trackDb - name of table to create, usually trackDb, or trackDb_${USER} trackDb.sql - SQL definition of the table to create, typically from - the source tree file: src/hg/lib/trackDb.sql - the table name in the CREATE statement is replaced by the - table name specified on this command line. hgRoot - a directory name to prepend to org to locate the hierarchy: hgRoot/trackDb.ra - top level trackDb.ra file processed first hgRoot/org/trackDb.ra - second level file processed second hgRoot/org/database/trackDb.ra - third level file processed last - for no directory hierarchy use . -strict - only include tables that exist (and complain about missing html files). -raName=trackDb.ra - Specify a file name to use other than trackDb.ra for the ra files. -release=alpha|beta|public - Include trackDb entries with this release tag only. -settings - for trackDb scanning, output table name, type line, - and settings hash to stderr while loading everything. ================================================================ ======== hgWiggle ==================================== ================================================================ ### kent source version 362 ### # no database specified, using .wig files # doAscii option on, perform the default ascii output hgWiggle - fetch wiggle data from data base or file usage: hgWiggle [options] <track names ...> options: -db=<database> - use specified database -chr=chrN - examine data only on chrN -chrom=chrN - same as -chr option above -position=[chrN:]start-end - examine data in window start-end (1-relative) (the chrN: is optional) -chromLst=<file> - file with list of chroms to examine -doAscii - perform the default ascii output, in addition to other outputs - Any of the other -do outputs turn off the default ascii output - ***WARNING*** this ascii output is 0-relative offset which - *** is *not* the normal wiggle input format. Use the -lift - *** argument -lift=1 to get 1-relative offset: -lift=<D> - lift ascii output positions by D (0 default) -rawDataOut - output just the data values, nothing else -htmlOut - output stats or histogram in HTML instead of plain text -doStats - perform stats measurement, default output text, see -htmlOut -doBed - output bed format -bedFile=<file> - constrain output to ranges specified in bed <file> -dataConstraint='DC' - where DC is one of < = >= <= == != 'in range' -ll=<F> - lowerLimit compare data values to F (float) (all but 'in range') -ul=<F> - upperLimit compare data values to F (float) (need both ll and ul when 'in range') -help - display more examples and extra options (to stderr) When no database is specified, track names will refer to .wig files example using the file chrM.wig: hgWiggle chrM example using the database table hg17.gc5Base: hgWiggle -chr=chrM -db=hg17 gc5Base ================================================================ ======== hgsqldump ==================================== ================================================================ hgsqldump - Execute mysqldump using passwords from .hg.conf usage: hgsqldump [OPTIONS] database [tables] or: hgsqldump [OPTIONS] --databases [OPTIONS] DB1 [DB2 DB3 ...] or: hgsqldump [OPTIONS] --all-databases [OPTIONS] Generally anything in command line is passed to mysqldump after an implicit '-u user -ppassword See also: mysqldump Note: directory for results must be writable by mysql. i.e. 'chmod 777 .' Which is a security risk, so remember to change permissions back after use. e.g.: hgsqldump --all -c --tab=. cb1 ================================================================ ======== htmlCheck ==================================== ================================================================ ### kent source version 362 ### htmlCheck - Do a little reading and verification of html file usage: htmlCheck how url where how is: ok - just check for 200 return. Print error message and exit -1 if no 200 getAll - read the url (header and html) and print to stdout getHeader - read the header and print to stdout getCookies - print list of cookies getHtml - print the html, but not the header to stdout getForms - print the form structure to stdout getVars - print the form variables to stdout getLinks - print links getTags - print out just the tags checkLinks - check links in page checkLinks2 - check links in page and all subpages in same host (Just one level of recursion) checkLocalLinks - check local links in page checkLocalLinks2 - check local links in page and connected local pages (Just one level of recursion) submit - submit first form in page if any using 'GET' method validate - do some basic validations including TABLE/TR/TD nesting strictTagNestCheck - check tags are correctly nested options: cookies=cookie.txt - Cookies is a two column file containing <cookieName><space><value><newLine> note: url will need to be in quotes if it contains an ampersand or question mark. ================================================================ ======== hubCheck ==================================== ================================================================ ### kent source version 362 ### hubCheck - Check a track data hub for integrity. usage: hubCheck http://yourHost/yourDir/hub.txt options: -noTracks - don't check remote files for tracks, just trackDb (faster) -checkSettings - check trackDb settings to spec -version=[v?|url] - version to validate settings against (defaults to version in hub.txt, or current standard) -extra=[file|url] - accept settings in this file (or url) -level=base|required - reject settings below this support level -settings - just list settings with support level -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs. Will create this directory if not existing -printMeta - print the metadaa for each track -cacheTime=N - set cache refresh time in seconds, default 1 -verbose=2 - output verbosely ================================================================ ======== hubPublicCheck ==================================== ================================================================ ### kent source version 362 ### hubPublicCheck - checks that the labels in hubPublic match what is in the hub labels outputs SQL statements to put the table into compliance usage: hubPublicCheck tableName options: -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs -addHub=url - output statments to add url to table ================================================================ ======== ixIxx ==================================== ================================================================ ### kent source version 362 ### ixIxx - Create indices for simple line-oriented file of format <symbol> <free text> usage: ixIxx in.text out.ix out.ixx Where out.ix is a word index, and out.ixx is an index into the index. options: -prefixSize=N Size of prefix to index on in ixx. Default is 5. -binSize=N Size of bins in ixx. Default is 64k. ================================================================ ======== lavToAxt ==================================== ================================================================ lavToAxt - Convert blastz lav file to an axt file (which includes sequence) usage: lavToAxt in.lav tNibDir qNibDir out.axt Where tNibDir/qNibDir are either directories full of nib files, or a single twoBit file options: -fa qNibDir is interpreted as a fasta file of multiple dna seq instead of directory of nibs -tfa tNibDir is interpreted as a fasta file of multiple dna seq instead of directory of nibs -dropSelf drops alignment blocks on the diagonal for self alignments -scoreScheme=fileName Read the scoring matrix from a blastz-format file. (only used in conjunction with -dropSelf, to rescore alignments when blocks are dropped) ================================================================ ======== lavToPsl ==================================== ================================================================ lavToPsl - Convert blastz lav to psl format usage: lavToPsl in.lav out.psl options: -target-strand=c set the target strand to c (default is no strand) -bed output bed instead of psl -scoreFile=filename output lav scores to side file, such that each psl line in out.psl is matched by a score line. ================================================================ ======== ldHgGene ==================================== ================================================================ ### kent source version 362 ### ldHgGene - load database with gene predictions from a gff file. usage: ldHgGene database table file(s).gff options: -bin Add bin column (now the default) -nobin don't add binning (you probably don't want this) -exon=type Sets type field for exons to specific value -oldTable Don't overwrite what's already in table -noncoding Forces whole prediction to be UTR -gtf input is GTF, stop codon is not in CDS -predTab input is already in genePredTab format -requireCDS discard genes that don't have CDS annotation -out=gpfile write output, in genePred format, instead of loading table. Database is ignored. -genePredExt create a extended genePred, including frame information and gene name -impliedStopAfterCds - implied stop codon in GFF/GTF after CDS ================================================================ ======== liftOver ==================================== ================================================================ ### kent source version 362 ### liftOver - Move annotations from one assembly to another usage: liftOver oldFile map.chain newFile unMapped oldFile and newFile are in bed format by default, but can be in GFF and maybe eventually others with the appropriate flags below. The map.chain file has the old genome as the target and the new genome as the query. *********************************************************************** WARNING: liftOver was only designed to work between different assemblies of the same organism. It may not do what you want if you are lifting between different organisms. If there has been a rearrangement in one of the species, the size of the region being mapped may change dramatically after mapping. *********************************************************************** options: -minMatch=0.N Minimum ratio of bases that must remap. Default 0.95 -gff File is in gff/gtf format. Note that the gff lines are converted separately. It would be good to have a separate check after this that the lines that make up a gene model still make a plausible gene after liftOver -genePred - File is in genePred format -sample - File is in sample format -bedPlus=N - File is bed N+ format -positions - File is in browser "position" format -hasBin - File has bin value (used only with -bedPlus) -tab - Separate by tabs rather than space (used only with -bedPlus) -pslT - File is in psl format, map target side only -ends=N - Lift the first and last N bases of each record and combine the result. This is useful for lifting large regions like BAC end pairs. -minBlocks=0.N Minimum ratio of alignment blocks or exons that must map (default 1.00) -fudgeThick (bed 12 or 12+ only) If thickStart/thickEnd is not mapped, use the closest mapped base. Recommended if using -minBlocks. -multiple Allow multiple output regions -minChainT, -minChainQ Minimum chain size in target/query, when mapping to multiple output regions (default 0, 0) -minSizeT deprecated synonym for -minChainT (ENCODE compat.) -minSizeQ Min matching region size in query with -multiple. -chainTable Used with -multiple, format is db.tablename, to extend chains from net (preserves dups) -errorHelp Explain error messages ================================================================ ======== liftOverMerge ==================================== ================================================================ ### kent source version 362 ### liftOverMerge - Merge multiple regions in BED 5 files generated by liftOver -multiple usage: liftOverMerge oldFile newFile options: -mergeGap=N Max size of gap to merge regions (default 0) ================================================================ ======== liftUp ==================================== ================================================================ ### kent source version 362 ### liftUp - change coordinates of .psl, .agp, .gap, .gl, .out, .align, .gff, .gtf .bscore .tab .gdup .axt .chain .net, .gp, .genepred, .wab, .bed, .bed3, or .bed8 files to parent coordinate system. usage: liftUp [-type=.xxx] destFile liftSpec how sourceFile(s) The optional -type parameter tells what type of files to lift If omitted the type is inferred from the suffix of destFile Type is one of the suffixes described above. DestFile will contain the merged and lifted source files, with the coordinates translated as per liftSpec. LiftSpec is tab-delimited with each line of the form: offset oldName oldSize newName newSize LiftSpec may optionally have a sixth column specifying + or - strand, but strand is not supported for all input types. The 'how' parameter controls what the program will do with items which are not in the liftSpec. It must be one of: carry - Items not in liftSpec are carried to dest without translation drop - Items not in liftSpec are silently dropped from dest warn - Items not in liftSpec are dropped. A warning is issued error - Items not in liftSpec generate an error If the destination is a .agp file then a 'large inserts' file also needs to be included in the command line: liftUp dest.agp liftSpec how inserts sourceFile(s) This file describes where large inserts due to heterochromitin should be added. Use /dev/null and set -gapsize if there's not inserts file. options: -nohead No header written for .psl files -dots=N Output a dot every N lines processed -pslQ Lift query (rather than target) side of psl -axtQ Lift query (rather than target) side of axt -chainQ Lift query (rather than target) side of chain -netQ Lift query (rather than target) side of net -wabaQ Lift query (rather than target) side of waba alignment (waba lifts only work with query side at this time) -nosort Don't sort bed, gff, or gdup files, to save memory -gapsize change contig gapsize from default -ignoreVersions - Ignore NCBI-style version number in sequence ids of input files -extGenePred lift extended genePred ================================================================ ======== linesToRa ==================================== ================================================================ ### kent source version 362 ### linesToRa - generate .ra format from lines with pipe-separated fields usage: linesToRa in.txt out.ra ================================================================ ======== localtime ==================================== ================================================================ localtime - convert unix timestamp to date string usage: localtime <time stamp> <time stamp> - integer 0 to 2147483647 ================================================================ ======== mafAddIRows ==================================== ================================================================ ### kent source version 362 ### mafAddIRows - add 'i' rows to a maf usage: mafAddIRows mafIn twoBitFile mafOut WARNING: requires a maf with only a single target sequence options: -nBeds=listOfBedFiles reads in list of bed files, one per species, with N locations -addN adds rows of N's into maf blocks (rather than just annotating them) -addDash adds rows of -'s into maf blocks (rather than just annotating them) ================================================================ ======== mafAddQRows ==================================== ================================================================ ### kent source version 362 ### mafAddQRows - Add quality data to a maf usage: mafAddQRows species.lst in.maf out.maf where each species.lst line contains two fields 1) species name 2) directory where the .qac and .qdx files are located options: -divisor=n is value to divide Q value by. Default is 5. ================================================================ ======== mafCoverage ==================================== ================================================================ ### kent source version 362 ### mafCoverage - Analyse coverage by maf files - chromosome by chromosome and genome-wide. usage: mafCoverage database mafFile Note maf file must be sorted by chromosome,tStart -restrict=restrict.bed Restrict to parts in restrict.bed -count=N Number of matching species to count coverage. Default = 3 ================================================================ ======== mafFetch ==================================== ================================================================ mafFetch - get overlapping records from an MAF using an index table usage: mafFetch db table overBed mafOut Select MAF records overlapping records in the BED using the the database table to lookup the file and record offset. Only the first 3 columns are required in the bed. Options: ================================================================ ======== mafFilter ==================================== ================================================================ ### kent source version 362 ### mafFilter - Filter out maf files. Output goes to standard out usage: mafFilter file(s).maf options: -tolerate - Just ignore bad input rather than aborting. -minCol=N - Filter out blocks with fewer than N columns (default 1) -minRow=N - Filter out blocks with fewer than N rows (default 2) -maxRow=N - Filter out blocks with >= N rows (default 100) -factor - Filter out scores below -minFactor * (ncol**2) * nrow -minFactor=N - Factor to use with -minFactor (default 5) -minScore=N - Minimum allowed score (alternative to -minFactor) -reject=filename - Save rejected blocks in filename -needComp=species - all alignments must have species as one of the component -overlap - Reject overlapping blocks in reference (assumes ordered blocks) -componentFilter=filename - Filter out blocks without a component listed in filename -speciesFilter=filename - Filter out blocks without a species listed in filename ================================================================ ======== mafFrag ==================================== ================================================================ ### kent source version 362 ### mafFrag - Extract maf sequences for a region from database usage: mafFrag database mafTrack chrom start end strand out.maf options: -outName=XXX Use XXX instead of database.chrom for the name ================================================================ ======== mafFrags ==================================== ================================================================ ### kent source version 362 ### mafFrags - Collect MAFs from regions specified in a 6 column bed file usage: mafFrags database track in.bed out.maf options: -orgs=org.txt - File with list of databases/organisms in order -bed12 - If set, in.bed is a bed 12 file, including exons -thickOnly - Only extract subset between thickStart/thickEnd -meFirst - Put native sequence first in maf -txStarts - Add MAF txstart region definitions ('r' lines) using BED name and output actual reference genome coordinates in MAF. -refCoords - output actual reference genome coordinates in MAF. ================================================================ ======== mafGene ==================================== ================================================================ ### kent source version 362 ### mafGene - output protein alignments using maf and genePred usage: mafGene dbName mafTable genePredTable species.lst output arguments: dbName name of SQL database mafTable name of maf file table genePredTable name of the genePred table species.lst list of species names output put output here options: -useFile genePredTable argument is a genePred file name -geneName=foobar name of gene as it appears in genePred -geneList=foolst name of file with list of genes -geneBeds=foo.bed name of bed file with genes and positions -chrom=chr1 name of chromosome from which to grab genes -exons output exons -noTrans don't translate output into amino acids -uniqAA put out unique pseudo-AA for every different codon -includeUtr include the UTRs, use only with -noTrans -delay=N delay N seconds between genes (default 0) -noDash don't output lines with all dashes ================================================================ ======== mafMeFirst ==================================== ================================================================ ### kent source version 362 ### mafMeFirst - Move component to top if it is one of the named ones. Useful in conjunction with mafFrags when you don't want the one with the gene name to be in the middle. usage: mafMeFirst in.maf me.list out.maf options: -xxx=XXX ================================================================ ======== mafOrder ==================================== ================================================================ ### kent source version 362 ### mafOrder - order components within a maf file usage: mafOrder mafIn order.lst mafOut where order.lst has one species per line options: ================================================================ ======== mafRanges ==================================== ================================================================ ### kent source version 362 ### mafRanges - Extract ranges of target (or query) coverage from maf and output as BED 3 (e.g. for processing by featureBits). usage: mafRanges in.maf db out.bed db should appear in in.maf alignments as the first part of "db.seqName"-style sequence names. The seqName part will be used as the chrom field in the range printed to out.bed. options: -otherDb=oDb Output ranges only for alignments that include oDb. oDB can be comma-separated list. -notAllOGap Don't include bases for which all other species have a gap. ================================================================ ======== mafSpeciesList ==================================== ================================================================ ### kent source version 362 ### mafSpeciesList - Scan maf and output all species used in it. usage: mafSpeciesList in.maf out.lst options: -ignoreFirst - If true ignore first species in each maf, useful when this is a mafFrags result that puts gene id there. ================================================================ ======== mafSpeciesSubset ==================================== ================================================================ ### kent source version 362 ### mafSpeciesSubset - Extract a maf that just has a subset of species. usage: mafSpeciesSubset in.maf species.lst out.maf Where: in.maf is a file where the sequence source are either simple species names, or species.something. Usually actually it's a genome database name rather than a species before the dot to tell the truth. species.lst is a file with a list of species to keep out.maf is the output. It will have columns that are all - or . in the reduced species set removed, as well as the lines representing species not in species.lst removed. options: -keepFirst - If set, keep the first 'a' line in a maf no matter what Useful for mafFrag results where we use this for the gene name ================================================================ ======== mafSplit ==================================== ================================================================ ### kent source version 362 ### mafSplit - Split multiple alignment files usage: mafSplit splits.bed outRoot file(s).maf options: -byTarget Make one file per target sequence. (splits.bed input is ignored). -outDirDepth=N For use only with -byTarget. Create N levels of output directory under current dir. This helps prevent NFS problems with a large number of file in a directory. Using -outDirDepth=3 would produce ./1/2/3/outRoot123.maf. -useSequenceName For use only with -byTarget. Instead of auto-incrementing an integer to determine output filename, expect each target sequence name to end with a unique number and use that number as the integer to tack onto outRoot. -useFullSequenceName For use only with -byTarget. Instead of auto-incrementing an integer to determine output filename, use the target sequence name to tack onto outRoot. -useHashedName=N For use only with -byTarget. Instead of auto-incrementing an integer or requiring a unique number in the sequence name, use a hash function on the sequence name to compute an N-bit number. This limits the max #filenames to 2^N and ensures that even if different subsets of sequences appear in different pairwise mafs, the split file names will be consistent (due to hash function). This option is useful when a "scaffold-based" assembly has more than one sequence name pattern, e.g. both chroms and scaffolds. ================================================================ ======== mafSplitPos ==================================== ================================================================ ### kent source version 362 ### mafSplitPos - Pick positions to split multiple alignment input files usage: mafSplitPos database size(Mbp) out.bed options: -chrom=chrN Restrict to one chromosome -minGap=N Split only on gaps >N bp, defaults to 100, specify -1 to disable -minRepeat=N Split only on repeats >N bp, defaults to 100, specify -1 to disable ================================================================ ======== mafToAxt ==================================== ================================================================ ### kent source version 362 ### mafToAxt - Convert from maf to axt format usage: mafToAxt in.maf tName qName output Where tName and qName are the names for the target and query sequences respectively. tName should be maf target since it must always be oriented in "+" direction. Use 'first' for tName to always use first sequence Options: -stripDb - Strip names from start to first period. ================================================================ ======== mafToBigMaf ==================================== ================================================================ ### kent source version 362 ### mafToBigMaf - Put ucsc standard maf file into bigMaf format usage: mafToBigMaf referenceDb input.maf out.bed options: -xxx=XXX ================================================================ ======== mafToPsl ==================================== ================================================================ ### kent source version 362 ### mafToPsl - Convert maf to psl format usage: mafToPsl querySrc targetSrc in.maf out.psl The query and target src can be either an organism prefix (hg17), or a full src sequence name (hg17.chr11), or just the sequence name if the MAF does not contain organism prefixes. ================================================================ ======== mafToSnpBed ==================================== ================================================================ ### kent source version 362 ### mafToSnpBed - finds SNPs in MAF and builds a bed with their functional consequence usage: mafToSnpBed database input.maf input.gp output.bed options: -xxx=XXX ================================================================ ======== mafsInRegion ==================================== ================================================================ ### kent source version 362 ### mafsInRegion - Extract MAFS in a genomic region usage: mafsInRegion regions.bed out.maf|outDir in.maf(s) options: -outDir - output separate files named by bed name field to outDir -keepInitialGaps - keep alignment columns at the beginning and of a block that are gapped in all species ================================================================ ======== makeTableList ==================================== ================================================================ ### kent source version 362 ### makeTableList - create/recreate tableList tables (cache of SHOW TABLES and DESCRIBE) usage: makeTableList [assemblies] options: -host show tables: mysql host -user show tables: mysql user -password show tables: mysql password -toProf optional: mysql profile to write table list to (target server) -toHost alternative to toProf: mysql target host -toUser alternative to toProf: mysql target user -toPassword alternative to toProf: mysql target pwd -hgcentral specify an alternative hgcentral db name when using -all -all recreate tableList for all active assemblies in hg.conf's hgcentral -bigFiles create table with tuples (track, name of bigfile) ================================================================ ======== maskOutFa ==================================== ================================================================ ### kent source version 362 ### maskOutFa - Produce a masked .fa file given an unmasked .fa and a RepeatMasker .out file, or a .bed file to mask on. usage: maskOutFa in.fa maskFile out.fa.masked where in.fa and out.fa.masked can be the same file, and maskFile can end in .out (for RepeatMasker) or .bed. MaskFile parameter can also be the word 'hard' in which case lower case letters are converted to N's. options: -soft - puts masked parts in lower case other in upper. -softAdd - lower cases masked bits, leaves others unchanged -clip - clip out of bounds mask records rather than dying. -maskFormat=fmt - "out" or "bed" for when input does not have required extension. ================================================================ ======== mktime ==================================== ================================================================ mktime - convert date string to unix timestamp usage: mktime YYYY-MM-DD HH:MM:SS valid dates: 1970-01-01 00:00:00 to 2038-01-19 03:14:07 ================================================================ ======== mrnaToGene ==================================== ================================================================ ### kent source version 362 ### mrnaToGene - convert PSL alignments of mRNAs to gene annotations usage: mrnaToGene [options] psl genePredFile Convert PSL alignments with CDS annotation from genbank to gene annotations in genePred format. Accessions without valids CDS are optionally dropped. A best attempt is made to convert incomplete CDS annotations. The psl argument may either be a PSL file or a table in a databases, depending on options. CDS maybe obtained from the database or file. Accession in PSL files are tried with and with out genbank versions. Options: -db=db - get PSLs and CDS from this database, psl specifies the table. -cdsDb=db - get CDS from this database, psl is a file. -cdsFile=file - get CDS from this file, psl is a file. File is tab separate with name as the first column and NCBI CDS the second -insertMergeSize=8 - Merge inserts (gaps) no larger than this many bases. A negative size disables merging of blocks. This differs from specifying zero in that adjacent blocks will not be merged, allowing tracking of frame for each block. Defaults to 8 unless -cdsMergeSize or -utrMergeSize are specified, if either of these are specified, this option is ignored. -smallInsertSize=n - alias for -insertMergetSize -cdsMergeSize=-1 - merge gaps in CDS no larger than this size. A negative values disables. -cdsMergeMod3 - only merge CDS gaps if they mod 3 -utrMergeSize=-1 - merge gaps in UTR no larger than this size. A negative values disables. -requireUtr - Drop sequences that don't have both 5' and 3' UTR annotated. -genePredExt - create a extended genePred, including frame information. -allCds - consider PSL to be all CDS. -noCds - consider PSL to not contain any CDS. -keepInvalid - Keep sequences with invalid CDS. -quiet - Don't print print info about dropped sequences. -ignoreUniqSuffix - ignore all characters after last `-' in qName when looking up CDS. Used when a suffix has been added to make qName unique. It is not removed from the name in the genePred. ================================================================ ======== netChainSubset ==================================== ================================================================ ### kent source version 362 ### netChainSubset - Create chain file with subset of chains that appear in the net usage: netChainSubset in.net in.chain out.chain options: -gapOut=gap.tab - Output gap sizes to file -type=XXX - Restrict output to particular type in net file -splitOnInsert - Split chain when get an insertion of another chain -wholeChains - Write entire chain references by net, don't split when a high-level net is encoundered. This is useful when nets have been filtered. -skipMissing - skip chains that are not found instead of generating an error. Useful if chains have been filtered. ================================================================ ======== netClass ==================================== ================================================================ ### kent source version 362 ### netClass - Add classification info to net usage: netClass [options] in.net tDb qDb out.net tDb - database to fetch target repeat masker table information qDb - database to fetch query repeat masker table information options: -tNewR=dir - Dir of chrN.out.spec files, with RepeatMasker .out format lines describing lineage specific repeats in target -qNewR=dir - Dir of chrN.out.spec files for query -noAr - Don't look for ancient repeats -qRepeats=table - table name for query repeats in place of rmsk -tRepeats=table - table name for target repeats in place of rmsk - for example: -tRepeats=windowmaskerSdust -liftQ=file.lft - Lift in.net's query coords to chrom-level using file.lft (for accessing chrom-level coords in qDb) -liftT=file.lft - Lift in.net's target coords to chrom-level using file.lft (for accessing chrom-level coords in tDb) ================================================================ ======== netFilter ==================================== ================================================================ ### kent source version 362 ### netFilter - Filter out parts of net. What passes filter goes to standard output. Note a net is a recursive data structure. If a parent fails to pass the filter, the children are not even considered. usage: netFilter in.net(s) options: -q=chr1,chr2 - restrict query side sequence to those named -notQ=chr1,chr2 - restrict query side sequence to those not named -t=chr1,chr2 - restrict target side sequence to those named -notT=chr1,chr2 - restrict target side sequence to those not named -minScore=N - restrict to those scoring at least N -maxScore=N - restrict to those scoring less than N -minGap=N - restrict to those with gap size (tSize) >= minSize -minAli=N - restrict to those with at least given bases aligning -maxAli=N - restrict to those with at most given bases aligning -minSizeT=N - restrict to those at least this big on target -minSizeQ=N - restrict to those at least this big on query -qStartMin=N - restrict to those with qStart at least N -qStartMax=N - restrict to those with qStart less than N -qEndMin=N - restrict to those with qEnd at least N -qEndMax=N - restrict to those with qEnd less than N -tStartMin=N - restrict to those with tStart at least N -tStartMax=N - restrict to those with tStart less than N -tEndMin=N - restrict to those with tEnd at least N -tEndMax=N - restrict to those with tEnd less than N -qOverlapStart=N - restrict to those where the query overlaps a region starting here -qOverlapEnd=N - restrict to those where the query overlaps a region ending here -tOverlapStart=N - restrict to those where the target overlaps a region starting here -tOverlapEnd=N - restrict to those where the target overlaps a region ending here -type=XXX - restrict to given type, maybe repeated to allow several types -syn - do filtering based on synteny (tuned for human/mouse). -minTopScore=N - Minimum score for top level alignments. default 300000 -minSynScore=N - Min syntenic block score (def=200,000). Default covers 27,000 bases including 9,000 aligning--a very stringent requirement. -minSynSize=N - Min syntenic block size (def=20,000). - -minSynAli=N - Min syntenic alignment size(def=10,000). - -maxFar=N - Max distance to allow synteny (def=200,000). -nonsyn - do inverse filtering based on synteny (tuned for human/mouse). -chimpSyn - do filtering based on synteny (tuned for human/chimp). -fill - Only pass fills, not gaps. Only useful with -line. -gap - Only pass gaps, not fills. Only useful with -line. -line - Do this a line at a time, not recursing -noRandom - suppress chains involving 'random' chromosomes -noHap - suppress chains involving chromosome names inc '_hap|_alt' ================================================================ ======== netSplit ==================================== ================================================================ netSplit - Split a genome net file into chromosome net files usage: netSplit in.net outDir options: -xxx=XXX ================================================================ ======== netSyntenic ==================================== ================================================================ ### kent source version 362 ### netSyntenic - Add synteny info to net. usage: netSyntenic in.net out.net options: -xxx=XXX ================================================================ ======== netToAxt ==================================== ================================================================ ### kent source version 362 ### netToAxt - Convert net (and chain) to axt. usage: netToAxt in.net in.chain target.2bit query.2bit out.axt note: directories full of .nib files (an older format) may also be used in place of target.2bit and query.2bit. options: -qChain - net is with respect to the q side of chains. -maxGap=N - maximum size of gap before breaking. Default 100 -gapOut=gap.tab - Output gap sizes to file -noSplit - Don't split chain when there is an insertion of another chain ================================================================ ======== netToBed ==================================== ================================================================ ### kent source version 362 ### netToBed - Convert target coverage of net to a bed file. usage: netToBed in.net out.bed options: -maxGap=N - break up at gaps of given size or more -minFill=N - only include fill of given size of above. ================================================================ ======== newProg ==================================== ================================================================ ### kent source version 362 ### newProg - make a new C source skeleton. usage: newProg progName description words This will make a directory 'progName' and a file in it 'progName.c' with a standard skeleton Options: -jkhgap - include jkhgap.a and mysql libraries as well as jkweb.a archives -cgi - create shell of a CGI script for web ================================================================ ======== newPythonProg ==================================== ================================================================ ### kent source version 362 ### newPythonProg - Make a skeleton for a new python program usage: newPythonProg programName "The usage statement" options: -xxx=XXX ================================================================ ======== nibFrag ==================================== ================================================================ ### kent source version 362 ### nibFrag - Extract part of a nib file as .fa (all bases/gaps lower case by default) usage: nibFrag [options] file.nib start end strand out.fa where strand is + (plus) or m (minus) options: -masked Use lower-case characters for bases meant to be masked out. -hardMasked Use upper-case for not masked-out, and 'N' characters for masked-out bases. -upper Use upper-case characters for all bases. -name=name Use given name after '>' in output sequence. -dbHeader=db Add full database info to the header, with or without -name option. -tbaHeader=db Format header for compatibility with tba, takes database name as argument. ================================================================ ======== nibSize ==================================== ================================================================ ### kent source version 362 ### nibSize - print size of nibs usage: nibSize nib1 [...] ================================================================ ======== oligoMatch ==================================== ================================================================ oligoMatch - find perfect matches in sequence. usage: oligoMatch oligos sequence output.bed where "oligos" and "sequence" can be .fa, .nib, or .2bit files. The oligos may contain IUPAC codes. ================================================================ ======== overlapSelect ==================================== ================================================================ ### kent source version 362 ### wrong # args: overlapSelect [options] selectFile inFile outFile Select records based on overlapping chromosome ranges. The ranges are specified in the selectFile, with each block specifying a range. Records are copied from the inFile to outFile based on the selection criteria. Selection is based on blocks or exons rather than entire range. Options starting with -select* apply to selectFile and those starting with -in* apply to inFile. Options: -selectFmt=fmt - specify selectFile format: psl - PSL format (default for *.psl files). pslq - PSL format, using query instead of target genePred - genePred format (default for *.gp or *.genePred files). bed - BED format (default for *.bed files). If BED doesn't have blocks, the bed range is used. chain - chain file format (default from .chain files) chainq - chain file format, using query instead of target -selectCoordCols=spec - selectFile is tab-separate with coordinates as described by spec, which is one of: o chromCol - chrom in this column followed by start and end. o chromCol,startCol,endCol,strandCol,name - chrom, start, end, and strand in specified columns. Columns can be omitted from the end or left empty to not specify. NOTE: column numbers are zero-based -selectCds - Use only CDS in the selectFile -selectRange - Use entire range instead of blocks from records in the selectFile. -inFmt=fmt - specify inFile format, same values as -selectFmt. -inCoordCols=spec - inFile is tab-separate with coordinates specified by spec, in format described above. -inCds - Use only CDS in the inFile -inRange - Use entire range instead of blocks of records in the inFile. -nonOverlapping - select non-overlapping instead of overlapping records -strand - must be on the same strand to be considered overlapping -oppositeStrand - must be on the opposite strand to be considered overlapping -excludeSelf - don't compare records with the same coordinates and name. Warning: using only one of -inCds or -selectCds will result in different coordinates for the same record. -idMatch - only select overlapping records if they have the same id -aggregate - instead of computing overlap bases on individual select entries, compute it based on the total number of inFile bases overlap by selectFile records. -overlapSimilarity and -mergeOutput will not work with this option. -overlapThreshold=0.0 - minimum fraction of an inFile record that must be overlapped by a single select record to be considered overlapping. Note that this is only coverage by a single select record, not total coverage. -overlapThresholdCeil=1.1 - select only inFile records with less than this amount of overlap with a single record, provided they are selected by other criteria. -overlapSimilarity=0.0 - minimum fraction bases in inFile and selectFile records that overlap the same genomic locations. This is computed by (2*overlapBase)/(inFileBase+selectFileBases). Note that this is only coverage by a single select record and this is bidirectional inFile and selectFile must overlap by this amount. A value of 1.0 will select identical records (or CDS if both CDS options are specified. Not currently supported with -aggregate. -overlapSimilarityCeil=1.1 - select only inFile records with less than this amount of similarity with a single record. provided they are selected by other criteria. -overlapBases=-1 - minimum number of bases of overlap, < 0 disables. -statsOutput - output overlap statistics instead of selected records. If no overlap criteria is specified, all overlapping entries are reported, Otherwise only the pairs passing the criteria are reported. This results in a tab-separated file with the columns: inId selectId inOverlap selectOverlap overBases Where inOverlap is the fraction of the inFile record overlapped by the selectFile record and selectOverlap is the fraction of the select record overlap by inFile records. With -aggregate, output is: inId inOverlap inOverBases inBases -statsOutputAll - like -statsOutput, however output all inFile records, including those that are not overlapped. -statsOutputBoth - like -statsOutput, however output all selectFile and inFile records, including those that are not overlapped. -mergeOutput - output file with be a merge of the input file with the selectFile records that selected it. The format is inRec<tab>selectRec. if multiple select records hit, inRec is repeated. This will increase the memory required. Not supported with -nonOverlapping or -aggregate. -idOutput - output a tab-separated file of pairs of inId selectId with -aggregate, only a single column of inId is written -dropped=file - output rows that were dropped to this file. -verbose=n - verbose > 1 prints some details, ================================================================ ======== para ==================================== ================================================================ ### kent source version 362 ### para - version 12.18 Manage a batch of jobs in parallel on a compute cluster. Normal usage is to do a 'para create' followed by 'para push' until job is done. Use 'para check' to check status. usage: para [options] command [command-specific arguments] The commands are: para create jobList This makes the job-tracking database from a text file with the command line for each job on a separate line. options: -cpu=N Number of CPUs used by the jobs, default 1. -ram=N Number of bytes of RAM used by the jobs. Default is RAM on node divided by number of cpus on node. Shorthand expressions allow t,g,m,k for tera, giga, mega, kilo. e.g. 4g = 4 Gigabytes. -batch=batchDir - specify the directory path that is used to store the batch control files. The batchDir can be an absolute path or a path relative to the current directory. The resulting path is use as the batch name. The directory is created if it doesn't exist. When creating a new batch, batchDir should not have been previously used as a batch name. The batchDir must be writable by the paraHub process. This does not affect the working directory assigned to jobs. It defaults to the directory where para is run. If used, this option must be specified on all para commands for the batch. For example to run two batches in the same directory: para -batch=b1 make jobs1 para -batch=b2 make jobs2 para push This pushes forward the batch of jobs by submitting jobs to parasol It will limit parasol queue size to something not too big and retry failed jobs. options: -retries=N Number of retries per job - default 4. -maxQueue=N Number of jobs to allow on parasol queue. Default 2000000. -minPush=N Minimum number of jobs to queue. Default 1. Overrides maxQueue. -maxPush=N Maximum number of jobs to queue - default 100000. -warnTime=N Number of minutes job runs before hang warning. Default 4320 (3 days). -killTime=N Number of minutes hung job runs before push kills it. By default kill off for backwards compatibility. -delayTime=N Number of seconds to delay before submitting next job to minimize i/o load at startup - default 0. -priority=x Set batch priority to high, medium, or low. Default medium (use high only with approval). If needed, use with make, push, create, shove, or try. Or, set batch priority to a specific numeric value - default 10. 1 is emergency high priority, 10 is normal medium, 100 is low for bottomfeeders. Setting priority higher than normal (1-9) will be logged. Please keep low priority jobs short, they won't be pre-empted. -maxJob=x Limit the number of jobs the batch can run. Specify number of jobs, for example 10 or 'unlimited'. Default unlimited displays as -1. -jobCwd=dir - specify the directory path to use as the current working directory for each job. The dir can be an absolute path or a path relative to the current directory. It defaults to the directory where para is run. para try This is like para push, but only submits up to 10 jobs. para shove Push jobs in this database until all are done or one fails after N retries. para make jobList Create database and run all jobs in it if possible. If one job fails repeatedly this will fail. Suitable for inclusion in makefiles. Same as a 'create' followed by a 'shove'. para check This checks on the progress of the jobs. para stop This stops all the jobs in the batch. para chill Tells system to not launch more jobs in this batch, but does not stop jobs that are already running. para finished List jobs that have finished. para hung List hung jobs in the batch (running > killTime). para slow List slow jobs in the batch (running > warnTime). para crashed List jobs that crashed or failed output checks the last time they were run. para failed List jobs that crashed after repeated restarts. para status List individual job status, including times. para problems List jobs that had problems (even if successfully rerun). Includes host info. para running Print info on currently running jobs. para hippos time Print info on currently running jobs taking > 'time' (minutes) to run. para time List timing information. para recover jobList newJobList Generate a job list by selecting jobs from an existing list where the `check out' tests fail. para priority 999 Set batch priority. Values explained under 'push' options above. para maxJob 999 Set batch maxJob. Values explained under 'push' options above. para ram 999 Set batch ram usage. Values explained under 'push' options above. para cpu 999 Set batch cpu usage. Values explained under 'push' options above. para resetCounts Set batch done and crash counters to 0. para flushResults Flush results file. Warns if batch has jobs queued or running. para freeBatch Free all batch info on hub. Works only if batch has nothing queued or running. para showSickNodes Show sick nodes which have failed when running this batch. para clearSickNodes Clear sick nodes statistics and consecutive crash counts of batch. Common options -verbose=1 - set verbosity level. ================================================================ ======== paraFetch ==================================== ================================================================ ### kent source version 362 ### paraFetch - try to fetch url with multiple connections usage: paraFetch N R URL {outPath} where N is the number of connections to use R is the number of retries outPath is optional. If not specified, it will attempt to parse URL to discover output filename. options: -newer only download a file if it is newer than the version we already have. -progress Show progress of download. ================================================================ ======== paraHub ==================================== ================================================================ ### kent source version 362 ### paraHub - parasol hub server version 12.18 usage: paraHub machineList Where machine list is a file with the following columns: name - Network name cpus - Number of CPUs we can use ramSize - Megabytes of memory tempDir - Location of (local) temp dir localDir - Location of local data dir localSize - Megabytes of local disk switchName - Name of switch this is on options: -spokes=N Number of processes that feed jobs to nodes - default 30. -jobCheckPeriod=N Minutes between checking on job - default 10. -machineCheckPeriod=N Minutes between checking on machine - default 20. -subnet=XXX.YYY.ZZZ Only accept connections from subnet (example 192.168). -nextJobId=N Starting job ID number. -logFacility=facility Log to the specified syslog facility - default local0. -logMinPriority=pri minimum syslog priority to log, also filters file logging. defaults to "warn" -log=file Log to file instead of syslog. -debug Don't daemonize -noResume Don't try to reconnect with jobs running on nodes. -ramUnit=N Number of bytes of RAM in the base unit used by the jobs. Default is RAM on node divided by number of cpus on node. Shorthand expressions allow t,g,m,k for tera, giga, mega, kilo. e.g. 4g = 4 Gigabytes. -defaultJobRam=N Number of ram units in a job has no specified ram usage. Defaults to 1. ================================================================ ======== paraHubStop ==================================== ================================================================ paraHubStop - version 12.18 Shut down paraHub daemon. usage: paraHubStop now ================================================================ ======== paraNode ==================================== ================================================================ ### kent source version 362 ### paraNode - version 12.18 Parasol node server. usage: paraNode start options: -logFacility=facility Log to the specified syslog facility - default local0. -logMinPriority=pri minimum syslog priority to log, also filters file logging. defaults to "warn" -log=file Log to file instead of syslog. -debug Don't daemonize -hub=host Restrict access to connections from hub. -umask=000 Set umask to run under - default 002. -userPath=bin:bin/i386 User dirs to add to path. -sysPath=/sbin:/local/bin System dirs to add to path. -env=name=value - add environment variable to jobs. Maybe repeated. -randomDelay=N Up to this many milliseconds of random delay before starting a job. This is mostly to avoid swamping NFS with file opens when loading up an idle cluster. Also it limits the impact on the hub of very short jobs. Default 5000. -cpu=N Number of CPUs to use - default 1. ================================================================ ======== paraNodeStart ==================================== ================================================================ ### kent source version 362 ### paraNodeStart - version 12.18 Start up parasol node daemons on a list of machines. usage: paraNodeStart machineList where machineList is a file containing a list of hosts. Machine list contains the following columns: <name> <number of cpus> It may have other columns as well. options: -exe=/path/to/paraNode -logFacility=facility Log to the specified syslog facility - default local0. -logMinPriority=pri minimum syslog priority to log, also filters file logging. defaults to "warn" -log=file Log to file instead of syslog. -umask=000 Set umask to run under - default 002. -randomDelay=N Set random start delay in milliseconds - default 5000. -userPath=bin:bin/i386 User dirs to add to path. -sysPath=/sbin:/local/bin System dirs to add to path. -env=name=value - add environment variable to jobs. Maybe repeated. -hub=machineHostingParaHub Nodes will ignore messages from elsewhere. -rsh=/path/to/rsh/like/command. ================================================================ ======== paraNodeStatus ==================================== ================================================================ paraNodeStatus - version 12.18 Check status of paraNode on a list of machines. usage: paraNodeStatus machineList options: -retries=N Number of retries to get in touch with machine. The first retry is after 1/100th of a second. Each retry after that takes twice as long up to a maximum of 1 second per retry. Default is 7 retries and takes about a second. -long List details of current and recent jobs. ================================================================ ======== paraNodeStop ==================================== ================================================================ Couldn't open -verbose=2 , No such file or directory ================================================================ ======== paraSync ==================================== ================================================================ ### kent source version 362 ### paraSync 1.0 paraSync - uses paraFetch to recursively mirror url to given path usage: paraSync {options} N R URL outPath where N is the number of connections to use R is the number of retries options: -A='ext1,ext2' means accept only files with ext1 or ext2 -newer only download a file if it is newer than the version we already have. -progress Show progress of download. ================================================================ ======== paraTestJob ==================================== ================================================================ paraTestJob - version 12.18 A good test job to run on Parasol. Can be configured to take a long time or crash. usage: paraTestJob count Run a relatively time consuming algorithm count times. This algorithm takes about 1/10 per second each time. options: -crash Try to write to NULL when done. -err Return -1 error code when done. -output=file Make some output in file as well. -heavy=n Make output heavy: n extra lumberjack lines. -input=file Make it read in a file too. -sleep=n Sleep for N seconds. ================================================================ ======== parasol ==================================== ================================================================ Parasol version 12.18 Parasol is the name given to the overall system for managing jobs on a computer cluster and to this specific command. This command is intended primarily for system administrators. The 'para' command is the primary command for users. Usage in brief: parasol add machine machineFullHostName localTempDir - Add new machine to pool. or parasol add machine machineFullHostName cpus ramSizeMB localTempDir localDir localSizeMB switchName parasol remove machine machineFullHostName "reason why" - Remove machine from pool. parasol check dead - Check machines marked dead ASAP, some have been fixed. parasol add spoke - Add a new spoke daemon. parasol [options] add job command-line - Add job to list. options: -in=in - Where to get stdin, default /dev/null -out=out - Where to put stdout, default /dev/null -wait - If set wait for job to finish to return and return with job status code -err=outFile - set stderr to out file - only works with wait flag -verbose=N - set verbosity level, default level is 1 -printId - prints jobId to stdout -dir=dir - set output results dir, default is current dir -results=resultFile fully qualified path to the results file, or `results' in the current directory if not specified. -cpu=N Number of CPUs used by the jobs, default 1. -ram=N Number of bytes of RAM used by the jobs. Default is RAM on node divided by number of cpus on node. Shorthand expressions allow t,g,m,k for tera, giga, mega, kilo. e.g. 4g = 4 Gigabytes. parasol [options] clear sick - Clear sick stats on a batch. options: -results=resultFile fully qualified path to the results file, or `results' in the current directory if not specified. parasol remove job id - Remove job of given ID. parasol ping [count] - Ping hub server to make sure it's alive. parasol remove jobs userName [jobPattern] - Remove jobs submitted by user that match jobPattern (which may include ? and * escaped for shell). parasol list machines - List machines in pool. parasol [-extended] list jobs - List jobs one per line. parasol list users - List users one per line. parasol [options] list batches - List batches one per line. option - 'all' if set include inactive parasol list sick - List nodes considered sick by all running batches, one per line. parasol status - Summarize status of machines, jobs, and spoke daemons. parasol [options] pstat2 - Get status of jobs queued and running. options: -results=resultFile fully qualified path to the results file, or `results' in the current directory if not specified. parasol flushResults Flush results file. Warns if batch has jobs queued or running. options: -results=resultFile fully qualified path to the results file, or `results' in the current directory if not specified. options: -host=hostname - connect to a paraHub process on a remote host instead localhost. Important note: Options must precede positional arguments ================================================================ ======== positionalTblCheck ==================================== ================================================================ ### kent source version 362 ### positionalTblCheck - check that positional tables are sorted usage: positionalTblCheck db table options: -verbose=n n>=2, print tables as checked This will check sorting of a table in a variety of formats. It looks for commonly used names for chrom and chrom start columns. It also handles split tables ================================================================ ======== pslCDnaFilter ==================================== ================================================================ ### kent source version 362 ### wrong # of args: pslCDnaFilter [options] inPsl outPsl Filter cDNA alignments in psl format. Filtering criteria are comparative, selecting near best in genome alignments for each given cDNA and non-comparative, based only on the quality of an individual alignment. WARNING: comparative filters requires that the input is sorted by query name. The command: 'sort -k 10,10' will do the trick. Each alignment is assigned a score that is based on identity and weighted towards longer alignments and those with introns. This can do either global or local best-in-genome selection. Local near best in genome keeps fragments of an mRNA that align in discontinuous locations from other fragments. It is useful for unfinished genomes. Global near best in genome keeps alignments based on overall score. Options: -algoHelp - print message describing the filtering algorithm. -localNearBest=-1.0 - local near best in genome filtering, keeping aligments within this fraction of the top score for each aligned portion of the mRNA. A value of zero keeps only the best for each fragment. A value of -1.0 disables (default). -globalNearBest=-1.0 - global near best in genome filtering, keeping aligments withing this fraction of the top score. A value of zero keeps only the best alignment. A value of -1.0 disables (default). -ignoreNs - don't include Ns (repeat masked) while calculating the score and coverage. That is treat them as unaligned rather than mismatches. Ns are still counts as mismatches when calculating the identity. -ignoreIntrons - don't favor apparent introns when scoring. -minId=0.0 - only keep alignments with at least this fraction identity. -minCover=0.0 - minimum fraction of query that must be aligned. If -polyASizes is specified and the query is in the file, the ploy-A is not included in coverage calculation. -decayMinCover - the minimum coverage is calculated per alignment from the query size using the formula: minCoverage = 1.0 - qSize / 250.0 and minCoverage is bounded between 0.25 and 0.9. -minSpan=0.0 - keep only alignments whose target length are at least this fraction of the longest alignment passing the other filters. This can be useful for removing possible retroposed genes. -minQSize=0 - drop queries shorter than this size -minAlnSize=0 - minimum number of aligned bases. This includes repeats, but excludes poly-A/poly-T bases if available. -minNonRepSize=0 - Minimum number of matching bases that are not repeats. This does not include mismatches. Must use -repeats on BLAT if doing unmasked alignments. -maxRepMatch=1.0 - Maximum fraction of matching bases that are repeats. Must use -repeats on BLAT if doing unmasked alignments. -repsAsMatch - treat matches in repeats just like other matches -maxAlignsDrop=-1 - maximum number of alignments for a given query. If exceeded, then all alignments of this query are dropped. A value of -1 disables (default) -maxAligns=-1 - maximum number of alignments for a given query. If exceeded, then alignments are sorted by score and only this number will be saved. A value of -1 disables (default) -polyASizes=file - tab separate file with information about poly-A tails and poly-T heads. Format is outputted by faPolyASizes: id seqSize tailPolyASize headPolyTSize -usePolyTHead - if a poly-T head was detected and is longer than the poly-A tail, it is used when calculating coverage instead of the poly-A head. -bestOverlap - filter overlapping alignments, keeping the best of alignments that are similar. This is designed to be used with overlapping, windowed alignments, where one alignment might be truncated. Does not discarding ones with weird overlap unless -filterWeirdOverlapped is specified. -hapRegions=psl - PSL format alignments of each haplotype pseudo-chromosome to the corresponding reference chromosome region. This is used to map alignments between regions. -dropped=psl - save psls that were dropped to this file. -weirdOverlapped=psl - output weirdly overlapping PSLs to this file. -filterWeirdOverlapped - Filter weirdly overlapped alignments, keeping the single highest scoring one or an arbitrary one if multiple with the same high score. -alignStats=file - output the per-alignment statistics to this file -uniqueMapped - keep only cDNAs that are uniquely aligned after all other filters have been applied. -noValidate - don't run pslCheck validation. -statsOut=file - write filtering stats to this file, overrides -verbose=1 -verbose=1 - 0: quite 1: output stats, unless -statsOut is specified 2: list problem alignment (weird or invalid) 3: list dropped alignments and reason for dropping 4: list kept psl and info 5: info about all PSLs -hapRefMapped=psl - output PSLs of haplotype to reference chromosome cDNA alignments mappings (for debugging purposes). -hapRefCDnaAlns=psl - output PSLs of haplotype cDNA to reference cDNA alignments (for debugging purposes). -hapLociAlns=outfile - output grouping of final alignments create by haplotype mapping process. Each row will start with an integer haplotype group id number follow by a PSL record. All rows with the same id are alignments of the a given cDNA that were determined to be haplotypes of the same locus. Alignments that are not part of a haplotype locus are not included. -alnIdQNameMode - add internal assigned alignment numbers to cDNA names on output. Useful for debugging, as they are include in the verbose tracing as [#1], etc. Will make a mess of normal production usage. -blackList=file.txt - adds a list of accession ranges to a black list. Any accession on this list is dropped. Black list file is two columns where the first column is the beginning of the range, and the second column is the end of the range, inclusive. The default options don't do any filtering. If no filtering criteria are specified, all PSLs will be passed though, except those that are internally inconsistent. THE INPUT MUST BE BE SORTED BY QUERY for the comparative filters. ================================================================ ======== pslCat ==================================== ================================================================ pslCat - concatenate psl files usage: pslCat file(s) options: -check parses input. Detects more errors but slower -nohead omit psl header -dir files are directories (concatenate all in dirs) -out=file put output to file rather than stdout -ext=.xxx limit files in directories to those with extension ================================================================ ======== pslCheck ==================================== ================================================================ ### kent source version 362 ### pslCheck - validate PSL files usage: pslCheck fileTbl(s) options: -db=db - get targetSizes from this database, and if file doesn't exist, look for a table in this database. -prot - confirm psls are protein psls -noCountCheck - don't validate that match/mismatch counts are match the total size of the alignment blocks -pass=pslFile - write PSLs without errors to this file -fail=pslFile - write PSLs with errors to this file -targetSizes=sizesFile - tab file with columns of target and size. If specified, psl is check to have a valid target and target coordinates. -skipInsertCounts - Don't validate insert counts. Useful for BLAT protein PSLs where these are not computed consistently. -querySizes=sizesFile - file with query sizes. -ignoreQUniq - ignore everything after the last `-' in the qName field, that is sometimes used to generate a unique identifier -quiet - no write error message, just filter ================================================================ ======== pslDropOverlap ==================================== ================================================================ pslDropOverlap - deletes all overlapping self alignments. usage: pslDropOverlap in.psl out.psl ================================================================ ======== pslFilter ==================================== ================================================================ pslFilter - filter out psl file pslFilter in.psl out.psl options -dir Input files are directories rather than single files -reward=N (default 1) Bonus to score for match -cost=N (default 1) Penalty to score for mismatch -gapOpenCost=N (default 4) Penalty for gap opening -gapSizeLogMod=N (default 1.00) Penalty for gap sizes -minScore=N (default 15) Minimum score to pass filter -minMatch=N (default 30) Min match (including repeats to pass) -minUniqueMatch (default 20) Min non-repeats to pass) -maxBadPpt (default 700) Maximum divergence in parts per thousand -minAli (default 600) Minimum ratio query in alignment in ppt -noHead Don't output psl header -minAliT (default 0) Like minAli for target ================================================================ ======== pslHisto ==================================== ================================================================ ### kent source version 362 ### wrong # of args: pslHisto [options] what inPsl outHisto Collect counts on PSL alignments for making histograms. These then be analyzed with R, textHistogram, etc. The 'what' argument determines what data to collect, the following are currently supported: o alignsPerQuery - number of alignments per query. Output is one line per query with the number of alignments. o coverSpread - difference between the highest and lowest coverage for alignments of a query. Output line per query, with the difference. Only includes queries with multiple alignments o idSpread - difference between the highest and lowest fraction identity for alignments of a query. Output line per query, with the difference. Options: -multiOnly - omit queries with only one alignment from output. -nonZero - omit queries with zero values. ================================================================ ======== pslLiftSubrangeBlat ==================================== ================================================================ ### kent source version 362 ### pslLiftSubrangeBlat - lift PSLs from blat subrange alignments usage: pslLiftSubrangeBlat isPsl outPsl Lift a PSL with target coordinates from a blat subrange query (e.g. blah/hg18.2bit:chr1:1000-20000) which has subrange coordinates as the target name (e.g. chr1:1000-200000) to actual target coordinates. options: -tSizes=szfile - lift target side based on tName, using target sizes from this tab separated file. -qSizes=szfile - lift query side based on qName, using query sizes from this tab separated file. Must specify at least on of -tSizes or -qSize or both. ================================================================ ======== pslMap ==================================== ================================================================ ### kent source version 362 ### Error: wrong number of arguments pslMap - map PSLs alignments to new targets using alignments of the old target to the new target. Given inPsl and mapPsl, where the target of inPsl is the query of mapPsl, create a new PSL with the query of inPsl aligned to all the targets of mapPsl. If inPsl is a protein to nucleotide alignment and mapPsl is a nucleotide to nucleotide alignment, the resulting alignment is nucleotide to nucleotide alignment of a hypothetical mRNA that would code for the protein. This is useful as it gives base alignments of spliced codons. A chain file may be used instead mapPsl. usage: pslMap [options] inPsl mapFile outPsl Options: -chainMapFile - mapFile is a chain file instead of a psl file -swapMap - swap query and target sides of map file. -swapIn - swap query and target sides of inPsl file. -suffix=str - append str to the query ids in the output alignment. Useful with protein alignments, where the result is not actually and alignment of the protein. -keepTranslated - if either psl is translated, the output psl will be translated (both strands explicted). Normally an untranslated psl will always be created -mapFileWithInQName - The first column of the mapFile PSL records are a qName, the remainder is a standard PSL. When an inPsl record is mapped, only mapping records are used with the corresponding qName. -mapInfo=file - output a file with information about each mapping. The file has the following columns: o srcQName, srcQStart, srcQEnd, srcQSize - qName, etc of psl being mapped (source alignment) o srcTName, srcTStart, srcTEnd - tName, etc of psl being mapped o srcStrand - strand of psl being mapped o srcAligned - number of aligned based in psl being mapped o mappingQName, mappingQStart, mappingQEnd - qName, etc of mapping psl used to map alignment o mappingTName, mappingTStart, mappingTEnd - tName, etc of mapping psl o mappingStrand - strand of mapping psl o mappingId - chain id, or psl file row o mappedQName mappedQStart, mappedQEnd - qName, etc of mapped psl o mappedTName, mappedTStart, mappedTEnd - tName, etc of mapped psl o mappedStrand - strand of mapped psl o mappedAligned - number of aligned bases that were mapped o qStartTrunc - aligned bases at qStart not mapped due to mapping psl/chain not covering the entire soruce psl. This is from the start of the query in the positive direction. o qEndTrunc - similary for qEnd If the psl count not be mapped, the mapping* and mapped* columns are empty. -mappingPsls=pslFile - write mapping alignments that were used in PSL format to this file. Transformations that were done, such as -swapMap, will be reflected in this file. There will be a one-to-one correspondence of rows of this file to rows of the outPsl file. -simplifyMappingIds - simplifying mapping ids (inPsl target name and mapFile query name) before matching them. This first drops everything after the last `-', and then drops everything after the last remaining `.'. -verbose=n - verbose output 2 - show each overlap and the mapping ================================================================ ======== pslMapPostChain ==================================== ================================================================ ### kent source version 362 ### wrong # of args: postTransMapChain [options] inPsl outPsl Post genomic pslMap (TransMap) chaining. This takes transcripts that have been mapped via genomic chains adds back in blocks that didn't get include in genomic chains due to complex rearrangements or other issues. This program has not seen much use and may not do what you want ================================================================ ======== pslMrnaCover ==================================== ================================================================ pslMrnaCover - Make histogram of coverage percentage of mRNA in psl. usage: pslMrnaCover mrna.psl mrna.fa options: -minSize=N - default 100. Minimum size of mRNA considered -listZero=zero.tab - List accessions that don't align in zero.tab ================================================================ ======== pslPairs ==================================== ================================================================ pslPairs - join paired ends in psl alignments usage: pslPairs <pslFile> <pairFile> <pslTableName> <outFilePrefix> creates: <outFilePrefix>.pairs file pslFile - filtered psl alignments of ends from kluster run pairFile - three column tab separated: forward reverse cloneId - forward and reverse columns can be comma separated end ids pslTableName - table name the psl alignments have been loaded into outFilePrefix - prefix used for each output file name Options: -max=N - maximum length of clone sequence (default=47000) -min=N - minimum length of clone sequence (default=32000) -slopval=N - deviation from max/min clone lengths allowed for slop report - (default=5000) -nearTop=N - maximium deviation from best match allowed (default=0.001) -minId=N - minimum pct ID of at least one end (default=0.96) -minOrphanId=N - minimum pct ID for orphan alignment (default=0.96) -tInsert=N - maximum insert bases allowed in sequence alignment - (default=500) -hardMax=N - absolute maximum clone length for long report (default=75000) -verbose - display all informational messages -noBin - do not include bin column in output file -noRandom - do not include placements on random portions - {length(chr name) < 7} -slop - create <outFilePrefix>.slop file of pairs that fall within - slop length -short - create <outFilePrefix>.short file of pairs shorter than - min size -long - create <outFilePrefix>.long file of pairs longer than - max size, but less than hardMax size -mismatch - create <outFilePrefix>.mismatch file of pairs with - bad orientation of ends -orphan - create <outFilePrefix>.orphan file of unmatched end sequences ================================================================ ======== pslPartition ==================================== ================================================================ ### kent source version 362 ### Error: wrong # args pslPartition - split PSL files into non-overlapping sets usage: pslPartition [options] pslFile outDir Split psl files into non-overlapping sets for use in cluster jobs, limiting memory usage, etc. Multiple levels of directories can be are created under outDir to prevent slow access to huge directories. The pslFile maybe compressed and no ordering is assumed. options: -outLevels=0 - number of output subdirectory levels. 0 puts all files directly in outDir, 2, will create files in the form outDir/0/0/00.psl -partSize=20000 - will combine non-overlapping partitions, while attempting to keep them under this number of PSLs. This reduces the number of files that are created while ensuring that there are no overlaps between any two PSL files. A value of 0 creates a PSL file per set of overlapping PSLs. -dropContained - drop PSLs that are completely contained in a block of another PSL. ================================================================ ======== pslPosTarget ==================================== ================================================================ ### kent source version 362 ### pslPosTarget - flip psl strands so target is positive and implicit usage: pslPosTarget inPsl outPsl ================================================================ ======== pslPretty ==================================== ================================================================ pslPretty - Convert PSL to human-readable output usage: pslPretty in.psl target.lst query.lst pretty.out options: -axt Save in format like Scott Schwartz's axt format. Note gaps in both sequences are still allowed in the output, which not all axt readers will expect. -dot=N Output a dot every N records. -long Don't abbreviate long inserts. -check=fileName Output alignment checks to filename. It's recommended that the psl file be sorted by target if it contains multiple targets; otherwise, this will be extremely slow. The target and query lists can be fasta, 2bit or nib files, or a list of these files, one per line. ================================================================ ======== pslRc ==================================== ================================================================ ### kent source version 362 ### wrong # args: pslRc [options] inPsl outPsl reverse-complement psl Options: ================================================================ ======== pslRecalcMatch ==================================== ================================================================ ### kent source version 362 ### pslRecalcMatch - Recalculate match,mismatch,repMatch columns in psl file. This can be useful if the psl went through pslMap, or if you've added lower-case repeat masking after the fact usage: pslRecalcMatch in.psl targetSeq querySeq out.psl where targetSeq is either a nib directory or a two bit file and querySeq is a fasta file, nib file, two bit file, or list of such files. The psl's should be simple non-translated ones. This will work faster if the in.psl is sorted on target. options: -ignoreQUniq - ignore everything after the last `-' in the qName field, that is sometimes used to generate a unique identifier -ignoreQMissing - pass through the record if querySeq doesn't include qName ================================================================ ======== pslReps ==================================== ================================================================ ### kent source version 362 ### pslReps - Analyze repeats and generate genome-wide best alignments from a sorted set of local alignments usage: pslReps in.psl out.psl out.psr where: in.psl is an alignment file generated by psLayout and sorted by pslSort out.psl is the best alignment output out.psr contains repeat info options: -nohead Don't add PSL header. -ignoreSize Will not weigh as much in favor of larger alignments. -noIntrons Will not penalize for not having introns when calculating size factor. -singleHit Takes single best hit, not splitting into parts. -minCover=0.N Minimum coverage to output. Default is 0. -ignoreNs Ignore Ns when calculating minCover. -minAli=0.N Minimum alignment ratio. Default is 0.93. -nearTop=0.N How much can deviate from top and be taken. Default is 0.01. -minNearTopSize=N Minimum size of alignment that is near top for alignment to be kept. Default 30. -coverQSizes=file Tab-separate file with effective query sizes. When used with -minCover, this allows polyAs to be excluded from the coverage calculation. ================================================================ ======== pslScore ==================================== ================================================================ ### kent source version 362 ### pslScore - calculate web blat score from psl files usage: pslScore <file.psl> [moreFiles.psl] options: none at this time columns in output: #tName tStart tEnd qName:qStart-qEnd score percentIdentity ================================================================ ======== pslSelect ==================================== ================================================================ ### kent source version 362 ### pslSelect - select records from a PSL file. usage: pslSelect [options] inPsl outPsl Must specify a selection option Options: -qtPairs=file - file is tab-separated qName and tName pairs to select -qPass - pass all PSLs with queries that do not appear in qtPairs file at all (default is to remove all PSLs for queries that are not in file) -queries=file - file has qNames to select -queryPairs=file - file is tab-separated pairs of qNames to select with new qName to substitute in output file -qtStart=file - file is tab-separate rows of qName,tName,tStart -qDelim=char - use only the part of the query name before this character ================================================================ ======== pslSomeRecords ==================================== ================================================================ ### kent source version 362 ### pslSomeRecords - Extract multiple psl records usage: pslSomeRecords pslIn listFile pslOut where: pslIn is the input psl file listFile is a file with a qName (rna accession usually) on each line pslOut is the output psl file options: -not - include psl if name is NOT in list -tToo - if set, the list file is two column, qName and tName. In this case only records matching on both q and t are output ================================================================ ======== pslSort ==================================== ================================================================ pslSort - Merge and sort psCluster .psl output files usage: pslSort dirs[1|2] outFile tempDir inDir(s)OrFile(s) This will sort all of the .psl input files or those in the directories inDirs in two stages - first into temporary files in tempDir and second into outFile. The device on tempDir must have enough space (typically 15-20 gigabytes if processing whole genome). pslSort g2g[1|2] outFile tempDir inDir(s) This will sort a genome-to-genome alignment, reflecting the alignments across the diagonal. Adding 1 or 2 to the dirs or g2g option will limit the program to only the first or second pass respectively of the sort. options: -nohead Do not write psl header. -verbose=N Set verbosity level, higher for more output. Default is 1. ================================================================ ======== pslStats ==================================== ================================================================ ### kent source version 362 ### pslStats - collect statistics from a psl file. usage: pslStats [options] psl statsOut Options: -queryStats - output per-query statistics, the default is per-alignment stats -overallStats - output overall statistics. -queries=querySizeFile - tab separated file with of expected qNames and sizes. If specified, statistic will include queries that didn't align. ================================================================ ======== pslSwap ==================================== ================================================================ ### kent source version 362 ### wrong # args: pslSwap [options] inPsl outPsl Swap target and query in psls Options: -noRc - don't reverse complement untranslated alignments to keep target positive strand. This will make the target strand explict. ================================================================ ======== pslToBed ==================================== ================================================================ ### kent source version 362 ### pslToBed: tranform a psl format file to a bed format file. usage: pslToBed psl bed options: -cds=cdsFile cdsFile specifies a input cds tab-separated file which contains genbank-style CDS records showing cdsStart..cdsEnd e.g. NM_123456 34..305 These coordinates are assumed to be in the query coordinate system of the psl, like those that are created from genePredToFakePsl -posName changes the qName field to qName:qStart-qEnd (can be used to create links to query position on details page) ================================================================ ======== pslToBigPsl ==================================== ================================================================ ### kent source version 362 ### pslToBigPsl - converts psl to bigPsl input (bed format with extra fields) usage: pslToBigPsl file.psl stdout | sort -k1,1 -k2,2n > file.bigPslInput options: -cds=file.cds -fa=file.fasta NOTE: to build bigBed: bedToBigBed -type=bed12+13 -tab -as=bigPsl.as file.bigPslInput chrom.sizes output.bb ================================================================ ======== pslToChain ==================================== ================================================================ ### kent source version 362 ### pslToChain - Convert psl records to chain records usage: pslToChain pslIn chainOut options: -xxx=XXX ================================================================ ======== pslToPslx ==================================== ================================================================ ### kent source version 362 ### pslToPslx - Convert from psl to pslx format, which includes sequences usage: pslToPslx [options] in.psl qSeqSpec tSeqSpec out.pslx qSeqSpec and tSeqSpec can be nib directory, a 2bit file, or a FASTA file. FASTA files should end in .fa, .fa.gz, .fa.Z, or .fa.bz2 and are read into memory. Options: -masked - if specified, repeats are in lower case cases, otherwise entire sequence is loader case. ================================================================ ======== pslxToFa ==================================== ================================================================ ### kent source version 362 ### pslxToFa - convert pslx (with sequence) to fasta file usage: pslxToFa in.psl out.fa options: -liftTarget=liftTarget.lft -liftQuery=liftQuery.lft ================================================================ ======== qaToQac ==================================== ================================================================ qaToQac - convert from uncompressed to compressed quality score format. usage: qaToQac in.qa out.qac ================================================================ ======== qacAgpLift ==================================== ================================================================ ### kent source version 362 ### qacAgpLift - Use AGP to combine per-scaffold qac into per-chrom qac. usage: qacAgpLift scaffoldToChrom.agp scaffolds.qac chrom.qac options: -mScore=N - score to use for missing data (otherwise fail) range: 0-99, recommended values are 98 (low qual) or 99 (high) ================================================================ ======== qacToQa ==================================== ================================================================ ### kent source version 362 ### qacToQa - convert from compressed to uncompressed quality score format. usage: qacToQa in.qac out.qa -name=name restrict output to just this sequence name ================================================================ ======== qacToWig ==================================== ================================================================ ### kent source version 362 ### qacToWig - convert from compressed quality score format to wiggle format. usage: qacToWig in.qac outFileOrDir -name=name restrict output to just this sequence name -fixed output single file with wig headers and fixed step size If neither -name nor -fixed is used, outFileOrDir is a directory which will be created if it does not already exist. If -name and/or -fixed is used, outFileOrDir is a file (or "stdout"). ================================================================ ======== raSqlQuery ==================================== ================================================================ ### kent source version 362 ### raSqlQuery - Do a SQL-like query on a RA file. raSqlQuery raFile(s) query-options or raSqlQuery -db=dbName query-options Where dbName is a UCSC Genome database like hg18, sacCer1, etc. One of the following query-options must be specified -queryFile=fileName "-query=select list,of,fields from file where field='this'" The queryFile just has a query in it in the same form as the query option. The syntax of a query statement is very SQL-like. The most common commands are: select tag1,tag2,tag3 where tag1 like 'prefix%' where the % is a SQL wildcard. Sorry to mix wildcards. Another command query is select count(*) from * where tag = 'val The from list is optional. If it exists it is a list of raFile names select track,type from *Encode* where type like 'bigWig%' Other command line options: -addFile - Add 'file' field to say where record is defined -addDb - Add 'db' field to say where record is defined -strict - Used only with db option. Only report tracks that exist in db -key=keyField - Use the as the key field for merges and parenting. Default name -parent - Merge together inheriting on parentField -parentField=field - Use field as the one that tells us who is our parent. Default subTrack -overrideNeeded - If set records are only overridden field-by-field by later records if 'override' follows the track name. Otherwiser later record replaces earlier record completely. If not set all records overridden field by field -noInheritField=field - If field is present don't inherit fields from parent -merge - If there are multiple raFiles, records with the same keyField will be merged together with fields in later files overriding fields in earlier files -restrict=keyListFile - restrict output to only ones with keys in file. -db=hg19 - Acts on trackDb files for the given database. Sets up list of files appropriately and sets parent, merge, and override all. Use db=all for all databases ================================================================ ======== raToLines ==================================== ================================================================ ### kent source version 362 ### raToLines - Output .ra file stanzas as single lines, with pipe-separated fields. usage: raToLines in.ra out.txt ================================================================ ======== raToTab ==================================== ================================================================ ### kent source version 362 ### raToTab - Convert ra file to table. usage: raToTab in.ra out.tab options: -cols=a,b,c - List columns in order to output in table Only these columns will be output. If you Don't give this option, all columns are output in alphabetical order -head - Put column names in header ================================================================ ======== randomLines ==================================== ================================================================ ### kent source version 362 ### randomLines - Pick out random lines from file usage: randomLines inFile count outFile options: -seed=N - Set seed used for randomizing, useful for debugging. -decomment - remove blank lines and those starting with ================================================================ ======== rmFaDups ==================================== ================================================================ rmFaDup - remove duplicate records in FA file usage rmFaDup oldName.fa newName.fa ================================================================ ======== rowsToCols ==================================== ================================================================ ### kent source version 362 ### rowsToCols - Convert rows to columns and vice versa in a text file. usage: rowsToCols in.txt out.txt By default all columns are space-separated, and all rows must have the same number of columns. options: -varCol - rows may to have various numbers of columns. -tab - fields are separated by tab -fs=X - fields are separated by given character -fixed - fields are of fixed width with space padding -offsets=X,Y,Z - fields are of fixed width at given offsets ================================================================ ======== sizeof ==================================== ================================================================ type bytes bits char 1 8 unsigned char 1 8 short int 2 16 u short int 2 16 int 4 32 unsigned 4 32 long 8 64 unsigned long 8 64 long long 8 64 u long long 8 64 size_t 8 64 void * 8 64 float 4 32 double 8 64 long double 16 128 LITTLE ENDIAN machine detected byte order: normal order: 0x12345678 in memory: 0x78563412 ================================================================ ======== spacedToTab ==================================== ================================================================ ### kent source version 362 ### spacedToTab - Convert fixed width space separated fields to tab separated Note this requires two passes, so it can't be done on a pipe usage: spacedToTab in.txt out.tab options: -sizes=X,Y,Z - Force it to have columns of the given widths. The final char in each column should be space or newline ================================================================ ======== splitFile ==================================== ================================================================ splitFile - Split up a file usage: splitFile source linesPerFile outBaseName options: -head=file - put head in front of each output -tail=file - put tail at end of each output ================================================================ ======== splitFileByColumn ==================================== ================================================================ ### kent source version 362 ### splitFileByColumn - Split text input into files named by column value usage: splitFileByColumn source outDir options: -col=N - Use the Nth column value (default: N=1, first column) -head=file - Put head in front of each output -tail=file - Put tail at end of each output -chromDirs - Split into subdirs of outDir that are distilled from chrom names, e.g. chr3_random -> outDir/3/chr3_random.XXX . -ending=XXX - Use XXX as the dot-suffix of split files (default: taken from source). -tab - Split by tab characters instead of whitespace. Split source into multiple files in outDir, with each filename determined by values from a column of whitespace-separated input in source. If source begins with a header, you should pipe "tail +N source" to this program where N is number of header lines plus 1, or use some similar method to strip the header from the input. ================================================================ ======== sqlToXml ==================================== ================================================================ ### kent source version 362 ### sqlToXml - dump out all or part of a relational database to XML, guided by a dump specification. See sqlToXml.doc for additional information. usage: sqlToXml database dumpSpec.od output.xml options: -topTag=name - Give the top level XML tag the given name. By default it will be the same as the database name. -query=file.sql - Instead of dumping whole database, just dump those records matching SQL select statement in file.sql. This statement should be of the form: select * from table where ... or select table.* from table,otherTables where ... Where the table is the same as the table in the first line of dumpSpec. -tab=N - number of spaces betweeen tabs in xml.dumpSpec - by default it's 8. (It may be best just to avoid tabs in that file though.) -maxList=N - This will limit any lists in the output to no more than size N. This is mostly just for testing. ================================================================ ======== stringify ==================================== ================================================================ ### kent source version 362 ### stringify - Convert file to C strings usage: stringify [options] in.txt A stringified version of in.txt will be printed to standard output. Options: -var=varname - create a variable with the specified name containing the string. -static - create the variable but put static in front of it. -array - create an array of strings, one for each line ================================================================ ======== subChar ==================================== ================================================================ subChar - Substitute one character for another throughout a file. usage: subChar oldChar newChar file(s) oldChar and newChar can either be single letter literal characters, or two digit hexadecimal ascii codes ================================================================ ======== subColumn ==================================== ================================================================ ### kent source version 362 ### subColumn - Substitute one column in a tab-separated file. usage: subColumn column in.tab sub.tab out.tab Where: column is the column number (starting with 1) in.tab is a tab-separated file sub.tab is a where first column is old values, second new out.tab is the substituted output options: -list - Column is a comma-separated list. Substitute all elements in list -miss=fileName - Print misses to this file instead of aborting ================================================================ ======== tailLines ==================================== ================================================================ tailLines - add tail to each line of file usage: tailLines file tail This will add tail to each line of file and print to stdout. ================================================================ ======== tdbQuery ==================================== ================================================================ ### kent source version 362 ### tdbQuery - Query the trackDb system using SQL syntax. Usage: tdbQuery sqlStatement Where the SQL statement is enclosed in quotations to avoid the shell interpreting it. Only a very restricted subset of a single SQL statement (select) is supported. Examples: tdbQuery "select count(*) from hg18" counts all of the tracks in hg18 and prints the results to stdout tdbQuery "select count(*) from *" counts all tracks in all databases. tdbQuery "select track,shortLabel from hg18 where type like 'bigWig%'" prints to stdout a a two field .ra file containing just the track and shortLabels of bigWig type tracks in the hg18 version of trackDb. tdbQuery "select * from hg18 where track='knownGene' or track='ensGene'" prints the hg18 knownGene and ensGene track's information to stdout. tdbQuery "select *Label from mm9" prints all fields that end in 'Label' from the mm9 trackDb. OPTIONS: -root=/path/to/trackDb/root/dir Sets the root directory of the trackDb.ra directory hierarchy to be given path. By default this is ~/kent/src/hg/makeDb/trackDb. -check Check that trackDb is internally consistent. Prints diagnostic output to stderr and aborts if there's problems. -strict Mimic -strict option on hgTrackDb. Suppresses tracks where corresponding table does not exist. -release=alpha|beta|public Include trackDb entries with this release tag only. Default is alpha. -noBlank Don't print out blank lines separating records -oneLine Print single ('|') pipe-separated line per record -noCompSub Subtracks don't inherit fields from parents -shortLabelLength=N Complain if shortLabels are over N characters -longLabelLength=N Complain if longLabels are over N characters ================================================================ ======== textHistogram ==================================== ================================================================ ### kent source version 362 ### textHistogram - Make a histogram in ascii usage: textHistogram [options] inFile Where inFile contains one number per line. options: -binSize=N - Size of bins, default 1 -maxBinCount=N - Maximum # of bins, default 25 -minVal=N - Minimum value to put in histogram, default 0 -log - Do log transformation before plotting -noStar - Don't draw asterisks -col=N - Which column to use. Default 1 -aveCol=N - A second column to average over. The averages will be output in place of counts of primary column. -real - Data input are real values (default is integer) -autoScale=N - autoscale to N # of bins -probValues - show prob-Values (density and cum.distr.) (sets -noStar too) -freq - show frequences instead of counts -skip=N - skip N lines before starting, default 0 ================================================================ ======== tickToDate ==================================== ================================================================ tickToDate - Convert seconds since 1970 to time and date usage: tickToDate ticks Use 'now' for current ticks and date ================================================================ ======== toLower ==================================== ================================================================ toLower - Convert upper case to lower case in file. Leave other chars alone usage: toLower inFile outFile equivalent to the unix commands: cat inFile | tr '[A-Z]' '[a-z]' > outFile ================================================================ ======== toUpper ==================================== ================================================================ toUpper - Convert lower case to upper case in file. Leave other chars alone usage: toUpper inFile outFile equivalent to the unix commands: cat inFile | tr '[a-z]' '[A-Z]' > outFile ================================================================ ======== transMapPslToGenePred ==================================== ================================================================ ### kent source version 362 ### transMapPslToGenePred - convert PSL alignments of mRNAs to gene annotations. usage: mrnaToGene [options] sourceGenePred mappedPsl mappedGenePred Convert PSL alignments from transmap to genePred. It specifically handles alignments where the source genes are genomic annotations in genePred format, that are converted to PSL for mapping and using this program to create a new genePred. This is an alternative to mrnaToGene which determines CDS and frame from the original annotation, which may have been imported from GFF/GTF. This was created because the genbankCds structure use by mrnaToGene doesn't handle partial start/stop codon or programmed frame shifts. This requires handling the list of CDS regions and the /codon_start attribute, At some point, this program may be extended to do handle genbank alignments correctly. Options: -nonCodingGapFillMax=0 - fill gaps in non-coding regions up to this many bases in length. -codingGapFillMax=0 - fill gaps in coding regions up to this many bases in length. Only coding gaps that are a multiple of three will be fill, with the max rounded down. -noBlockMerge - don't do any block merging of genePred, even of adjacent blocks. This is mainly for debugging. ================================================================ ======== trfBig ==================================== ================================================================ ### kent source version 362 ### trfBig - Mask tandem repeats on a big sequence file. usage: trfBig inFile outFile This will repeatedly run trf to mask tandem repeats in infile and put masked results in outFile. inFile and outFile can be .fa or .nib format. Outfile can be .bed as well. Sequence output is hard masked, lowercase. -bed creates a bed file in current dir -bedAt=path.bed - create a bed file at explicit location -tempDir=dir Where to put temp files. -trf=trfExe explicitly specifies trf executable name -maxPeriod=N Maximum period size of repeat (default 2000) -keep don't delete tmp files -l=<n> when used here, for new trf v4.09 option: maximum TR length expected (in millions) (eg, -l=3 for 3 million), Human genome hg38 would need -l=6 ================================================================ ======== twoBitDup ==================================== ================================================================ ### kent source version 362 ### twoBitDup - check to see if a twobit file has any identical sequences in it usage: twoBitDup file.2bit options: -keyList=file - file to write a key list, two columns: md5sum and sequenceName NOTE: use of keyList is very time expensive for 2bit files with a large number of sequences (> 5,000). Better to use a cluster run with the doIdKeys.pl automation script. -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs example: twoBitDup -keyList=stdout db.2bit \ | grep -v 'are identical' | sort > db.idKeys.txt ================================================================ ======== twoBitInfo ==================================== ================================================================ ### kent source version 362 ### twoBitInfo - get information about sequences in a .2bit file usage: twoBitInfo input.2bit output.tab options: -maskBed instead of seq sizes, output BED records that define areas with masked sequence -nBed instead of seq sizes, output BED records that define areas with N's in sequence -noNs outputs the length of each sequence, but does not count Ns -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs Output file has the columns:: seqName size The 2bit file may be specified in the form path:seq or path:seq1,seq2,seqN... so that information is returned only on the requested sequence(s). If the form path:seq:start-end is used, start-end is ignored. ================================================================ ======== twoBitMask ==================================== ================================================================ ### kent source version 362 ### twoBitMask - apply masking to a .2bit file, creating a new .2bit file usage: twoBitMask input.2bit maskFile output.2bit options: -add Don't remove pre-existing masking before applying maskFile. -type=.XXX Type of maskFile is XXX (bed or out). maskFile can be a RepeatMasker .out file or a .bed file. It must not contain rows for sequences which are not in input.2bit. ================================================================ ======== twoBitToFa ==================================== ================================================================ ### kent source version 362 ### twoBitToFa - Convert all or part of .2bit file to fasta usage: twoBitToFa input.2bit output.fa options: -seq=name Restrict this to just one sequence. -start=X Start at given position in sequence (zero-based). -end=X End at given position in sequence (non-inclusive). -seqList=file File containing list of the desired sequence names in the format seqSpec[:start-end], e.g. chr1 or chr1:0-189 where coordinates are half-open zero-based, i.e. [start,end). -noMask Convert sequence to all upper case. -bpt=index.bpt Use bpt index instead of built-in one. -bed=input.bed Grab sequences specified by input.bed. Will exclude introns. -bedPos With -bed, use chrom:start-end as the fasta ID in output.fa. -udcDir=/dir/to/cache Place to put cache for remote bigBed/bigWigs. Sequence and range may also be specified as part of the input file name using the syntax: /path/input.2bit:name or /path/input.2bit:name or /path/input.2bit:name:start-end ================================================================ ======== vai.pl ==================================== ================================================================ usage: vai.pl [options] db input.(vcf|pgsnp|pgSnp|txt)[.gz] > output.tab Invokes hgVai (Variant Annotation Integrator) on a set of variant calls to add functional effect predictions (e.g. does the variant fall within a regulatory region or part of a gene) and other data relevant to function. input.(...) must be a file or URL containing either variants formatted as VCF or pgSnp, or a sequence of dbSNP rs# IDs, optionally compressed by gzip. Output is printed to stdout. options: --hgVai=/path/to/hgVai Path to hgVai executable (default: /usr/local/apache/cgi-bin/hgVai) --position=chrX:N-M Sequence name, start and end of range to query (default: genome-wide query) --rsId Attempt to match dbSNP rs# ID with variant position at the expense of performance. (default: don't attempt to match dbSNP rs# ID) --udcCache=/path/to/udcCache Path to udc cache, overriding hg.conf setting (default: use value in hg.conf file) --geneTrack=track Genome Browser track with transcript predictions (default: refGene) --hgvsBreakDelIns=on|off HGVS delins: show "delAGinsTT" instead of "delinsTT" (default: off) --hgvsCN=on|off Include HGVS c./n. (coding/noncoding) terms in output (RefSeq transcripts only) (default: on) --hgvsG=on|off Include HGVS g. (genomic) terms in output (RefSeq transcripts only) (default: on) --hgvsP=on|off Include HGVS p. (protein) terms in output (RefSeq transcripts only) (default: on) --hgvsPAddParens=on|off Add parentheses around HGVS p. predicted changes (default: off) --include_cdsNonSyn=on|off Include CDS non-synonymous variants in output (default: on) --include_cdsSyn=on|off Include CDS synonymous variants in output (default: on) --include_exonLoss=on|off Include exon loss variants in output (default: on) --include_intergenic=on|off Include intergenic variants in output (default: on) --include_intron=on|off Include intron variants in output (default: on) --include_nmdTranscript=on|off Include variants in NMD transcripts in output (default: on) --include_noVariation=on|off Include "variants" with no observed variation in output (default: on) --include_nonCodingExon=on|off Include non-coding exon variants in output (default: on) --include_splice=on|off Include splice site and splice region variants in output (default: on) --include_upDownstream=on|off Include upstream and downstream variants in output (default: on) --include_utr=on|off Include 3' and 5' UTR variants in output (default: on) --variantLimit=N Maximum number of variants to process (default: 10000) -n, --dry-run Display hgVai command, but don't execute it -h, --help Display this message ================================================================ ======== validateFiles ==================================== ================================================================ ### kent source version 362 ### validateFiles - Validates the format of different genomic files. Exits with a zero status for no errors detected and non-zero for errors. Uses filename 'stdin' to read from stdin. Automatically decompresses Files in .gz, .bz2, .zip, .Z format. Accepts multiple input files of the same type. Writes Error messages to stderr usage: validateFiles -chromInfo=FILE -options -type=FILE_TYPE file1 [file2 [...]] -type= fasta : Fasta files (only one line of sequence, and no quality scores) fastq : Fasta with quality scores (see http://maq.sourceforge.net/fastq.shtml) csfasta : Colorspace fasta (implies -colorSpace) csqual : Colorspace quality (see link below) See http://marketing.appliedbiosystems.com/mk/submit/SOLID_KNOWLEDGE_RD?_JS=T&rd=dm bam : Binary Alignment/Map See http://samtools.sourceforge.net/SAM1.pdf bigWig : Big Wig See http://genome.ucsc.edu/goldenPath/help/bigWig.html bedN[+P] : BED N or BED N+ or BED N+P where N is a number between 3 and 15 of standard BED columns, optional + indicates the presence of additional columns and P is the number of addtional columns Examples: -type=bed6 or -type=bed6+ or -type=bed6+3 See http://genome.ucsc.edu/FAQ/FAQformat.html#format1 bigBedN[+P] : bigBED N or bigBED N+ or bigBED N+P, similar to BED See http://genome.ucsc.edu/goldenPath/help/bigBed.html tagAlign : Alignment files, replaced with BAM pairedTagAlign broadPeak : ENCODE Peak formats narrowPeak These are specialized bedN+P formats. gappedPeak See http://genomewiki.cse.ucsc.edu/EncodeDCC/index.php/File_Formats bedGraph : BED Graph rcc : NanoString RCC idat : Illumina IDAT -as=fields.as If you have extra "bedPlus" fields, it's great to put a definition of each field in a row in AutoSql format here. Applies to bed-related types. -tab If set, expect fields to be tab separated, normally expects white space separator. Applies to bed-related types. -chromDb=db Specify DB containing chromInfo table to validate chrom names and sizes -chromInfo=file.txt Specify chromInfo file to validate chrom names and sizes -colorSpace Sequences include colorspace values [0-3] (can be used with formats such as tagAlign and pairedTagAlign) -isSorted Input is sorted by chrom, only affects types tagAlign and pairedTagAlign -doReport Output report in filename.report -version Print version For Alignment validations -genome=path/to/hg18.2bit REQUIRED to validate sequence mappings match the genome specified in the .2bit file. (BAM, tagAlign, pairedTagAlign) -nMatch N's do not count as a mismatch -matchFirst=n Only check the first N bases of the sequence -mismatches=n Maximum number of mismatches in sequence (or read pair) -mismatchTotalQuality=n Maximum total quality score at mismatching positions -mmPerPair Check either pair dont exceed mismatch count if validating pairedTagAlign files (default is the total for the pair) -mmCheckOneInN=n Check mismatches in only one in 'n' lines (default=1, all) -allowOther Allow chromosomes that aren't native in BAM's -allowBadLength Allow chromosomes that have the wrong length in BAM -complementMinus Complement the query sequence on the minus strand (for testing BAM) -bamPercent=N.N Percentage of BAM alignments that must be compliant -privateData Private data so empty sequence is tolerated ================================================================ ======== validateManifest ==================================== ================================================================ ### kent source version 362 ### manifest.txt not found in workingDir . validateManifest v1.9 - Validates the ENCODE3 manifest.txt file. Calls validateFiles on each file in the manifest. Exits with a zero status for no errors detected and non-zero for errors. Writes Error messages to stderr usage: validateManifest -dir=workingDir, defaults to the current directory. -encValData=encValDataDir, relative to workingDir, defaults to encValData. Input files in the working directory: manifest.txt - current input manifest file validated.txt - input from previous run of validateManifest Output file in the working directory: validated.txt - results of validated input ================================================================ ======== webSync ==================================== ================================================================ Usage: webSync [options] <url> - download from https server, using files.txt on their end to get the list of files To create files.txt on the remote end, run this command: du -ab > files.txt Or preferably this command (otherwise empty directories will lead to "transmit" errors): find . -type f -exec du -ab {} + > files.txt Or this one if you have symlinks: find -L . -type f -exec du -Lab {} + > files.txt Then run this in the download directory: webSync https://there.org/ This will create a "webSyncLog" directory in the current directory, compare https://there.org/files.txt with the files in the current directory, transfer the missing files and write the changes to webSync/transfer.log. The URL will be saved after the first run and is not necessary from then on. You can add cd xxx && webSync to your crontab. It will not start if it's already running (flagfile). Status files after a run: - webSyncLog/biggerHere.txt - list of files that are bigger here. These could be errors or OK. - webSyncLog/files.here.txt - the list of files here - webSyncLog/files.there.txt - the list of files there, current copy of https://there.org/files.txt - webSyncLog/missingThere.txt - the list of files not on https://there.org anymore but here - webSyncLog/transfer.log - big transfer log, each run, date and size of transferred file is noted here. Options: -h, --help show this help message and exit -d, --debug show debug messages -x CONNECTIONS, --connections=CONNECTIONS Maximum number of parallel connections to the server, default 10 -s, --skipScan Do not scan local file sizes again, in case you know it is up to date ================================================================ ======== wigCorrelate ==================================== ================================================================ ### kent source version 362 ### wigCorrelate - Produce a table that correlates all pairs of wigs. usage: wigCorrelate one.wig two.wig ... n.wig This works on bigWig as well as wig files. The output is to stdout options: -clampMax=N - values larger than this are clipped to this value ================================================================ ======== wigEncode ==================================== ================================================================ ### kent source version 362 ### wigEncode - convert Wiggle ascii data to binary format usage: wigEncode [options] wigInput wigFile wibFile wigInput - wiggle ascii data input file (stdin OK) wigFile - .wig output file to be used with hgLoadWiggle wibFile - .wib output file to be symlinked into /gbdb/<db>/wib/ This processes the three data input format types described at: http://genome.ucsc.edu/encode/submission.html#WIG (track and browser lines are tolerated, i.e. ignored) options: -lift=<D> - lift all input coordinates by D amount, default 0 - can be negative as well as positive -allowOverlap - allow overlapping data, default: overlap not allowed - only effective for fixedStep and if fixedStep declarations - are in order by chromName,chromStart -noOverlapSpanData - check for overlapping span data -wibSizeLimit=<N> - ignore rest of input when wib size is >= N Example: hgGcPercent -wigOut -doGaps -file=stdout -win=5 xenTro1 \ /cluster/data/xenTro1 | wigEncode stdin gc5Base.wig gc5Base.wib load the resulting .wig file with hgLoadWiggle: hgLoadWiggle -pathPrefix=/gbdb/xenTro1/wib xenTro1 gc5Base gc5Base.wig ln -s `pwd`/gc5Base.wib /gbdb/xenTro1/wib ================================================================ ======== wigToBigWig ==================================== ================================================================ ### kent source version 362 ### wigToBigWig v 4 - Convert ascii format wig file (in fixedStep, variableStep or bedGraph format) to binary big wig format. usage: wigToBigWig in.wig chrom.sizes out.bw Where in.wig is in one of the ascii wiggle formats, but not including track lines and chrom.sizes is a two-column file/URL: <chromosome name> <size in bases> and out.bw is the output indexed big wig file. If the assembly <db> is hosted by UCSC, chrom.sizes can be a URL like http://hgdownload.cse.ucsc.edu/goldenPath/<db>/bigZips/<db>.chrom.sizes or you may use the script fetchChromSizes to download the chrom.sizes file. If not hosted by UCSC, a chrom.sizes file can be generated by running twoBitInfo on the assembly .2bit file. options: -blockSize=N - Number of items to bundle in r-tree. Default 256 -itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024 -clip - If set just issue warning messages rather than dying if wig file contains items off end of chromosome. -unc - If set, do not use compression. -fixedSummaries - If set, use a predefined sequence of summary levels. -keepAllChromosomes - If set, store all chromosomes in b-tree. ================================================================ ======== wordLine ==================================== ================================================================ ### kent source version 362 ### wordLine - chop up words by white space and output them with one word to each line. usage: wordLine inFile(s) Output will go to stdout.Options: -csym - Break up words based on C symbol rules rather than white space ================================================================ ======== xmlCat ==================================== ================================================================ ### kent source version 362 ### xmlCat - Concatenate xml files together, stuffing all records inside a single outer tag. usage: xmlCat XXX options: -xxx=XXX ================================================================ ======== xmlToSql ==================================== ================================================================ ### kent source version 362 ### xmlToSql - Convert XML dump into a fairly normalized relational database in the form of a directory full of tab-separated files and table creation SQL. You'll need to run autoDtd on the XML file first to get the dtd and stats files. usage: xmlToSql in.xml in.dtd in.stats outDir options: -prefix=name - A name to prefix all tables with -textField=name - Name to use for text field (default 'text') -maxPromoteSize=N - Maximum size (default 32) for a element that just defines a string to be promoted to a field in parent table ================================================================