pbh5tools is a collection of tools that can manipulate the content or extract data from two types of h5 files:
pbh5tools is comprised of two executables: cmph5tools.py and bash5tools.py. At the moment, the cmph5tools.py program provides a rich set of tools to manipulate and analyze the data in a cmp.h5 file. The bash5tools.py provides mechanisms to extract basecall information from bas.h5 files.
To install pbh5tools, run the following command from the pbh5tools root directory:
python setup.py install
bash5tools.py can extract read sequences and quality values for both Raw and circular consensus sequencing (CCS) readtypes and use create fastq and fasta files.
usage: bash5tools.py [-h] [--verbose] [--version] [--profile] [--debug]
[--outFilePrefix OUTFILEPREFIX]
[--readType {ccs,subreads,unrolled}] [--outType OUTTYPE]
[--minLength MINLENGTH] [--minReadScore MINREADSCORE]
[--minPasses MINPASSES]
input.bas.h5
Tool for extracting data from .bas.h5 files
positional arguments:
input.bas.h5 input .bas.h5 filename
optional arguments:
-h, --help show this help message and exit
--verbose, -v Set the verbosity level (default: None)
--version show program's version number and exit
--profile Print runtime profile at exit (default: False)
--debug Run within a debugger session (default: False)
--outFilePrefix OUTFILEPREFIX
output filename prefix [None]
--readType {ccs,subreads,unrolled}
read type (ccs, subreads, or unrolled) []
--outType OUTTYPE output file type (fasta, fastq) [fasta]
Read filtering arguments:
--minLength MINLENGTH
min read length [0]
--minReadScore MINREADSCORE
min read score, valid only with
--readType={unrolled,subreads} [0]
--minPasses MINPASSES
min number of CCS passes, valid only with
--readType=ccs [0]
Extracting all Raw reads from input.bas.h5 without any filtering and exporting to FASTA (myreads.fasta):
python bash5tools.py input.bas.h5 --outFilePrefix myreads --outType fasta --readType Raw
Extracting all CCS reads from input.bas.h5 that have read lengths larger than 100 and exporting to FASTQ (myreads.fastq):
python bash5tools.py --inFile input.bas.h5 --outFilePref myreads --outType fastq --readType CCS --minLength 100
cmph5tools.py is a multi-commandline tool that provides access to the following subtools:
3. select: Create a new file from a cmp.h5 file by specifying which reads to include.
4. equal: Compare the contents of 2 cmp.h5 files for equivalence.
5. summarize: Summarize the contents of a cmp.h5 file in a verbose, human readable format.
6. stats: Extract summary metrics from a cmp.h5 file into a csv file.
8. listMetrics: Emit the available metrics and statistics for use in the select and stats subcommands.
To list all available subtools provided by cmph5tools.py simply run:
cmph5tools.py --help
Each subtool has its own usage information which can be generated by running:
cmph5tools.py <toolname> --help
To run any subtool it is suggested to use the --info commandline argument since this will provide progress information while the script is running via printing in stdout:
cmph5tools.py <toolname> --info <other arguments>
More examples are available in the examples.t file.