Ash O'Farrell aofarrel

Fastqc is a program to perform some basic quality checks on fastq files. It makes nice html reports for a given file, but (as far as I can tell) doesn't provide a straightfowrard way to compare the results across files (which might represent different library preps, sequencing lanes or samples).

Here is the (really pretty hacky) solution to aggregating these stats that I came up with. This all assumes that you have a directory where reports for each fastq file are in a subdirectories containing the reports with names ./library_name.L001.R1.fastqc/fastqc_data.txt. We will then use regular expressions to match just those parts of the file we care about.

import os
import re

#percent sequences left after de_dup
re_dup = re.compile('Total Deduplicated Percentage\t(\d\d\.\d)')

	GNU AFFERO GENERAL PUBLIC LICENSE
	Version 3, 19 November 2007

	Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
	Everyone is permitted to copy and distribute verbatim copies
	of this license document, but changing it is not allowed.

	Preamble

	The GNU Affero General Public License is a free, copyleft license for

	kill all running containers with docker kill $(docker ps -q)
	delete all stopped containers with docker rm $(docker ps -a -q)
	delete all images with docker rmi $(docker images -q)