Skip to content

Instantly share code, notes, and snippets.

View aofarrel's full-sized avatar
🧬

Ash O'Farrell aofarrel

🧬
View GitHub Profile
@huddlej
huddlej / LICENSE.txt
Last active November 5, 2025 18:22
Command line tool to convert annotated phylogenetic trees nextstrain.org's JSON format to a tidy data frame of tree attributes
GNU AFFERO GENERAL PUBLIC LICENSE
Version 3, 19 November 2007
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU Affero General Public License is a free, copyleft license for
@SirSerje
SirSerje / docker-kill.txt
Last active March 18, 2021 22:11
Remove all docker's shit
kill all running containers with docker kill $(docker ps -q)
delete all stopped containers with docker rm $(docker ps -a -q)
delete all images with docker rmi $(docker images -q)
@dwinter
dwinter / parse_fq.md
Last active June 6, 2024 18:22
Parse fastqc outputs

Fastqc is a program to perform some basic quality checks on fastq files. It makes nice html reports for a given file, but (as far as I can tell) doesn't provide a straightfowrard way to compare the results across files (which might represent different library preps, sequencing lanes or samples).

Here is the (really pretty hacky) solution to aggregating these stats that I came up with. This all assumes that you have a directory where reports for each fastq file are in a subdirectories containing the reports with names ./library_name.L001.R1.fastqc/fastqc_data.txt. We will then use regular expressions to match just those parts of the file we care about.

import os
import re

#percent sequences left after de_dup
re_dup = re.compile('Total Deduplicated Percentage\t(\d\d\.\d)')