Jin Zhe jin-zhe

PDF Compression on Mac using command line

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dBATCH  -dQUIET -sOutputFile=output.pdf input.pdf

	#!/usr/bin/env bash
	set -euo pipefail

	# --- Default Values ---
	TARGET_DIR="."
	FILE_EXT=""
	SIZE_LIMIT_MB=10
	OUTPUT_FILE="large_files_report.txt"

	# --- Help Function ---

	'''
	DESCRIPTION:
	This simple convenience function provides parallelization of pandas .apply()
	Adapted from: https://proinsias.github.io/tips/How-to-use-multiprocessing-with-pandas/

	REQUIREMENTS:
	`multiprocess` and `dill` packages are required.
	```
	python -m pip install multiprocess dill
	```

	def load_csv(csv_path: Path, ignore_first_row=True, ignore_empty_rows=True, delimiter=','):
	'''
	Returns all the rows of a csv file
	'''
	rows = []
	with csv_path.open() as csvfile:
	csv_reader = csv.reader(csvfile, delimiter=delimiter)
	if ignore_first_row:
	next(csv_reader)
	for row in csv_reader:

	import pandas as pd


	def jsonl_to_df(jsonl_filepath):
	return pd.read_json(jsonl_filepath, lines=True)


	def df_to_jsonl(df, jsonl_filepath):
	payload = df.to_json(orient='records', lines=True)
	with open(jsonl_filepath, 'w') as writer:

	'''
	Simple script to split a PDF using PyPDF2 package in Python.
	Often times we would need to split an academic paper into the main paper and the
	supplementary material before submission.
	To do that, the script may be simply run as:

	`python split_pdf.py -in CVPR.pdf -s 15 -o`

	This produces 2 files: 'CVPR.01-14.pdf' and 'CVPR.15-20.pdf', where the starting
	page numbers for each split file are 1 and 15 respectively.

	'''
	Workaround for logging a simple table that supports step sliding. (See issue https://github.com/wandb/wandb/issues/6286)
	It's a great pity that wandb currently doesn't support this with the `wandb.Table` which is too overkill.

	The `wandb_htmltable` function follows the same signature as `wandb.Table` and takes as input parameters of the same type.
	It currently only supports text and image type data. Image data is realized via its byte string declared in the <img /> tag

	Example:
	```
	my_data = [

	'''
	Resizes images in source image directory within given size bounds (keeping
	aspect ratio) and outputs in target directory with identical directory tree
	structure. Uses Magick for image resizing.
	'''
	import os
	import argparse
	import subprocess
	from pathlib import Path

	# STEP 1: `$ mkdir ~/bin`
	# STEP 2: `$ touch ~/bin/sshfr`
	# STEP 3: `$ chmod +x ~/bin/sshfr`
	# STEP 4: Copy the following contents into `~/bin/sshfr`
	# STEP 5: Update .profile or .bash_profle: `$ export PATH=$PATH":$HOME/bin"`
	# STEP 6: Reload .profile or .bash_profle E.g. `$ . ~/.bash_profile`

	# The contents of sshfr is as follows
	ADDRESS=$1
	PORT_START=${2-49151}

	from datetime import datetime
	import os

	import pandas as pd
	import argparse

	'''
	Note:
	- Entries start on row 3 of EduRec excel exports
	- 'Student Number' column is mandatory!