upidea’s gists

upidea / gist:4e036f3749bff630574743981dd7fa84

Created August 16, 2022 03:32

bash shell for multiple string into variables and bla..

	#!/bin/bash

	export PATH=/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

	if [[ $ENGINE_CORE_CLICKHOUSE_USER != '' ]]; then
	CLARG="--host ${ENGINE_CORE_CLICKHOUSE_HOST} --port ${ENGINE_CORE_CLICKHOUSE_PORT} --user ${ENGINE_CORE_CLICKHOUSE_USER} --password ${ENGINE_CORE_CLICKHOUSE_PASSWORD}"
	elif [[ $ENGINE_CORE_CLICKHOUSE_HOST != '' ]]; then
	CLARG="--host ${ENGINE_CORE_CLICKHOUSE_HOST} --port ${ENGINE_CORE_CLICKHOUSE_PORT}"
	else
	CLARG="--host 127.0.0.1 --port 9000"

upidea / tfidf

Created March 3, 2020 01:22

tfidf

	# tf-idf （term frequency - inverse document frequency）
	# 常用于挖掘文章的关键词；
	# 在同一篇文章内值大的表示该词在这篇文章中有较高区分度：
	# 在该篇文章中反复出现，而在全部文档中出现较少(逆文档频率)
	# 整个语料中值大的，并无特别的意义，不适于跨文章比较

	# 词频向量化
	from sklearn.feature_extraction.text import CountVectorizer

	# token_pattern 参数设置来指定字符切分字符串： r"(?u)\b[^@]+\b '\\b\\w+\\b'

upidea / hexdump.py

Created April 16, 2019 09:39

	# Format data
	out = "\n".join([
	" ".join([
	f"{data[i+1]:02x}{data[i]:02x}"
	for i in range(line, min(line + 256, length), 16)
	])
	for line in range(0, length, 256)
	])

upidea / hackfile.py

Created April 16, 2019 08:18

	import mmap
	import struct

	print(struct.unpack('<i', b'\xa0\xcf\x6e\x44')) # 1148112800
	struct.unpack('>i', b'\x95\x6b\x31\x93') # -1788137069


	with open('tmp', 'rb', 0) as file, \
	mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as s:
	pos = s.find(b'\x64\x65')

upidea / to_category.py

Created January 29, 2019 03:20

numpy的one_hot编码函数

	def dense_to_one_hot(labels_dense, num_classes):
	"""Convert class labels from scalars to one-hot vectors."""
	num_labels = labels_dense.shape[0]
	index_offset = numpy.arange(num_labels) * num_classes
	labels_one_hot = numpy.zeros((num_labels, num_classes))
	labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
	return labels_one_hot


	看起来这是一段代码

upidea / shuffleofnumpy.py

Last active January 25, 2019 02:11

	# 直接用numpy做数据打散、划分
	data = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)


	labels = to_categorical(np.asarray(labels))
	print('Shape of Data Tensor:', data.shape)
	print('Shape of Label Tensor:', labels.shape)

	indices = np.arange(data.shape[0])
	np.random.shuffle(indices)

upidea / keras_precision_recall.py

Created January 4, 2019 02:19

callback for keras fit, caculate precison and recall.

	class Metrics(tf.keras.callbacks.Callback):
	def on_train_begin(self, logs={}):
	self.confusion = []
	self.precision = []
	self.recall = []
	self.f1s = []
	self.kappa = []
	self.auc = []

	def on_epoch_end(self, epoch, logs={}):

upidea / everydayenglish.py

Created January 3, 2019 07:39

Generate Anki flashcard from web snatch.

	import os
	import re
	import requests
	import json
	import time
	import datetime
	import genanki

	my_model = genanki.Model(
	201901021920,

upidea / hivetime

Created September 18, 2018 08:01

	select to_char('2018-04-26 22:23:40', 'yyyyMMdd');
	select date_format('2018-04-26 22:23:40', 'yyyyMMdd');
	select date_format('2018-04-26 22:23:40', 'yyyy-MM-dd HH:mm:ss');
	select to_char('2018-04-26 22:23:40', 'yyyy-MM-dd hh24:mi:ss');
	select date_format(to_unix_timestamp(nvl('2018-04-26 22:23:40', '')), 'yyyyMMdd');
	select from_unixtime(unix_timestamp('20171205 22:23:40','yyyymmdd HH:mm:ss'),'yyyy-mm-dd HH-mm-ss');

upidea / SparkGibbsLDA.scala

Created July 1, 2018 02:17 — forked from waleking/SparkGibbsLDA.scala

We implement gibbs sampling for LDA by Spark. This version performs much better than alpha version, and now can handle 3196204 words, 100 topics, 1000 sample iterations on server in 161.7 minutes. To solve the long time consuming in collect() process in alpha version, we utilize the cache() method as line 261 and line 262. We also solve a pile o…

	package topic

	import spark.broadcast._
	import spark.SparkContext
	import spark.SparkContext._
	import spark.RDD
	import spark.storage.StorageLevel
	import scala.util.Random
	import scala.math.{ sqrt, log, pow, abs, exp, min, max }
	import scala.collection.mutable.HashMap

J.Wang upidea