Skip to content

Instantly share code, notes, and snippets.

@Hegghammer
Last active January 27, 2023 17:38
Show Gist options
  • Select an option

  • Save Hegghammer/7555831b627ff8eb49d46a4cf569c349 to your computer and use it in GitHub Desktop.

Select an option

Save Hegghammer/7555831b627ff8eb49d46a4cf569c349 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Programmatic interaction with Zotero \n",
"\n",
"It's useful to know how to interact programmatically with Zotero libraries in case you need to extract, modify, or upload things in bulk. \n",
"\n",
"The key to this is the PyZotero package: https://pyzotero.readthedocs.io/en/latest/."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Installation\n",
"\n",
"Install the necessary packages:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!pip install pyzotero\n",
"#!pip install python-dotenv\n",
"#!pip install pandas"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Authentication\n",
"\n",
"Get an API key from Zotero.org (Settings > Feeds/API) and store it in an ` .env` file like so:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"ZOTERO_API_KEY = \"xxxxxxxxxxxxxxxxxx\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the packages and the ` .env` :"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import string\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from pyzotero import zotero\n",
"load_dotenv()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Authenticate with Zotero. The code is slightly different depending on whether you're logging into a personal or a group library. Here I'm logging into a group library. "
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"group_id = XXXXX # Your group library id\n",
"api_key = os.environ.get('ZOTERO_API_KEY')\n",
"zot = zotero.Zotero(group_id, \"group\", api_key)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"A private login would look something like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#personal_id = XXXXXX # Your personal library id\n",
"#api_key = os.environ.get('ZOTERO_API_KEY')\n",
"#zot = zotero.Zotero(personal_id, \"user\", api_key)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"You find the **user id** (usually a seven-digit integer) by clicking [here](https://www.zotero.org/settings/keys) and looking for the phrase \"Your userID for use in API calls is XXXXXX\" in the top left of the page. \n",
"\n",
"The **group id** (usually a six-digit integer) can be found by opening the group’s page: https://www.zotero.org/groups/groupname, and hovering over the group settings link. The ID is the number after ` /groups/` .\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Inspect existing contents"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"all_items = zot.everything(zot.items())\n",
"len(all_items)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"The items are nested dictionaries. Most of the salient variables are in the ['data'] key. We can see the types of subkeys available: "
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['key', 'version', 'parentItem', 'itemType', 'linkMode', 'title', 'accessDate', 'url', 'note', 'contentType', 'charset', 'filename', 'md5', 'mtime', 'tags', 'relations', 'dateAdded', 'dateModified'])"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_items[0]['data'].keys()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"This opens up for all kinds of conceivable extraction jobs, for example to generate a spreadsheet with all the items that meet certain criteria."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Modify existing contents"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"You can modify anything in your library with the ` update_item()` method. Let's say you wanted to lowercase all the tags in the entire library to clean them up and consolidate them. You could iterate over the items in the collection like this: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for index, item in enumerate(all_items):\n",
" print(f'Processing item {index} of {len(all_items)} ..') \n",
" tags = item['data']['tags']\n",
" for tag in tags:\n",
" tag['tag'] = tag['tag'].lower()\n",
" item['data']['tags'] = tags\n",
" zot.update_item(item)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Upload new content\n",
"\n",
"Let's say we have a CSV file titled ` example.csv` structured like this:\n",
"\n",
"```\n",
"Authors,Title,Publication,Year,Abstract\n",
"\"Smith, John; Li, Kim\",\"New Study\",\"Nature\",\"2023\",\"We discover new things.\"\n",
"\"Wu, Deb; Wye, Or\",\"Old Study\",\"Science\",\"1989\",\"We discover new things\"\n",
"```\n",
"\n",
"First we load it with pandas:"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [],
"source": [
"csv = pd.read_csv('example.csv')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we loop over the rows, each time creating a dictionary item (` entry` ) which we send to Zotero with the ` create_items()` method. \n",
"\n",
"Note that the precise script will depend on how the CSV is structured, notably on how the cells containing the author names are formatted. "
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [],
"source": [
"for i in range(len(csv)):\n",
" # Getting the basics is easy:\n",
" title = csv.iloc[i].Title\n",
" publication = csv.iloc[i].Publication\n",
" year = str(csv.iloc[i].Year)\n",
" abstract = csv.iloc[i].Abstract\n",
" # Creating the author dictionary is harder:\n",
" authors = csv.iloc[i].Authors\n",
" authors = authors.split(';')\n",
" authors = [author.strip() for author in authors]\n",
" authors = list(filter(None, authors))\n",
" creators = []\n",
" for fullname in authors:\n",
" last=fullname.split(',')[0]\n",
" first=fullname.split(',')[-1]\n",
" first=first.strip()\n",
" name = {'creatorType': 'author', 'firstName': first, 'lastName': last}\n",
" creators.append(name)\n",
" # Once we have it, we can build the entry item:\n",
" entry = zot.item_template('journalArticle')\n",
" entry['creators'] = creators\n",
" entry['title'] = title\n",
" entry['publicationTitle'] = publication\n",
" entry['date'] = year\n",
" entry['abstractNote'] = abstract\n",
" # Now we upload:\n",
" zot.create_items([entry])\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Also note that the precise composition of the entry dictionary depends on the *type* of item being uploaded. You can get a sense of which fields are needed for each type by using the ` item_template()` function and the name of the item type."
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {},
"outputs": [],
"source": [
"sample_entry = zot.item_template('journalArticle')\n",
"sample_entry"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"You can view all the available item types like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"types = zot.item_types()\n",
"types"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In the above example I have assumed that all the entries in the CSV were journal articles. If you have entries of different types, you may want to have a column with the entry type, and then have ` if` statements handle each type differently."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment