Last active
December 31, 2024 15:52
-
-
Save akhuff157/ab8daeadf316feeb8c92062990d1adc5 to your computer and use it in GitHub Desktop.
agu_python_workshop_2024.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "view-in-github", | |
| "colab_type": "text" | |
| }, | |
| "source": [ | |
| "<a href=\"https://colab.research.google.com/gist/akhuff157/ab8daeadf316feeb8c92062990d1adc5/agu_python_workshop_2024.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "Snrb6KjKajti" | |
| }, | |
| "source": [ | |
| "# **Python for Satellite Remote Sensing: Analysis and Visualization for Earth Scientists**\n", | |
| "\n", | |
| "Authors: Dr. Rebekah Esmaili (rebekah.esmaili@gmail.com) and Dr. Amy Huff (amy.huff@noaa.gov)\n", | |
| "\n", | |
| "This tutorial was written in December 2024 for the AGU24 Fall Meeting \"Pre-Conference Workshop\" PREWS3.\n", | |
| "\n", | |
| "<font color='red'>**If you use any of the Python code in Sections 1 or 4 in your research, please credit the NOAA/NESDIS/STAR Aerosols & Atmospheric Composition Science Team.**</font>\n", | |
| "\n", | |
| "---\n", | |
| "\n", | |
| "Topics covered in **Section 1 (Search for & Download Satellite Files)**:\n", | |
| "- Access online NOAA and NASA data archives\n", | |
| "- Search for available files for a given satellite/sensor/product & date/time\n", | |
| " - NASA Earthdata:\n", | |
| " - [TEMPO Level 3 (gridded) Nitrogen Dioxide (NO2)](https://www.star.nesdis.noaa.gov/atmospheric-composition-training/images/2024/tempo_tropospheric_no2_20241113_scan009_1822.png)\n", | |
| " - [GPM IMERG rain rate](https://www.star.nesdis.noaa.gov/atmospheric-composition-training/images/2024/gpm_imerg_20240401.png)\n", | |
| " - NOAA Open Data Dissemination (NODD) on AWS:\n", | |
| " - [AVHRR Optimum Interpolation Sea Surface Temperature (OISST)](https://www.star.nesdis.noaa.gov/atmospheric-composition-training/images/2024/avhrr_oisst_2020926.png)\n", | |
| " - [NOAA-20 VIIRS Level 3 (gridded) Aerosol Optical Depth (AOD)](https://www.star.nesdis.noaa.gov/atmospheric-composition-training/images/2024/noaa20_viirs_aod_gridded_20200821.png)\n", | |
| "- Select & download satellite files\n", | |
| " - [Link to zip file with data files](https://www.star.nesdis.noaa.gov/atmospheric-composition-training/documents/2024a/agu2024_python_workshop_data_files.zip) (back-up only, in case you encounter trouble with downloading files in Section 1)\n", | |
| "\n", | |
| "Topics covered in **Section 2 (Open & Understand Satellite Files)**:\n", | |
| "- Understand satellite data filenames\n", | |
| "- Open satellite files that are/are not organized using \"groups\"\n", | |
| "- Identify data variables and read their metadata\n", | |
| "- Extract data variables\n", | |
| "- Process satellite data variables using quality/diagnostic flags\n", | |
| "\n", | |
| "Topics covered in **Section 3 (Simple Plots of Satellite Data)**:\n", | |
| "- Check satellite variable & coordinate dimensions and ensure they are consistent\n", | |
| "- Make a simple plot of satellite data\n", | |
| "\n", | |
| "Topics covered in **Section 4 (Plotting Satellite Data on a Map)**:\n", | |
| "- Set up a figure with a map projection\n", | |
| "- Set the geographic domain of the map\n", | |
| "- Change the appearance (colors) of borderlines, land & water polygons\n", | |
| "- Add latitude/longitude gridlines & labels\n", | |
| "- Pull information out of the data file name and use it to automatically make a plot title & name for saved image file\n", | |
| "\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## **More Information on Satellite Data Terminology**\n", | |
| "\n", | |
| "In this tutorial, we will be working with Level 3 (L3 or gridded) satellite data, and we refer to Level 2 (L2 or granule) satellite data. In addition, the VIIRS gridded AOD data file we will use is generated from reprocessed L2 data, while the other products are generated from operational L2 data. And the TEMPO NO2 data are beta-maturity, while the other products are full/validated maturity.\n", | |
| "\n", | |
| "If you aren't familiar with these terms, the NOAA Aerosols and Atmospheric Composition Science Team's [satellite training website](https://www.star.nesdis.noaa.gov/atmospheric-composition-training/index.php) has definitions and examples of:\n", | |
| "- [Level 1b, Level 2, and Level 3 satellite data processing levels](https://www.star.nesdis.noaa.gov/atmospheric-composition-training/satellite_data_processing_levels.php)\n", | |
| "- [Operational vs. reprocessed data latency](https://www.star.nesdis.noaa.gov/atmospheric-composition-training/satellite_data_operational_reprocessed.php)\n", | |
| "- [Beta, Provisional, and Full/Validated product maturity levels](https://www.star.nesdis.noaa.gov/atmospheric-composition-training/satellite_data_maturity_levels.php)" | |
| ], | |
| "metadata": { | |
| "id": "BtToSuEqU_QJ" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## **Read the Satellite Data Documentation!**\n", | |
| "\n", | |
| "Before working with a satellite product, users should read the product documentation from the science team, such as the users' guide, \"readme\", or algorithm theoretical basis document (ATBD). These documents provide important information about satellite products, including valid data ranges, data screening using data quality/diagnostic flags, and known issues.\n", | |
| "\n", | |
| "Here are links to documentation for the satellite products used in today's tutorial:\n", | |
| "\n", | |
| "- [TEMPO Trace Gas and Cloud Level 2 and Level 3 Data Products: User's Guide](https://asdc.larc.nasa.gov/documents/tempo/guide/TEMPO_Level-2-3_trace_gas_clouds_user_guide_V1.1.pdf) (September 18, 2024)\n", | |
| "-[GPM IMERG V07 Release Notes](https://gpm.nasa.gov/sites/default/files/2024-02/IMERG_V07_ReleaseNotes_240221.pdf) (February 21, 2023)\n", | |
| "-[VIIRS Aerosol Optical Depth EDR (granules) User's Guide](https://www.star.nesdis.noaa.gov/atmospheric-composition-training/documents/VIIRS_AOD_Users_Guide.pdf) (February 2020)\n", | |
| " - VIIRS gridded AOD files are generated by gridding high quality AOD EDR (granules) data on a regular 0.25° x 0.25° equal-angle grid (~28 km x 28 km at the equator), for global coverage\n", | |
| "- [OISST ATBD](https://www.ncei.noaa.gov/pub/data/sds/cdr/CDRs/Sea_Surface_Temperature_Optimum_Interpolation/AlgorithmDescription_01B-09.pdf) (September 13, 2013)\n" | |
| ], | |
| "metadata": { | |
| "id": "MggXxnrWgyTt" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## **Section 0: Set up Google Colab**" | |
| ], | |
| "metadata": { | |
| "id": "x_amqa-K5S87" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "[Google Colab](https://colab.google/) is a hosted Jupyter Notebook service that requires no setup to use and provides free access to computing resources.\n", | |
| "\n", | |
| "[Jupyter Notebook](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html) is an open-source web application that supports >40 programming languages, including Python. It allows code to be broken into “blocks” that run independently, which makes it ideal for learning. Any output from the code in a \"block\" will appear underneath it.\n", | |
| "\n", | |
| "**The Python code demonstrated in this training is universal**. Specific lines of code or functions will run in any Python IDE (e.g., Spyder, Visual Studio Code), Jupyter Notebook, or the Python interpreter." | |
| ], | |
| "metadata": { | |
| "id": "xWSzp7qCARPl" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "###**Example of how to run Jupyter Notebook code blocks**\n", | |
| "\n", | |
| "To see how Jupyter Notebook works, let's run the Python code to print \"Hello world!\"\n", | |
| "\n", | |
| "Place your cursor over the grey code block below, then click the little black circle with the white arrow inside, located on the far left side of the block." | |
| ], | |
| "metadata": { | |
| "id": "8RQO-i-i4TM2" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "print('Hello world!')" | |
| ], | |
| "metadata": { | |
| "id": "kKCQc4nl2vuN" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **Limitations of using Google Colab**\n", | |
| "\n", | |
| "Colab is free, powerful, and easy to use, but it has limitations:\n", | |
| "\n", | |
| "\n", | |
| "1. **Colab sessions are temporary.** A session will expire after 12 hours of continuous use or after 90 minutes of idle time. <font color='red'>**All output, including downloaded or generated files, will be lost after the session expires.**</font> Therefore, any files users want to save must be downloaded to the user's local computer or Google Drive account.\n", | |
| "2. **Colab cannot be configured to use a virtual or conda environment.** Therefore, in general, users must work with the existing, current Colab configuration.\n", | |
| "3. **The Colab configuration changes frequently, with the addition of new packages and updates to existing packages.** So code that runs today may give an error in the future after an update to the Colab configuration." | |
| ], | |
| "metadata": { | |
| "id": "NnfX517t1yxB" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "d3404f3a" | |
| }, | |
| "source": [ | |
| "### **Import Python modules and packages**\n", | |
| "\n", | |
| "- [pathlib](https://docs.python.org/3/library/pathlib.html): module to set file system paths\n", | |
| "- [datetime](https://docs.python.org/3/library/datetime.html): module to manipulate dates and times\n", | |
| "- [S3Fs](https://s3fs.readthedocs.io/en/latest/): library to set up a file system interface with AWS Simple Storage Service (S3) buckets\n", | |
| "- [earthaccess](https://earthaccess.readthedocs.io/en/stable/): library to search for, download, or stream NASA Earth science data\n", | |
| "- [Xarray](https://docs.xarray.dev/en/stable/index.html): library to work with labeled multi-dimensional arrays\n", | |
| "- [NumPy](https://numpy.org/doc/stable/user/index.html): library to perform array operations\n", | |
| "- [Matplotlib](https://matplotlib.org/stable/index.html): library to make plots\n", | |
| "- [cartopy](https://scitools.org.uk/cartopy/docs/latest/index.html): library to make maps\n", | |
| "\n", | |
| "Used by `xarray` but not imported:\n", | |
| "- [netCDF4](https://unidata.github.io/netcdf4-python/): library to read and write netCDF files\n", | |
| "\n", | |
| "---\n", | |
| "---\n", | |
| "The Colab configuration does not include the `earthaccess`, `s3fs`, `netcdf4`, or `cartopy` packages, so they need to be installed.\n", | |
| "\n", | |
| "**Ignore any error messages about package dependency conflicts; they will not impact the training.**" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Install missing packages in Colab quietly (no progress notifications)\n", | |
| "!pip install --quiet earthaccess s3fs netcdf4 cartopy" | |
| ], | |
| "metadata": { | |
| "id": "RWkGnKW1cGIy" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Import modules and packages\n", | |
| "\n", | |
| "from pathlib import Path\n", | |
| "\n", | |
| "import datetime\n", | |
| "from datetime import date\n", | |
| "\n", | |
| "import earthaccess\n", | |
| "\n", | |
| "import s3fs\n", | |
| "\n", | |
| "import xarray as xr\n", | |
| "\n", | |
| "import numpy as np\n", | |
| "\n", | |
| "import matplotlib as mpl\n", | |
| "from matplotlib import pyplot as plt\n", | |
| "import matplotlib.ticker as ticker\n", | |
| "\n", | |
| "import cartopy.feature as cfeature\n", | |
| "from cartopy import crs as ccrs" | |
| ], | |
| "metadata": { | |
| "id": "FvJF3LSTdVbh" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **Save a copy of today's Colab Python configuration**\n", | |
| "\n", | |
| "Google updates the Colab package configuration on a monthly basis. If you plan to run today's notebook again in the future, or to use any of the provided code in a script or notebook of your own, you will want to have a record of the packages & their versions used today.\n", | |
| "\n", | |
| "You can install a [Conda environment](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) on your local computer with the key packages and their versions to run Python code with the same configuration as today's Colab.\n", | |
| "\n", | |
| "Run the blocks below to:\n", | |
| "\n", | |
| "1. Print the current version of Python installed in Colab & save it as a text file (`colab_version.txt`)\n", | |
| "2. Print the list of packages and their versions in the current Colab configuration & save them in a text file (`colab_packages.txt`)\n" | |
| ], | |
| "metadata": { | |
| "id": "S3UG28bVjAA-" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "3iXLZ493VOij" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Find version of Python installed in Colab\n", | |
| "!python --version" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Save installed version of Python to a text file\n", | |
| "!python --version > colab_version.txt" | |
| ], | |
| "metadata": { | |
| "id": "6gqPdgPXXIEO" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# List all packages & their versions in the Colab configuration\n", | |
| "!pip list -v" | |
| ], | |
| "metadata": { | |
| "id": "AbfYeIs_WzYr" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Save list of all installed packages & their versions to a text file\n", | |
| "!pip freeze > colab_packages.txt" | |
| ], | |
| "metadata": { | |
| "id": "RzBNBBxXTLOr" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## **Section 1: Search for & Download Satellite Files**" | |
| ], | |
| "metadata": { | |
| "id": "nf6-3lhb6jTY" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **1.1 NASA Earthdata: TEMPO and GPM IMERG Files**" | |
| ], | |
| "metadata": { | |
| "id": "KMkENZN9mz0O" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "####**1.1.1 Connect to Earthdata Login**\n", | |
| "\n", | |
| "The data archive for TEMPO and GPM IMERG files is [NASA Earthdata](https://search.earthdata.nasa.gov/).\n", | |
| "\n", | |
| "We can connect to [NASA Earthdata Login](https://urs.earthdata.nasa.gov/) using the `earthaccess` package. In this tutorial, we are using the `strategy='interactive'` option so you can manually enter your username and password for your Earthdata Login.\n", | |
| "\n", | |
| "If you don't already have an account with Earthdata, register [here](https://urs.earthdata.nasa.gov/users/new) (it's free!).\n", | |
| "\n", | |
| "Useful `earthaccess` functions covered in this tutorial include:\n", | |
| "- [earthaccess.login()](https://earthaccess.readthedocs.io/en/stable/user-reference/api/api/#earthaccess.api.login): authenticates with Earthdata Login credentials\n", | |
| "- [earthaccess.search_data()](https://earthaccess.readthedocs.io/en/stable/user-reference/api/api/#earthaccess.api.search_data): queries for dataset granules on NASA Earthdata\n", | |
| "- [DataGranule.data_links()](https://earthaccess.readthedocs.io/en/stable/user-reference/granules/granules/#earthaccess.results.DataGranule.data_links): returns the URL link to a granule file\n", | |
| "- [earthaccess.download()](https://earthaccess.readthedocs.io/en/stable/user-reference/api/api/#earthaccess.api.download): downloads granules files to local directory" | |
| ], | |
| "metadata": { | |
| "id": "UkAhHni0h35z" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Connect to NASA Earthdata Login with user credentials\n", | |
| "auth = earthaccess.login(strategy='interactive')" | |
| ], | |
| "metadata": { | |
| "id": "rln4q4_ncQt4" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "#### **1.1.2 Search NASA Earthdata for TEMPO NO2 Level 3 (gridded) files**\n", | |
| "\n", | |
| "The `earthaccess.search_data()` function has a number of arguments that can be defined to query Earthdata, including `short_name`, `temporal`, `bounding_box`, and `granule_name`. You can use some or all of these parameters to search for satellite data files.\n", | |
| "\n", | |
| "Let's use these search parameters to find the available TEMPO NO2 L3 files spanning the full TEMPO field of regard (FOR) for November 13, 2024. The FOR extends from Mexico City and the Yucatan Peninsula to the Canadian oil sands in the north-south direction, and from the Atlantic Ocean to the Pacific Ocean in the east-west direction. TEMPO scans its FOR from east to west; a full scan takes about 1 hour, with shorter scan times in the morning and evening (~40 minutes)." | |
| ], | |
| "metadata": { | |
| "id": "Oh7t1_VvixKq" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Earthdata search settings\n", | |
| "\n", | |
| "short_name = 'TEMPO_NO2_L3' # Product abbreviation (string)\n", | |
| "observation_start = '2024-11-13 12:00:00' # Observation start date/time (string): 'YYYY-MM-DD HH:MM:SS'\n", | |
| "observation_end = '2024-11-13 23:59:59' # Observation end date/time (string): 'YYYY-MM-DD HH:MM:SS'\n", | |
| "domain = (-130, 10, -50, 60) # Geographic bounding box (integers: W_lon, S_lat, E_lon, N_lat)\n", | |
| "scan_wildcard = '*S*' # Wildcard string for scan number (use '*S*' to find all scans)" | |
| ], | |
| "metadata": { | |
| "id": "lCKEtyRQXbU2" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Search NASA Earthdata\n", | |
| "\n", | |
| "results = earthaccess.search_data(short_name=short_name,\n", | |
| " temporal=(observation_start, observation_end),\n", | |
| " bounding_box=domain,\n", | |
| " granule_name=scan_wildcard)" | |
| ], | |
| "metadata": { | |
| "id": "g06qjMmWXZBg" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "#### **1.1.3 Review NASA Earthdata search results**\n", | |
| "\n", | |
| "Using [list comprehensions](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions) in conjunction with the the `earthaccess` `DataGranule.data_links()` function, we can print the list of satellite file names returned from the NASA Earthdata search.\n", | |
| "\n", | |
| "---\n", | |
| "---\n", | |
| "\n", | |
| "**Pro tip:** When printing directory paths, for simplicity I prefer to print only the final path component using the `str.split` [method](https://docs.python.org/3/library/stdtypes.html#str.split). To see why this is a simpler option, we can also print the full directory path for the first product, for comparison." | |
| ], | |
| "metadata": { | |
| "id": "O_SybN7al5yg" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Print list of available file names\n", | |
| "\n", | |
| "available_files = [(granule.data_links(access=\"external\")) for granule in results]\n", | |
| "for available_file in available_files:\n", | |
| " print(available_file[0].split('/')[-1])" | |
| ], | |
| "metadata": { | |
| "id": "1GxHR-C3Ybwx" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "16fba075" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Print the full directory path for the first product\n", | |
| "# For NASA Earthdata, this is a web link\n", | |
| "\n", | |
| "print(available_files[0])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "#### **1.1.4 Modify search settings to find file corresponding to Scan 9**\n", | |
| "\n", | |
| "Our initial search returned 13 files for November 13, 2024, corresponding to TEMPO Scans 2 (`S002`) through 14 (`S014`). \n", | |
| "\n", | |
| "Let's modify the `Earthdata search settings` to search for only Scan 9 (`S009`), and then re-run the code block to `Search NASA Earthdata`." | |
| ], | |
| "metadata": { | |
| "id": "fcLP9smkXuSx" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Earthdata search settings\n", | |
| "\n", | |
| "short_name = 'TEMPO_NO2_L3' # Product abbreviation (string)\n", | |
| "observation_start = '2024-11-13 12:00:00' # Observation start date/time (string): 'YYYY-MM-DD HH:MM:SS'\n", | |
| "observation_end = '2024-11-13 23:59:59' # Observation end date/time (string): 'YYYY-MM-DD HH:MM:SS'\n", | |
| "domain = (-130, 10, -50, 60) # Geographic bounding box (integers: W_lon, S_lat, E_lon, N_lat)\n", | |
| "scan_wildcard = '*S009*' # Wildcard string for scan number (use '*S*' to find all scans)" | |
| ], | |
| "metadata": { | |
| "id": "DHDwt4HNWspi" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Search NASA Earthdata\n", | |
| "\n", | |
| "results = earthaccess.search_data(short_name=short_name,\n", | |
| " temporal=(observation_start, observation_end),\n", | |
| " bounding_box=domain,\n", | |
| " granule_name=scan_wildcard)" | |
| ], | |
| "metadata": { | |
| "id": "3VHFb1S9Wspk" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Print list of available file names\n", | |
| "# Should be only one file, corresponding to Scan 9 (S009)\n", | |
| "\n", | |
| "available_files = [(granule.data_links(access=\"external\")) for granule in results]\n", | |
| "for available_file in available_files:\n", | |
| " print(available_file[0].split('/')[-1])" | |
| ], | |
| "metadata": { | |
| "id": "lscCU8z--Y9Y" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "#### **1.1.5 Download the TEMPO NO2 file for Scan 9 on November 13, 2024**\n", | |
| "\n", | |
| "Use the `earthaccess.download()` function to download the one TEMPO NO2 L3 (gridded) data file. Downloaded files are saved to the Colab instance; click on the `Files` icon in the menu panel on the left side of the Colab window.\n", | |
| "\n", | |
| "---\n", | |
| "---\n", | |
| "\n", | |
| "\n", | |
| "**Pro tip:** I recommend using the `pathlib` [module](https://docs.python.org/3/library/pathlib.html) to set directory paths because it has a lot of very handy features, including automatically using the correct format for the user's operating system. This helps avoid errors in situations when more than one person is using the same code file, because Windows uses back slashes in directory paths, while MacOS and Linux use forward slashes.\n", | |
| "\n", | |
| "The `pathlib` syntax to set a directory path is `Path('directory_name')`, for example `Path('C:/Users/Jane.Smith/Desktop')`. To set the current working directory, use `Path.cwd()`." | |
| ], | |
| "metadata": { | |
| "id": "y00x4CCsbdGR" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Download file from NASA Earthdata\n", | |
| "\n", | |
| "earthaccess.download(results, local_path=Path.cwd())" | |
| ], | |
| "metadata": { | |
| "id": "v0wygewEaaBy" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "##### <b><font color=\"blue\" size=\"5\">Exercise 1-1: Search for GPM IMERG rain rate files for April 1, 2024 & download file for the 12:00 UTC scan</b></font>\n", | |
| "\n", | |
| "Fill in the code blocks below, following the same procedure used for the TEMPO NO2 data:\n", | |
| "\n", | |
| "1. Change the `Earthdata search settings` to search for all of the available IMERG rain rate files for April 1, 2024, as follows:\n", | |
| "\n", | |
| "```\n", | |
| "short_name = 'GPM_3IMERGHH' # Product abbreviation (string)\n", | |
| "observation_start = '2024-04-01 00:00:00' # Observation start date/time (string): 'YYYY-MM-DD HH:MM:SS'\n", | |
| "observation_end = '2024-04-01 23:59:59' # Observation end date/time (string): 'YYYY-MM-DD HH:MM:SS'\n", | |
| "domain = (180, -90, -180, 90) # Geographic bounding box (integers: W_lon, S_lat, E_lon, N_lat)\n", | |
| "scan_wildcard = '*S*' # # Wildcard string for scan number (use '*S*' to find all scans)\n", | |
| "```\n", | |
| "\n", | |
| "2. Run the code block to `Search NASA Earthdata`.\n", | |
| " - Your search should return 48 files.\n", | |
| " - Check the list of file names with the answers (below).\n", | |
| "\n", | |
| "3. Modify the `Earthdata search settings` to search for **only the 12:00 UTC scan** & re-run the the code block to `Search NASA Earthdata`.\n", | |
| " - Your search should return 1 file.\n", | |
| " - Check the file name with the answers (below).\n", | |
| "\n", | |
| "4. Run the code block to `Download file from NASA Earthdata`.\n", | |
| " - **You should download 1 file.**\n", | |
| " - Check that the downloaded file appears in the `Files` menu panel.\n" | |
| ], | |
| "metadata": { | |
| "id": "yQYcjXYfnn-0" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Earthdata search settings\n", | |
| "\n", | |
| "short_name = '' # Product abbreviation (string)\n", | |
| "observation_start = 'YYYY-MM-DD 12:00:00' # Observation start date/time (string): 'YYYY-MM-DD HH:MM:SS'\n", | |
| "observation_end = 'YYYY-MM-DD 23:59:59' # Observation end date/time (string): 'YYYY-MM-DD HH:MM:SS'\n", | |
| "domain = () # Geographic bounding box (integers: W_lon, S_lat, E_lon, N_lat)\n", | |
| "scan_wildcard = '*S*' # Wildcard string for scan number (use '*S*' to find all scans)" | |
| ], | |
| "metadata": { | |
| "id": "bNH3D-FjW2Xn" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Search NASA Earthdata\n", | |
| "\n", | |
| "results =" | |
| ], | |
| "metadata": { | |
| "id": "Dfm4c6OJW2Xo" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Print list of available file names\n", | |
| "\n" | |
| ], | |
| "metadata": { | |
| "id": "sZ3hSaI6XSj-" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Download file from NASA Earthdata for 12:00 UTC scan on April 1, 2024\n", | |
| "\n" | |
| ], | |
| "metadata": { | |
| "id": "uzZ_HAzaXjJv" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "<details><summary><b><font color=\"blue\" size=5>Answers to Exercise 1-1</font></b></summary>\n", | |
| "<p></p>\n", | |
| "\n", | |
| "```\n", | |
| "# Earthdata search settings for all files on April 1, 2024\n", | |
| "\n", | |
| "short_name = 'GPM_3IMERGHH' # Product abbreviation (string)\n", | |
| "observation_start = '2024-04-01 00:00:00' # Observation start date/time (string): 'YYYY-MM-DD HH:MM:SS'\n", | |
| "observation_end = '2024-04-01 23:59:59' # Observation end date/time (string): 'YYYY-MM-DD HH:MM:SS'\n", | |
| "domain = (180, -90, -180, 90) # Geographic bounding box (integers: W_lon, S_lat, E_lon, N_lat)\n", | |
| "scan_wildcard = '*S*' # # Wildcard string for scan number (use '*S*' to find all scans)\n", | |
| "```\n", | |
| "\n", | |
| "```\n", | |
| "# Search NASA Earthdata\n", | |
| "\n", | |
| "results = earthaccess.search_data(short_name=short_name,\n", | |
| " temporal=(observation_start, observation_end),\n", | |
| " bounding_box=domain,\n", | |
| " granule_name=scan_wildcard)\n", | |
| "```\n", | |
| "\n", | |
| "\n", | |
| "\n", | |
| "\n", | |
| "---\n", | |
| "\n", | |
| "**List of all of the available IMERG rain rate files for April 1, 2024**\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S000000-E002959.0000.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S003000-E005959.0030.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S010000-E012959.0060.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S013000-E015959.0090.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S020000-E022959.0120.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S023000-E025959.0150.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S030000-E032959.0180.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S033000-E035959.0210.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S040000-E042959.0240.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S043000-E045959.0270.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S050000-E052959.0300.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S053000-E055959.0330.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S060000-E062959.0360.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S063000-E065959.0390.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S070000-E072959.0420.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S073000-E075959.0450.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S080000-E082959.0480.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S083000-E085959.0510.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S090000-E092959.0540.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S093000-E095959.0570.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S100000-E102959.0600.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S103000-E105959.0630.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S110000-E112959.0660.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S113000-E115959.0690.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S120000-E122959.0720.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S123000-E125959.0750.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S130000-E132959.0780.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S133000-E135959.0810.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S140000-E142959.0840.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S143000-E145959.0870.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S150000-E152959.0900.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S153000-E155959.0930.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S160000-E162959.0960.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S163000-E165959.0990.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S170000-E172959.1020.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S173000-E175959.1050.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S180000-E182959.1080.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S183000-E185959.1110.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S190000-E192959.1140.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S193000-E195959.1170.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S200000-E202959.1200.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S203000-E205959.1230.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S210000-E212959.1260.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S213000-E215959.1290.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S220000-E222959.1320.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S223000-E225959.1350.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S230000-E232959.1380.V07B.HDF5\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S233000-E235959.1410.V07B.HDF5\n", | |
| "\n", | |
| "```\n", | |
| "# Modified Earthdata search settings for the 12:00 UTC scan\n", | |
| "\n", | |
| "short_name = 'GPM_3IMERGHH' # Product abbreviation (string)\n", | |
| "observation_start = '2024-04-01 00:00:00' # Observation start date/time (string): 'YYYY-MM-DD HH:MM:SS'\n", | |
| "observation_end = '2024-04-01 23:59:59' # Observation end date/time (string): 'YYYY-MM-DD HH:MM:SS'\n", | |
| "domain = (180, -90, -180, 90) # Geographic bounding box (integers: W_lon, S_lat, E_lon, N_lat)\n", | |
| "scan_wildcard = '*S1200*' # # Wildcard string for scan number (use '*S*' to find all scans)\n", | |
| "```\n", | |
| "\n", | |
| "---\n", | |
| "\n", | |
| "**IMERG file for the 12:00 UTC scan on April 1, 2024**\n", | |
| "\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S120000-E122959.0720.V07B.HDF5\n", | |
| "\n", | |
| "---\n" | |
| ], | |
| "metadata": { | |
| "id": "6AyYSX39mU9v" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **1.2 NOAA Open Data Dissemination (NODD): OISST and VIIRS AOD Files**" | |
| ], | |
| "metadata": { | |
| "id": "zg7BsB4XnDl5" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "#### **1.2.1 Connect to NODD on Amazon Web Services (AWS)**\n", | |
| "The [NODD Program](https://www.noaa.gov/information-technology/open-data-dissemination) provides public access to NOAA's open data via commercial cloud platforms.\n", | |
| "\n", | |
| "The NODD platform for VIIRS gridded AOD and AVHRR OISST files is Amazon Web Services (AWS). We can connect to AWS using the `s3fs` package with an anonymous connection (`annon=True`). Files on AWS are stored in directories called Simple Storage Service (S3) buckets. The `s3fs` package allows users to access AWS S3 buckets as if they are file system (`fs`) directories.\n", | |
| "\n", | |
| "Useful `s3fs` functions covered in this training include:\n", | |
| "- [fs.get()](https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem.get): downloads files to local directory\n", | |
| "- [fs.ls()](https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem.ls): lists file paths (directories & files) at source\n", | |
| "- [fs.size()](https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem.size): returns the approximate size (in bytes) of the file\n", | |
| "\n", | |
| "**You do not need an AWS cloud computing account to access NOAA data!** Think of the NODD on AWS as a data archive that just happens to be in the cloud instead of hosted on a physical server." | |
| ], | |
| "metadata": { | |
| "id": "VvjEmgss8anY" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Connect to AWS S3 anonymously\n", | |
| "fs = s3fs.S3FileSystem(anon=True)" | |
| ], | |
| "metadata": { | |
| "id": "-gsEKs-goIRJ" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "#### **1.2.2 Navigating the NODD S3 buckets on AWS**\n", | |
| "\n", | |
| "Each NOAA program has its own S3 buckets for NODD data. The NODD also has a web interface, which allows users to easily see the organizational structure for the various S3 buckets. Data files can be downloaded manually via the web interface.\n", | |
| "\n", | |
| "The NOAA Oceanic Climate Data Records (CDRs) program includes the AVHRR OISST data:\n", | |
| "- [NODD on AWS Oceanic CDRs homepage](https://registry.opendata.aws/noaa-cdr-oceanic/)\n", | |
| "- OISST S3 bucket name: `noaa-cdr-sea-surface-temp-optimum-interpolation-pds`\n", | |
| "- OISST S3 bucket [web interface](https://noaa-cdr-sea-surface-temp-optimum-interpolation-pds.s3.amazonaws.com/index.html)\n", | |
| "\n", | |
| "The NOAA Joint Polar Satellite Series (JPSS) program includes the VIIRS gridded AOD data:\n", | |
| "- [NODD on AWS JPSS homepage](https://registry.opendata.aws/noaa-jpss/)\n", | |
| "- JPSS Development Data S3 bucket name: `noaa-jpss`\n", | |
| "- JPSS Development Data S3 bucket [web interface](https://noaa-jpss.s3.amazonaws.com/index.html)\n", | |
| "\n", | |
| "\n", | |
| "---\n", | |
| "\n", | |
| "Find a specific data file on the NODD by setting the full S3 bucket directory path to the file; for example:\n", | |
| "\n", | |
| "`noaa-cdr-sea-surface-temp-optimum-interpolation-pds/data/v2.1/avhrr/202409/oisst-avhrr-v02r01.20240926.nc`\n", | |
| "\n", | |
| "`noaa-jpss/NOAA20/VIIRS/NOAA20_VIIRS_Aerosol_Optical_Depth_Gridded_Reprocessed/0.25_Degrees_Daily/2020/viirs_eps_noaa20_aod_0.250_deg_20200821.nc`" | |
| ], | |
| "metadata": { | |
| "id": "MbzrzIKnTjvB" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "7acd94bd" | |
| }, | |
| "source": [ | |
| "#### **1.2.3 Browse the AVHRR OISST S3 bucket**\n", | |
| "\n", | |
| "To find the full S3 bucket directory paths to OISST files, we start by browsing the top organizational level of the AVHRR OISST S3 bucket; these are subdirectories labeled with the 6-digit year and month of the observations, in the format `YYYYMM`.\n", | |
| "\n", | |
| "Use the `fs.ls()` function to find the available product subdirectory year-month paths in the OISST S3 bucket." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Find & print the subdirectory paths in the S3 bucket\n", | |
| "\n", | |
| "year_months = fs.ls('noaa-cdr-sea-surface-temp-optimum-interpolation-pds/data/v2.1/avhrr/')\n", | |
| "\n", | |
| "for year_month in year_months:\n", | |
| " print(year_month.split('/')[-1])" | |
| ], | |
| "metadata": { | |
| "id": "r-KDC3ki34-j" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "e15a0b14" | |
| }, | |
| "source": [ | |
| "#### **1.2.4 Find all of the OISST files for September 2024**\n", | |
| "\n", | |
| "Let's define a `data_path` variable to set the directory path for AVHRR OISST files for September 2024, and then use the `fs.ls()` function to list the full path names for the individual data files.\n", | |
| "\n", | |
| "**Coding Note:** The `fs.ls()` function takes the source directory path argument `data_path` as a string, which is why the `year` and `month` variables, entered as integers, are converted to strings. The Python `str.zfill(width)` [method](https://docs.python.org/3/library/stdtypes.html#str.zfill) ensures the `month` string in the `data_path` is 2 digits; `str.zfill(width)` returns a copy of the string left-filled with ASCII '0' digits to make a string of length `width`. This way, the `data_path` syntax is correct for `month` variable integers <10." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Find all the OISST files for September 2024\n", | |
| "# Print total number of files in directory & first 10 file names\n", | |
| "\n", | |
| "bucket = 'noaa-cdr-sea-surface-temp-optimum-interpolation-pds/data/v2.1/avhrr/'\n", | |
| "year = 2024\n", | |
| "month = 9\n", | |
| "\n", | |
| "data_path = (bucket + str(year) + str(month).zfill(2) + '/')\n", | |
| "\n", | |
| "files = fs.ls(data_path)\n", | |
| "\n", | |
| "print('Total number of files:', len(files), '\\n')\n", | |
| "\n", | |
| "for file in files[:10]:\n", | |
| " print(file.split('/')[-1])" | |
| ], | |
| "metadata": { | |
| "id": "lAOGiJ9v34D1" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Print the full directory path for the first data file\n", | |
| "\n", | |
| "files[0]" | |
| ], | |
| "metadata": { | |
| "id": "o3u8gvZE6dQZ" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "78c34997" | |
| }, | |
| "source": [ | |
| "#### **1.2.5 Select the OISST file for September 26, 2024**\n", | |
| "\n", | |
| "Most satellite file names contain information about the observation date and, if relevant, observation times. You will learn more about filename conventions in Section 2.\n", | |
| "\n", | |
| "Using Python [slicing](https://stackoverflow.com/questions/509211/how-slicing-in-python-works) and [list comprehension](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions), we can select OISST file paths on the S3 bucket using the observation date in `YYYYMMDD` format in the file names.\n", | |
| "\n", | |
| "Let's select the file corresponding to September 26.\n", | |
| "\n", | |
| "---\n", | |
| "---\n", | |
| "\n", | |
| "**Pro tip**: Before downloading, I recommend printing the selected satellite file names to confirm they are the files you want, and checking the approximate size of each file using the `fs.size()` function.\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Select the file for September 26, 2024\n", | |
| "\n", | |
| "matches = [file for file in files if (file.split('/')[-1].split('.')[1][6:9] == '26')]\n", | |
| "\n", | |
| "# Print file name and approximate file size\n", | |
| "for match in matches:\n", | |
| " print(match.split('/')[-1])\n", | |
| " print('Approximate file size (MB):', round((fs.size(match)/1.0E6), 2))" | |
| ], | |
| "metadata": { | |
| "id": "x6fp4N096lPs" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "#### **1.2.6 Download the selected OISST file**\n", | |
| "\n", | |
| "Use the `fs.get()` function to download the one OISST file corresponding to the selected file directory path (`matches`). Downloaded files are saved to the Colab instance; click on the `Files` icon in the menu panel on the left side of the Colab window.\n", | |
| "\n", | |
| "**Coding Notes:** The `pathlib` module uses `/` to join the directory path with the file name to get the full path for the file, e.g., `Path.cwd() / match.split('/')[-1]`. The `fs.get()` [function](https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem.get) takes the downloaded file directory argument as a string, so the `pathlib` `PurePath.as_posix()` [method](https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.as_posix) is used to return a string representation of the full path (with forward slashes) for the downloaded file." | |
| ], | |
| "metadata": { | |
| "id": "B3hFIcqJu1O7" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Download file from AWS NODD\n", | |
| "\n", | |
| "for match in matches:\n", | |
| " fs.get(match, (Path.cwd() / match.split('/')[-1]).as_posix())" | |
| ], | |
| "metadata": { | |
| "id": "P6sM2McG8lM-" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "##### <b><font color=\"blue\" size=\"5\">Exercise 1-2: Search for NOAA-20 VIIRS gridded AOD files (daily at 0.250° resolution) for 2020</b></font>\n", | |
| "\n", | |
| "\n", | |
| "1. Fill in the missing code in the block below, to find all of the VIIRS gridded AOD files (daily at 0.250° resolution) for 2020.\n", | |
| "\n", | |
| "2. Run the code block to print the total number of files in the directory and the first 10 file names.\n", | |
| " - Your search should return 366 files.\n", | |
| " - Check the list of the first 10 file names with the answers (below).\n", | |
| "\n", | |
| "\n" | |
| ], | |
| "metadata": { | |
| "id": "oavqh-AVh4-g" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Find all the NOAA-20 VIIRS gridded AOD files (daily at 0.250° resolution) for 2020\n", | |
| "# Print total number of files in directory & first 10 file names\n", | |
| "\n", | |
| "bucket = 'noaa-jpss/NOAA20/VIIRS/NOAA20_VIIRS_Aerosol_Optical_Depth_Gridded_Reprocessed/'\n", | |
| "resolution = '0.25_Degrees_Daily/'\n", | |
| "year =\n", | |
| "\n", | |
| "data_path = ()\n", | |
| "\n", | |
| "files = fs.ls()\n", | |
| "\n", | |
| "print('Total number of files:', len(files), '\\n')\n", | |
| "\n", | |
| "for\n" | |
| ], | |
| "metadata": { | |
| "id": "pY7YHeTPhkF3" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "<details><summary><b><font color=\"blue\" size=5>Answers to Exercise 1-2</font></b></summary>\n", | |
| "<p></p>\n", | |
| "\n", | |
| "```\n", | |
| "# Find all the VIIRS gridded AOD files (daily at 0.250° resolution) for 2020\n", | |
| "# Print total number of files in directory & first 10 file names\n", | |
| "\n", | |
| "bucket = 'noaa-jpss/NOAA20/VIIRS/NOAA20_VIIRS_Aerosol_Optical_Depth_Gridded_Reprocessed/'\n", | |
| "resolution = '0.25_Degrees_Daily/'\n", | |
| "year = 2020\n", | |
| "\n", | |
| "data_path = (bucket + resolution + str(year) + '/')\n", | |
| "\n", | |
| "files = fs.ls(data_path)\n", | |
| "\n", | |
| "print('Total number of files:', len(files), '\\n')\n", | |
| "\n", | |
| "for file in files[:10]:\n", | |
| " print(file.split('/')[-1])\n", | |
| "```\n", | |
| "---\n", | |
| "\n", | |
| "Total number of files: 366\n", | |
| "\n", | |
| "viirs_eps_noaa20_aod_0.250_deg_20200101.nc\n", | |
| "\n", | |
| "viirs_eps_noaa20_aod_0.250_deg_20200102.nc\n", | |
| "\n", | |
| "viirs_eps_noaa20_aod_0.250_deg_20200103.nc\n", | |
| "\n", | |
| "viirs_eps_noaa20_aod_0.250_deg_20200104.nc\n", | |
| "\n", | |
| "viirs_eps_noaa20_aod_0.250_deg_20200105.nc\n", | |
| "\n", | |
| "viirs_eps_noaa20_aod_0.250_deg_20200106.nc\n", | |
| "\n", | |
| "viirs_eps_noaa20_aod_0.250_deg_20200107.nc\n", | |
| "\n", | |
| "viirs_eps_noaa20_aod_0.250_deg_20200108.nc\n", | |
| "\n", | |
| "viirs_eps_noaa20_aod_0.250_deg_20200109.nc\n", | |
| "\n", | |
| "viirs_eps_noaa20_aod_0.250_deg_20200110.nc\n" | |
| ], | |
| "metadata": { | |
| "id": "E2s6b3W3h4-h" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "kUpIyMK6qQrh" | |
| }, | |
| "source": [ | |
| "#### **1.2.7 Select the NOAA-20 VIIRS gridded AOD file for August 21, 2024**\n", | |
| "\n", | |
| "Like we did with the OISST files, we can select the VIIRS gridded AOD file paths on the S3 bucket using the observation date in `YYYYMMDD` format in the file names.\n", | |
| "\n", | |
| "Let's select the file corresponding to August 21, print the file name and the approximate size of the file.\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Select the file for August 21, 2020\n", | |
| "\n", | |
| "matches = [file for file in files if (file.split('/')[-1].split('_')[6][4:8] == '0821')]\n", | |
| "\n", | |
| "# Print file name and approximate file size\n", | |
| "for match in matches:\n", | |
| " print(match.split('/')[-1])\n", | |
| " print('Approximate file size (MB):', round((fs.size(match)/1.0E6), 2))" | |
| ], | |
| "metadata": { | |
| "id": "CGVs-mprqQri" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "#### **1.2.8 Download the selected AOD file**\n", | |
| "\n", | |
| "Like we did with the OISST file, use the `fs.get()` function to download the one AOD file corresponding to the selected file directory path (`matches`). Downloaded files are saved to the Colab instance; click on the `Files` icon in the menu panel on the left side of the Colab window." | |
| ], | |
| "metadata": { | |
| "id": "qA-PcMMyqQri" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Download file from AWS NODD\n", | |
| "\n", | |
| "for match in matches:\n", | |
| " fs.get(match, (Path.cwd() / match.split('/')[-1]).as_posix())" | |
| ], | |
| "metadata": { | |
| "id": "uoEi8OJIqQri" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## **Section 2: Open & Understand Satellite Files**" | |
| ], | |
| "metadata": { | |
| "id": "ItJikCNIrTh8" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "WGFW2OPjajti" | |
| }, | |
| "source": [ | |
| "### **2.1 Filenames Conventions**\n", | |
| "\n", | |
| "NetCDF and HDF are [self-describing formats](https://ntrs.nasa.gov/api/citations/20120008476/downloads/20120008476.pdf), which are structured binary data files. These formats are community standards for Earth Science because they:\n", | |
| "\n", | |
| "1. Are faster to read when dealing with binary-based datasets compared to text formats\n", | |
| "2. Are compact files and thus cost-effective for long-term data storage\n", | |
| "3. Include metadata that provides information about the data source and the available variables\n", | |
| "\n", | |
| "NetCDF files are a type of HDF file, they share many of the same tools and workflows to read, extract, and write.\n", | |
| "\n", | |
| "In Section 1, we downloaded four files that we will work with:\n", | |
| "\n", | |
| "* TEMPO gridded NO2\n", | |
| "* GPM IMERG rain rate\n", | |
| "* VIIRS gridded Aerosol Optical Depth (AOD)\n", | |
| "* AVHRR Optimum Interpolation Sea Surface Temperature (OISST)\n", | |
| "\n", | |
| "Many environmental dataset names are quite long and can vary in structure when produced by different organizations. However, the filenames follow similar conventions and contain useful information about their contents. For instance, the following filename is from the VIIRS Level 2 (granule) AOD product from the NOAA JPSS program:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "" | |
| ], | |
| "metadata": { | |
| "id": "ZUhhl02qAi_4" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "\n", | |
| "```\n", | |
| "JRR-AOD_v2r3_j01_s202009152044026_e202009152045271_c202009152113150.nc\n", | |
| "```\n", | |
| "\n", | |
| "* Prefix indicates the mission (`JRR` for JPSS Risk Reduction)\n", | |
| "* Product (`AOD` for Aerosol Optical Depth)\n", | |
| "* Algorithm version and revision number (`v2r3` for version 2 revision 3)\n", | |
| "* Satellite source (`j01` for JPSS-1/NOAA-20)\n", | |
| "* Start (`s`), end (`e`), and creation (`c`) time, `YYYYMMDDSSS`(seconds are to one decimal place) in UTC\n", | |
| "* The extension `.nc` means that it's a NetCDF file\n", | |
| "\n", | |
| "According to the start and end `HHMM`, file begin at `20:44` and ends at `20:45`, it's roughly a one-minute scan (85 seconds).\n", | |
| "\n", | |
| "Let's explore another file. This file was generated by NASA and has a different filename system, but follows similar conventions:" | |
| ], | |
| "metadata": { | |
| "id": "oBajNTydAjoa" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "" | |
| ], | |
| "metadata": { | |
| "id": "k4mQx9YFAjIi" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "\n", | |
| "```\n", | |
| "3B-HHR.MS.MRG.3IMERG.20240401-S120000-E122959.0720.V07B.HDF5\n", | |
| "```\n", | |
| "\n", | |
| "* Prefix indicates that this is a Level 3B (`3B`) and half-hourly dataset (`HRR`). `MS` is for multi-satellite, `MRG` is multi-instrument\n", | |
| "* `3IMERG` is for Level 3 IMERG, which is the algorithm name\n", | |
| "* The calendar date is in `YYYYMMDD` format (`20240401`)\n", | |
| "* The start time (`S`) is `120000` for 12:00:00 in UTC\n", | |
| "* The end time (`E`) is `122959` for 12:29:59 in UTC\n", | |
| "* The version number is `.V07B` for version 7b\n", | |
| "* The extension `.HDF` indicates an HDF file\n", | |
| "\n", | |
| "You can learn more about the [Precipitation Product File Naming Convention Document](https://gpm.nasa.gov/sites/default/files/2020-02/FileNamingConventionForPrecipitationProductsForGPMMission.pdf)." | |
| ], | |
| "metadata": { | |
| "id": "R-lj-pFEFKxb" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "#### <b><font color=\"blue\" size=\"5\">Exercise 2-1: Understanding filenames</b></font>\n", | |
| "\n", | |
| "For the file:\n", | |
| "\n", | |
| "```\n", | |
| "oisst-avhrr-v02r01.20240901.nc\n", | |
| "```\n", | |
| "\n", | |
| "1. What does `v02r01` refer to?\n", | |
| "2. What is the algorithm name?\n", | |
| "3. What sensor does this dataset use?\n", | |
| "\n", | |
| "For the file:\n", | |
| "\n", | |
| "```\n", | |
| "TEMPO_NO2_L3_V03_20241113T182249Z_S009.nc\n", | |
| "```\n", | |
| "\n", | |
| "4. What sensor does this dataset use?\n", | |
| "5. What is the format of the date in the filename?\n", | |
| "6. If you don't know the filename structure where can you go to learn more?\n", | |
| "---\n", | |
| "\n", | |
| "Your Answer:\n" | |
| ], | |
| "metadata": { | |
| "id": "dtGUmrxqD4G6" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [], | |
| "metadata": { | |
| "id": "sgu4w9F9IijX" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "<details><summary><b><font color=\"blue\" size=5>Answers to Exercise 2-1</font></b></summary>\n", | |
| "<p></p>\n", | |
| "\n", | |
| "\n", | |
| "```\n", | |
| "oisst-avhrr-v02r01.20240901.nc\n", | |
| "```\n", | |
| "\n", | |
| "1. What does `v02r01` refer to?\n", | |
| "\n", | |
| "<font color=\"blue\">Algorithm version 2, revision 1</font>\n", | |
| "\n", | |
| "2. What is the algorithm name?\n", | |
| "\n", | |
| "<font color=\"blue\">OISST</font>\n", | |
| "\n", | |
| "3. What sensor does this dataset use?\n", | |
| "\n", | |
| "<font color=\"blue\">AVHRR</font>\n", | |
| "\n", | |
| "For the file:\n", | |
| "\n", | |
| "```\n", | |
| "TEMPO_NO2_L3_V03_20241113T182249Z_S009.nc\n", | |
| "```\n", | |
| "\n", | |
| "4. What instrument is this from?\n", | |
| "\n", | |
| "<font color=\"blue\">TEMPO</font>\n", | |
| "\n", | |
| "5. What is the format of the date in the filename?\n", | |
| "\n", | |
| "`YYYYMMDD` which is separated with a `T` followed by `HHMMSS`. The Z indicated \"Zulu\" or UTC. </font>\n", | |
| "\n", | |
| "6. If you don't know the filename structure, how can you go to find out?\n", | |
| "\n", | |
| "<font color=\"blue\">Google! You can find \"Level 2\" and \"Level 3\" product user guides, READMEs, Algorithm Theoretical Basis Documents (ATBDs) at NASA, NOAA, EUMETSAT, JAXA, etc. for technical information about the data. </font>\n", | |
| "\n", | |
| "[TEMPO Trace Gas and Cloud Level 2 and Level 3 Data Products: User's Guide](https://asdc.larc.nasa.gov/documents/tempo/guide/TEMPO_Level-2-3_trace_gas_clouds_user_guide_V1.1.pdf)" | |
| ], | |
| "metadata": { | |
| "id": "oZOc9z_iEPxj" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **2.2 Opening NetCDF files**" | |
| ], | |
| "metadata": { | |
| "id": "FYgVJLYeBzRr" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "Xrpbrvdeajtj" | |
| }, | |
| "source": [ | |
| "We can use the [xarray](http://xarray.pydata.org/en/stable/io.html) package to open self-describing formats and work with large, nested arrays. The [h5netcdf](https://github.com/h5netcdf/h5netcdf) reader engine is useful because it can open `netcdf4` and `HDF` files. Other useful support packages are the [netcdf4](https://unidata.github.io/netcdf4-python/) and [h5py](https://www.h5py.org/) packages." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "Nhg0I-nEajtj" | |
| }, | |
| "source": [ | |
| "\n", | |
| "Use the `xr.open_dataset()` function to import the above dataset. The engine option (`engine=<name>`) is used to read the files. Some possible file readers are `netcdf4`, `scipy`, `pydap`, `h5netcdf`, `pynio`, `cfgrib`, `pseudonetcdf`, `zarr` but you also must have the packages installed.\n", | |
| "\n", | |
| "Note: If you need to get the path of the file in Google Colab, you can click on the `Files` folder icon on the left, and right click on the file of interest, and then select \"Copy Path.\"" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "" | |
| ], | |
| "metadata": { | |
| "id": "gjGfOVseXFDI" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "4hoPDQ4Bajtj" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "fname='/content/viirs_eps_noaa20_aod_0.250_deg_20200821.nc'\n", | |
| "noaa20_viirs_aod_file_id = xr.open_dataset(fname, engine='h5netcdf')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "8UjwQn3Oajtj" | |
| }, | |
| "source": [ | |
| "Printing `noaa20_viirs_aod_file_id` will reveal a long list of the variables, dimensions, indices, and global attributes:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "iuLTYbQXajtk" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "noaa20_viirs_aod_file_id" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "l9HpClKQajtk" | |
| }, | |
| "source": [ | |
| "The output above is interactive - click on the arrows to show the metadata for:\n", | |
| "\n", | |
| "* __Dimensions__: The dimensions are named `lon` and `lat`, which are respectively 1440 and 720.\n", | |
| "\n", | |
| "* __Coordinates__: The coordinates are `lon` and `lat`. These have the same name as the dimensions, but this is not always true.\n", | |
| "\n", | |
| "* __Variables__: This file contains several variables, we'll use `AOD550`. It's dimensions are also lat and lon.\n", | |
| "\n", | |
| "* __Attributes__: netCDF4 [CF-1.5 conventions](https://cfconventions.org/). Some of the information that we saw in the file name is also present: this product is the \"NOAA Enterprise L3 Aerosol Optical Depth\" (`title`) it's a NOAA Level 3 product (`processing_level`) and the data was collected from the NOAA-20 (`satellite_name`) VIIRS instrument (`instrument_name)`.\n", | |
| "\n", | |
| "Always inspect netCDF file header contents when working with new data." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **2.3 Opening NetCDF files that have 'Groups'**" | |
| ], | |
| "metadata": { | |
| "id": "hitaK79S8NQm" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "_XKEU6lnajtk" | |
| }, | |
| "source": [ | |
| "Let's look at a dataset that has column totals of nitrogen dioxide (NO2) in the troposphere. NO2 is a harmful air pollutant.\n", | |
| "\n", | |
| "We'll open the `TEMPO_NO2_L3_V03_20241113T182249Z_S009.nc`, inspect the header, and find the `vertical_column_troposphere` variable." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "fname='/content/TEMPO_NO2_L3_V03_20241113T182249Z_S009.nc'\n", | |
| "tempo_no2_file_id = xr.open_dataset(fname, engine='h5netcdf')\n", | |
| "tempo_no2_file_id" | |
| ], | |
| "metadata": { | |
| "id": "qkKIjJ52M1M7" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "In the file above, we can find the dimensions (`longitude` and `latitude`) and coordinates, but there is one data variable (`weight`) but `vertical_column_troposphere` is not listed. That does not mean that data are not in the file, but may be inside a [group](https://docs.xarray.dev/en/stable/user-guide/terminology.html#term-Group). Groups are a commonly used organizational structure in HDF, NetCDF, and Zarr formats.\n", | |
| "\n", | |
| "Some files use a nested [group structure](https://docs.h5py.org/en/stable/high/group.html) to organize their variables, which is also called a [DataTree](https://docs.xarray.dev/en/stable/user-guide/terminology.html#term-DataTree). Groups are also called [subtrees](https://xarray-datatree.readthedocs.io/en/latest/terminology.html#term-Subtree)." | |
| ], | |
| "metadata": { | |
| "id": "8HD7O5Ts6wEb" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "" | |
| ], | |
| "metadata": { | |
| "id": "ihQWTE00PwGm" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "We can open the groups, but we will need to open the file as a [data tree](https://docs.xarray.dev/en/stable/generated/xarray.open_datatree.html#xarray-open-datatree) and use the syntax `xr.open_datatree()`. The syntax looks similar to `xr.open_dataset()`:" | |
| ], | |
| "metadata": { | |
| "id": "W_zJdcmq6G5x" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "fname='/content/TEMPO_NO2_L3_V03_20241113T182249Z_S009.nc'\n", | |
| "tempo_no2_file_id = xr.open_datatree(fname, engine='h5netcdf')\n", | |
| "tempo_no2_file_id" | |
| ], | |
| "metadata": { | |
| "id": "Ke62XdKyPsvh" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "In the output above, you will see there are four groups (`products`, `qa_statistics`, `geolocation`, and `support_data `). You can also print them as a list using `.groups`:" | |
| ], | |
| "metadata": { | |
| "id": "eEKnfxtYRX-e" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "2_BCIXeRajtk" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "tempo_no2_file_id.groups" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "The `vertical_column_troposphere` is contained in the `product` group. You can confirm this by inspecting the contents of the group:" | |
| ], | |
| "metadata": { | |
| "id": "2ejlDlm2-WyL" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "tempo_no2_file_id['/product']" | |
| ], | |
| "metadata": { | |
| "id": "pZIKtphK-jJr" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "Now you can see the `vertical_column_troposphere` variable is in the file, we can begin extracting the data we need to make a plot." | |
| ], | |
| "metadata": { | |
| "id": "um0FUbh7AFq-" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **2.4 Extracting Data**" | |
| ], | |
| "metadata": { | |
| "id": "M2SPFPKb8njD" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "Now that we have inspected the file, we can begin extracting the `latitude`, `longitude`, and `vertical_column_troposphere` variables. Once extracted, we can perform analysis and make plots with the data.\n", | |
| "\n", | |
| "The syntax for extracting variables that aren't in a group is just the variable name:\n", | |
| "\n", | |
| "```\n", | |
| "\"<variable name>\"\n", | |
| "```\n", | |
| "\n", | |
| "The `latitude` and `longitude` are outside of the group, so we use the following syntax:" | |
| ], | |
| "metadata": { | |
| "id": "SK2tVLnVZ9pY" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "4B-JTdJ0ajtk" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "tempo_no2_lat = tempo_no2_file_id['latitude']\n", | |
| "tempo_no2_lon = tempo_no2_file_id['longitude']" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "To extract the tropospheric column total NO2, we reference the full path of the group and variable using the following syntax:\n", | |
| "\n", | |
| "```\n", | |
| "\"<group name>/<variable name>\"\n", | |
| "```\n", | |
| "\n", | |
| "Below we extract the variable `vertical_column_troposphere` from the `product` group:" | |
| ], | |
| "metadata": { | |
| "id": "Ac1N1k8EaSon" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "jSRuQOOtajtk" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "tempo_no2_vc_trop = tempo_no2_file_id['/product/vertical_column_troposphere']" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "Af-1RMK5ajtk" | |
| }, | |
| "source": [ | |
| "Let's print column total NO2 (`tempo_no2_vc_trop`) below:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "Udo7g6UJajtk" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "tempo_no2_vc_trop" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "Sm9Kh_f0ajtk" | |
| }, | |
| "source": [ | |
| "The `xarray` package uses `numpy` as a dependency so we can use many of the numpy functions like `.mean()`. You can confirm this fact if you check the type of `tempo_no2_vc_trop.values`. Below, you can see the type is `numpy.ndarray.`" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "69z0HVCFajtk" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "type(tempo_no2_vc_trop.values)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "QObwJqj1ajtl" | |
| }, | |
| "source": [ | |
| "Xarray handles missing data automatically - that is, if we do statistics on arrays with `nan` values in them, they will be ignored implicitly:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "Vgsi49Rfajtl" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Check if there are any nan values\n", | |
| "print(np.isnan(tempo_no2_vc_trop).any())\n", | |
| "\n", | |
| "# Compute the average\n", | |
| "avgNO2 = np.mean(tempo_no2_vc_trop)\n", | |
| "print(avgNO2)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "Also, let's check the dimensions. Are they the same as the latitude and longitude size? If not, we have to address this when we make plots." | |
| ], | |
| "metadata": { | |
| "id": "75j7MyQmbVVh" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "tempo_no2_vc_trop.shape, tempo_no2_lat.shape, tempo_no2_lon.shape" | |
| ], | |
| "metadata": { | |
| "id": "YrYLdQARbcX9" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "ixrEF0TDajtl" | |
| }, | |
| "source": [ | |
| "#### <b><font color=\"blue\" size=\"5\">Exercise 2-2: Importing an HDF5 file</b></font>\n", | |
| "\n", | |
| "1. Open the IMERG file using the xarray library using `xr.open_datatree()`, save as `gpm_imerg_file_id`\n", | |
| "2. Are there any groups? If yes, what are their name(s)?\n", | |
| "3. Print the variable names - what are the coordinate names?\n", | |
| "3. What are the dimensions of the `latitude` and `longitude` coordinates?\n", | |
| "---\n", | |
| "\n", | |
| "Your Answer:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [], | |
| "metadata": { | |
| "id": "GuZ9bQIeZvmw" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [], | |
| "metadata": { | |
| "id": "iyC34qdRhKmc" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "<details><summary><b><font color=\"blue\" size=5>Answers to Exercise 2-2</font></b></summary>\n", | |
| "<p></p>\n", | |
| "\n", | |
| "1. Open the file\n", | |
| "```\n", | |
| "fname='/content/3B-HHR.MS.MRG.3IMERG.20240401-S120000-E122959.0720.V07B.HDF5'\n", | |
| "gpm_imerg_file_id = xr.open_datatree(fname, engine='h5netcdf')\n", | |
| "gpm_imerg_file_id\n", | |
| "```\n", | |
| "2. Yes there is one group (`Grid`)\n", | |
| "\n", | |
| "3. The coordinates are `lat`, `lon`, and `time`. You can find them in the `'/Grid'` group.\n", | |
| "```\n", | |
| "gpm_imerg_group_id = gpm_imerg_file_id['/Grid']\n", | |
| "gpm_imerg_group_id\n", | |
| "```\n", | |
| "\n", | |
| "4. The `lat` and `lon` coordinates are 1800 and 3600, respectively.\n" | |
| ], | |
| "metadata": { | |
| "id": "FJzkumpKhMcF" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **2.5 Processing Data using Quality/Diagnostic Flags**\n", | |
| "\n", | |
| "Many satellite data files - especially L2 files - include data quality flags or diagnostic flags. These flags are used to screen out satellite observations with known errors or uncertainty (e.g., clouds, sunglint) or select pixels with specific characteristics (e.g., over land, over water).\n", | |
| "\n", | |
| "**The onus is on you as the satellite data user to correctly process satellite variables using quality/diagnostic flags.** How do you know what processing is required, and what quality/diagnostic flags to use? **You must read the satellite product documentation provided by the science team!!**\n", | |
| "\n", | |
| "In this tutorial, the only satellite product that requires processing is the TEMPO gridded (L3) NO2 data." | |
| ], | |
| "metadata": { | |
| "id": "IYCGzZj4eDNh" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "#### **2.5.1 Required processing of TEMPO gridded NO2 data**\n", | |
| "\n", | |
| "From reading the [TEMPO Trace Gas and Cloud Level 2 and Level 3 Data Products: User's Guide](https://asdc.larc.nasa.gov/documents/tempo/guide/TEMPO_Level-2-3_trace_gas_clouds_user_guide_V1.1.pdf) (p. 17-18), we know that we need to apply the following processing to the TEMPO NO2 L3 data variables:\n", | |
| "- Set `product/main_data_quality_flag` == 0\n", | |
| "- Exclude pixels with `support_data/eff_cloud_fraction` > 0.2\n", | |
| "\n", | |
| "Let's read in the two flag variables we need." | |
| ], | |
| "metadata": { | |
| "id": "NvD3Arzcm7vV" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "bSAhtvVdiwSx" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "tempo_main_data_qf = tempo_no2_file_id['/product/main_data_quality_flag']\n", | |
| "tempo_eff_cloud_frac = tempo_no2_file_id['/support_data/eff_cloud_fraction']" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "ueSf9igbiwSy" | |
| }, | |
| "source": [ | |
| "Let's print the flag variables below:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "zaprNQlliwSz" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "tempo_main_data_qf" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "4iGpCOMHjotL" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "tempo_eff_cloud_frac" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "We can use the `xarray.DataArray.where()` [function](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.where.html) to select the tropospheric column NO2 (`tempo_no2_vc_trop`) pixels that have `tempo_main_data_qf == 0` and `tempo_eff_cloud_frac < 0.2`." | |
| ], | |
| "metadata": { | |
| "id": "yJcX2b4fj9jl" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "processed_no2 = tempo_no2_vc_trop.where((tempo_main_data_qf == 0) & (tempo_eff_cloud_frac < 0.2))" | |
| ], | |
| "metadata": { | |
| "id": "K8zpgvWplNTu" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "Notice that the average of the processed NO2 (below) is less than that of the original, unprocessed NO2 you calculated previously. This is because the pixels in the original NO2 DataArray that have `tempo_main_data_qf != 0` and `tempo_eff_cloud_frac > 0.2` were screened out.\n", | |
| "\n", | |
| "**You should use the `processed_no2` variable for analysis and plotting!!!**" | |
| ], | |
| "metadata": { | |
| "id": "9ft0w9nFmqV_" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Compute the average of the processed NO2\n", | |
| "\n", | |
| "processed_avgNO2 = np.mean(processed_no2)\n", | |
| "print(processed_avgNO2)" | |
| ], | |
| "metadata": { | |
| "id": "IYZdC_2ylNMB" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "CnsUOciTajtl" | |
| }, | |
| "source": [ | |
| "Now that we know how to import & process multidimensional data, you will make some plots in the next section." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## **Section 3: Simple Plots of Satellite Data**" | |
| ], | |
| "metadata": { | |
| "id": "MPjiVr46rpy9" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "0qhu6T0Zajtl" | |
| }, | |
| "source": [ | |
| "In the homework, we made line and scatter plots from tabular data. The `Matplotlib` package also supports plotting spatial datasets. However, we often have to do perform several array operations to ensure the `x`, `y`, and `z` coordinates are the same shape. Let's work with GPM IMERG dataset in the next example and make a 3D plot." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **3.1 Ensuring Consistent Variable Dimensions**\n", | |
| "\n" | |
| ], | |
| "metadata": { | |
| "id": "BI2YCBMN0Sn3" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "1LEdowACajtl" | |
| }, | |
| "source": [ | |
| "From Exercise 2-1, we learned that this file only contains one group (`Grid`). We can open the group directly so that we don't have to add `'/Grid'` every time we want to access a variable inside that group." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "zSgbdyPYajtl" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "gpm_imerg_group_id = gpm_imerg_file_id['/Grid']\n", | |
| "gpm_imerg_group_id" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "1LgKW5Beajtl" | |
| }, | |
| "source": [ | |
| "From the printed information above, we can see the following:\n", | |
| "\n", | |
| "* __Dimensions__: The dimensions are named `time`, `lon`, and `lat`, which each have the size of 1, 3600, and 1800.\n", | |
| "\n", | |
| "* __Coordinates__: Are also `time`, `lon`, and `lat`\n", | |
| "\n", | |
| "* __Variables__: Has seven variables, we'll examine `precipitation` in this lesson." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "3oLamTAYajtl" | |
| }, | |
| "source": [ | |
| "Let's import the rain rate (`precipitation`). We also need `lat` and `lon` to know where the data are." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "6bp2VCfMajtl" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "gpm_imerg_rr = gpm_imerg_group_id['precipitation']\n", | |
| "gpm_imerg_lat = gpm_imerg_group_id['lat']\n", | |
| "gpm_imerg_lon = gpm_imerg_group_id['lon']" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "Let's inspect the shape and see if the data are already formatted for plotting:" | |
| ], | |
| "metadata": { | |
| "id": "m9vdnH3kcMZj" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "gpm_imerg_lat.shape, gpm_imerg_lon.shape, gpm_imerg_rr.shape" | |
| ], | |
| "metadata": { | |
| "id": "dmtCNswZcLbr" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "At this point, we can make a simple plot of the rain rates using [imshow](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html#matplotlib.pyplot.imshow) from matplotlib. Because the dimensions are `(time, lon, lat)`, we can remove time by subsetting the data using the syntax `[0,:,:]`. We will use the first (and only) time index and keep all the `lon` and `lat` data We'll set the min and max values to 0 and 2 using `vmin` and `vmax`." | |
| ], | |
| "metadata": { | |
| "id": "psR7KyuYzAeu" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "plt.figure()\n", | |
| "plt.imshow(gpm_imerg_rr[0,:,:], vmin=0, vmax=2)\n", | |
| "plt.show()" | |
| ], | |
| "metadata": { | |
| "id": "CYdleQ3pzD8V" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "tN4ujMvuajtl" | |
| }, | |
| "source": [ | |
| "A problem with the above plot is that we do not know where these data are located - for that we need to plot the data with the `lat` and `lon` coordinates." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "yMntbeAuajtl" | |
| }, | |
| "source": [ | |
| "Mesh plots, such as [pcolormesh](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.pcolormesh.html), are useful for plotting 3D data. Creating a mesh requires that the `X`, `Y`, and `Z` variables have the same 2D shape.\n", | |
| "\n", | |
| "The `.shapes` illustrate that the data all have different shapes. Here are the steps to address this:\n", | |
| "1. `gpm_imerg_rr` has a time dependency, which the latitude and longitude don't. We'll need to reduce this to 2D data.\n", | |
| "2. `gpm_imerg_lat` and `gpm_imerg_lon` are 1D. We can use `np.meshgrid()` to project the 1-dimensional x and y coordinates into two dimensions\n", | |
| "\n", | |
| "The first problem can be solved by using the xarray `.isel` [function](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.isel.html#xarray.DataArray.isel) to select the only available timestep. The index for this is `0`." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "lOViOoQ6ajtl" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "gpm_imerg_rr = gpm_imerg_rr.isel(time=0)\n", | |
| "gpm_imerg_rr" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "ppPKZjXVajtl" | |
| }, | |
| "source": [ | |
| "The `np.meshgrid` function will help with problem #2 above. The function is a little confusing at first, so let's start with a simple example. Suppose you have two simple, 1D arrays:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "PBMQ0Ostajtl" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "old_x = [1,2]\n", | |
| "old_y = [3,4,5]" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "lnvK_Hc4ajtl" | |
| }, | |
| "source": [ | |
| "`old_x` has two elements and `old_y` has three. If you create a mesh of the two variables, there will be two variables, both with 2 rows and 3 columns:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "Nap4SyMIajtl" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "new_x, new_y = np.meshgrid(old_x, old_y, indexing='ij')\n", | |
| "print(new_x)\n", | |
| "print(new_y)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "" | |
| ], | |
| "metadata": { | |
| "id": "hjdjdya4zOaG" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "The image above depicts how the resulting mesh will look like.\n", | |
| "\n", | |
| "We strongly recommend using the `indexing='ij'` option. This will yield arrays with dimensions in the same order that the indexes are passed into `np.meshgrid()`.\n", | |
| "\n", | |
| "For example:\n", | |
| "* Dim `old_x` --> `2`\n", | |
| "* Dim `old_y` --> `3`\n", | |
| "\n", | |
| "Dim `new_x`, `new_y` --> `(2,3)`\n", | |
| "\n", | |
| "If you use the default (`indexing='xy'`) the resulting mesh will be `(3, 2)`." | |
| ], | |
| "metadata": { | |
| "id": "nzOUOqjaw3Qd" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **3.2 Creating a Plot**" | |
| ], | |
| "metadata": { | |
| "id": "RgmJdmPO0dKK" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "cW3GgW8iajtl" | |
| }, | |
| "source": [ | |
| "\n", | |
| "\n", | |
| "Returning to the the GPM IMERG dataset, below is the meshgrid of the 1-dimensional latitude and longitude coordinates:\n", | |
| "\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "X_gpm_imerg, Y_gpm_imerg = np.meshgrid(gpm_imerg_lon, gpm_imerg_lat, indexing='ij')" | |
| ], | |
| "metadata": { | |
| "id": "ldXd0sOnprn9" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "TCXfOrxIajtm" | |
| }, | |
| "source": [ | |
| "We want the dimensions of the output mesh to match `gpm_imerg_lon` dims, which are `(3600, 1800)`. Note the order:\n", | |
| "\n", | |
| "* Dim `gpm_imerg_lon` --> `3600`\n", | |
| "* Dim `gpm_imerg_lat` --> `1800`\n", | |
| "\n", | |
| "\n", | |
| "To ensure the dimensions of `X_gpm_imerg`, `Y_gpm_imerg` --> `(3600,1800)`, the order needs to use the syntax:\n", | |
| "\n", | |
| "```\n", | |
| "new_x, new_y = np.meshgrid(old_x, old_y, indexing='ij')\n", | |
| "```\n", | |
| "\n", | |
| "We'll see in the next exercise that the NO2 dimensions are ordered lat, lon!" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "0H4_kGMOajtm" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "gpm_imerg_rr.shape, X_gpm_imerg.shape, Y_gpm_imerg.shape" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "dRwFSRGyajtm" | |
| }, | |
| "source": [ | |
| "\n", | |
| "Let's plot the data!\n", | |
| "\n", | |
| "The basic syntax is:\n", | |
| "\n", | |
| "```\n", | |
| "fig = plt.figure()\n", | |
| "ax = plt.subplot(111)\n", | |
| "rr_plot = ax.pcolormesh(X_gpm_imerg, Y_gpm_imerg, gpm_imerg_rr)\n", | |
| "plt.show()\n", | |
| "```\n", | |
| "\n", | |
| "We'll add a few enhancements to make the plot look nicer:\n", | |
| "\n", | |
| "1. Rain rates have a large range (sometimes 50 mm/hour!) but are usually less than 2 mm/hr. We can set the data ranges using the keyword options `vmin=0` and `vmax=2` in `plt.pcolormesh()`\n", | |
| "2. It would be helpful to have a colorbar! To set a colorbar, we need to save the plotted data object as a variable (`rr_plot`) so that we can tell the colorbar command what we're making a colorbar for (`fig.colorbar(rr_plot)`) The `orientation='horizontal'` keyword option can control where the colorbar is positioned." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "PfCKfVN4ajtm" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "fig = plt.figure()\n", | |
| "ax = plt.subplot(111)\n", | |
| "rr_plot = ax.pcolormesh(X_gpm_imerg, Y_gpm_imerg, gpm_imerg_rr, vmin=0, vmax=2)\n", | |
| "fig.colorbar(rr_plot, orientation='horizontal')\n", | |
| "plt.show()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "xkNwRFSgajtm" | |
| }, | |
| "source": [ | |
| "---\n", | |
| "\n", | |
| "#### <b><font color=\"blue\" size=\"5\">Exercise 3-1: Manipulate variable dimensions and make a plot</b></font>\n", | |
| "\n", | |
| "Plot `tempo_no2_lon`, `tempo_no2_lat`, and `processed_no2` (we imported & processed in Section 2):\n", | |
| "\n", | |
| "1. Check the dimensions for all variables using `.shape`.\n", | |
| "2. Do you need to `.isel()` or `.meshgrid()`?\n", | |
| "3. Create a pcolormesh plot. Some useful aesthetics:\n", | |
| " * `processed_no2/1e14`\n", | |
| " * set the `vmin=0` and `vmax=50`\n", | |
| " * add a colorbar\n", | |
| "\n", | |
| "---\n", | |
| "Your answer:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [], | |
| "metadata": { | |
| "id": "93AKVUDVrvmH" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "<details><summary><b><font color=\"blue\" size=5>Answers to Exercise 3-1</font></b></summary>\n", | |
| "<p></p>\n", | |
| "\n", | |
| "Plot `tempo_no2_lon`, `tempo_no2_lat`, and `processed_no2` (we imported & processed in Section 2):\n", | |
| "\n", | |
| "1. Check the dimensions for all variables using `.shape`.\n", | |
| "\n", | |
| "```\n", | |
| "tempo_no2_lat.shape, tempo_no2_lon.shape, processed_no2.shape\n", | |
| "```\n", | |
| "*((2950,), (7750,), (1, 2950, 7750))*\n", | |
| "\n", | |
| "2. Do you need to `.isel()` or `.meshgrid()`?\n", | |
| "\n", | |
| "Yes to both! `processed_no2` has a time dimension and `tempo_no2_lat`, `tempo_no2_lon` are one dimensional.\n", | |
| "\n", | |
| "Reducing `processed_no2` is straightforward:\n", | |
| "\n", | |
| "```\n", | |
| "processed_no2=processed_no2.isel(time=0)\n", | |
| "```\n", | |
| "\n", | |
| "We want the dimensions of the output mesh to match the `processed_no2` dimensions, which are `(7750, 2970)`. Note the order is lat then lon:\n", | |
| "\n", | |
| "* Dim `tempo_no2_lat` --> `2970`\n", | |
| "* Dim `tempo_no2_lon` --> `7750`\n", | |
| "\n", | |
| "To ensure the dimensions of `tempo_no2_Y`, `tempo_no2_X` --> `(7750,2970)`, the order needs to be lat then lon:\n", | |
| "\n", | |
| "```\n", | |
| "tempo_no2_Y, tempo_no2_X = np.meshgrid(tempo_no2_lat, tempo_no2_lon, indexing='ij')\n", | |
| "```\n", | |
| "\n", | |
| "print the shape to check:\n", | |
| "\n", | |
| "```\n", | |
| "tempo_no2_X.shape, tempo_no2_Y.shape\n", | |
| "\n", | |
| "```\n", | |
| "*((2950, 7750), (2950, 7750))*\n", | |
| "\n", | |
| "\n", | |
| "This matches the shape of tempo_no2_vc_trop after it was reduce using `.isel`.\n", | |
| "\n", | |
| "3. Create a pcolormesh plot. Some useful aesthetics:\n", | |
| " * `processed_no2/1e14`\n", | |
| " * set the `vmin=0` and `vmax=50`\n", | |
| " * add a colorbar\n", | |
| "\n", | |
| "```\n", | |
| "fig = plt.figure()\n", | |
| "ax = plt.subplot(111)\n", | |
| "no2_plot = ax.pcolormesh(tempo_no2_X, tempo_no2_Y, processed_no2/1e14, vmin=0, vmax=50)\n", | |
| "fig.colorbar(no2_plot, orientation='horizontal')\n", | |
| "plt.show()\n", | |
| "```\n", | |
| "\n", | |
| "" | |
| ], | |
| "metadata": { | |
| "id": "FxRD67GHuAxc" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## **Section 4: Plotting Satellite Data on Maps**" | |
| ], | |
| "metadata": { | |
| "id": "SCIix0nlsAcD" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "The `xarray` package has some [basic built-in plotting functions](https://docs.xarray.dev/en/stable/user-guide/plotting.html) that can be useful for making a quick plot, especially if you are exploring a new satellite dataset. And in Section 3, you learned how to make a simple plot using the `Matplotlib` package.\n", | |
| "\n", | |
| "To make a figure suitable for a presentation or report, however, you will want to plot satellite data on a map using the [cartopy](https://scitools.org.uk/cartopy/docs/latest/index.html) package." | |
| ], | |
| "metadata": { | |
| "id": "W2pFFNYPCCW9" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **4.1 Select a map projection using `cartopy`**\n", | |
| "\n", | |
| "The`cartopy` package has many [map projection options](https://scitools.org.uk/cartopy/docs/latest/reference/projections.html), each with its own strengths and limitations.\n", | |
| "\n", | |
| "Choose a map projection that highlights/emphasizes the satellite data with which you are working. This tutorial demonstrates two different map projections: `Plate Carree` and `Geostationary`." | |
| ], | |
| "metadata": { | |
| "id": "VSaR6rQmA9Vv" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "#### **4.1.1 : Plate Carree map projection**\n", | |
| "\n", | |
| "To make a map, we set up a figure in `Matplotlib` and add `geoaxes` with a map projection using `cartopy`.\n", | |
| "\n", | |
| "The first example of a map projection is the [Plate Carree](https://scitools.org.uk/cartopy/docs/latest/reference/projections.html#platecarree) equidistant cylindrical (equirectangular) projection: `projection=ccrs.PlateCarree()`.\n", | |
| "\n", | |
| "We add [coastlines](https://scitools.org.uk/cartopy/docs/latest/reference/generated/cartopy.mpl.geoaxes.GeoAxes.html#cartopy.mpl.geoaxes.GeoAxes.coastlines) to the map, so we can see how the Earth is represented by this projection. You can ignore the `DownloadWarning` message; the first time you access a [Natural Earth Feature shapefile](https://scitools.org.uk/cartopy/docs/latest/reference/generated/cartopy.io.shapereader.natural_earth.html#cartopy.io.shapereader.natural_earth) via `cartopy`, you will see this warning message." | |
| ], | |
| "metadata": { | |
| "id": "8JQY6jxSDefH" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Set up figure in matplotlib\n", | |
| "fig = plt.figure(figsize=(8, 10))\n", | |
| "\n", | |
| "# Set Plate Carree map projection using cartopy\n", | |
| "ax = plt.axes(projection=ccrs.PlateCarree())\n", | |
| "\n", | |
| "# Add coastlines\n", | |
| "ax.coastlines()\n", | |
| "\n", | |
| "# Show plot\n", | |
| "plt.show()" | |
| ], | |
| "metadata": { | |
| "id": "cLhoZEyvDd8B" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "1d8958d6" | |
| }, | |
| "source": [ | |
| "#### **4.1.2 : Geostationary map projection**\n", | |
| "\n", | |
| "When plotting geostationary satellite data, like TEMPO data, many end users prefer to use the native [Geostationary](https://scitools.org.uk/cartopy/docs/latest/reference/projections.html#geostationary) projection.\n", | |
| "\n", | |
| "A number of constants are used to define the `geo_projection`:\n", | |
| "- TEMPO flies on the Intelsat-40e satellite:\n", | |
| " - `satellite_height`: height of satellite above ellipsoid (Earth)\n", | |
| " - `central_longitude`: longitude where satellite is centered; 91° W\n", | |
| "- Ellipsoid (Earth):\n", | |
| " - `semi_major_axis`: semi-major axis of Earth using Geodetic Reference System of 1980 (GRS80); same as for GOES-R ABI data\n", | |
| " - `semi_minor_axis`: semi-minor axis of Earth using Geodetic Reference System of 1980 (GRS80); same as for GOES-R ABI data\n", | |
| " - `inverse_flattening`: reciprocal of Geodetic Reference System of 1980 (GRS80) flattening factor; same as for GOES-R ABI data" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Define constants for Intelsat-40e geostationary orbit & Earth reference system\n", | |
| "satellite_height = 35786023.0 # height of satellite in meters\n", | |
| "central_longitude = -91.0 # longitude of satellite in degrees\n", | |
| "semi_major_axis = 6378137.0 # GRS80 semi-major axis of earth in meters\n", | |
| "semi_minor_axis = 6356752.31414 # GRS80 semi-minor axis of earth in meters\n", | |
| "inverse_flattening = 298.257222096 # Reciprocal of GRS80 flattening factor\n", | |
| "\n", | |
| "# Define geostationary map projection using cartopy\n", | |
| "globe = ccrs.Globe(semimajor_axis=semi_major_axis, semiminor_axis=semi_minor_axis,\n", | |
| " inverse_flattening=inverse_flattening)\n", | |
| "geo_projection = ccrs.Geostationary(central_longitude=central_longitude,\n", | |
| " satellite_height=satellite_height, globe=globe)" | |
| ], | |
| "metadata": { | |
| "id": "iPVxDKi6atRo" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Set up figure in matplotlib\n", | |
| "fig = plt.figure(figsize=(8, 10))\n", | |
| "\n", | |
| "# Set geostationary map projection using cartopy\n", | |
| "ax = plt.axes(projection=geo_projection)\n", | |
| "\n", | |
| "# Add coastlines\n", | |
| "ax.coastlines()\n", | |
| "\n", | |
| "# Show plot\n", | |
| "plt.show()" | |
| ], | |
| "metadata": { | |
| "id": "wcPEvAKqFWms" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **4.2 Set the geographic domain of the map**\n", | |
| "\n", | |
| "Use the `set_extent([bounding_box_corners])` [function](https://scitools.org.uk/cartopy/docs/latest/reference/generated/cartopy.mpl.geoaxes.GeoAxes.html#cartopy.mpl.geoaxes.GeoAxes.set_extent) to set the geographic domain of the map, where the `bounding_box_corners` are the `[western_longitude, eastern_longitude, southern_latitude, northern_latitude]` of the zoomed-in domain in degrees; negative values indicate °S latitude or °W longitude.\n", | |
| "\n", | |
| "The argument `crs=ccrs.PlateCarree()` must be used with the `set_extent()` function because we are entering the `bounding_box_corners` in geographic coordinates (latitude and longitude).\n", | |
| "\n", | |
| "Let's zoom into the TEMPO field of regard (FOR) with the `geostationary` map projection. As before, you can ignore the `DownloadWarning` message related to downloading the Natural Earth Feature shapefile via `cartopy`." | |
| ], | |
| "metadata": { | |
| "id": "CHU_sPJeDrx8" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Set up figure in matplotlib\n", | |
| "fig = plt.figure(figsize=(8, 10))\n", | |
| "\n", | |
| "# Set geostationary map projection using cartopy\n", | |
| "ax = plt.axes(projection=geo_projection)\n", | |
| "\n", | |
| "# Set geographic domain of map: [W_lon, E_lon, S_lat, N_lat]\n", | |
| "# °E longitude > 0 > °W longitude, °N latitude > 0 > °S latitude\n", | |
| "ax.set_extent([-121, -64, 17, 59], crs=ccrs.PlateCarree())\n", | |
| "\n", | |
| "# Add coastlines\n", | |
| "ax.coastlines()\n", | |
| "\n", | |
| "# Show plot\n", | |
| "plt.show()" | |
| ], | |
| "metadata": { | |
| "id": "3T1l1u6UCMmM" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **4.3 Add borders and shade land & water polygons**\n", | |
| "\n", | |
| "The `cartopy` package has a number of options for adding borders to a map. We've already seen how to add `coastlines()`. For more control over borders as well as land & water polygons, use the `cartopy` [Feature interface](https://scitools.org.uk/cartopy/docs/latest/matplotlib/feature_interface.html), which defines seven common features.\n", | |
| "\n", | |
| "The Feature Interface borders plot as black lines with `linewidth=1` by default. You can modify the default Features interface settings to change the color and/or width of the coastlines and borderlines, and to shade land, ocean, and lakes polygons using `Matplotlib` [built-in named colors](https://matplotlib.org/stable/gallery/color/named_colors.html). \n", | |
| "\n", | |
| "Working with our TEMPO FOR `geostationary` projection map, let's add coastlines, international borders, US state & Canada province borders, and large lakes from the Features interface. And let's color the ocean and lakes `facecolor='lightblue'` and the land `facecolor='wheat'`, and change the widths of the coast/border lines.\n", | |
| "\n", | |
| "As before, you can ignore the `DownloadWarning` messages related to downloading Natural Earth Feature shapefiles via `cartopy`." | |
| ], | |
| "metadata": { | |
| "id": "3bc3Wz5nHo-W" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Set up figure in matplotlib\n", | |
| "fig = plt.figure(figsize=(8, 10))\n", | |
| "\n", | |
| "# Set geostationary map projection using cartopy\n", | |
| "ax = plt.axes(projection=geo_projection)\n", | |
| "\n", | |
| "# Set geographic domain of map: [W_lon, E_lon, S_lat, N_lat]\n", | |
| "# °E longitude > 0 > °W longitude, °N latitude > 0 > °S latitude\n", | |
| "ax.set_extent([-121, -64, 17, 59], crs=ccrs.PlateCarree())\n", | |
| "\n", | |
| "# Add coastlines & borders, shade land & water polygons\n", | |
| "ax.add_feature(cfeature.COASTLINE, linewidth=0.5)\n", | |
| "ax.add_feature(cfeature.BORDERS, linewidth=0.5)\n", | |
| "ax.add_feature(cfeature.LAKES, facecolor='lightblue')\n", | |
| "ax.add_feature(cfeature.STATES, linewidth=0.25)\n", | |
| "ax.add_feature(cfeature.LAND, facecolor='wheat')\n", | |
| "ax.add_feature(cfeature.OCEAN, facecolor='lightblue')\n", | |
| "\n", | |
| "# Show plot\n", | |
| "plt.show()" | |
| ], | |
| "metadata": { | |
| "id": "m7rXeDrxK2cp" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **4.4 Add latitude/longitude gridlines & labels using `cartopy`**\n", | |
| "\n", | |
| "Latitude and longitude gridlines and labels can be added using the `cartopy` [gridlines function](https://scitools.org.uk/cartopy/docs/latest/reference/generated/cartopy.mpl.geoaxes.GeoAxes.html#cartopy.mpl.geoaxes.GeoAxes.gridlines), conventionally abbreviated as `gl`. Examples of how to apply the gridlines settings are shown in [cartopy's gridlines tutorial](https://scitools.org.uk/cartopy/docs/latest/matplotlib/gridliner.html).\n", | |
| "\n", | |
| "The appearance of gridlines and labels will be different for different map projections. The easiest approach is to begin with the default settings to add gridlines and labels `gl=ax.gridlines(draw_labels=True)`, and then use parameters in the `gridlines()` function to customize the grid labels and the gridlines.\n", | |
| "\n", | |
| "Let's see how to:\n", | |
| "- Add `'navy'` colored dashed gridlines with `linewidth=0.5`\n", | |
| "- Set tick mark intervals `lon_ticks` and `lat_ticks` for the latitude/longitude grid\n", | |
| "- Assign the locations of the gridlines using the `matplotlib` `ticker.FixedLocator()` [function](https://matplotlib.org/stable/api/ticker_api.html#matplotlib.ticker.FixedLocator)\n", | |
| "- Remove the labels along the bottom and right sides of map\n", | |
| "- Make the `'size'` of the labels smaller (8-point font)." | |
| ], | |
| "metadata": { | |
| "id": "uoozbd48Nm71" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Set up figure in matplotlib\n", | |
| "fig = plt.figure(figsize=(8, 10))\n", | |
| "\n", | |
| "# Set geostationary map projection using cartopy\n", | |
| "ax = plt.axes(projection=geo_projection)\n", | |
| "\n", | |
| "# Set geographic domain of map: [W_lon, E_lon, S_lat, N_lat]\n", | |
| "# °E longitude > 0 > °W longitude, °N latitude > 0 > °S latitude\n", | |
| "ax.set_extent([-121, -64, 17, 59], crs=ccrs.PlateCarree())\n", | |
| "\n", | |
| "# Format lat/lon gridlines using cartopy\n", | |
| "lon_ticks = [-120, -100, -80, -60]\n", | |
| "lat_ticks = [20, 30, 40, 50]\n", | |
| "gl = ax.gridlines(draw_labels=True, linewidth=0.5, color='navy', ls='--')\n", | |
| "gl.xlocator = ticker.FixedLocator(lon_ticks)\n", | |
| "gl.ylocator = ticker.FixedLocator(lat_ticks)\n", | |
| "gl.right_labels = False\n", | |
| "gl.bottom_labels = False\n", | |
| "gl.xlabel_style = {'size': 8}\n", | |
| "gl.ylabel_style = {'size': 8}\n", | |
| "\n", | |
| "# Add coastlines & borders, shade land & water polygons\n", | |
| "ax.add_feature(cfeature.COASTLINE, linewidth=0.5)\n", | |
| "ax.add_feature(cfeature.BORDERS, linewidth=0.5)\n", | |
| "ax.add_feature(cfeature.LAKES, facecolor='lightblue')\n", | |
| "ax.add_feature(cfeature.STATES, linewidth=0.25)\n", | |
| "ax.add_feature(cfeature.LAND, facecolor='wheat')\n", | |
| "ax.add_feature(cfeature.OCEAN, facecolor='lightblue')\n", | |
| "\n", | |
| "# Show plot\n", | |
| "plt.show()" | |
| ], | |
| "metadata": { | |
| "id": "N7mP-7PNOI6I" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **4.5 Pull information from the satellite file name for plot title & name of saved image file**\n", | |
| "\n", | |
| "Next, we're going to plot TEMPO NO2 and VIIRS AOD data on maps. For each map file we make, we could manually assign the text of the plot title and the saved image file name, but that is tedious and prone to error. A better approach is to use the information in a satellite data file name to generate the plot title and file save name automatically.\n", | |
| "\n", | |
| "Let's see how to do this by making a title for a plot of the TEMPO gridded NO2 data. We set the full directory path to the file on the Colab instance using the `pathlib` module.\n", | |
| "- Recall from Section 1 that the `pathlib` module uses `/` to join the directory path with the file name to get the full path for the file, e.g., `Path.cwd() / <file_name>`.\n", | |
| "- Appending `.name` to a `pathlib` object (e.g., `no2_file.name`) extracts a string representing the final path component, in this case, the file name.\n", | |
| "\n", | |
| "We can extract parts of the satellite data file name using the Python `str.split()` [built-in type](https://docs.python.org/3/library/stdtypes.html#str.split) to split the file name using the `_` delimiter, in conjunction with Python [slicing](https://stackoverflow.com/questions/509211/how-slicing-in-python-works).\n", | |
| "\n", | |
| "For satellite data plot titles, I like to specify the observation date and time. I recommend reformatting the observation date using the `datetime` module's [datetime format codes](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes). This way, we can reformat the date in the satellite file name, which is in **YYYMMDD** format, to something more readable for a plot title, such as **DD Mon YYYY** format (e.g., \"13 Nov 2024\"); this format avoids any confusion with the order of date abbreviations used in the US (MM/DD) and in Europe (DD/MM).\n", | |
| "\n", | |
| "**Coding Notes:** The text formatting `'$_{2}$'` will appear as a subscripted `2` when we add the plot title with `matplotlib`. Python [f-strings](https://docs.python.org/3/tutorial/inputoutput.html#tut-f-strings) are used to join the defined variables and strings to make the `plot_title`." | |
| ], | |
| "metadata": { | |
| "id": "1dC3poi3Uk8w" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Set full directory path to file\n", | |
| "no2_file = Path.cwd() / 'TEMPO_NO2_L3_V03_20241113T182249Z_S009.nc'\n", | |
| "\n", | |
| "# Print the string representing the final path component\n", | |
| "print(no2_file.name)\n", | |
| "print(type(no2_file.name))" | |
| ], | |
| "metadata": { | |
| "id": "MF-kmAFguQ2v" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Pull info out of TEMPO file name string, reformat & make plot title automatically\n", | |
| "file_date = no2_file.name.split('_')[4][:8]\n", | |
| "title_date = datetime.datetime.strptime(file_date, '%Y%m%d').date().strftime('%d %b %Y')\n", | |
| "file_time = no2_file.name.split('_')[4][9:13]\n", | |
| "scan = no2_file.name.split('_')[5][1:4]\n", | |
| "plot_title = (f'TEMPO Tropospheric Column NO$_{2}$ {title_date} Scan {scan}'\n", | |
| " f' {file_time} UTC')\n", | |
| "\n", | |
| "# Print the automatically generated plot title (to check)\n", | |
| "print(plot_title)" | |
| ], | |
| "metadata": { | |
| "id": "jmXLbuO8CMr4" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### **4.6 Putting it all together: Plot processed TEMPO NO2 on a map**\n", | |
| "\n", | |
| "The code below builds on concepts presented previously in the tutorial to make a professional-looking plot of processed TEMPO gridded (L3) NO2 data on a map.\n", | |
| "\n", | |
| "**There is always more than one way to do something with Python!** The code below presents a different approach than Section 3 for three aspects of the data analysis/plotting:\n", | |
| "\n", | |
| "1. The extra `time` dimension for variables in the `product` and `support_data` groups is removed using the `xarray.DataArray.squeeze` [function](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.squeeze.html) instead of the `xarray.DataArray.isel` [function](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.isel.html#xarray.DataArray.isel).\n", | |
| "2. The processed TEMPO NO2 data (`processed_no2`) are multiplied by 1x10E-15 and plotted on a scale of 0-10 to conform with the convention for displaying satellite NO2 data.\n", | |
| "3. The `latitude` and `longitude` 1-D arrays are used directly in the `matplotlib` `pyplot.pcolormesh()` [plotting function](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.pcolormesh.html) without converting them to 2-D arrays using the `numpy.meshgrid` [function](https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html)\n", | |
| " - `pcolormesh` can handle its `X` and `Y` input arrays (i.e., `latitude` and `longitude`) as 1-D arrays\n", | |
| " - If `X` and/or `Y` are 1-D arrays or column vectors they will be expanded as needed into the appropriate 2-D arrays, making a rectangular grid\n", | |
| "---\n", | |
| "**Additional notes on the map plotting code:**\n", | |
| "\n", | |
| "We open the data file utilizing the Python `with` [statement](https://www.geeksforgeeks.org/with-statement-in-python/), which automatically closes the file when we're done with it.\n", | |
| "\n", | |
| "I changed the colors of the gridlines and land & water polygons to provide a neutral background for the plotted satellite data.\n", | |
| "\n", | |
| " For plotting NO2 data, select any of the sequential colormaps from the `matplotlib` [built-in colormaps](https://matplotlib.org/stable/tutorials/colors/colormaps.html). Note that adding `_r` to the end of any colormap name reverses it (e.g., `'plasma_r'`). We [set a unique color for data > vmax](https://matplotlib.org/stable/api/_as_gen/matplotlib.colors.Colormap.html#matplotlib.colors.Colormap.with_extremes) using the `matplotlib` [built-in named colors](https://matplotlib.org/stable/gallery/color/named_colors.html). This highlights areas on the map with high NO2.\n", | |
| "\n", | |
| "The `matplotlib` `pyplot.pcolormesh()` [plotting function](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.pcolormesh.html) argument `transform=ccrs.PlateCarree()` tells `cartopy` that the plotted satellite data are in geographic coordinates (lat/lon). This argument **must** be included when plotting satellite data that are in geographic coordinates, or the data will not plot correctly on the map projection.\n", | |
| "\n", | |
| "We add a colorbar and format it using the `matplotlib.colorbar.colorbar()` [function](https://matplotlib.org/stable/api/colorbar_api.html).\n", | |
| "\n", | |
| "We save the map image plot to the Colab instance using the the `matplotlib` `fig.savefig()` [function](https://matplotlib.org/stable/api/figure_api.html#matplotlib.figure.Figure.savefig).\n", | |
| "\n", | |
| "- Change the resolution of the saved image file using the `dpi=` argument in `fig.savefig()`. The higher the dpi, the higher the figure resolution, but the larger the file size and the longer it will take to save the file. The default is `dpi=100`. I typically use `dpi=300` for general use (e.g., posting images on social media). Try `dpi=600` for presentations and `dpi=1000` for journal articles.\n", | |
| "- I recommend setting `bbox_inches='tight'` to minimize empty space around the figure (to zoom in \"tight\" on the plot)." | |
| ], | |
| "metadata": { | |
| "id": "lXLuj6WBLfw1" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Process & plot tropospheric TEMPO L3 NO2 on a geostationary map projection\n", | |
| "\n", | |
| "# Set full directory path to file\n", | |
| "no2_file = Path.cwd() / 'TEMPO_NO2_L3_V03_20241113T182249Z_S009.nc'\n", | |
| "\n", | |
| "# Open file using xarray datatree (automatically closes file when done)\n", | |
| "with xr.open_datatree(no2_file, engine='netcdf4') as dt:\n", | |
| "\n", | |
| " # Read in & process tropospheric NO2 using quality/diagnostic flags\n", | |
| " # \"dataarray.squeeze()\" removes \"time\" dimension & converts 3D arrays to 2D\n", | |
| " trop_no2 = dt['/product/vertical_column_troposphere'].squeeze(dim='time')\n", | |
| " main_data_qf = dt['/product/main_data_quality_flag'].squeeze(dim='time')\n", | |
| " eff_cloud_frac = dt['/support_data/eff_cloud_fraction'].squeeze(dim='time')\n", | |
| " processed_no2 = trop_no2.where((main_data_qf == 0) & (eff_cloud_frac < 0.2))\n", | |
| "\n", | |
| " # Scale NO2 by factor of 1x10E15 (for plotting)\n", | |
| " processed_no2 = processed_no2*1.0E-15\n", | |
| "\n", | |
| " # Read in latitude and longitude\n", | |
| " longitude = dt['longitude']\n", | |
| " latitude = dt['latitude']\n", | |
| "\n", | |
| " # Set up figure in matplotlib\n", | |
| " fig = plt.figure(figsize=(8, 10))\n", | |
| "\n", | |
| " # Set map projection using cartopy\n", | |
| " # Use TEMPO's geostationary projection\n", | |
| " ax = plt.axes(projection=geo_projection)\n", | |
| "\n", | |
| " # Set geographic domain of map: [W_lon, E_lon, S_lat, N_lat]\n", | |
| " # °E longitude > 0 > °W longitude, °N latitude > 0 > °S latitude\n", | |
| " ax.set_extent([-121, -64, 17, 59], crs=ccrs.PlateCarree())\n", | |
| "\n", | |
| " # Format lat/lon gridlines using cartopy\n", | |
| " lon_ticks = [-120, -100, -80, -60]\n", | |
| " lat_ticks = [20, 30, 40, 50]\n", | |
| " gl = ax.gridlines(draw_labels=True, linewidth=0.3, color='silver')\n", | |
| " gl.xlocator = ticker.FixedLocator(lon_ticks)\n", | |
| " gl.ylocator = ticker.FixedLocator(lat_ticks)\n", | |
| " gl.right_labels = False\n", | |
| " gl.bottom_labels = False\n", | |
| " gl.xlabel_style = {'size': 8}\n", | |
| " gl.ylabel_style = {'size': 8}\n", | |
| "\n", | |
| " # Add coastlines & borders, shade land & water polygons\n", | |
| " ax.add_feature(cfeature.COASTLINE, linewidth=0.5)\n", | |
| " ax.add_feature(cfeature.BORDERS, linewidth=0.5)\n", | |
| " ax.add_feature(cfeature.LAKES, facecolor='lightgrey')\n", | |
| " ax.add_feature(cfeature.STATES, linewidth=0.25)\n", | |
| " ax.add_feature(cfeature.LAND, facecolor='grey')\n", | |
| " ax.add_feature(cfeature.OCEAN, facecolor='lightgrey')\n", | |
| "\n", | |
| " # Format & add plot title\n", | |
| " # Pull information from file name string & reformat\n", | |
| " file_date = no2_file.name.split('_')[4][:8]\n", | |
| " title_date = datetime.datetime.strptime(file_date, '%Y%m%d').date().strftime('%d %b %Y')\n", | |
| " file_time = no2_file.name.split('_')[4][9:13]\n", | |
| " scan = no2_file.name.split('_')[5][1:4]\n", | |
| " plot_title = (f'TEMPO Tropospheric Column NO$_{2}$ {title_date} Scan {scan}'\n", | |
| " f' {file_time} UTC')\n", | |
| " plt.title(plot_title, pad=10, size=8, weight='bold')\n", | |
| "\n", | |
| " # Set colormap with unique color for data > vmax\n", | |
| " cmap = plt.get_cmap('rainbow').with_extremes(over='darkred')\n", | |
| "\n", | |
| " # Plot tropospheric NO2 data\n", | |
| " # \"transform=ccrs.PlateCarree()\" argument needed b/c data are in geographic coordinates (lat/lon)\n", | |
| " plot = ax.pcolormesh(longitude, latitude, processed_no2, cmap=cmap, vmin=0,\n", | |
| " vmax=10, transform=ccrs.PlateCarree())\n", | |
| "\n", | |
| " # Add colorbar\n", | |
| " cb = fig.colorbar(plot, orientation='horizontal', fraction=0.2, pad=0.02,\n", | |
| " shrink=0.5, ticks=[0, 2, 4, 6, 8, 10], extend='max')\n", | |
| " cb.set_label(label='Tropospheric Column NO$_{2}$ (10$\\mathregular{^{15}}$ cm$\\mathregular{^{-2}}$)',\n", | |
| " size=8, weight='bold')\n", | |
| "\n", | |
| " # Show plot\n", | |
| " plt.show()\n", | |
| "\n", | |
| " # Save image file\n", | |
| " save_name = f'tempo_tropospheric_no2_{file_date}_scan{scan}_{file_time}'\n", | |
| " fig.savefig(Path.cwd() / save_name, dpi=300, bbox_inches='tight')\n", | |
| "\n", | |
| " # Close plot\n", | |
| " plt.close()" | |
| ], | |
| "metadata": { | |
| "id": "WuP-sonW4kwY" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "##### <b><font color=\"blue\" size=\"5\">Exercise 4-1: Plot NOAA-20 VIIRS gridded AOD data on global map</b></font>\n", | |
| "\n", | |
| "In the empty block below, write the code to plot NOAA-20 VIIRS gridded AOD data on a global map with a Plate Carree map projection. Use the code to plot processed TEMPO gridded NO2 data as a starting point.\n", | |
| "\n", | |
| "**Hints:**\n", | |
| "- Review Section 2 to see how you opened the AOD file, and what the contents are.\n", | |
| " - Are there any groups in this file?\n", | |
| "- You will need to extract the `AOD550`, `lon`, and `lat` variables.\n", | |
| "- No processing of the `AOD550` data is required! The file was generated using high quality AOD data.\n", | |
| "- For a Plate Carree map projection, set `projection=ccrs.PlateCarree()`\n", | |
| "- For a global domain, use:\n", | |
| " - `ax.set_extent([180, -180, -90, 90], crs=ccrs.PlateCarree())`\n", | |
| " - `lon_ticks = [0, 60, 120, 180, -180, -120, -60]`\n", | |
| " - `lat_ticks = [-60, -30, 0, 30, 60]`\n", | |
| "- Plot AOD data with a range of `vmin=0` and `vmax=1`.\n", | |
| "- Select a sequential colormap and set a unique color for AOD >1.\n", | |
| "\n", | |
| "\n", | |
| "\n" | |
| ], | |
| "metadata": { | |
| "id": "WZUgEzQXoMCX" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "# Plot VIIRS gridded AOD on a Plate Carree map projection\n", | |
| "\n" | |
| ], | |
| "metadata": { | |
| "id": "m6Qm6iberUvR" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "<details><summary><b><font color=\"blue\" size=5>An answer for Exercise 4-1</font></b></summary>\n", | |
| "<p></p>\n", | |
| "\n", | |
| "```\n", | |
| "# Plot VIIRS gridded AOD on a Plate Carree map projection\n", | |
| "\n", | |
| "# Set full directory path to file\n", | |
| "aod_file = Path.cwd() / 'viirs_eps_noaa20_aod_0.250_deg_20200821.nc'\n", | |
| "\n", | |
| "# Open file using xarray dataset (automatically closes file when done)\n", | |
| "with xr.open_dataset(aod_file, engine='netcdf4') as ds:\n", | |
| "\n", | |
| " # Read in AOD550, latitude, and longitude\n", | |
| " aod = ds['AOD550']\n", | |
| " lon = ds['lon']\n", | |
| " lat = ds['lat']\n", | |
| "\n", | |
| " # Set up figure in matplotlib\n", | |
| " fig = plt.figure(figsize=(8, 10))\n", | |
| "\n", | |
| " # Set map projection using cartopy\n", | |
| " # Use Plate Carree projection\n", | |
| " ax = plt.axes(projection=ccrs.PlateCarree())\n", | |
| "\n", | |
| " # Set geographic domain of map: [W_lon, E_lon, S_lat, N_lat]\n", | |
| " # °E longitude > 0 > °W longitude, °N latitude > 0 > °S latitude\n", | |
| " ax.set_extent([180, -180, -90, 90], crs=ccrs.PlateCarree())\n", | |
| "\n", | |
| " # Format lat/lon gridlines using cartopy\n", | |
| " lon_ticks = [0, 60, 120, 180, -180, -120, -60]\n", | |
| " lat_ticks = [-60, -30, 0, 30, 60]\n", | |
| " gl = ax.gridlines(draw_labels=True, linewidth=0.3, color='silver')\n", | |
| " gl.xlocator = ticker.FixedLocator(lon_ticks)\n", | |
| " gl.ylocator = ticker.FixedLocator(lat_ticks)\n", | |
| " gl.right_labels = False\n", | |
| " gl.top_labels = False\n", | |
| " gl.xlabel_style = {'size': 8}\n", | |
| " gl.ylabel_style = {'size': 8}\n", | |
| "\n", | |
| " # Add coastlines & borders, shade land & water polygons\n", | |
| " ax.add_feature(cfeature.COASTLINE, linewidth=0.75)\n", | |
| " ax.add_feature(cfeature.BORDERS, linewidth=0.75)\n", | |
| " ax.add_feature(cfeature.LAKES, facecolor='lightgrey')\n", | |
| " ax.add_feature(cfeature.STATES, linewidth=0.5)\n", | |
| " ax.add_feature(cfeature.LAND, facecolor='grey')\n", | |
| " ax.add_feature(cfeature.OCEAN, facecolor='lightgrey')\n", | |
| "\n", | |
| " # Format & add plot title\n", | |
| " # Pull information from file name & reformat\n", | |
| " satellite = aod_file.name.split('_')[2]\n", | |
| " if satellite == 'npp':\n", | |
| " satellite_name = 'SNPP'\n", | |
| " elif satellite == 'noaa20':\n", | |
| " satellite_name = 'NOAA-20'\n", | |
| " resolution = aod_file.name.split('_')[4][:4]\n", | |
| " file_date = aod_file.name.split('_')[6][:8]\n", | |
| " title_date = datetime.datetime.strptime(file_date, '%Y%m%d').date().strftime('%d %b %Y')\n", | |
| " plot_title = (f'{satellite_name}/VIIRS Aerosol Optical Depth ({resolution}\\N{DEGREE SIGN} resolution)'\n", | |
| " f' {title_date}')\n", | |
| " plt.title(plot_title, pad=10, size=8, weight='bold')\n", | |
| "\n", | |
| " # Set colormap with unique color for data > vmax\n", | |
| " cmap = plt.get_cmap('rainbow').with_extremes(over='darkred')\n", | |
| "\n", | |
| " # Plot VIIRS AOD data\n", | |
| " # \"transform=ccrs.PlateCarree()\" argument needed b/c data are in geographic coordinates (lat/lon)\n", | |
| " plot = ax.pcolormesh(lon, lat, aod, cmap=cmap, vmin=0, vmax=1,\n", | |
| " transform=ccrs.PlateCarree())\n", | |
| "\n", | |
| " # Add colorbar\n", | |
| " cb = fig.colorbar(plot, orientation='horizontal', fraction=0.2, pad=0.05,\n", | |
| " shrink=0.5, ticks=[0, 0.25, 0.5, 0.75, 1], extend='max')\n", | |
| " cb.set_label(label='Aerosol Optical Depth at 550nm', size=8, weight='bold')\n", | |
| "\n", | |
| " # Show plot\n", | |
| " plt.show()\n", | |
| "\n", | |
| " # Save image file\n", | |
| " save_name = f'{satellite}_viirs_aod_gridded_{file_date}'\n", | |
| " fig.savefig(Path.cwd() / save_name, dpi=300, bbox_inches='tight')\n", | |
| "\n", | |
| " # Close plot\n", | |
| " plt.close()\n", | |
| "```\n", | |
| "\n", | |
| "![noaa20_viirs_aod_gridded_20200821.png]( |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment