Nushell using Group-By and Histogram together

I saw this in the Nushell Discotrd Channel. They were showing how they used a group-by and histogram together.

Data

The data came from Reaction_Watch.
This describes the data.

Split Records

The reaction_watch.csv "Subject" field can have more than one "field of study". This file must be split so there is a record for each "field of study". This will also update the new field, Area, with the full name of the Area instead of just a three character entry.

Before the split, there are 68.513 records.
After the split, there are 186,442 records.

Nushell Code to Split

open d:/work/visidata/retraction_watch.csv
  | get  Subject
  | split row ";"
  | compact -e
  #| parse --regex  '^\((?P<area>\S+)\)\s(?P<subject>\w+(?:(?:\s\w+)+(?:\/\w+)?|\/\w+(?:\s\w+)?)?)?(?:\s\-\s?)?(?P<specialization>\w+(?:\s\w+)?(?:\/\w+)?)?'
  | parse --regex r#'(?x)^\( (?<area>\S+) \)\s(?<subject>\w+(?:(?:\s\w+)+(?:\/\w+)?|\/\w+(?:\s\w+)?)?)?(?:\s\-\s?)?(?<specialization>\w+(?:\s\w+)?(?:\/\w+)?)?'#
  | par-each {
      update area {
            str replace --all 'B/T' 'B/T: Business and Technology'
          | str replace --all 'BLS' 'BLS: Basic Life Sciences'
          | str replace --all 'ENV' 'ENV: Environmental Sciences'
          | str replace --all 'HSC' 'HSC: Health Sciences'
          | str replace --all 'HUM' 'HUM: Humanities'
          | str replace --all 'PHY' 'PHY: Physical Sciences'
          | str replace --all 'SOC' 'SOC: Social Sciences'
      }
  }
  | save -f d:/work/Visidata/retraction_watch_split.csv

Nushell Code to get a histgram by group-by

open d:/work/Visidata/retraction_watch_split.csv
  | group-by area --to-table
  | par-each {update items { histogram subject } }

Outupt screenshot

paul-d-ray/Nushell_Group-by_and_Histogram.md

Select an option

No results found