Commit d23fe45 added facet matches to 4 storage services. However, the old system used numeric facet IDs while the new refactored system uses semantic facet IDs. Additionally, some old facets no longer exist in the new system.
Based on the old facet-tree.json:
| Old ID | Facet Category | Description |
|---|---|---|
| 5 | Risk Classification | Public / Low Risk |
| 6 | Risk Classification | Sensitive / Moderate Risk |
| 7 | Risk Classification | Confidential or Restricted / High Risk |
| 8 | Risk Classification | HIPAA-Regulated |
| 14 | Data Size | Greater than 1TB or likely to exceed in 2 years |
| 23 | Who Needs Access | Non-specific collaborators via a shared link |
| 25 | Who Needs Access | Specific collaborators external to NYU without NetIDs |
| 33 | Purpose | Backups |
| 39 | Access Location | Off campus without a VPN |
| 55 | Additional Capabilities | Version control / snapshots |
| 58 | Additional Capabilities | Data replication (copy exists in other locations) |
| 60 | Additional Capabilities | Automated workflows |
Change: Added facet "8" (HIPAA-Regulated)
Current CSV data:
- Storable Files: "High"
Required CSV change:
- Storable Files: "High, HIPAA" or "HIPAA" (to trigger HIPAA matcher in config.ts line 101)
New system equivalent:
- Maps to
risk-classification.hipaafacet - Matcher:
/\bhipaa\b/iin "Storable Files" column
Changes: Added facets:
- "60" (Automated workflows)
- "55" (Version control / snapshots)
- "23" (Non-specific collaborators via shared link)
- "25" (Specific collaborators external to NYU)
Current CSV data:
- Permission Settings: "One owner, adjustable collaboration permissions for teamwork"
- Other columns don't mention workflows, versioning, or external access
Problem: The new facet system does NOT include "Additional Capabilities" facets (automated workflows, version control). These facets no longer exist.
Possible CSV changes (for facets that still exist):
For facet "23" (shared link):
- Maps to
access-needs.shared-linkin new system - Permission Settings: Add "shared link" → "One owner, shared link collaboration permissions for teamwork"
- Matcher:
/\bshared link\b/i(config.ts line 191)
For facet "25" (external collaborators):
- Maps to
access-needs.external-collaboratorsin new system - Permission Settings: Add "external" → "One owner, adjustable collaboration permissions for teamwork, external collaborators supported"
- Matcher:
/\bexternal\b/i(config.ts line 195)
Note: Automated workflows (60) and version control (55) cannot be replicated in the new system as these facet options were removed during refactoring.
Changes:
- Removed facet "33" (Backups purpose)
- Added facet "39" (Off campus without VPN access)
- Added facet "58" (Data replication capability)
Current CSV data:
- Use Case: "A collection of Geospatial Data from Open Data and proprietary resources. Users can download layers and work with them in many GIS platforms"
Problem: These facet categories no longer exist in the new system:
- "Purpose" facet (which included "Backups" option) → Removed
- "Access Location" facet (which included "Off campus without VPN") → Removed
- "Additional Capabilities" facet (which included "Data replication") → Removed
No equivalent changes possible - these facet dimensions were removed in the refactoring.
Change: Added facet "14" (Greater than 1TB or likely to exceed in 2 years)
Current CSV data:
- Limitations: "Recommended to 100 files or 50GB (per file), contact repository team uv@nyu.edu if exceeds this limit"
Problem: The current text suggests 50GB per file limit, which would NOT trigger the large storage facet.
Required CSV change: To indicate storage > 1TB or unlimited capacity, modify the Limitations column. However, the new system's storage capacity matchers (config.ts lines 436-462) look for:
- "5TB", "no limit", or "unlimited" → All three capacity options (small, medium, large)
- "50 GB" or "2TB" → small and medium
- "20 GB" → small only
The old facet "14" meant "> 1TB", which in the NEW system would map to having storage-capacity.medium and/or storage-capacity.large facets.
Possible change:
- Limitations: Update to mention larger capacity like "No storage limit" or "Supports large datasets" to trigger large capacity matchers
- However, this may not be factually accurate if the service actually has a 50GB per file limit
The old system had these facet categories that NO LONGER EXIST in the new refactored system:
- "For what purpose will you be using this storage?" (included Backups, Project storage, Hosting, etc.)
- "From where will the data be accessed?" (included VPN, off-campus, browser, workstation, cloud)
- "What additional capabilities do you require?" (included automated workflows, version control, data replication, DOIs, etc.)
- "How large is any individual file?" (size ranges)
The new system simplified to these facet categories:
- Risk classification
- University affiliation
- Who needs access (simplified)
- Backup availability (yes/no only)
- Synchronous access (yes/no only)
- Alumni access (yes/no only)
- Storage duration (long-term vs temporary)
- Primary purpose (archive, research, collaboration, surveys, media, repository)
- Budget (free vs paid)
- Storage capacity (small, medium, large)
- Special requirements (none, active directory, faculty sponsorship, data stewardship, project approval)
Changes that CAN be made to the CSV:
-
NYU Box (row 14):
Storable Files: "High, HIPAA" -
Qualtrics (row 20):
Permission Settings: "One owner, adjustable collaboration permissions for teamwork, shared link access, external collaborators supported" -
Ultraviolet Repository (row 27):
- If the service actually supports > 1TB total storage, update Limitations to mention this
- Otherwise, no change needed as 50GB limit is accurate
Changes that CANNOT be made:
- Spatial Data Repository changes (facets no longer exist)
- Qualtrics automated workflows and version control facets (no longer exist)
The refactoring removed several facet dimensions, making it impossible to fully replicate the granular filtering that commit d23fe45 provided.