This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import requests | |
| import sys | |
| import os | |
| from urllib.parse import quote, unquote | |
| def cauta_in_wayback(url_pattern): | |
| """Caută snapshot-uri în Wayback Machine""" | |
| print(f"\nCaut: {url_pattern}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import cv2 | |
| import numpy as np | |
| from pathlib import Path | |
| def detect_underline_by_vertical_context(img, low=150, high=220, bg_thresh=220, check_dist=3): | |
| """ | |
| Detectează subliniere bazat pe context vertical. | |
| Un pixel e subliniere dacă: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import cv2 | |
| import numpy as np | |
| from pathlib import Path | |
| def process_v1_frequency_zones(img_gray, low=150, high=220): | |
| """Elimină pixelii din zonele cu frecvență mică""" | |
| result = img_gray.copy() | |
| mask = (img_gray >= low) & (img_gray < high) | |
| result[mask] = 245 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import cv2 | |
| import numpy as np | |
| from pathlib import Path | |
| def remove_underlines_statistical(image_path, output_path=None): | |
| """ | |
| Elimină sublinierile folosind abordare statistică: | |
| - Frecvența pixelilor în zona de subliniere per rând | |
| - Mediana + k*std ca threshold pentru detectare |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # PowerShell script pentru TOATE PDF-urile de pe biblioteca-digitala.ro | |
| # Foloseste Wayback Machine CDX API pentru a gasi TOATE cele ~100.000 PDF-uri | |
| $OutputDir = "G:\biblioteca-digitala-COMPLET" | |
| # Creaza directorul daca nu exista | |
| if (!(Test-Path $OutputDir)) { | |
| New-Item -ItemType Directory -Path $OutputDir | Out-Null | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # PowerShell script pentru biblioteca-digitala.ro | |
| # PDF-urile se vor descarca in G:\yyyy | |
| $OutputDir = "G:\yyyy" | |
| # Creaza directorul daca nu exista | |
| if (!(Test-Path $OutputDir)) { | |
| New-Item -ItemType Directory -Path $OutputDir | Out-Null | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/bash | |
| # Script pentru descărcarea PDF-urilor de pe biblioteca-digitala.ro/reviste/carte/ | |
| # Folosește Wayback Machine CDX API + wget/curl | |
| OUTPUT_DIR="${1:-biblioteca_digitala_pdfs}" | |
| URLS_FILE="$OUTPUT_DIR/pdf_urls.txt" | |
| MAX_PARALLEL=3 | |
| echo "============================================================" | |
| echo "Scraper pentru biblioteca-digitala.ro/reviste/carte/" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python3 | |
| """ | |
| Scraper pentru descărcarea PDF-urilor de pe biblioteca-digitala.ro/reviste/carte/ | |
| Versiune îmbunătățită cu URL-uri pre-descoperite și suport Wayback Machine. | |
| METODĂ RECOMANDATĂ: | |
| 1. Rulează mai întâi: python3 scraper.py --wayback | |
| Aceasta va extrage toate URL-urile din Wayback Machine CDX API | |
| 2. Apoi descarcă: python3 scraper.py --download |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Alternative idei de SCENARII SCRIBD (cum se poate ajunge la acel rezultat) | |
| Da-mi 10 alternative, despre cum se putea ajunge la acel rezultat. Exemplu, se da textul: | |
| Iar în deșertul acela fără nume, scribul a găsit un pergament vechi, ros de vânturi și de timp, pe care erau scrise trei cuvinte în trei limbi diferite: *Ego*, *Mundus*, *Deus*. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import cv2 | |
| import numpy as np | |
| from pathlib import Path | |
| def remove_thin_horizontal_lines(binary_img, line_thickness=5, kernel_length=40): | |
| """ | |
| Elimină liniile orizontale subțiri din imaginea binarizată. | |
| """ | |
| inv = cv2.bitwise_not(binary_img) |
NewerOlder