mirror of
https://github.com/vondas-network/videobeaux.git
synced 2026-01-25 08:11:11 +01:00
4.1 KiB
4.1 KiB
hash_fingerprint
:contentReference[oaicite:0]{index=0}
Description
Creates unique fingerprints of video files using checksums, perceptual hashing, or frame-level hashing.
Useful for deduplication, archival identification, similarity matching, and database cataloging.
Purpose
hash_fingerprint allows Videobeaux users to generate machine-identifiable signatures from media files.
This supports:
- duplicate detection,
- content-based indexing,
- frame-level comparison,
- large-scale archival workflows,
- catalog metadata generation,
- cross-system verification of assets.
How It Works
- File System Scanning
recursiveallows walking entire folder trees.extsrestricts scanning to specific file types.
- Hash Types
The tool can generate several forms of fingerprints:- file_hashes → whole-file digests (MD5, SHA1, etc.)
- stream_hash → stream-level checksums from container metadata
- framemd5 → per-frame MD5s for high-precision comparison
- phash → perceptual hash used for similarity matching rather than byte-exact comparison
- Perceptual Hashing Controls
phash_fpsdetermines how many frames per second are sampled.phash_sizesets the resolution of the perceptual hash grid.
- Catalog Output
A fingerprint catalog can be generated for long-term storage, search systems, or dataset builds. - Stream Selection
stream_kindallows selecting video/audio/subtitle streams depending on the hashing method.
Program Template
videobeaux -P hash_fingerprint \
-i input.mp4 \
-o output.mp4 \
--recursive VALUE \
--exts VALUE \
--file_hashes VALUE \
--stream_hash VALUE \
--framemd5 VALUE \
--phash VALUE \
--phash_fps VALUE \
--phash_size VALUE \
--catalog VALUE \
--stream_kind VALUE
Arguments
- recursive — Enables recursive folder scanning for batch fingerprinting.
- exts — Comma-separated extensions to include (e.g.,
mp4,mov,mkv). - file_hashes — Generates byte-level file digests for exact-match identification.
- stream_hash — Computes hash digests for individual media streams.
- framemd5 — Produces an MD5 hash for every decoded frame; extremely precise but large.
- phash — Enables perceptual hashing for similarity comparisons.
- phash_fps — Number of frames per second to sample for phash generation.
- phash_size — Resolution of the phash grid (larger = more detail).
- catalog — Outputs results into a catalog file for later lookup or indexing.
- stream_kind — Specifies which stream type to fingerprint (e.g.,
v,a,s).
Real World Example
videobeaux -P hash_fingerprint \
-i myvideo.mp4 \
-o hash_fingerprint_styled.mp4 \
--recursive false \
--exts mp4,mov \
--file_hashes true \
--stream_hash true \
--framemd5 false \
--phash true \
--phash_fps 1 \
--phash_size 32 \
--catalog true \
--stream_kind v
Technical Notes
- File hashes ensure perfect binary-level identification, but cannot detect visually similar variants.
- Perceptual hashing (
phash) is ideal for detecting duplicates that differ by transcoding, compression, or scaling. framemd5is extremely accurate but produces large files; best for forensic comparison.- Catalog files allow large-scale search across thousands of items.
- Adjust
phash_fpsandphash_sizeto balance between accuracy and performance.
Recommended Usage
- Archival fingerprinting for media libraries.
- Deduplication of large video collections.
- Detecting alternate encodes of the same content.
- Forensic verification and tamper detection.
- Preparing similarity datasets for AI/ML workflows.
Quality Tips
- Use
phash_fps=1–3for good coverage without heavy overhead. - Higher
phash_size(e.g., 32–64) improves discrimination of similar videos. - Use
framemd5only when exact frame-level matching is required. - Always include
catalog=truefor batch processing or long-term reference.