Files
videobeaux/docs/_site/programs/utilities/hash_fingerprint.html
2025-12-07 22:04:44 -05:00

194 lines
9.1 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- Begin Jekyll SEO tag v2.8.0 -->
<title>hash_fingerprint</title>
<meta name="generator" content="Jekyll v3.10.0" />
<meta property="og:title" content="hash_fingerprint" />
<meta property="og:locale" content="en_US" />
<meta name="description" content="The friendly multilateral video toolkit built for artists by artists." />
<meta property="og:description" content="The friendly multilateral video toolkit built for artists by artists." />
<link rel="canonical" href="http://localhost:4000/videobeaux/programs/utilities/hash_fingerprint.html" />
<meta property="og:url" content="http://localhost:4000/videobeaux/programs/utilities/hash_fingerprint.html" />
<meta property="og:type" content="website" />
<meta name="twitter:card" content="summary" />
<meta property="twitter:title" content="hash_fingerprint" />
<script type="application/ld+json">
{"@context":"https://schema.org","@type":"WebPage","description":"The friendly multilateral video toolkit built for artists by artists.","headline":"hash_fingerprint","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"http://localhost:4000/videobeaux/assets/img/videobeaux.png"}},"url":"http://localhost:4000/videobeaux/programs/utilities/hash_fingerprint.html"}</script>
<!-- End Jekyll SEO tag -->
<link rel="stylesheet" href="/videobeaux/assets/css/style.css?v=5e23701ed3967d38bab12937d79f95fae74b2a53">
<!--[if lt IE 9]>
<script src="https://cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv.min.js"></script>
<![endif]-->
<!-- start custom head snippets, customize with your own _includes/head-custom.html file -->
<!-- Setup Google Analytics -->
<!-- You can set your favicon here -->
<!-- link rel="shortcut icon" type="image/x-icon" href="/videobeaux/favicon.ico" -->
<!-- end custom head snippets -->
</head>
<body>
<div class="wrapper">
<header>
<h1><a href="http://localhost:4000/videobeaux/">videobeaux</a></h1>
<img src="/videobeaux/assets/img/videobeaux.png" alt="Logo" />
<p>The friendly multilateral video toolkit built for artists by artists.</p>
<p class="view"><a href="https://github.com/schwwaaa/videobeaux">View the Project on GitHub <small>schwwaaa/videobeaux</small></a></p>
</header>
<section>
<h1 id="hash_fingerprint">hash_fingerprint</h1>
<p>:contentReference[oaicite:0]{index=0}</p>
<h2 id="description">Description</h2>
<p>Creates unique fingerprints of video files using checksums, perceptual hashing, or frame-level hashing.<br />
Useful for deduplication, archival identification, similarity matching, and database cataloging.</p>
<h2 id="purpose">Purpose</h2>
<p><code class="language-plaintext highlighter-rouge">hash_fingerprint</code> allows Videobeaux users to generate machine-identifiable signatures from media files.<br />
This supports:</p>
<ul>
<li>duplicate detection,</li>
<li>content-based indexing,</li>
<li>frame-level comparison,</li>
<li>large-scale archival workflows,</li>
<li>catalog metadata generation,</li>
<li>cross-system verification of assets.</li>
</ul>
<h2 id="how-it-works">How It Works</h2>
<ol>
<li><strong>File System Scanning</strong>
<ul>
<li><code class="language-plaintext highlighter-rouge">recursive</code> allows walking entire folder trees.</li>
<li><code class="language-plaintext highlighter-rouge">exts</code> restricts scanning to specific file types.</li>
</ul>
</li>
<li><strong>Hash Types</strong><br />
The tool can generate several forms of fingerprints:
<ul>
<li><strong>file_hashes</strong> → whole-file digests (MD5, SHA1, etc.)</li>
<li><strong>stream_hash</strong> → stream-level checksums from container metadata</li>
<li><strong>framemd5</strong> → per-frame MD5s for high-precision comparison</li>
<li><strong>phash</strong> → perceptual hash used for similarity matching rather than byte-exact comparison</li>
</ul>
</li>
<li><strong>Perceptual Hashing Controls</strong>
<ul>
<li><code class="language-plaintext highlighter-rouge">phash_fps</code> determines how many frames per second are sampled.</li>
<li><code class="language-plaintext highlighter-rouge">phash_size</code> sets the resolution of the perceptual hash grid.</li>
</ul>
</li>
<li><strong>Catalog Output</strong><br />
A fingerprint catalog can be generated for long-term storage, search systems, or dataset builds.</li>
<li><strong>Stream Selection</strong><br />
<code class="language-plaintext highlighter-rouge">stream_kind</code> allows selecting video/audio/subtitle streams depending on the hashing method.</li>
</ol>
<h2 id="program-template">Program Template</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>videobeaux -P hash_fingerprint \
-i input.mp4 \
-o output.mp4 \
--recursive VALUE \
--exts VALUE \
--file_hashes VALUE \
--stream_hash VALUE \
--framemd5 VALUE \
--phash VALUE \
--phash_fps VALUE \
--phash_size VALUE \
--catalog VALUE \
--stream_kind VALUE
</code></pre></div></div>
<h2 id="arguments">Arguments</h2>
<ul>
<li><strong>recursive</strong> — Enables recursive folder scanning for batch fingerprinting.</li>
<li><strong>exts</strong> — Comma-separated extensions to include (e.g., <code class="language-plaintext highlighter-rouge">mp4,mov,mkv</code>).</li>
<li><strong>file_hashes</strong> — Generates byte-level file digests for exact-match identification.</li>
<li><strong>stream_hash</strong> — Computes hash digests for individual media streams.</li>
<li><strong>framemd5</strong> — Produces an MD5 hash for every decoded frame; extremely precise but large.</li>
<li><strong>phash</strong> — Enables perceptual hashing for similarity comparisons.</li>
<li><strong>phash_fps</strong> — Number of frames per second to sample for phash generation.</li>
<li><strong>phash_size</strong> — Resolution of the phash grid (larger = more detail).</li>
<li><strong>catalog</strong> — Outputs results into a catalog file for later lookup or indexing.</li>
<li><strong>stream_kind</strong> — Specifies which stream type to fingerprint (e.g., <code class="language-plaintext highlighter-rouge">v</code>, <code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">s</code>).</li>
</ul>
<h2 id="real-world-example">Real World Example</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>videobeaux -P hash_fingerprint \
-i myvideo.mp4 \
-o hash_fingerprint_styled.mp4 \
--recursive false \
--exts mp4,mov \
--file_hashes true \
--stream_hash true \
--framemd5 false \
--phash true \
--phash_fps 1 \
--phash_size 32 \
--catalog true \
--stream_kind v
</code></pre></div></div>
<h2 id="technical-notes">Technical Notes</h2>
<ul>
<li>File hashes ensure perfect binary-level identification, but cannot detect visually similar variants.</li>
<li>Perceptual hashing (<code class="language-plaintext highlighter-rouge">phash</code>) is ideal for detecting duplicates that differ by transcoding, compression, or scaling.</li>
<li><code class="language-plaintext highlighter-rouge">framemd5</code> is extremely accurate but produces large files; best for forensic comparison.</li>
<li>Catalog files allow large-scale search across thousands of items.</li>
<li>Adjust <code class="language-plaintext highlighter-rouge">phash_fps</code> and <code class="language-plaintext highlighter-rouge">phash_size</code> to balance between accuracy and performance.</li>
</ul>
<h2 id="recommended-usage">Recommended Usage</h2>
<ul>
<li>Archival fingerprinting for media libraries.</li>
<li>Deduplication of large video collections.</li>
<li>Detecting alternate encodes of the same content.</li>
<li>Forensic verification and tamper detection.</li>
<li>Preparing similarity datasets for AI/ML workflows.</li>
</ul>
<h2 id="quality-tips">Quality Tips</h2>
<ul>
<li>Use <code class="language-plaintext highlighter-rouge">phash_fps=13</code> for good coverage without heavy overhead.</li>
<li>Higher <code class="language-plaintext highlighter-rouge">phash_size</code> (e.g., 3264) improves discrimination of similar videos.</li>
<li>Use <code class="language-plaintext highlighter-rouge">framemd5</code> only when exact frame-level matching is required.</li>
<li>Always include <code class="language-plaintext highlighter-rouge">catalog=true</code> for batch processing or long-term reference.</li>
</ul>
</section>
<footer>
<p>This project is maintained by <a href="https://github.com/schwwaaa">schwwaaa</a></p>
<p><small>Hosted on GitHub Pages &mdash; Theme by <a href="https://github.com/orderedlist">orderedlist</a></small></p>
</footer>
</div>
<script src="/videobeaux/assets/js/scale.fix.js"></script>
</body>
</html>