<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>HNSW | Jiyuan Liu</title><link>https://jiyuanliu.netlify.app/tags/hnsw/</link><atom:link href="https://jiyuanliu.netlify.app/tags/hnsw/index.xml" rel="self" type="application/rss+xml"/><description>HNSW</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Sun, 26 Jan 2025 00:00:00 +0000</lastBuildDate><image><url>https://jiyuanliu.netlify.app/media/icon_hu6671943710606299486.png</url><title>HNSW</title><link>https://jiyuanliu.netlify.app/tags/hnsw/</link></image><item><title>Neo4j Vector Indexing: A Technical Deep Dive with Concrete Examples</title><link>https://jiyuanliu.netlify.app/post/25.kg_llm/neo4j_vector_index_blog/</link><pubDate>Sun, 26 Jan 2025 00:00:00 +0000</pubDate><guid>https://jiyuanliu.netlify.app/post/25.kg_llm/neo4j_vector_index_blog/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Neo4j&amp;rsquo;s vector indexing system bridges graph databases and semantic search through the HNSW (Hierarchical Navigable Small World) algorithm. This post walks through the technical execution of vector indexing operations using a concrete example: 10 movies embedded into 7 dimensions.&lt;/p>
&lt;p>Instead of abstract theory, we&amp;rsquo;ll use real numbers and track exactly what happens in memory as your data flows through the system.&lt;/p>
&lt;hr>
&lt;h2 id="the-dataset-our-7-dimensional-vector-space">The Dataset: Our 7-Dimensional Vector Space&lt;/h2>
&lt;p>Imagine each movie in your database has a vector representing seven semantic traits:&lt;/p>
&lt;p>&lt;strong>Dimensions:&lt;/strong> &lt;code>[Action, Sci-Fi, Romance, Comedy, Horror, Drama, Mystery]&lt;/code>&lt;/p>
&lt;p>&lt;strong>Sample Movies:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Node A (Star Wars):&lt;/strong> &lt;code>[0.9, 0.8, 0.0, 0.1, 0.0, 0.3, 0.2]&lt;/code>&lt;/li>
&lt;li>&lt;strong>Node B (The Notebook):&lt;/strong> &lt;code>[0.1, 0.0, 0.9, 0.2, 0.0, 0.8, 0.1]&lt;/code>&lt;/li>
&lt;li>&lt;strong>Nodes C–J:&lt;/strong> (8 additional movies with their own 7D vectors)&lt;/li>
&lt;/ul>
&lt;p>These coordinates represent how &amp;ldquo;Action-heavy,&amp;rdquo; &amp;ldquo;Sci-Fi focused,&amp;rdquo; or &amp;ldquo;Romantic&amp;rdquo; each movie is. Star Wars sits in the top-right &amp;ldquo;Sci-Fi/Action&amp;rdquo; corner of our 7D space, while The Notebook occupies the &amp;ldquo;Romance/Drama&amp;rdquo; region.&lt;/p>
&lt;hr>
&lt;h2 id="step-1-create-vector-index--the-initialization">Step 1: CREATE VECTOR INDEX – The Initialization&lt;/h2>
&lt;h3 id="technical-action">Technical Action&lt;/h3>
&lt;p>Your database claims a metadata segment in &lt;strong>off-heap memory&lt;/strong> (OS Memory) and pre-registers a specialized Apache Lucene HNSW structure.&lt;/p>
&lt;h3 id="what-happens-numerically">What Happens Numerically&lt;/h3>
&lt;p>You tell Neo4j:&lt;/p>
&lt;blockquote>
&lt;p>&amp;ldquo;I am going to provide 7-dimensional LIST objects. Build the HNSW hierarchy to support semantic search across this space.&amp;rdquo;&lt;/p>
&lt;/blockquote>
&lt;h3 id="the-hnsw-scaffold">The HNSW Scaffold&lt;/h3>
&lt;p>Neo4j creates an empty &lt;strong>Skip List&lt;/strong> structure with configured parameters:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Dimensions:&lt;/strong> 7&lt;/li>
&lt;li>&lt;strong>Similarity Metric:&lt;/strong> Cosine Distance&lt;/li>
&lt;li>&lt;strong>HNSW Parameters:&lt;/strong> &lt;code>m=16, ef_construction=200&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>Think of this as defining the &lt;em>grid&lt;/em> of a map without yet drawing the roads or placing any cities.&lt;/p>
&lt;h3 id="memory-footprint">Memory Footprint&lt;/h3>
&lt;p>The index claims only a &lt;strong>small metadata allocation&lt;/strong> (~1–5 MB) in off-heap memory. This is &lt;em>not&lt;/em> pre-allocating space for all 10 movies—that grows dynamically as you add data.&lt;/p>
&lt;h3 id="result">Result&lt;/h3>
&lt;p>A new entry appears in &lt;code>SHOW VECTOR INDEXES&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">Index: movie_tagline_embeddings | State: ONLINE | PopulationPercent: 0%
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="step-2-genaivectorencode--the-translation-layer">Step 2: genai.vector.encode – The Translation Layer&lt;/h2>
&lt;h3 id="technical-action-1">Technical Action&lt;/h3>
&lt;p>Your data flows through an external REST API that extracts semantic meaning and converts text into numerical coordinates.&lt;/p>
&lt;h3 id="the-transformation-in-action">The Transformation in Action&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Input (Human Language)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">movie_tagline&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;In a galaxy far, far away...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># OpenAI Embedding API processes this&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Output (7-dimensional Vector)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">vector&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mf">0.92&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.81&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.02&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.11&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.01&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.28&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.19&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This vector is now a Python LIST object in RAM, ready to be inserted into the database.&lt;/p>
&lt;h3 id="resource-cost">Resource Cost&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Network I/O:&lt;/strong> HTTP payload to OpenAI&lt;/li>
&lt;li>&lt;strong>API Tokens:&lt;/strong> Consumed per vector (tracked against your OpenAI quota)&lt;/li>
&lt;li>&lt;strong>Neo4j Resources:&lt;/strong> Minimal—the list is held temporarily in transaction state RAM&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Complexity:&lt;/strong> O(1) per API call, with latency dependent on OpenAI&amp;rsquo;s response time.&lt;/p>
&lt;hr>
&lt;h2 id="step-3-dbcreatesetnodevectorproperty--the-registration--indexing">Step 3: db.create.setNodeVectorProperty – The Registration &amp;amp; Indexing&lt;/h2>
&lt;h3 id="technical-action-2">Technical Action&lt;/h3>
&lt;p>This is where the &amp;ldquo;heavy lifting&amp;rdquo; occurs. Your vector is:&lt;/p>
&lt;ol>
&lt;li>Written to disk (ACID-compliant node property)&lt;/li>
&lt;li>Handed to the HNSW Worker for intelligent neighborhood linking&lt;/li>
&lt;/ol>
&lt;h3 id="dual-storage-mechanism">Dual Storage Mechanism&lt;/h3>
&lt;h4 id="storage-layer-1-the-node-store">Storage Layer 1: The Node Store&lt;/h4>
&lt;p>The 7D vector is saved as a persistent property on the movie node:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">Node A Property Block:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">{
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> name: &amp;#34;Star Wars&amp;#34;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> tagline: &amp;#34;In a galaxy far, far away...&amp;#34;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> taglineEmbedding: [0.92, 0.81, 0.02, 0.11, 0.01, 0.28, 0.19]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="storage-layer-2-the-lucene-index-hnsw">Storage Layer 2: The Lucene Index (HNSW)&lt;/h4>
&lt;p>The HNSW Worker receives the vector and:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Placement:&lt;/strong> Positions the new vector [0.92, 0.81&amp;hellip;] in the 7D coordinate space&lt;/li>
&lt;li>&lt;strong>Neighbor Search:&lt;/strong> Searches existing movie vectors (up to 9 others) to find the mathematically closest neighbors
&lt;ul>
&lt;li>Calculates cosine similarity with each existing movie&lt;/li>
&lt;li>Identifies the ~4 nearest neighbors (controlled by &lt;code>m=16&lt;/code> parameter)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Edge Creation:&lt;/strong> Creates &amp;ldquo;friendship links&amp;rdquo; between this movie and its closest neighbors&lt;/li>
&lt;li>&lt;strong>Hierarchy Updates:&lt;/strong> The HNSW algorithm performs a &amp;ldquo;coin flip&amp;rdquo; (probabilistic) to determine if this node should be promoted to the &amp;ldquo;Highway Layer&amp;rdquo; (Upper layers) for faster future searches&lt;/li>
&lt;/ol>
&lt;h3 id="the-neighbor-linkage-example">The Neighbor Linkage Example&lt;/h3>
&lt;p>When Star Wars is added:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">Star Wars [0.92, 0.81, 0.02, 0.11, 0.01, 0.28, 0.19]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ↓ (Cosine Similarity ≈ 0.98)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Star Trek [0.88, 0.79, 0.01, 0.09, 0.02, 0.25, 0.21]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ↓ (Cosine Similarity ≈ 0.94)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Interstellar [0.85, 0.75, 0.05, 0.08, 0.03, 0.30, 0.25]
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>These links allow future searches to &amp;ldquo;jump&amp;rdquo; directly to the Sci-Fi cluster instead of scanning the Drama cluster.&lt;/p>
&lt;h3 id="memory-growth">Memory Growth&lt;/h3>
&lt;p>With each new movie added, the index grows:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>After 1 movie:&lt;/strong> ~100 KB (minimal structure + 1 vector + metadata)&lt;/li>
&lt;li>&lt;strong>After 5 movies:&lt;/strong> ~300 KB (vectors + edges between neighbors)&lt;/li>
&lt;li>&lt;strong>After 10 movies:&lt;/strong> ~500 KB (complete 7D neighborhood graph)&lt;/li>
&lt;/ul>
&lt;p>The index lives in &lt;strong>OS RAM (off-heap)&lt;/strong>, not in Neo4j&amp;rsquo;s Page Cache.&lt;/p>
&lt;hr>
&lt;h2 id="step-4-semantic-search--dbindexvectorquerynodes">Step 4: Semantic Search – db.index.vector.queryNodes&lt;/h2>
&lt;h3 id="when-you-search">When You Search&lt;/h3>
&lt;p>You execute a query: &lt;em>&amp;ldquo;Fast-paced space adventure&amp;rdquo;&lt;/em>&lt;/p>
&lt;h3 id="the-query-encoding">The Query Encoding&lt;/h3>
&lt;p>OpenAI encodes your search query into 7 dimensions:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">query&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;Fast-paced space adventure&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">query_vector&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mf">0.89&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.82&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.03&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.10&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.02&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.26&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mf">0.18&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="the-hnsw-navigation">The HNSW Navigation&lt;/h3>
&lt;p>Instead of checking all 10 movies, the index uses a &lt;strong>greedy traversal&lt;/strong> algorithm:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>High-Level Layer:&lt;/strong> Start at the &amp;ldquo;Highway&amp;rdquo; level—a sparse layer containing only the most central nodes&lt;/li>
&lt;li>&lt;strong>Skip the Drama Cluster:&lt;/strong> The query vector [0.89, 0.82&amp;hellip;] is clearly Sci-Fi heavy, so the algorithm bypasses The Notebook and other Drama nodes&lt;/li>
&lt;li>&lt;strong>Jump to Sci-Fi Cluster:&lt;/strong> Follow edges to Star Wars, Star Trek, and Interstellar (all high Sci-Fi, high Action scores)&lt;/li>
&lt;li>&lt;strong>Local Search:&lt;/strong> Check these neighbors&amp;rsquo; neighbors for the closest match&lt;/li>
&lt;li>&lt;strong>Return Results:&lt;/strong> Sorted by cosine similarity scores&lt;/li>
&lt;/ol>
&lt;h3 id="cosine-similarity-calculation">Cosine Similarity Calculation&lt;/h3>
&lt;p>For query vector and Star Wars:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">Similarity = (0.89×0.92 + 0.82×0.81 + 0.03×0.02 + ... + 0.18×0.19)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ÷ (√(0.89² + 0.82² + ...) × √(0.92² + 0.81² + ...))
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ≈ 0.99
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Result:&lt;/strong> Star Wars returned with similarity score &lt;strong>0.99&lt;/strong>&lt;/p>
&lt;hr>
&lt;h2 id="technical-summary-table">Technical Summary Table&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Script Step&lt;/th>
&lt;th>Numeric Context (10 Movies, 7D)&lt;/th>
&lt;th>Complexity&lt;/th>
&lt;th>Memory Impact&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>CREATE INDEX&lt;/strong>&lt;/td>
&lt;td>Defines 7D coordinate system map&lt;/td>
&lt;td>O(1)&lt;/td>
&lt;td>~5 MB metadata&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>genai.vector.encode&lt;/strong>&lt;/td>
&lt;td>Returns 7-element float array per movie&lt;/td>
&lt;td>O(1) network&lt;/td>
&lt;td>Temporary RAM&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>setNodeVectorProperty&lt;/strong>&lt;/td>
&lt;td>Writes vector to disk; connects to ~4 closest neighbors&lt;/td>
&lt;td>O(log N) indexing&lt;/td>
&lt;td>~50 KB per vector&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>queryNodes&lt;/strong>&lt;/td>
&lt;td>Greedy HNSW traversal to find closest matches&lt;/td>
&lt;td>O(log N) search&lt;/td>
&lt;td>No additional memory&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;hr>
&lt;h2 id="python-script-monitoring-index-population-progress">Python Script: Monitoring Index Population Progress&lt;/h2>
&lt;p>Since HNSW indexing happens &lt;strong>asynchronously&lt;/strong> in the background, use this script to verify all 10 movies are fully indexed before running semantic searches:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">neo4j&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">GraphDatabase&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">check_index_status&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">uri&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">user&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">password&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">index_name&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> Monitor the population progress of a vector index.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> uri: Neo4j connection string (e.g., &amp;#34;bolt://localhost:7687&amp;#34;)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> user: Database username
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> password: Database password
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> index_name: Name of the vector index to monitor
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">driver&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">GraphDatabase&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">driver&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">uri&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">auth&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">user&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">password&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">try&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="n">driver&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">session&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">session&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Query index metadata&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">session&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">run&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;SHOW INDEXES YIELD name, state, populationPercent WHERE name = $index_name&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">index_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">index_name&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">record&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">result&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Index: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">record&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;name&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Status: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">record&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;state&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Population Progress: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">record&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;populationPercent&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">%&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Wait for 100% population before querying&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">record&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;populationPercent&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="mi">100&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;⚠️ Index still building. Wait before querying.&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">else&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;✅ Index fully populated. Ready for semantic search.&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">finally&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">driver&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">close&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Example usage&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">check_index_status&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;bolt://localhost:7687&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;neo4j&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;your_password&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;movie_tagline_embeddings&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="expected-output">Expected Output&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">Index: movie_tagline_embeddings
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Status: ONLINE
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Population Progress: 100%
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">✅ Index fully populated. Ready for semantic search.
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="key-insights-what-you-must-understand">Key Insights: What You Must Understand&lt;/h2>
&lt;h3 id="1-create-vector-index-claims-memory-not-data">1. &lt;strong>CREATE VECTOR INDEX Claims Memory, Not Data&lt;/strong>&lt;/h3>
&lt;p>It does not pre-allocate space for 1 million vectors. It merely registers the &lt;em>structure&lt;/em> and reserves a small off-heap footprint.&lt;/p>
&lt;h3 id="2-genaivectorencode-is-the-semantic-brain">2. &lt;strong>genai.vector.encode Is the Semantic Brain&lt;/strong>&lt;/h3>
&lt;p>This step is the &lt;em>only&lt;/em> part that understands meaning. It converts human language (&amp;ldquo;galaxy far away&amp;rdquo;) into mathematical coordinates that computers can compare.&lt;/p>
&lt;h3 id="3-setnodevectorproperty-is-the-critical-registration-step">3. &lt;strong>setNodeVectorProperty Is the Critical Registration Step&lt;/strong>&lt;/h3>
&lt;p>If you save a vector as a regular property (&lt;code>SET n.embedding = vector&lt;/code>), the HNSW index will &lt;em>never see it&lt;/em>. You &lt;strong>must&lt;/strong> use &lt;code>setNodeVectorProperty&lt;/code> to actually make vectors searchable.&lt;/p>
&lt;h3 id="4-hnsw-builds-a-navigable-hierarchy">4. &lt;strong>HNSW Builds a Navigable Hierarchy&lt;/strong>&lt;/h3>
&lt;p>Instead of scanning all 10 movies on every search, HNSW creates &amp;ldquo;highways&amp;rdquo; (upper layers) that allow greedy algorithms to skip unrelated data clusters. This reduces search time from O(N) to O(log N).&lt;/p>
&lt;h3 id="5-indexing-is-asynchronous">5. &lt;strong>Indexing Is Asynchronous&lt;/strong>&lt;/h3>
&lt;p>After you call &lt;code>setNodeVectorProperty&lt;/code>, the HNSW worker updates the index in the background. Always check &lt;code>populationPercent&lt;/code> before running production semantic searches.&lt;/p>
&lt;hr>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>Neo4j&amp;rsquo;s vector indexing system is a three-stage pipeline:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Index Creation:&lt;/strong> Define the 7D space and prepare the HNSW infrastructure&lt;/li>
&lt;li>&lt;strong>Vector Encoding:&lt;/strong> Convert semantic text into mathematical coordinates&lt;/li>
&lt;li>&lt;strong>Intelligent Registration:&lt;/strong> Place vectors in the index and link them to neighbors for fast traversal&lt;/li>
&lt;/ol>
&lt;p>Using HNSW, your semantic search scales from O(N) brute force to O(log N) navigable hierarchies—enabling lightning-fast searches across millions of high-dimensional vectors.&lt;/p>
&lt;hr>
&lt;h2 id="further-reading">Further Reading&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://neo4j.com/docs/" target="_blank" rel="noopener">Neo4j Vector Indexes Documentation&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://arxiv.org/abs/1802.02413" target="_blank" rel="noopener">HNSW Algorithm Paper&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://platform.openai.com/docs/guides/embeddings" target="_blank" rel="noopener">OpenAI Embeddings API&lt;/a>&lt;/li>
&lt;/ul></description></item></channel></rss>