Ingres vs Postgres MVCC Explained With Neo4j's LLM Knowledge Graph Builder
Monday, 14 April 2025

LLM Knowledge Graph Builder is an application designed to turn
unstructured data such as pdfs, text documents, YouTube videos, and web pages, into a knowledge graph stored in Neo4j, promising much better accuracy than simple RAG (Retrieval-Augmented Generation).

The key term here is the "knowledge graph" part of GraphRAG, a powerful retrieval mechanism going beyond simple RAG that improves GenAI applications by taking advantage of the rich context in graph data structures.

GraphRAG which like RAG is based on vectors, it's far superior in that it also manages to capture the context/relationships between the vectors, resulting to more accurate answers when querying that data.  

Let's find out if that promise holds true by putting the Builder to test.

Thus, I set to answer a question that's been bugging me for some time. Both the Ingres and Postgres dbms's do MVCC but they do it differently; what is that difference?

To find out I uploaded four documents in pdf format. "Serializable Snapshot Isolation in PostgreSQL" which included info on Postgres MVCC was formatted as a research paper, while the other three were Ingres MVCC related, one being a Powerpoint presentation and the rest being pdf dumps from official Wiki articles. And when I say "uploaded", I mean onto the cloud instance of the Builder under the free tier.

 

The tool managed to ingest and parse all of the unstructured documents and subsequently deconstructed them into semantic chunks and ultimately into related nodes.

So what can you do after that? You can preview the finished Graph to observe the connections and discover how things are related, and ultimately ask the pressing questions:

Q:Tell me the difference between Postgres MVCC and Ingres MVCC

A:PostgreSQL and Ingres both implement Multi-Version Concurrency Control (MVCC), but there are notable differences in their approaches:

1. Implementation History:
- PostgreSQL replaced its conventional lock manager with an MVCC system in 1999, making it a core part of its architecture. It uses versioned tuples to implement snapshots and ensures that "readers don’t block writers, and writers don’t block readers."

- Ingres is introducing MVCC as a major architectural change, and it is designed to be optional. Existing applications can continue to function without MVCC, while new applications can leverage it.

2. Concurrency Behavior:
- PostgreSQL uses tuple-level write locks to prevent concurrent transactions from modifying the same data, ensuring snapshot isolation. It supports features like "readers don’t block writers" as a fundamental characteristic.

- Ingres aims to implement MVCC in phases, starting with a 2VCC (two-version concurrency control) model where readers don’t block writers. However, it is not yet fully integrated into the system.

3. Isolation Levels:
- PostgreSQL supports snapshot isolation and other isolation levels like READ COMMITTED, ensuring that anomalies like dirty reads, non-repeatable reads, and phantom reads are avoided.

- Ingres plans to allow MVCC usage at the program level using isolation levels, but architectural constraints may restrict it to the database or instance level.

4. Compatibility:
- PostgreSQL has fully embraced MVCC, making it a default and integral part of its system.

- Ingres is introducing MVCC in a way that maintains compatibility with existing applications, making it optional to avoid disrupting current workflows.

In summary, PostgreSQL has a mature and deeply integrated MVCC system, while Ingres is in the process of adopting MVCC as an optional feature to modernize its architecture while maintaining backward compatibility.

Q:What are the implications of using both MVCC and ANSI isolation semantics in Ingres?

A:Using both MVCC and ANSI isolation semantics in Ingres allows applications to benefit from enhanced concurrency while adhering to standard isolation levels defined by ANSI. MVCC ensures that readers do not block writers and vice versa, providing consistent data views without locking. When combined with ANSI isolation levels (Serializable, Repeatable Read, Read Committed, and Read Uncommitted), it enables developers to choose the appropriate balance between consistency and concurrency for their applications.

However, there are implications to consider. MVCC introduces overhead due to maintaining multiple versions of rows, which can increase resource usage, such as memory and storage. Additionally, while MVCC reduces contention, it may require careful tuning to avoid performance degradation in high-update scenarios. The integration of ANSI isolation semantics ensures compatibility with existing applications but may require developers to understand how MVCC interacts with these isolation levels to avoid unexpected behavior.

Q:Tell me some cases of unexpected behavior

A:Unexpected behavior can occur in systems using MVCC or snapshot isolation due to anomalies or specific implementation details. For example:

1. Snapshot Isolation Anomalies: In PostgreSQL, snapshot isolation can lead to silent data corruption, such as lost updates, because it does not guarantee serializable behavior. These anomalies are difficult to detect and reproduce, as they often manifest as inconsistencies much later.

2. Application Consistency Issues in Ingres: When using statement-level read consistency in Ingres, subtle differences in behavior can arise. For instance, if a transaction reads data that is later updated by another transaction, the read may not reflect the most current state, potentially causing application logic errors.

3. Concurrency Conflicts: In both PostgreSQL and Ingres, conflicts between transactions (e.g., one updating while another reads) can lead to locking or delays, especially if explicit locking mechanisms are not used to prevent anomalies.

These cases highlight the importance of understanding the limitations and behavior of MVCC and isolation levels to avoid unexpected outcomes.

So in a snap, I could discover the relationships between the documents, represented as nodes, as well as the internal topic relationships and while there was just a faint relationship between the Postgres nodes (up-left) to the Ingres nodes extracted from the three Ingres documents (low right) , still the Q&A did return back solid results, despite as seen by the answer below reafirms the faint relationship:

Q:In the end, which dbms's mvcc is better?

A:The context does not provide a direct comparison of which DBMS's MVCC is definitively better. However, PostgreSQL's MVCC implementation appears more mature and optimized, offering features like Serializable Snapshot Isolation (SSI) and better performance for read-intensive workloads. It also integrates well with other PostgreSQL features and provides a consistent view of the database with snapshot isolation.

Ingres, on the other hand, has introduced MVCC as an optional feature, which is a significant architectural change for the system. While it provides concurrency without locking, it may require additional tuning and has challenges like increased learning curves for developers and potential compatibility issues with existing applications.

Ultimately, the "better" MVCC depends on the specific use case, workload, and requirements of the application. PostgreSQL's MVCC might be more suitable for advanced and high-concurrency scenarios, while Ingres's MVCC could be a good fit for environments already using Ingres with a need for optional MVCC.

It makes senses that feeding it more related papers it will also come up with more relevant answers. Then you have also the option to use more advanced features to enhance your Graph by either uploading a Schema, giving it additional instructions, identifying and removing entities that are not connected to any other information, merge duplicate entities or even use post-processing jobs.

In order for the LLM Knowledge Graph Builder to work its magic, it requires a Neo4j Graph Database backend and a connection to it. You can opt for the cloud offering of the fully managed AuraDB which comes with a free tier and the cloud version of the Builder so that you can start experimenting right now, or since it's an open source project, you can go fully local by getting the Docker image and Ollama to hook up the LLM of your choosing.

Those tests I run were performed with the cloud version, which I must say was very easy to use, with a clean interface and no setup necessary; just spin a free instance of Neoj4 Aura and upload your documents.

Ultimately, the combination of the Neo4j Graph Database and the GraphRAG based Builder, manage to leverage the full potential of Graph Databases to meet the demands of the AI era.

 

More Information

LLM Graph Builder - Cloud
LLM Graph Builder - Github
LLM Knowledge Graph Builder-First Release of 2025  

Related Articles

An Introduction to Neo4j 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Kafka 4 Adds Queue Semantics Support
17/04/2025

Kafka 4.0 has been released, with major changes. This is the first version to operate entirely in KRaft mode by default. It also adds a new consumer group protocol designed to dramatically improve reb [ ... ]



World Quantum Day
16/04/2025

April 14th was World Quantum day and the most noticeable thing that happened was that Google presented us with a nice doodle. Should you get excited?


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 14 April 2025 )