Experience with Graph-Oriented Database Neo4j

Since I had no prior experience with graph databases, I wanted to record the learnings I gained from actually working with Neo4j, which is widely used as a graph-oriented database.

Overall

I had previously worked with relational databases, but I had never touched graph databases and struggled to grasp the concept. Therefore, this time I aimed to deepen my understanding by getting hands-on experience. I had no opportunities to work with graph databases until now, but I first learned about algorithms related to graphs, such as Dijkstra’s algorithm, during my studies in algorithms. I was particularly interested when I learned in my Computer Networks class that Dijkstra’s algorithm is used in OSPF.

CS 6250 Computer Networks Exam 1

Moreover, from the perspective of the currently popular generative AI, the utilization of Retrieval-Augmented Generation (RAG) has been advancing by using available information to provide answers, and a new concept called Graph RAG has been proposed as the next step. It has been reported that by considering the relationships of information, generative AI can now answer questions that were previously impossible. Therefore, I thought it would be a good opportunity to learn what a graph is.

Note on using Microsoft GraphRAG to answer questions that previous RAG could not

What is a Graph?

A graph is a data model specialized in representing the relationships between pieces of information. It consists of nodes, which represent the units of data, and edges, which indicate the relationships between nodes, each with its own properties. Labels can be assigned to both nodes and edges. In a social network, for example, nodes labeled as users represent users themselves, and the connections between users are represented by edges. Properties for each user could include name, age, and gender. With the expansion of social networking services (SNS), there has been an increase in data that is difficult to store in traditional relational databases, leading to the advancement of the use of graph-oriented databases as part of NoSQL. Given that Neo4j is a highly popular open-source graph database, I decided to try it out.

Steps to Set Up Neo4j

To create an environment for working with a graph database, I started by setting up Neo4j. There is an official Docker image available, and since building it with Docker was the easiest method, I proceeded with that.

1
2
docker pull neo4j  
docker run -d -p 7474:7474 -p 7687:7687 --name neo4j-container neo4j  

After executing the above commands, connecting to localhost on port 7474 takes you to the Neo4j login screen. For the first login, both the username and password are “neo4j”.

Verification

Now that I was able to connect to the database, I executed queries to perform some verifications. Queries for graph data use a unique syntax, and in Neo4j, operations are conducted using Cypher. After logging in at localhost on port 7474, the GUI displays the graph, allowing visualization of the relationships.

To create a node, the CREATE statement is used. For example, to create a node with the label “User”:

1
CREATE (n:User {name: 'Mike', age: 30})

To search for a node, the MATCH statement is used. For instance, to find the user named ‘Mike’:

1
2
MATCH (n:User {name: 'Mike'})
RETURN n

node_match

To try creating an edge, I will add another node.

1
CREATE (n:User {name: 'Tom', age: 40})

After that, to create an edge representing the “FRIEND” relationship between the users, the following CREATE statement can be used:

1
2
MATCH (a:User {name: 'Mike'}), (b:User {name: 'Tom'})
CREATE (a)-[:FRIEND]->(b)

To search for the “FRIEND” relationship of the user named ‘Mike’, you can use the following:

1
2
MATCH (a:User {name: 'Mike'})-[r:FRIEND]->(b)
RETURN a, r, b

edge_match

To delete an edge, you can combine the MATCH and DELETE statements. To remove the friendship between the user named ‘Mike’ and the user named ‘Tom’, you would do the following.

1
2
MATCH (a:User {name: 'Mike'})-[r:FRIEND]->(b:User {name: 'Tom'})
DELETE r

To delete a node, you also combine the MATCH and DELETE statements. Specifically, to delete the user named ‘Mike’, you would do the following.

1
2
MATCH (n:User {name: 'Mike'})
DELETE n

Features of Neo4j

Neo4j has the concept of transactions and can meet ACID properties. However, because it is a database that stores connections between data in a graph format, it is difficult to split records, making distributed management challenging.

Performance

By utilizing persistent logs on disk and caches in memory, Neo4j efficiently writes and reads data. All write operations are logged to disk, ensuring the durability aspect of the ACID properties. It is possible to create indexes on specific labels and properties. For example, to create an index on the name property of the User label, you would use the following CREATE INDEX statement:

1
CREATE INDEX FOR (n:User) ON (n.name)

To check the created indexes, you can use the SHOW INDEXES statement:

1
SHOW INDEXES

Availability

Although it is not a database that excels in distributed management, the presence of the transaction concept allows for replication that synchronizes primary processing with secondary processing.

Transactions

A group of operations that form a meaningful unit is treated as a transaction, maintaining locks at the node or edge level to implement transactions. Once committed, the contents are written to disk. The default transaction isolation level is Read Committed, which means there is a possibility of Non-Repeatable Reads occurring when the same data is read within the same transaction and the values differ.

Reflection

I previously had no clear image of graph databases, but by setting up and interacting with Neo4j, I was able to gain an understanding of how to use it. However, I am curious about how graph data, which can expand in various directions, is stored. In a relational database, data is stored in pages, and rows are inserted, which makes it easier to visualize data retrieval. But with graph data, I cannot imagine how it works. Therefore, I would like to investigate that further in the future.

Licensed under CC BY-NC-SA 4.0
Built with Hugo
Theme Stack designed by Jimmy