📜 ⬆️ ⬇️

Neo4j VS MySQL

Foreword


As a third year student, I chose a topic for coursework: “Graph databases using the example of Neo4j”. Since until that time I had studied and used exclusively relational databases, I was wondering why the graph database would be at all, and when should its capabilities be better applied? After viewing many sites on the Internet, I found only a theory and no more. Since the theory is not always convincing and I would like to see some kind of statistics, I was curious and I decided that in my coursework I would do it, and I chose MySQL as the enemy of Neo4j.

So who will win this standoff?

Briefly about Neo4j and graph databases


Neo4j is an open-source graph database, the story of which began with the investments of Neo Technology in 2003. Since 2007, it has become publicly available. Neo4j contains all database features, including adhering to ACID, maintaining clustering, and recovering from a system crash. Today is the leader among database graphs.

ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that ensure reliable operation of transactions.
')
The components of a graph database are nodes and edges. They can be supplemented with their own set of fields. A model of such a database is schematically shown in the figure.



Saving data in Neo4j


The nodestore.db file contains a certain size of records containing information about the Nodes:
1. A label that indicates the record is active;
2. Pointer to the first relation that this node contains;
3. Pointer to the first property that this node contains.

The node does not contain its own identifier. Since each entry in nodestore.db occupies the same amount of space, you can calculate a pointer to a node.

The relationshipstore.db file also contains records of the same size that describe relationships, but they consist of the following elements:

1. A label that indicates the record is active;
2. Pointer to the node that contains this relation;
3. A pointer to the node to which this relationship is directed;
4. Type of relationship;
5. A pointer to the relationship that is ahead (within the limits of this node);
6. A pointer to the relationship that is behind (within this node);
7. A pointer to the relationship that is ahead (within the limits of the node in which this relationship is directed);
8. A pointer to the relation that stands behind (within the limits of the node in which this relation is directed);
9. A pointer to the first property of this relationship.

Filling data


In order to compare for efficiency, it is necessary to fill the database with the same content. It is worth noting that this data should be large. On small volumes, we will not see the difference. Therefore, I set myself the goal to generate for the relational model no less than 300 megabytes.

I have chosen a subject area - a social network. I built an ER diagram, on the basis of which I created a relational and graph model.



After that, I filled MySQL with data, creating a class for generating this data and methods that contributed new data to the database.
The amount of data that was entered in MySQL:

addUsers (100,000); - number of users.
addGroups (200,000); - number of groups.
addPhotos (300,000); - number of photos.
addAudio (250,000); - The number of audio recordings.
addFriendships (1,000,000); - number of friends.
addMessages (4,000,000); - number of messages.
addUserAudio (350,000); - The number of audio recordings.
addUserGroups (400,000); - the number of user groups.
addUserPhoto (400,000); - the number of photos of users.

Then I converted this data to Neo4j. It is worth noting that filling the database took me a lot of time. If you decide to do something similar on your own, get ready for the fact that you either spend a lot of time or do not use text fields (I think it took a lot of time in my case). After that, I compared the place that information occupies in Neo4j and MySQL. There was a small shock; for a relational database, 351.5 MB was required, and for a graph database, 3.45 GB. Now we will start experiments.

Experiment 1


Experiment Description: measuring user search time by user ID.

In this experiment, I searched for users by their ID a certain number of times. For this, I wrote methods whose code is presented below.

public static void main(String[] args){ // Database connection initialisation init(); int [] counts = {10, 100, 1000, 5000, 10000, 30000, 60000, 90000, 120000}; for(int i = 0; i < counts.length; i++){ findUserByIDTest(counts[i]); } } static void findUserByIDTest(int count){ System.out.println("__________________________________"); System.out.println("STATISTIC FOR COUNT: " + count); { Random r = new Random(count); long time = - System.currentTimeMillis(); for (int i = 0; i < count; i++) { int id = r.nextInt(100000); User u = MySQLDAO.getUserByID(id); } time += System.currentTimeMillis(); System.out.println("Time for MySQL:" + time); } { Random r = new Random(count); long time = - System.currentTimeMillis(); for (int i = 0; i < count; i++) { int id = r.nextInt(100000); User u = Neo4jDAO.getUserByID(id);; } time += System.currentTimeMillis(); System.out.println("Time for Neo4j:" + time); } System.out.println("__________________________________"); } 

After executing the program code, I made a table and drew a diagram for it. The results pleased me. Neo4j did a great job.
User count = 100000

Number of queries

Time for MySQL, ms

Time for Neo4j, ms

ten

four

9

100

34

76

1000

286

510

5000

1034

1103

10,000

1340

1187

30,000

3390

1384

60,000

6876

2102

90000

10537

3175

120,000

14033

3858




Experiment 2


Experiment Description: measurement of the time spent by the user's friends with a change in the value of the interval of occurrence of identifiers for the search.

In this experiment, I selected the range of valid identifier values ​​and searched for the user's friends with this identifier. Then I changed this range and conducted the experiment again. After that, I came out the corresponding table with the data on which I built the dependency graphs.



In any case, the time of the graph base for searching for big data was less than that which is relative.

Experiment 3


Experiment Description: measuring the time spent by the total number of photos for users who administer at least one group, depending on the range of values ​​of user IDs.

This experiment is very similar to the previous one, but is more complex. If the time values ​​in previous experiments with repetitions differed slightly, in this jumps much stronger. Therefore, I conducted this experiment three times and compiled an appropriate table.
User count = 100000, Photo count = 300000, User photo count = 400000

experiment 1

experiment 2

experiment 3

id range <

Time for MySQL, ms

Time for Neo4j, ms

Time for MySQL, ms

Time for Neo4j, ms

Time for MySQL, ms

Time for Neo4j, ms

ten

299

11339

164

92594

456

56575

100

10652

2748

674

2400

663

2826

1000

7243

4931

5433

2481

5649

1942

5000

22538

5521

23747

2408

23522

3514

10,000

47755

5627

44917

2650

44992

1844

30,000

137572

8161

33856

3108

136592

3707

60,000

64002

13577

300814

5004

280672

6029

90000

482329

13475

438875

5102

429304

5631







After discussions on Habré, I tried changing the approach to the conditions of the experiment: corrected the query, for a simpler one, raised MySQL again before each experiment and maintained conditions that MySQL and Neo4j should not work at the same time. The following happened: the MySQL graph became more stable, and the graph database began to lose a lot of relational data on small amounts of data.

experiment 1
id range <
Time for MySQL, ms
Time for Neo4j, ms
ten
120
10767
100
690
10706
1000
5879
10884
5000
24668
12245
10,000
52462
12280
30,000
154534
13352
60,000
296369
14545
90000
489830
18058



Modest conclusions


I can safely say that the graph database requires much less time in searching on large amounts of data, but it takes up much more space on the hard disk. Therefore, for small systems, it is better to use a relational database or look for an alternative among other NoSQL databases.

Thanks for attention.

Source: https://habr.com/ru/post/258179/


All Articles