Neo4j & Cypher #1 Installation and basic query -

Neo4j & Cypher #1 Installation and basic query

In one of my previous post, we got familiar with the graph databases concept as the opposing approach to the aggregate-oriented, NoSQL databases. As I promised back then, there’s going to be a series of post dedicated to the query language designed for querying that kind of data structure called Cypher. Today’s one starts it and it going to present the way to install Neo4j and later is going to discuss creating simple queries.


Installing Neo4j

To start our journey we need to install Neo4j first. You can download it from the official site here. Notice that it’s a free community version since we’ll not care that much about scaling and other features delivered by Enterprise version. When the installation process completes, you should see the following window:



Notice that we could run the server with default (empty) database but instead, I suggest you download example dataset from the official site. This will allow us to build our first queries immediately instead of taking care of inserting the data first (which could take way too much time). The dataset I’m going to use through the whole series is this one. Having it, you simply unzip the file and select it before running the server. One more thing before we move forward. There might be a chance that an error will occur during the selection:


Failed to start Neo4j with an older data store version. To enable automatic upgrade, please set configuration parameter “dbms.allow_format_migration=true”


Well, the error is self-explained, so what you’re going to do is to locate the configuration file and add displayed line at the end. If you’re not sure where is mentioned config you can check it here.


Exploring dataset

Assuming that you’ve already got the correct dataset and you ran the server without any trouble, we can finally move to the querying Neo4j. Obviously, before that, we need to know what kind of data do we have. In order to explore that, we need to login into the server that’s currently running. Be default the address and credentials are:

server: http://localhost:7474
login: neo4j
password: neo4j

Now, let’s explore the data by clicking database icon on the side menu and then by clicking once on the “Movie” labeled node. The following screen presents the result of the described actions:



Notice that Neo4j already generated some query to present the data. Now, what we see here are nodes which have a “Movie” label (which is not presented on the node). As I described in the introduction, each node may consist of a label and the set of properties. So, where are the properties of each movie? If you take a closer look, you can see that when you click on the node some of them appear below the graph. There’s also a little triangle icon by which you can see the entire set of properties:



So we know how to explore the nodes, but the whole beauty of the graph databases lies in the relationships. Where are they? It turns out, that by click on each node we reveal its connections with other nodes. Here’s the example of the generated graph after click on the “Avatar” movie:



As presented the “Avatar” has two types of relationships with “Person” nodes: ACTS_IN and DIRECTED. If you remember, like nodes, each relationship can also have a set of properties. Let’s see whether this example also has some:



Yes, the ACTS_IN relationship has a name property which describes the character in the movie. Therefore, we can deduce that the actor Sam Worthington played the Jake Sully character in the “Avatar” movie. Of course, the database creator could also create a dedicated “Character” node but that’s just another approach. So far, so good! I guess that we can now create our first query to retrieve some data!


Creating first Cypher’s query

We’ve already explored the “Avatar” movie so let’s stick to that one. Let’s say that we’d like to get all the actors from that movie. The query looks as follows:




Before running it, let’s discuss the code first. The specify starts with the MATCH word to mark that we want the data which suits later pattern. Then, this fancy ASCII art starts. In Cypher, nodes are represented by “()” and relationships by “-[]->”. Let’s take a look at the first part of the query:


We can spot here three things. The first is that this “beeing” is a node since it’s surrounded by normal brackets. Inside them, we have two texts separated by a colon. The “a” letter is a variable which represents the whole node. The second part is the name of node’s label. So, to this moment our query says:

Give me all actors

Moving to the second part we have that code:


This one represents a relationship and it also contains text separated by a colon but there’s nothing on the left side. Why? Well, the thing is pretty simple. Neither a label nor a variable is required inside node/relationship declaration. The variables give us an access to the proper object later in the query while labels are used for filtering. So, if you don’t need it, just don’t write it, simple is that. Our job is to retrieve actors, so we don’t need an access to the relationship later. That’s why we didn’t declare a variable and we left the left side empty. The right side of the text is once again the name of the relationship. To this moment, our query says:

Give me all actors who act in

Here comes the last part of the pattern:


Once again we didn’t declare the variable since we don’t need data about the movie. Notice that besides the “Movie label” we added some additional JSON inside. This one is a filter in which we tell that we are only interested in the movies which have an “Avatar” title. Of course, you can filter as many properties as you want. One more thing here, if you don’t like that syntax, you can use more SQL-like which in our case would look like this:


WHERE m.title = "Avatar"


This one might be more clear for some of you, but it requires declaring an additional variable. So the final pattern is:

Give me all actor who act in the movie titled Avatar

The last line specifies what data we want to retrieve. We wanted the whole “Actor” node so we return the variable. Let’s run the query:



We have correct result, nice! If we’d like to return only names of the actors the only thing to change is the return statement:


WHERE m.title = "Avatar"


Notice that like in a SQL we use AS to change the name of the column.


Presenting the optionality of labels/variables

In the previous paragraph, I wrote that the text inside nodes and relationships is optional. Before you leave me today, I want to make it clear. If you move back to the “exploring” paragraph you can spot that “Avatar” has also one connection called DIRECTED. So let’s modify our job. We want to retrieve every person that have something in common with this movie. So it might be acting, directing or anything else. We won’t want to specify it and we also won’t specify what kind of nodes are we interested in. So, it could be an actor, director, staff and so on. Let’s see the query:


(p)-[]->(m:Movie{title: "Avatar"})


I hope that it’s clear now. The pattern does not specify the name of relationship and the type of the node. Here’s the result:


That’s all for today! We know how to install and explore the data in Neo4j and we also have some basic knowledge about querying the database. In the next part, we’ll introduce more advanced examples together with other Cypher’s features.

You may also like...