Get Noticed 2017 Neo4j

Neo4j & Cypher #2 Creating complex query

In the previous post, we explored Neo4j dataset containing actors and movies. We also got familiar with basics of Cypher – declarative query language created for graph databases. In today’s post, we’ll find out how to create a more complex query which will consist of many relationships and nodes.

 

What are we going to retrieve?

Let’s start with defining our task. Imagine that someone asked us to build a query which will answer the below question:

I want to know whether actors Wes Studi and Matt Gerald played together in some movie. If so, I’d like to know the title but also a name of the director together with his other productions.

This one might seem complicated but soon you’ll see that for Cypher it’s not such a big deal.

 

Creating a query

To make the whole process easier to understand, I’m going to split the command into two parts. The first would be answering whether mentioned actors have ever played together in some movie. The code below present that part:

 


MATCH
(:Actor {name: "Wes Studi"})-[:ACTS_IN]->(m:Movie)<-[:ACTS_IN]-(:Actor {name: "Matt Gerald"})
RETURN m.title
LIMIT 1

 

So, this one is already more complicated than the example from the previous post since it contains two relationships instead of one, but it’s not that hard to understand. Let’s start reading from the left side. We defined a node with the Actor label but with no variable. That’s because we weren’t asked about any actor’s information. The node contains filter which says that we’re looking for the actor named Wes Studi. Next, we’ve got a relationship with ACTS_IN name which also has no variable defined. Notice the direction of the relationship which goes from the actor to the movie. Next, we have Movie node but this time with m variable which will give us an access to data in the RETURN statement. Now, take a look what comes next. The rest of the query is almost identical to the previously discussed. The only difference is the filter which this time restricts actors to those with the Matt Gerald name. What’s also really important is once again the direction of the relationship which this time goes from the right (Actor) to the left (Movie) so keep in mind that it matters. In the RETURN statement, we retrieve only the movie title instead of the whole node. New part comes right after that. Since our job was to check whether mentioned actors have ever played in the same movie we simply limit the result to just one. For this purpose, the LIMIT keyword was introduced at the end of the query. Let’s run the above code in Neo4j server:

 

 

And we’ve got it! Wes Studi and Matt Gerald played together in the Avatar (surprised? :D). Now, we come to the second part. Having the title, we’d like to add information about the director together with his other productions. But here comes the question – how to extend the query with new restrictions if we’ve already defined two relationships pointing to „Movie” node. Our code is not a whiteboard on which we can draw another line from the above or under. Cypher has a very simple answer to that question – use coma. Let’s see how does it look like:

 


MATCH
(:Actor {name: "Wes Studi"})-[:ACTS_IN]->(m:Movie)<-[:ACTS_IN]-(:Actor {name: "Matt Gerald"}),
(m)<-[:DIRECTED]-(d)-[:DIRECTED]->(others)
RETURN m.title, d.name, collect(others.title) AS productions
LIMIT 1

 

All right, what actually happened? As you see, after the coma we defined the other part of the query which starts from the node with the m variable. Does it mean that we can define multiple vars with the same name? No, by this we told Cypher that this one is the same Movie node from the first part and we just want to add some more restrictions. Next, we have a DIRECTED relationship which comes from the node described by the d variable. Why didn’t we specify the label explicitly? Because we didn’t have to. As you remember labels, properties and vars are optional, so if you don’t have to specify them just don’t do this. The movie is always directed by the director so there won’t be any confusion for the Cypher. Using other words, we didn’t add Director label because we were 100% certain that it must be a Director node. Same story with the rest of the query. Remember that besides the name of the director we also need to retrieve his other productions, so we added another DIRECTED relationship which points to the node with the others variable. Once again there’s no label because this node must be a Movie. We cannot direct an actor or a person.  So, to make it clear here’s how should we read the entire query:

Give ma a movie in which actors Wes Studi and Matt Gerald played together. Having this, give me also the name of the director and titles of other movies that he directed.

The last thing which we are going to discuss is the collection function in the RETURN statement. What it does is creating an array from the given set of data. In our case, it takes the titles from all director’s movies and put them into one array. Let’s run the query to see the result:

 

 

So we have an answer, the movie in which Wes Studi and Matt Gerald played together was an Avatar directed by James Cameron who also directed The Abyss, Aliens, Titanic, Terminator 2: Judgment Day and The Terminator.

Don’t miss new posts!

If you enjoy reading my blog, follow me on Twitter or leave a like on Facebook. It costs nothing and will let you be up to date with new posts 🙂