Where does Apache Cassandra put its JanusGraph Vertexes?
By no means an expert question, I’m a newbie and I know it. Rusty in some areas, too.
I’m a beginner in JanusGraph, Cassandra, and asking questions on Stack Overflow.
I have nowhere near the number of experiences as expert engineers-and-developers or experienced-and-consistent Stack Overflow authors.
Reproduction Steps:
- Create-and-Startup a Cassandra [Docker container]
docker run --name jg-cassandra -d -e CASSANDRA_START_RPC=true -p 9160:9160 -p 9042:9042 -p 7199:7199 -p 7001:7001 -p 7000:7000 cassandra:3.11 - Create-and-Run this Java + Maven Project (code below)
- Results
- Expected: Something like OrientDB or Neo4j to have a table for vertexes to read from
- Actually: No such name like
Vorvertexfound
Cassandra [Docker container] Terminal
cqlsh> desc keyspaces;
system_schema system system_distributed
system_auth janusgraph system_traces
cqlsh> use janusgraph;
cqlsh:janusgraph> desc tables
edgestore_lock_ graphindex_lock_ janusgraph_ids
txlog systemlog graphindex
edgestore system_properties_lock_ system_properties
Log4j2 STOUT Logs
2023-05-09 11:01:53,970 [INFO] [c.d.o.d.i.c.ContactPoints.main] :: Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 11:01:54,072 [INFO] [c.d.o.d.i.c.DefaultMavenCoordinates.main] :: DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2023-05-09 11:01:54,652 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] :: Using native clock for microsecond precision
2023-05-09 11:01:54,967 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] :: [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/[0:0:0:0:0:0:0:1]:9042, hostId=null, hashCode=7b9f753a)=null; please provide the correct local DC, or check your contact points
2023-05-09 11:01:55,209 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] :: Generated unique-instance-id=c0a8563c1700-rmt-lap-win201
2023-05-09 11:01:55,231 [INFO] [c.d.o.d.i.c.ContactPoints.main] :: Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 11:01:55,265 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] :: Using native clock for microsecond precision
2023-05-09 11:01:55,322 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] :: [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=1dc6cab9)=null; please provide the correct local DC, or check your contact points
2023-05-09 11:01:55,341 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] :: Initiated fixed thread pool of size 40
2023-05-09 11:01:55,447 [INFO] [o.j.g.d.StandardJanusGraph.main] :: Gremlin script evaluation is disabled
2023-05-09 11:01:55,473 [INFO] [o.j.d.l.k.KCVSLog.main] :: Loaded unidentified ReadMarker start time 2023-05-09T16:01:55.473310Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@ff23ae7
Process finished with exit code 0
Code
Main.java simplified
import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.JanusGraphVertex;
import org.janusgraph.core.PropertyKey;
import org.janusgraph.core.schema.JanusGraphManagement;
public class Main {
public static void main(String[] args) {
JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:9042").open();
JanusGraphManagement janusGraphManagement = janusGraph.openManagement();
PropertyKey propertyKey = janusGraphManagement.getOrCreatePropertyKey("_id");
janusGraphManagement.commit();
JanusGraphVertex janusGraphVertex = janusGraph.addVertex();
janusGraphVertex.property("test","test");
janusGraph.tx().commit();
janusGraphVertex = janusGraph.addVertex();
janusGraphVertex.property("test","test2");
janusGraph.tx().commit();
janusGraph.close();
}
}
pom.xml snippet
<dependencies>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j2-impl</artifactId>
<version>2.20.0</version>
</dependency>
<dependency>
<groupId>org.janusgraph</groupId>
<artifactId>janusgraph-cql</artifactId>
<version>1.0.0-20230504-014643.988c094</version>
</dependency>
</dependencies>
log4j2.xml focused
<Configuration>
<Appenders>
<Console name="STDOUT" target="SYSTEM_OUT">
<PatternLayout>
<Pattern>%d [%p] [%c{1.}.%t] ::	 %m%n</Pattern>
</PatternLayout>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="STDOUT"/>
</Root>
</Loggers>
</Configuration>
>Solution :
As you noticed, JanusGraph creates several tables when it starts up. All the primary graph data is stored as wide rows in the edgestore table. However, these tables are largely opaque blobs and you will not be able to query them meaningfully from CQL.
The way that the edgestore table is constructed is discussed here