First of all, please forgive me, if this question is no fit for stack overflow- I honestly wouldn't know who or where else to ask.
I have the following situation: in the transactional system we have a table with business processes that get executed by users. These process might be something like "decommission car with ID xzy". The process itself involves arbitrary tasks, that might or might not be needed to execute to fulfill the process.
Now, in the Cassandra DB we want to setup a monitoring of the processes. That is, number of processes per day, number of open processes, number of canceled processes and most important: in what step was the process stopped, canceled or put on hold. Also the amount of time it took for a step until it was finished is of interest.
Please be aware, that all these processes and steps are manually done and reported back in the system.
Now I'm wondering how to best model this.
I'm thinking about a supercolumn with the process id (stored as uuid) and information about the job/process itself plus rows for the steps each with the step id (UUID as well) as column key.
Would that be the right model?
I'm also unsure about the time values. I have a start and end time for the overall process plus start and end times for each step. Given the fact, that most queries will be time based, as in "how many decommission jobs were started today?", the time values are of vital importance . Should I use a secondary index or is a table with the time as column key a better fit?
Maybe someone can guide me in the right direction as I am still quite new to Cassandra