# Data & AI 6 Cloud Advice
### What is the source of your streamed data ?
Data from sensors, reactor can have many sensors (pressure, temperature, flow meter, etc.), this data is generated from the acid operations.
Time series data.
It's needed to be seen in real-time, with a maximum of 4-5 second delay.
It can also be data that's coming from laboratory data, which is used for analysis, this data is stored in a local data base in the lab.
Also records from the maintenance team, which check the downtime and operational status of equipment, to make sure that the availability of the production line is as high as possible.
**Note**: this answer is not only for the streamed data.
### What does your company do ?
A company that is in the oil and gas sector, which processes raw materials into chemicals.
Something with reactors, the downtime is very costly, they need to be up all the time.
Bruh I don't understand what he's saying.
Acid or Assets ???
Lab Information ... System.
Stores all the information about the inputs of the proocess, and based on this you can adjust the parameteres in the process to get the most out of the reactor performance.
With this you can check the quality of the outputs and see.
The goal is to achieve the production goal for the week, month year.
We need to monitor the acids in the reactors, if the maintenance team sees that something is not right, they can make adjustments.
### Clarification on what type of data we plan on working with.
We want to store the data in the cloud for future analysis.
SEP system.
Based on the amount of operation hours we can schedule
Time series data that comes from the sensors.
And analytical data that comes from the laboratories.
### Would we be provided with the data or do we have to mimick it ?
We need to mimick the data.
### Is the central information in the system the reaction ?
Yes, we can say that.
The main process is the reaction, which involves the reactors and the real time data which is generated during the process.
All of the other systems are support systems.
### Who will be using this data and how ?
Rephrased to, who is the end user which will be using this data.
3 groups.
- Production managers
Mostly interested in achieving the production goals.
Specific amount of cubic liters per months.
They are mainly interested in making the production process as efficient as possible not only in terms of outcome, but also with use of the chemical substances and energy, cost of people, etc.
- Maintenance team
They are more interested in the availability of the assets, they want to monitor the reactor as closely as possible to make sure it's up time is maximized.
- Operators in the plant
Literally in the plant, next to the reactor.
Checking if the operation is going as expected, in real time.
Checking, adjusting parameters.
Real time decisions with real time data.
### More insight of the Lab system, how does it work, how do scientists get data, what kind of insights it gives them ?
Not an expert answer.
Inside the Lab, there is some equipment.
The opreation team provides some samples, and they need to be analysed at a rate of e.g. 40 samples per day.
The samples need to be analysed end results returned the next day at 7:00 A.M.
The results are returned through a user interface by the lab workers, with the sample ID and results.
Manually stored.
### So to be clear, in order to adjust the parameters of the reactors, you look at the sample of this data and the reactors ?
Yes.
### What is the solution, what do you expect us to help you with ?
Our focus should be on the architecture and how we can implement that, more than the actual implementations.
We can provide how to store this data in the cloud.
### Do the end users need a live dashboard, or email alerts for the data ?
Up to us.
Would prefer the end user have something that they can just access through the browser.
Quickly accessable.
Tableau, PowerBI.
### Reporting part.
We don't need anyone to be notified daily with the reports.
But at the same time we do need people to be notified at the time of the event ?
If we're a maintenance team observing 3 specific parameters, if some events happen with specific sensors, an event should be triggered and the workers should be notified.
Email would be sufficient.
### Should the values that trigger the notifications be manually selected or decided by a machine learning algorithm ?
Both ???
He misunderstood the question a bit, but if we can predict that an event would happen with machine learning, then that would be perfect.
Currently, they only get notified if a certain parameter reachers a specific value.
### About moving data to the cloud, have they worked with anything like this before ?
### How big is the data for the streamed data ?
1 event per second.
Float data.
86K events per day.
30 thousand variables?
### How many variables are within the lab system ?
20 Variables.
### How do they currently store data in the legacy system ?
Currently 1 system on premise, which is connecting most of the data from the sensors, but the problem is that we have different ways of collection.
Different communication protocols, which means that we can't use a single way of combining the data.
He would like to have a sensor database.
### How is the data currently set up ?
Currently in excel, not maintanable, not scalable and integrity is in question.
**Note**: I missed the question, not sure what he was talking about exactly.
I think it was about the visualizations, not entirely sure, as I said, doesn't really matter too.
### What kind of prediction are you interested in ?
Prediction or adjustment of parameters ?
Currently interested in anomaly detection.
Also the root cause of the anomaly.
Which variables triggered this anomaly.
### How do you define an anomaly ?
Currently there are 3 - 4 variables, for them they define the boundary (limits).
If the variable goes outside of the specified boundaries, it gets flagged as an anomaly.
But instead of relying purely on these boundaries, they would like to have a system, that based on certain algorithms, can calculate that something is not going well in the system.
If the process becomes less efficient or fails then that is an anomaly.
They are mostly interested in the anomalies that lead to failure of the system/equipment, so that actions could be made.
### Should the prediction be made on sensors or made with lab data from the previous day ?
It can also be an important contributor, if we analyse the input materials, then we could predict it's affect on the outcome.
### Currently the data visualizations, which are made in excel are made with raw data.
So there is a need to pre-process the data before we send the data to the cloud, so that certain calculations could be made.
Averaging, filling missing values, etc.
Once we have the raw data, we need to transform it in a way which is easily usable for the end user.
The variables should be linked to the system that they belong to.
So that the end user could easily check the variables of a specific reactor for example.
Maybe a diagram, clear structure, something like that.
**Note**: I think he's talking about something that we saw with Lanxess, the diagram of the entire process with clickable/observable variables. Not sure.
### Will the anomalies between reactors be the same ?
They will be different.
Reactors have different components, different operation times.
The variables and parameters are the same, but they should be treated differently when modelling.
### Overview of what solution they expect.
Flow.
Data is generated on site (in the plant, on premise systems).
It's sent to the cloud through an interface, which is already present, we don't have to think about that.
We have to propose which applications and which services are the best to serve the real time data.
Architecture, solution and why.
After we get the raw data in to the cloud in the correct form, it should be presented in an understandable way for the end user.
Dashboard reporting, depending on the audience.
They just need an explanation on what kind of technology we suggest them to use.
### When can we have access to the data, a sample.
We have to come up with our own data.
For oil and gas there are samples on kaggle.
Due to NDA reasons they can't explain the processes.
### Closing notes
We assume that the interface between the on premise systems and the cloud exists.
Once we have the data in the cloud, some contextualization is needed, to structure the data in a sort of hierarchy for the end user.
Once it's organized in a good way, find a way to provide the reports/dashboards to the 3 groups.
Some groups need real-time data, some don't.
From us they expect:
Services, applications, architecture.
Defend why these options are the most viable.
Advantages / Disadvantages.
### After meeting discussion
We didn't ask about the context and the sitation within the company, why they are seeking change.
We didn't ask about James himself, his title, position, expertise, etc...
In a real life scenario, the person we're talking to will most likely have no technological knowledge at all...