HackMD - Collaborative Markdown Knowledge Base

:sparkles::mag::sparkles:**Data Study Group Feedback Presentations**:sparkles::mag::sparkles: :clock1::clock9:**Agenda**:clock9::clock1: 10:05am - Opening remarks 10:10am - CRUK Cambridge Institute 10:25am - The University or Stathclyde & Supergen Energy Networks Hub 10:40am - CatsAi 10:55am - Greenvest Solutions **Questions for discussion later today** There will be a Q&A session (for event delegates only) from 11:45 - 12:45 today please record any questions or comments you may have here to be discussed during this session. Please also record your name so we can invite you to ask your question during the Q&A. :microscope::hospital:**CRUK**:hospital::microscope: * Congratulations on the fantastic talk, everyone! * Really nice slides! If you had more time and resources, what would you have liked to try? * Ultimate should these networks inform new experiments - which experiment would you recommend * Thanks for a nice talk! I missed the part of how you incorporated expert knowledge in your BN? Is it in the topology or some other aspects? Thanks ! * Ultimately we were going for the hybrid approach that combines the data-driven and knowledge-based approaches. We tried to elicit knowledge from domain experts and also checked the genecards ontology to find the semantic connections between genes, e.g. activation, inactivation. We then added the connections by whitelisting them. This improved the loss function in some models but deteriorated it in some others (Sam and Ali). * Really interesting approach with the CNN, would you mind explaining a bit more the transformation from count data to images? * If you had another week to work on the challenge what would you do? Oh and amazing presentation! * How have you evaluated the performance of your models? To what extent are you able to explain/predict the gene interactions? * (sami) Loved the slides and the work! and really cool approach with the Bayesian Network, do you mind sharing how you chose the distributions for each node in the network? How did you encode the interaction. Thanks! * Great question Sami! :) We chose to go with a discrete BN, in which the "response" of each gene is discretized into 3 classes: no response, suppressed or enhanced. The global and local probability distributions of a DBN are taken to be multinomial. (Heckerman et al., 1995 and https://arxiv.org/pdf/1906.06513.pdf) (Sam and Ali) * (Sami): Thanks for answering. Is this a domain knowledge? can a gene "surpress" or "enhance" on a continuous scale? * We calculated the relative differential expression of the genes. It was contiuous originally, but we discretised it to simplify the structure learning process (Ali). * (Sami): You mentioned you ranked the BNs basaed on their loss, how did you calculate the loss? was this part of the BN "structure search" (i.e. BIC score?) * It was a part of the cross-validation result after learning the BN structure. You can find more explanation [here](https://www.bnlearn.com/documentation/man/bn.cv.html). You can find more explaantion in [this video](https://www.youtube.com/watch?v=ZMmWeB2Tndg&list=PLaBRsf7CT_e29M39lsHHdvn4H2ebNWUwk&index=3&ab_channel=AaronSmith) (Ali and Sam). :battery::bulb:The University of Strathclyde & Supergen Energy Hub:bulb::battery: * Brilliant talk everyone! * Thanks! :smiley: * Is power failure cascade similar to other 'critical failing' phenomena? For e.g. in sand/land sliding or stock market crash? If so, could we import some models or learn from those cases? If not, why not? Thanks ! * This is correct, power grid cascades indeed have a similarity with the 'critical failing' phenomenon seen in other areas. There are methods proposed in academia that characterize this critical failing using techniques from non-linear dynamical systems and have applied them to power systems successfully. However, they cannot predict loss of a single component or few components. These can only predict entire system instability, which is very rare. The methods presented today predict both scenarios, making them more practical. * What kind of action could be taken to avoid a cascading effect upon detecting that it would occur? And how fast should intervention be to be effective? * The identification the control measures taht could be taken to avoid the failure is a hard question as it depends on the 'controllability' of the system. This is an entirely different ball game than the 'observability' of the grid which is what we worked on. We cannot identify the 'controllability' of a system from datasets generated using non-controlled simulation, which is what we had in our case. * The intervention depends on the severity of the event. In some cases, the cascading may be initiated in < 1 second after the disturbance and control for these scenarios cannot really be triggered. In other cases, the cascading could start around 30s after the disturbance and this allows us to perform controls. * When you normalise the features did you normalise it across all simulations (train+test) or just training set simulations? * We normalised it across the entire dataset (train+test). * How transferable are your models to other power network topologies? * This is a great question that is an active area of research. At present, the methods did not exploit the network nature of the underlying system. We hope that graph based techniques can help in generalizing. * Is there any 'spatial'/connectivity information that could be useful? For e.g. nearby (geographically or electronically connected) buses or devices might be the first to trigger/fail following the first failure. * Yes! this information will be useful to predict the actual sequence of the failed devices. This is a future direction that the PIs can explore further. * Does the binary classification allow identifying where the failure will happen? Or where (which node) should measures be taken to avoid the cascading effect? [Eduardo] * The binary classification we did was for classifying fail/non-fail cases. Since we selected only the data before the first failure occurred, we were able to predict failures 0.5 seconds before they occurred (Maryleen). * The identification of the node where the control measures could be taken to avoid the failure is a hard question as it depends on the 'controllability' of the system. This is an entirely different ball game than the 'observability' of the grid which is what we worked on. We cannot identify the 'controllability' of a system from datasets generated using non-controlled simulation, which is what we had in our case. * Interesting presentation! Very nicely presented. It seems that simpler model (LR) has better performance compared to more complex models (VAE), is that what is actually happening? * This is indeed what we are seeing in the dataset that we used. However, we feel these results point to further research directions to drill down with more computing power and data. * Great evaluation, it was important to have all four metrics (Acc, Precision, Recall and F1) score considering the unbalanced dataset! * Thanks! :thumbsup: * Is there a baseline or benchmark from current state-of-the-arts methods? How do your performances compare to them? * There are at present very few methods that aim to identify the loss of a component just using data as the system dynamics are very complex. This is one of the key motivations for using ML techniques in this problem. Regarding machine learning models in general, we used logistics regression model as a baseline and others like LSTM and Autoencoder models to compare it with (Maryleen). * Good job, everyone! Impressive results. * Thank you! :thumbsup: :slightly_smiling_face: * Any insights/suggestions on the multi-class classification problem? * We hope that the low dimensional embeddings in the auto-encoder can be used for the multi-classiification. :cake::birthday:CatsAi:birthday::cake: * have the pattern changed during covid - how can pre and post covid. How can the data be used for post covid predictions. * (Sami): The data we have is for 2018 March to October 2019, luckily 2020 didn't make it in this data. However, 2019 data wasn't very balanced (January to July of 2019 had unexplained behaviour) * (Sami): These models explored the imapct of timeseries seasonality and weather condition on sales, even after Covid these features are still part of the real world, so we can incorporate the findings as priors for any model post Covid. * Quite surprising that 80% of the entry don't have sales... don't bakeries rely on selling goods everyday to survive in the business?? * (Sami): The client of CatsAi is a wholesaler, a large one as well, they sell to bakeries in large quantities (for example one order of Donuts has 36 units, the median donut order is 20ish). Addditionally, the way the data is captured is a row for every product, every site (bakery), regardless if there was an order or not, a bakery that orders once a year (which there are few of). * (Gordon): There are other wholesalers as well but they are not available in our dataset. It is possible that bakeries other different products from different wholesalers. (Prakhar) Addtionally, there are more products as well, however we were presented with few of the highest selling products. So, it is possible that the bakeries ordered other products on that praticular day. * Do you have any baseline/benchmarks from current practice/state-of-arts methods? And how good are they in terms of explainability vs accuracy? Is your performance on-par with them or better? * (Sami): SOTA is Lime and DICE to explain black box, or using whitebox models, it is hard to benchmark on this data because of externality - the data is only on sale and weather condition, not much on the agents behaviour or any other details about the specific bakery. * (Sami): We didn't try too hard to do too much performance engineering (Bayesian optimisation of parameters, pruning, PCA, etc...) we wanted to focus on explainability and trying to understand communicate our understanding of the data. * What are the assumption for counterfactual explanation. * (Torty): Whenever you start talking about counterfactuals you immediately have to start thinking through a causal lens - counterfactual explanations assume a temporal causal ordering between cause A and event B, they assume that A and B are correlated and thirdly, that the correlation between A and B is not influenced by an external variable C (however most current counterfactual explanation methods do not properly account for this third assumption!) * (Ridda): Counterfactual explanations are based on 3 key causal inference assumptions: 1) Exchangeability; 2) Positivity; 3) Consistency. If you are interested in learning more about causal inference, I would recommend the following book: https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/2019/11/ci_hernanrobins_10nov19.pdf. If we had more time, we would have explored some causal inference methods such as Directed Acylic Graphs (DAGs). * What are the units/scales on the y-axis for the slide 24? For e.g. temp is it residual above/below the mean, in celcius? Is the sale number normalized to be percentage of the daily totaly sale? * (Sami): Temp is celcius yes. The sales are normalised as well, the STS model does anomality detection and (because of priors) it ignores the "fatfinger" orders. The tree models did p999 scaling of the fatfinger orders. * (Sami): Regarding fatfinger orders: we had quite few of these in the dataset, randomly you would get an order of x500 the order of magnitued the usual. * Great presentation, it made me very hungry * (Sami): Pls send donuts :+1: :+1: * Did you look at performance vs data size? For e.g. instead of doing yearly prediction, doing few-months prediction then plot it against data set. That will inform if more data will help or not. * (Sami): The STS model looked at december in 2018 after training from march to Nov2018. The other models did similar splits too, we did notice if we are training on the whole dataset the variance actually increases (because of unexplained behaviour in Jan-Aug 2019). But ya we looked month on month and week on week stuff. * (Divya) We did try to filter it down to category of item (like bread ready to bake which is a broad category) and make predictions based on that. We noticed this improved the performance for some categories but we don't have the same amount of data for all categories which doesn't make it comparable. * What is the complexity scaling with datasize for the STS models? E.g. other more explainable methods such as Gaussian processes really suffer from complexity of O(N^3). Also Is STS kernel based and if so how did you choose the kernel? Some kernels can help with the learning of periodic trends and kernel engineering by adding different kernels to learn different trends in the data can be interesting. Did you try any thing like that? Also thanks for the interesting presentation * (Sami): GPs would struggle with this data because of the cholosky decompositiy (Btw you can apply sparse GPs), time series uses gaussian random walks and X-step lookaheads rather than storing the whole dataset in matrix. They are complex though, we did use variational inference to approximate the result, HMC/NUTS wouldn't have finished in time for presentation. * (Sami) Regarding the kernel comment: DO you mean like automatic statistician approach (I know they are used in airline predictions), * (Josh) I basically mean something like the Mauna Loa Atmospheric Carbon Dioxide example from Rasmussen and Williams, Guassian Processes for Machine learning - section 5.4.3. I just find that interesting! * (Sami): It is very interesting method, one issue I had with GPs before it is sorta harder to inject prior knowledge outside the kernel selection, there are methods that combines GPs with Linear methods (e.g. BOAT semiparam methods) but honestly not something that can be done easily in DSG. * (Sami): Also: so we have 5 different categroical levels: bakery site, product, city, country, county. With mix of boolean values (holidays, sunday) etc... GPs assumes your data follows a gaussian distribution - which our data doesn't, there is discontinouty, unexplained jumps, anomalities, depending on which category you are conditioning on. The GP will really struggle (I am tempted to try fitting a GP on one type of pastry in one city just to confirm this). STS doesn't make such assumption, you can use a mixture of distribution for your final layer, in this data I experimented with using a Poisson distribution for output (since it is a sales data), you can mix various expoentital distribution family as long as they are linked. Maybe GPs can do that but I am not aware of straightforward way of doing that tbh * (Josh) Ah yeah fair enough, I guess its not a system based on physics so its less likely that the central limit theorem will get you out of trouble. Thanks for the detailed reply! * (Sami) I think CLT will still apply if we looked at all warehouses (we didn't get this data) and all bakeries and all products, I suspect the consumer behaviour follows CLT which shown in various game theories books. But ya very good points, love the question! * * Awesome talk and nice slides. :+1: :+1: * (sam ip) great stuff! in case this is helpful/interesting, on the perturbation natue of LIME and SHAP https://arxiv.org/pdf/1911.02508.pdf%3Cimg :recycle::leaves:**Greenvest Solutions**:leaves::recycle: * What could be the reason wind speed and direction aren't varying much over the years and how do you justify this stability will happen for future years? * [Jiaxin] Thank you for your question. The wind speed and direction do change over time, they fluctuate in short term. However, we are look at more about the long term and use the daily, weekly and monthly average distribution. The yearly average distribution won't change much might not be surprising like yearly average tempreture or annual rainfall. We initially wanted to target this problem, but after data exploration, we found it hard to capture the yearly change. We think if instability exists, it might be caused by climate change or extreme weather event in larger spatial scale, which might be another broad problem. On the other hand, we only have around 8 years of time series which is a bit hard to predict the yearly change. * Would it be possible to add a relation between stations? Predicting in real time the wind speed between stations? * [Ivan] Our resident Kriging expert Tom might want to come in here, but the Regression-Kriging approach would allow us to interpolate between stations across the UK. We would expect our predictions to be more accurate for locations closer to ones for which we have ground data, but we would be able to provide estimates at any chosen location at a resolution defined by the model. It really is an exciting approach that could offer a lot. * Can we predict how changing the ground morphology might affect the wind speed? Using a combination of AB testing and feature analysis? * [Ivan] Ah, great question. Everyone loves a bit of terraforming :female-construction-worker:. We could certainly mutate the elevation maps fed into our models and evaluate the effect of these terrain modifications on our predictions for wind speed and direction. We haven't considered AB testing explicitly, but our existing feature analysis could show the effects of these proposed changes. * [Luke] XGBoost should give an indication of which terrain features are good predictors. With CNN approaches you could enter any arbitrary terrain information (even synthetic/hypothetical) and see how the output changes. * I understand you're using wind speed and direction prediction to answer the question of where to locate the wind turbines. How exactly is the location decision being determined from the predictions? * [Luke] That's broadly for Greenvest to decide. The expected yield for a location depends primarily on having reliably high wind-speed. With these tools, we can take the satellite data that is available and query what the expected wind-speed (and variability in some cases) should be at that location. With the data available, we have models that handle how satellite data corresponds to wind statistics at specific locations within the UK, and so our models should hopefully generalize to the rest of the UK. In future, a landowner might come to Greenvest and ask them to assess the viability of their land to be a wind farm, and they could use these models to give an answer. * *