# ML in Finance project - Solution references: - https://www.kaggle.com/code/tommy1028/lightgbm-starter-with-feature-engineering-idea/notebook - https://www.kaggle.com/code/nyanpn/1st-place-public-2nd-place-solution - Data snapshoot(of a certain stock id) - book:![image](https://hackmd.io/_uploads/r1L90LMJyx.png) - trade: ![image](https://hackmd.io/_uploads/Hy06A8fJ1g.png) - Feature Engineering - Book - $wap1 = \frac{BidPrice1 * AskSize1 + AidPrice1 * BidSize1}{BidSize1 + AskSize1}$ - $wap2 = \frac{BidPrice2 * AskSize2 + AidPrice2 * BidSize2}{BidSize2 + AskSize2}$ - $logReturnWap1 = \Delta log(wap1)$ - $logReturnWap2 = \Delta log(wap2)$ - $logReturnAsk1 = \Delta log(AskPrice1)$ - $logReturnAsk2 = \Delta log(AskPrice2)$ - $logReturnBid1 = \Delta log(BidPrice1)$ - $logReturnBid2 = \Delta log(BidPrice2)$ - $wapBalance = |wap1 - wap2|$ - $priceSpread = \frac{(AskPrice1 - BidPrice1)}{\frac{AskPrice1 + BidPrice1}{2}}$ - $BidSpread = BidPrice1 - BidPrice2$ - $AidSpread = AskPrice1 - AskPrice2$ - $TotalVolume = AskSize1 + BidSize1 + AskSize2 + BidSize2$ - $VolumeImbalance = |AskSize1 + AskSize2 - BidSize1 - BidSize2|$ - Trade - $logReturn1 = \Delta log(Price)$ - $numberOfDeals = len(data)$ - $tradeSizeSum = sum(Size)$ - $orderCountMean = mean(OrderCount)$ - The past N seconds feature of all of the above - Target (this is weird in my opinion) - $stockIdTargetEnc$: mean of target of other timeslots.