Application of Nature-Inspired Swarm Optimization Algorithms in Artificial NeuralNetworks === ## Introduction Nature-Inspired Swarm Intellignece take inspiration from natural agents such as birds, ants and wolves that interact with each other and the environment to optimize certain processes such as food foraging. While Nature-Inspired Optimization Techniques are not limited to Organism Swarm Optimization, that is what this paper focuses on. Particle Swarm Optimization, Water Drop Optimization are other examples along with algorithms inspired by human behvaiour, such as Political Optimizer, which focuses on 5 phases of a political party or organization. These discussions are interesting, but out of scope for our discussion. While we will discuss the social behaviour of the organsms, the organisms are represented by agents in programs. Q. Why do we need such algorithms? > These algorithms are used to acheieve some degree of decentralization of intelligence while optimizing for food foraging. These metaheuristics are effective at attaining near-optimal solution and mminimization of search space. For example, using simple linear regression for networ routing may not be as effective as using Ant Colony Optmizer. Rule-Based models using metaheuristics have been developed to predict Bankruptcy as well - something that wouldn't be efficient using Support Vector Machines. As far as other optimization algorithms such as Stochastic Gradient Descent, there are certain problems with SGD that is solved individually by other algorithms. Frequent Updates cause problems with directions, computational resource usage etc.. Q. Why we chose these three algorithms? > These algorithms provide diversity due to the difference in complexity of organisms and their social orders. While all three work on optimising resource collection, all three organisms display different levels of social complexity, this affects their foraging behaviour. Bacteria work for maximising food allocation with little attention to hierarchy (Although one could argue there is an indirect community behaviour, such as cholera bacterium changing shape in response to environmental changes to allow larger community protection, this change was not reflected repeatedly in community behaviours). Ants collaborate in large communities and wolves hunt and live in highly-local communities. ## Algorithm ### Ant Colony Optimization This algorithm is inspired from the foraging behaviour of ant colonies, introduced by Dorigo, 1990. Ants are highly community-centric organisms. They communicate with each other using Sound, touch and Pheromones. Pheromones are chemicals that allow certain organisms to communicate with its species. Pheromones work by acting as hormones outside the body of the secreting agent. This causes a social reaction in the recieving individual. Ants use soil as a surface to leave a trail of pheromones, please note that pheromones evoapourate quickly. This allows other ants to smell (or sense) these pheromones. Ants commit a movement process resembling random-walks. For inititating the algorithm, certain parameters are required. Programitcally, we declare these as variables. They would be- 1. c (original number of trails) 2. alpha (pheromone importance) 3. beta (distance priority) 4. evaporation (rate for pheromone) 5. Q = total amt of pheromone left on the trail by each ant 6. antFactor = Ants per node 7. randomFactor = Randomness in simulation Each ant should be able to visit a specific node, remember visited nodes and keep track of trail length. ``` while not termination do: Generate ant population; Calculate fitness values associated with each ant; Find best solution through selection methods; Update pheromone trial; end while ``` ____ ### Bacterial Foraging Algorithm The Bacterial Foraging Optimization, proposed by Passino is inspired by the social foraging behavior of Escherichia coli (next E.coli). During foraging of the real bacteria, locomotion is achieved by a set of tensile flagella. Flagella help an E.coli bacterium to tumble or swim, which are two basic operations performed by a bacterium at the time of foraging. When they rotate the flagella in the clockwise direction, each flagellum pulls on the cell. That results in the moving of flagella independently and finally the bacterium tumbles with lesser number of tumbling whereas in a harmful place it tumbles frequently to find a nutrient gradient. Moving the flagella in the counterclockwise direction helps the bacterium to swim at a very fast rate. In the above-mentioned algorithm the bacteria undergoes chemotaxis, where they like to move towards a nutrient gradient and avoid noxious environment. Generally the bacteria move for a longer distance in a friendly environment. When they get food in sufficient, they are increased in length and in presence of suitable temperature they break in the middle to from an exact replica of itself. This phenomenon inspired Passino to introduce an event of reproduction in BFO. Due to the occurrence of sudden environmental changes or attack, the chemotactic progress may be destroyed and a group of bacteria may move to some other places or some other may be introduced in the swarm of concern. This constitutes the event of elimination-dispersal in the real bacterial population, where all the bacteria in a region are killed or a group is dispersed into a new part of the environment. Bacterial Foraging Optimization has three main steps: Chemotaxis Reproduction Elimination and Dispersal Chemotaxis: This process simulates the movement of an E.coli cell through swimming and tumbling via flagella. Biologically an E.coli bacterium can move in two different ways. It can swim for a period of time in the same direction or it may tumble, and alternate between these two modes of operation for the entire lifetime. Reproduction: The least healthy bacteria eventually die while each of the healthier bacteria (those yielding lower value of the objective function) asexually split into two bacteria, which are then placed in the same location. This keeps the swarm size constant. Elimination and Dispersal: Gradual or sudden changes in the local environment where a bacterium population lives may occur due to various reasons e.g. a significant local rise of temperature may kill a group of bacteria that are currently in a region with a high concentration of nutrient gradients. Events can take place in such a fashion that all the bacteria in a region are killed or a group is dispersed into a new location. To simulate this phenomenon in BFO some bacteria are liquidated at random with a very small probability while the new replacements are randomly initialized over the search space. ``` BEGIN Initialize randomly the bacteria foraging optimization population Calculate the fitness of each agent Set global best agent to best agent FOR number of iterations FOR number of chemotactic steps FOR each search agent Move agent to the random direction Calculate the fitness of the moved agent FOR swimming length IF current fitness is better than previous Move agent to the same direction ELSE Move agent to the random direction END IF END FOR END FOR Calculate the fitness of each agent END FOR Compute and sort sum of fitness function of all chemotactic loops (health of agent) Let live and split only half of the population according to their health IF not the last iteration FOR each search agent With some probability replace agent with new random generated END FOR END IF Update the best search agent Calculate the fitness of each agent END ``` ___ ### Grey Wolf Optimization The Gray Wolf Optimization algorithm mimics the leadership hierarchy and hunting mechanism of gray wolves in nature. Wolves live in a pack. The average pack consists of a family of 5–12 animals. wolves have strict social hierarchy which is represented by the division of a pack into four levels: alpha, beta, delta, and omega. Alpha wolves are the leaders of their pack. They are responsible for making decisions, but sometimes alphas can obey to other wolfes of the pack. Beta wolves help alphas make decisions, every beta is a candidate to become an alpha if an alpha has died or aged. A beta respects an alpha and transfers commands to the pack, ensures discipline among inferior wolves and provides a feedback from the pack to an alpha. Delta wolves have to submit to alphas and betas, but they dominate the omega. Finally, omega wolves have to obey all other wolves. Sometimes they play a role of caretakers. Gray wolf hunting has three main phases: Tracking, chasing, and approaching the prey Pursuing, encircling, and harassing the prey until it stops moving Attack towards the prey In mathematical model of the social hierarchy of wolves is mapped to the solution fit. The fittest solution is considered to be the alpha. Beta and delta are the second and third best solutions respectively. The rest of the candidate solutions are assumed to be omega. Alpha, beta and delta lead the hunting (optimization) and omega wolves follow these three wolves. ``` BEGIN Initialize randomly the gray wolf population Find 1st, 2nd and 3rd best agents (α, β, δ) Set global best agent to the 1st best agent Calculate the fitness of each search agent WHILE count < max number of iterations FOR each search agent Update te position of the current search agent END FOR Update α, β and δ Calculate the fitness of all search agents Update the best search agent, the 2nd best serach agent, and the 3rd best search agent ADD 1 to count END WHILE RETURN the best search agent END ``` ___ ### Application <i>ACO Trained ANN-Based Intelligent Fault Diagnosis</i> Fault Diagnosis requires non-linear optimization. ANNs can be used and have strong learning ability. Backpropagation feeds output backwards to mitigate errors. But it has a low rate of convergance and it locates local minima easily. ACO is able to give local minima with easy but due to distributed agents, the collective global convergance occurs faster. The paper uses an ensemble of ANN and ACO to find source of fault based on equipment status infor-mation. Last failure mode database component7is fedto the model, resulting a diagnosis. The network has to differentiate the different operating conditions of rolling-element bearing failure. The model is also fed results of an online monitoring data acquisition system. After required training, the model was tested. The paper concludes that for the given context,the model provides a 100% accuracy. This is an example of using ACO for acheiving a faster rate of convergance using equipment data. The loss is settled between an error range close to final result. ___ <i>An ACO–ANN based feature selection algorithm for big data</i> The paper implements a hybrid ANN and ACOalgorithm for feature selection in text datasets to categorize text. The ANN-ACO ensemble outperforms ACO, Genetic Algorithm, IG and CHI in terms of Macro-F1 valuesand micro-F1 values. The paper concludes that the model converges promptly due to its efficient search ability in thespace and the fact that it could find the minimal feature subset optimally. --- <i>Training neural networks with ant colony optimization algorithmsfor pattern classification</i> >Feed-forward neural networks are commonlyused for pattern classification. The classification accuracyof feed-forward neural networks depends on the configura-tion selected and the training process. Once the architec-ture of the network is decided, training algorithms, usu-ally gradient descent techniques, are used to determine theconnection weights of the feed-forward neural network.However, gradient descent techniques often get trapped inlocal optima of the search landscape. To address this issue,an ant colony optimization (ACO) algorithm is appliedto train feed-forward neural networks for pattern classifi-cation in this paper. In addition, the ACO training algo-rithm is hybridized with gradient descent training. Bothstandalone and hybrid ACO training algorithms are evalu-ated on several benchmark pattern classification problems,and compared with other swarm intelligence, evolution-ary and traditional training algorithms. The experimentalresults show the efficiency of the proposed ACO train-ing algorithms for feed-forward neural networks for patternclassification >To evaluate the performance of the proposed ACO trainingalgorithm, it is compared with other algorithms in trainingfeed-forward neural networks for solving different bench-mark classification problems (Prechelt 1994). The detailsregarding the configuration of the networks and datasets usedfor different problems are shown in Table1. The experimentalstudy is divided into three parts. The algorithms implementedand compared in the first part of the experimental study (seeSubsect.4.2) include the following: > >1. Back-propagation (BP) (Rumelhart et al. 1986): the basicback-propagation training without any acceleration orother tweaking techniques such as momentum and so on. > >2. Levenberg–Marquardt (LM) (Hagan and Menhaj 1994):the standard Levenberg–Marquardt training, which isanother gradient descent algorithm. > >3. ACO-BP: the proposed ACO training algorithm with thecombination of back-propagation, where each solutionconstructed by ACO undergoes a local search improve-ment by back-propagation. > >4. Random constructive heuristic (RCH): this algorithmconstructs solutions at each iteration in the same wayas ACO but without any pheromone trail reinforcement. ___ <i>Analysis of energy management in micro-grid - A hybrid BFOA and ANN approach</i> ![](https://i.imgur.com/97ZLdYr.png)![](https://i.imgur.com/J2P033o.png) ___ <center>Plants Leaf Segmentation using Bacterial ForagingOptimization algorithm</center> The paper introduces a hybrid method of using BFOA and ANN. For this, a simple ANN is consideredwhose initial weight and training procedure has beenoptimized with the help of BFOA. The plant leaf images are randomly selected for evaluating the performance of the said framework in comparison to K-means and ANN on the basis of Dice Similarity Coefficient (DSC) and Jaccard Coefficient (JC). While the K-means provides anaverage DSC of 0.75985 and JC of 0.7567 and ANN provides average DSC of 0.80745 and average JC of 0.8128, the BFO-ANN Hybrid provides an average DSC of 0.86005 and average JC of 0.85675. ![](https://i.imgur.com/bpNqXnw.png) ![](https://i.imgur.com/xuAip0b.png) **FAQ** Q. How does that rule based model works in financial? > In the paper, the authors proposed a hybrid system to predict corporate bankruptcy. The whole procedure consists of the following four stages: first, sequential forward selection was used to extract the most important features; second, a rule-based model was chosen to fit the given dataset since it can present physical meaning; third, a genetic ant colony algorithm (GACA) was introduced; the fitness scaling strategy and the chaotic operator were incorporated with GACA, forming a new algorithm—fitness-scaling chaotic GACA (FSCGACA), which was used to seek the optimal parameters of the rule-based model; and finally, the stratified K-fold cross-validation technique was used to enhance the generalization of the model. Simulation experiments of 1000 corporations’ data collected from 2006 to 2009 demonstrated that the proposed model was effective. It selected the 5 most important factors as “net income to stock broker’s equality,” “quick ratio,” “retained earnings to total assets,” “stockholders’ equity to total assets,” and “financial expenses to sales.” The total misclassification error of the proposed FSCGACA was only 7.9%, exceeding the results of genetic algorithm (GA), ant colony algorithm (ACA), and GACA. The average computation time of the model is 2.02 s. Q. Compare ACO and model in "Training neural networks with ant colony optimization algorithmsfor pattern classification"? >It is interesting to compare the performance of the stand-alone ACO and hybrid ACO-BP training algorithms forneural networks with the corresponding continuous ACOvariations, denoted ACORand ACOR-BP (Socha and Blum2007), respectively. The two existing training algorithmswere applied with the same stopping criteria (i.e., 1000 func-tion evaluations) and performed the same fourfold cross-validation with the proposed training algorithms.