## The different data items collected in IT Operations
- System and application logs
The operations and performance of various systems and applications, such as error messages, performance metrics, and user activity.
- Network traffic and usage data
This data includes information about the amount and type of network traffic, as well as data about the performance and availability of network resources. This data might include the data collected using probing tools like Prometheus
- Resource utilization metrics
This data includes information about the usage of various system resources, such as CPU, memory, disk, and network bandwidth.
- Error and event reports
These include information about any errors or other events that occur within the IT environment, such as system crashes or security breaches.
- Security-related data
This data includes information about the status and performance of various security-related processes and systems, such as intrusion detection and firewall logs.
## More Specific Examples of the data collected
- System uptime and downtime data
- Application error codes and messages
- Network latency and packet loss data
- Memory and disk usage data
- CPU utilization and load averages
- Service request and incident data
- Data on system and application performance over time
- Security incident and vulnerability data
- User login and authentication data
## Issues with the current approach of handling IT operations
- Scalability
As the volume and complexity of data generated by IT operations continue to grow, it can be challenging to scale IT operations management tools and processes to keep pace.
- Reliability
Ensuring the reliability of IT operations can be a challenge, as it requires constant monitoring and maintenance to prevent downtime and data loss.
- Integration
IT operations often involve a wide variety of systems and applications, which can be difficult to integrate and automate.
- Lack of skilled professionals
The field of IT operations is constantly evolving and requires skilled professionals who are well-versed in the latest technologies, which can be difficult to find and retain.
- Compliance
IT operations must meet various regulatory compliance requirements, such as HIPAA, PCI, and GDPR, which can be difficult and costly to implement and maintain.
## How can AI help ?
- It can help to scale IT operations management by automating repetitive tasks and providing insights that would be difficult or impossible to obtain manually. For example, AI-based predictive analytics can help to forecast future resource usage and identify potential bottlenecks before they occur.
- It can help to improve the reliability of IT operations by identifying and resolving issues more quickly and automatically. For example, AI-based monitoring systems can identify and respond to issues with systems and applications before they can cause downtime or data loss.
- It can automatically monitor systems and applications and identify and resolve issues without the need for skilled professionals.
- An AI-based tool that can automatically monitor changes to systems and applications and alert teams if a change could impact compliance.
- Hence, AI Ops
## Issues that AI has to tackle to be successful in managing the IT Ops
- Data quality and availability
AI systems need access to accurate and complete data from systems and applications in order to be effective. However, data in IT operations can be inconsistent, incomplete, or difficult to access, which can make it challenging for AI to make accurate predictions or identify issues.
- Integration with existing systems
AI systems used in IT operations need to be able to integrate seamlessly with existing systems and processes in order to be effective. This can be challenging, particularly when dealing with legacy systems that are not designed to integrate with new technologies.
## Data Quality Issues in IT Ops
- Data duplication
IT occurs when the same data is stored in multiple locations, leading to inconsistencies and inaccuracies in the analysis. This can happen when data is collected from multiple sources, such as different monitoring tools, and not properly consolidated.
- Data bias
It occurs when the data is skewed towards certain systems or components, leading to an incomplete or inaccurate understanding of the overall IT environment. For example, if a monitoring tool is only collecting data from a subset of systems, the analysis will only be based on that subset and not the entire environment.
- Data noise
It refers to unimportant or irrelevant data that can make it difficult to identify meaningful patterns and insights in the data. This can happen when there is a high volume of data or when data is collected from multiple sources with different levels of granularity.
- Data timeliness
Data timeliness refers to the currency of the data. Data that is not up-to-date can lead to inaccurate conclusions and poor decision-making. This can happen when data collection is not automated or when there are delays in data processing.
- Data completeness
Data completeness refers to the presence of all relevant data. Data may be missing, incomplete or inconsistent which can lead to inaccurate analysis. This can happen when data is not properly collected or when there are errors in data processing.
- Data integrity
Data integrity refers to the accuracy and consistency of the data. Data may be corrupted or tampered, leading to inaccurate conclusions. This can happen when data is not properly secured or when there are errors in data processing.
## Consequence of applying AIOps on the data that has the issues discussed above
- Inaccurate analysis
Data duplication, bias, noise, and incomplete data can lead to inaccurate conclusions and poor decision-making.
- Missed opportunities
Data bias and noise can make it difficult to identify meaningful patterns and insights in the data, leading to missed opportunities for optimization and automation.
- Poor performance
Data timeliness and completeness can lead to poor performance and suboptimal outcomes.
- Unreliable predictions
Data integrity issues can lead to unreliable predictions and inaccurate modeling.
- Lack of trust
If the data is not accurate, complete and consistent, stakeholders may not trust the insights and conclusions generated by AIOps.
- Inefficiency
If the data is not properly cleaned and preprocessed, it can lead to inefficiency in the AI-based systems and increase the cost of running AIOps.
# One of the solutions
- Robotic data automation (RDA) is an emerging methodology that addresses data quality issues in IT Operations data that hamper the results of AIOps. As identified in a Forbes piece by Shailesh Manjrekar, [RDA](https://www.forbes.com/sites/forbestechcouncil/2021/08/03/how-robotic-data-automation-could-automate-data-pipelines/?sh=48f0bd984e58) is closely related to robotic process automation (RPA) which automates business processes, data workflows and user tasks. However, RDA specifically focuses on automating data pipelines with software bots.
- With RDA, software bots can be deployed within data pipelines to simplify and abstract a lot of data operations and machine learning operations. This is the key to data automation. By using software bots within pipelines and automated workflows, one can achieve data quality for AIOps. RDA helps to automate the data pipelines and improve data quality which is an essential component for AIOps.
# Videos
- [How to fasttrack AIops using RDA](https://www.youtube.com/watch?v=cXfP-PkV_E0)
# References
- [RDA Related Stuff](https://www.zdnet.com/article/data-quality-can-make-or-break-efforts-to-bring-artificial-intelligence-to-it-operations/)
- [Cloud fabrix](https://cloudfabrix.com/blog/aiops/biggest-challenges-in-enterprise-it-data-quality-gaps-data-dispersion/)
- [Good blog](https://www.bmc.com/blogs/why-aiops-needs-big-data-and-what-that-means-for-you/)
- chatGPT