# Azure AI Fundamentals
###### tags: `Other stuff`
## Responsible AI Key Principles
1. **Fairness** :arrow_right: prevent unfair allocation of resources, remove unfair advantage
2. **Reliability & Safety** :arrow_right: eliminate threat of harm to human life - autonomous vehicles, healthcare diagnosis
3. **Privacy & Security** :arrow_right: do not disclose personal data and respect privacy - detect changes in the data
4. **Inclusivness** :arrow_right: engage all communities in the world (empore everyone)
5. **Transparency** :arrow_right: AI-based solutions should be understandable - open about how and why
6. **Accountability** :arrow_right: governance framework, ethical policies, legal standards
## AI Use Cases
- Anomaly Detection
- analyses data over time, a stream of data -> detect problems or unusual changes (dips, spikes..)

*Use Case*: IoT, monitoring, fault detection, computer network traffic
- Vision
- describing/categorizing an image or video
*Use Case*: reading text and barcodes, detect abnormalities in health scans
- Natural language detection
- all about understanding text (detect language, key phrases etc.)
- *Use Case*: Bots, extract meaning and intent, monitoring news
- Knowledge
- recommendations, discover & search
**Practical Questions:**
*common AI workloads?*
:heavy_check_mark: machine learning
:-1: blockchain
:heavy_check_mark: computer vision
:-1: batch processing
*risks about AI solutions?*
:heavy_check_mark:bias can affect results
:-1:AI algorithm always correct
:-1:humans not responsible for AI driven decisions
:-1:AI solutions always more reliable than humans
*one of the six principles?*
:-1:fast performance
:-1:flexible
:heavy_check_mark:inclusivness
:-1:open source
## ML in Azure
- Automated machine learning
This feature enables non-experts to quickly create an effective machine learning model from data.
- Azure Machine Learning designer
A graphical interface enabling no-code development of machine learning solutions.
- Data and compute management
Cloud-based data storage and compute resources that professional data scientists can use to run data experiment code at scale.
- Pipelines
Data scientists, software engineers, and IT operations professionals can define pipelines to orchestrate model training, deployment, and management tasks.
---
## Cognitive Services
- pre-built Artifical Inteligence
> **Service Areas**
> Language
> Speech
> Vision
> Knowledge
> Decision
REST API - easy to use (get info via URL - response in JSON or XML) + Authentification Key
send specific parameters in URL or as JSON in Body
[https://portal.azure.com/](https://)
---
## Computer Vision
- analysis of digital images
- computer sees an array containing colour and intensity as a number value
- analyse of values, detect and interpret image
**IMAGE CLASSIFICATION**
- looking at the full image
- categorize image
- describe image
- detect colour scheme
- identify image type
**OBJECT DETECTION**
- looking at a specific object in a image
- detect common object
**SEMANTIC SEGMENTATION** IMPORTANT FOR CERT
- pixel-level classification
- eg. used for self-driving cars, robots
- important for models to understand the context in which they are operating
**OPTICAL CHARACTER RECOGNITION**
- extract text from an image
**FACIAL DETECTION**
- detect faces
- recognize faces
- tag people in photos (bounding boxes over image, where face is detected)
*computer vision service* -> pre-trained computer vision model.
- Object detection
- detect domain-specific content
- image classification
- tagging
- describing an image
- text and OCR
- detect and analyze human faces!

**CUSTOM VISION SERVICE**
*build,deploy and improve own image identifier*

**Computer Vision vs Custom Vision**

**Form Recognizer**
:+1: extract information from scanned forms in image or PDF document
---
## Machine Learning
- creating predective models by finding relationships in data
**Supervised** - make predicitions based on set of labelled examples in provided data
**Unsupervised** - data isn't labelled = algorithm labels them by organizing data and structure. OUTCOME -> based on similarities
**Reinforcement** - learns from outcomes
**5 key areas of machine learning:**
1. *Classification* - which category?
2. *Regression* - predict how much/many?
3. *Anomaly* - is it weird?
4. *Reinforcement* Learning - What next?
5. *Clustering*, *Recommender* - Data Structure
### Machine Learning Model Types:
> Anomaly Detection (find unusual occurances)
- **Classification** (classify images or predict between several categories)
- supervised
- predict a category(class)
- discrete

- **Clustering** (discover structure)
- unsupervised
- algorithm groups data into clusters
- predict a category (class)
- discrete

- **Regression** (predict values)
- supervised
- predict a numeric value
- continuous

*Cheatsheet*: [https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-cheat-sheet](https://)

### Machine learning Concepts
- Select model
- Which machine learning algorithms to use to train the model
- Split data into
- Training
- Validation
- Validate the model
- Metrics for classification and
---
## Machine Learning Workflow
Can be a linear or iterative process

1. **Define Problem**
- translate problem into a machine learning problem statement
2. **Create a Dataset**
- [ ] Collect Data
- [ ] Identify labels
- [ ] Identify features
- [ ] perform feature engineering - enables us to improve or add additional columns or features, which can more predicitive than our raw columns

3. **Sample Dataset**
- split data into training and testing assets
4. **Build Model(s)**
- generate model with label predicted by features for training data
- tune model (optional)
5. **Evaluate Model**
- score test data
- use results to produce evaluation metrics
- explore results
**METRICS:**

**ERRORS**
- Regression Error
- Classification Error

## ML Platforms

**Compute**
Compute instances - create data and models
computer clusters - scalable, processing experiments
inference clusters - predict
**Deployment and management**
Automated Machine Learning
- provide data and desired model type, Azure ML finds the best model
## Language
- Analysis of text
- Statistical
- Frequency
- Phrases
- Strucuture
- Analysis of speech
- Convert audio into text
- Convert text into audio
- Translation
- Text and Speech
- Modelling
- Understand
### Types of NLP workloads:
**Key Phrase Extraction**
- works on identifying semantic structure
- Key talking points
- 15 languages supported
**Named Entity Recognition**
- People, Places, Organisations, Quantities *(Age, Percentages, Currencies*), Dates and times
- 23 languages supported

**Sentiment Analysis**
- pre-trained model
- positive or negative (score between 1 and 0)
- identifying emotions in a sentence (if person happy or mad etc.)
- 19 languages supported
**Language Modelling**
- Interpreting the meaning of text
- discovering the meaning of text
- turning text into requests and actions
**Speech Recognition and Synthesis**
- turns soundwaves into text
- detect and interpret spoken input
- acoustic model
- language model
- generate spoken output
- synthesize speech
- 60+ languages supported
**Translation**
- Text
- Speech
- - 60+ languages supported
### USE CASES:
- monitor social media for sentiment
- analyse phone conversations in call centre
- search for product information from documents
- prioritise emails in customer service

---
***SERVICES***

**Language Understanding Intelligence Service (LUIS)**
--> semantic modelling
--> custom language modelling
Sentence is splitted up into three categories:
- Utterance
- Entities
- Intents

>LUIS App
> - model of intents and entities.
> - use of utterance to train model.
> - pre-built models
> - authoring
> - prediction
**Speech**
- recognize and synthesize speech, and to translate spoken languages
- Text to Speech
- Speech to text
- Real-time
- batch
- Speech translation
- Speech recognition
**Text Translator**
- add multi-language user experience in 90 languages and dialects
- Text translation
- Machine translation service
- Neural Machine Translation(NMT)
Text Analytics
aidemos.microsoft.com/text-analytics
## Bots/Conversational AI
- conversational dialog between an AI agent app and a human
- over channels
- Web
- Social media
- Email
- Voice
- build on top of other AI services
- LUIS
- Speech
**BOTS**
- Web Chat
- IVR
- Personal Digital Assistants
**Use Cases**
- Customer Support
- FAQs
- online ordering
- travel reservation and booking
**Responsible AI for Bots:**
*Transparency* (purpose, limitations should be clear. customers should know that they are talking to a bot)
*Escalation to human* (possibility to seamlessly transfer to a human agent)
*Limit the bot* (reduce the scope of the
bot to its purpose)
*Treat people fairly* (be aware of bias)
**Azure Conversational AI Services**


---
# TEST





### Metrics
#### Classification models
- **Accuracy** measures the goodness of a classification model as the proportion of true results to total cases.
- **Precision** is the proportion of true results over all positive results.
- **Recall** is the fraction of all correct results returned by the model.
- **F-score** is computed as the weighted average of precision and recall between 0 and 1, where the ideal F-score value is 1.
- **AUC **measures the area under the curve plotted with true positives on the y axis and false positives on the x axis. This metric is useful because it provides a single number that lets you compare models of different types.
- **Average log loss** is a single score used to express the penalty for wrong results. It is calculated as the difference between two probability distributions – the true one, and the one in the model.
- **Training log** loss is a single score that represents the advantage of the classifier over a random prediction. The log loss measures the uncertainty of your model by comparing the probabilities it outputs to the known values (ground truth) in the labels. You want to minimize log loss for the model as a whole.
#### regression models
- **Negative log likelihood** measures the loss function, a lower score is better. Note that this metric is only calculated for Bayesian Linear Regression and Decision Forest Regression; for other algorithms, the value is Infinity which means for nothing.
- **Mean absolute error (MAE)** measures how close the predictions are to the actual outcomes; thus, a lower score is better.
- **Root mean squared error (RMSE)** creates a single value that summarizes the error in the model. By squaring the difference, the metric disregards the difference between over-prediction and under-prediction.
- **Relative absolute error (RAE)** is the relative absolute difference between expected and actual values; relative because the mean difference is divided by the arithmetic mean.
- **Relative squared error (RSE)** similarly normalizes the total squared error of the predicted values by dividing by the total squared error of the actual values.
- **Mean Zero One Error (MZOE)** indicates whether the prediction was correct or not. In other words: ZeroOneLoss(x,y) = 1 when x!=y; otherwise 0.
- **Coefficient of determination**, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random (explains nothing); 1 means there is a perfect fit. However, caution should be used in interpreting R2 values, as low values can be entirely normal and high values can be suspect
#### clutering models
- The **Sweep Clustering** module creates multiple clustering models, listed in order of accuracy. For simplicity, we've shown only the best-ranked model here. Models are measured using all possible metrics, but the models are ranked by using the metric that you specified. If you changed the metric, a different model might be ranked higher.
- The **Combined Evaluation** score at the top of the each section of results lists the averaged scores for the clusters created in that particular model.
This top-ranked model happened to create three clusters; other models might create two clusters, or four clusters. Therefore, this combined evaluation score helps you compare models with different number of clusters.
- The scores in the column, **Average Distance to Cluster Center**, represent the closeness of all points in a cluster to the centroid of that cluster.
- The scores in the column, **Average Distance to Other Center**, represent how close, on average, each point in the cluster is to the centroids of all other clusters.
You can choose any one of four metrics to measure this distance, but all measurements must use the same metric.
- The **Number of Points** column shows how many data points were assigned to each cluster, along with the total overall number of data points in any cluster.
If the number of data points assigned to clusters is less than the total number of data points available, it means that the data points could not be assigned to a cluster.
- The scores in the column, **Maximal Distance to Cluster Center**, represent the sum of the distances between each point and the centroid of that point’s cluster.
- If this number is high, it can mean that the cluster is widely dispersed. You should review this statistic together with the **Average Distance to Cluster Center** to determine the cluster’s spread.
