--- title: Summary all basic concepts of DAX tags: dax --- Before start: - Practice file: [Advanture Work 2020](https://github.com/microsoft/powerbi-desktop-samples) - Explain data model and case study [link](https://docs.microsoft.com/en-us/power-bi/guidance/dax-sample-model) - [DAX Power BI Models](/tavkoukYTbaaruknrMMBSg) # Part I: Introduction to DAX **CONTENT** **1. Review everything had learned in the last 5 days (Power BI concept)** - Power BI parts: power query, reports (includes visuals, static elements) - Understanding step in preprocessing data (Power Query) + Profiling data (data type, column names, queries name, missing value, etc.) + Transform data (aggregation, pivot/unpivot data, filter necessary rows/columns, etc.) + Merge/Append data (consolidate multiple queries into one queries) 2. Elements of data model in Power BI: Tables, Columns, Hierarchies, Measures, Relationships 3. Calculated tables and some related calculated columns for Date table - DAX expression structure - DAX data types - Some common operators in DAX expression - Logical - Alrithmatic - Concatenate several text values - Comparison Practice: 4. Calculate basic measures and create measure-holder table # Part 2: Introduction to DAX (cont) Content: ## 1. Basic measures in Power BI Two types of measures: - Implicit measures are automatic behaviors that allow visuals to summarize model column data - Explicit measures, also known simply as measures, are calculations that you can add to your model Numeric columns support the greatest range of aggregation functions: * Sum * Average * Minimum * Maximum * Count (Distinct) * Count * Standard deviation * Variance * Median Text columns allow the following aggregations: * First (alphabetically) * Last (alphabetically) * Count (Distinct) * Count Date columns allow the following aggregations: * Earliest * Latest * Count (Distinct) * Count * Boolean columns allow the following aggregations: * Count (Distinct) * Count **Practice**: Create simple measures as follows ```python Revenue = SUM(Sales[Sales Amount]) Cost = SUM(Sales[Total Product Cost]) Profit = SUM(Sales[Profit Amount]) Quantity = SUM(Sales[Order Quantity]) Minimum Price = MIN(Sales[Unit Price]) Maximum Price = MAX(Sales[Unit Price]) Average Price = AVERAGE(Sales[Unit Price]) # Count Order line and Order Order Line Count = COUNT(Sales[SalesOrderLineKey]) # --> should use COUNTROW for better performance Order Line Count = COUNTROWS(Sales) Order Count = DISTINCTCOUNT('Sales Order'[Sales Order]) ``` **Compare calculated columns and measures:** - SIMILARITIES: - Calculations that you can add to your data model. - Defined by using a DAX formula. - Referenced in DAX formulas by enclosing their names within square brackets. - DIFFERENCES - Purpose - Calculated columns extend a table with a new column, while measures define how to summarize model data. - Evaluation - Calculated columns are evaluated by using row context at data refresh time, while measures are evaluated by using filter context at query time. Filter context is introduced in a later module; it's an important topic to understand and master so that you can achieve sophisticated summarizations. - Storage - Calculated columns (in Import storage mode tables) store a value for each row in the table, but a measure never stores values in the model. - Visual use - Calculated columns (like any column) can be used to filter, group, or summarize (as an implicit measure), whereas measures are designed to summarize. ## 2. Overview of context in Power BI - Review filter, slicer (visualization) - Types of context: row context, filter context ## 3. Row context Row context: means the current row, often used in calculated column. **Practice: Row context for calculated columns and Row context for iterator DAX function (in measure)** ```python # Create the following measure Revenue = SUM(Sales[Sales Amount]) Revenue (MC) = // Manual Calculation Revenue, equivalent to SUM(Sales[Sales Amount]) SUMX( Sales, Sales[Order Quantity] * Sales[Unit Price] * (1 - Sales[Unit Price Discount Pct]) ) Discount = SUMX( Sales, Sales[Order Quantity] * (RELATED('Product'[List Price]) - Sales[Unit Price]) ) # Create the following columns (in Sales table) Discount Amount = (Sales[Order Quantity] * RELATED('Product'[List Price])) - Sales[Sales Amount] ``` Kéo bảng gồm: - Month - Các trường vừa tạo ở trên (nhớ format các trường đó về dạng Currency with 2 digits after comma) Takeaway: - Notice that the formula uses the `RELATED` function. Remember, row context does not extend beyond the table. If your formula needs to reference columns in other tables, and model relationships exist between the tables, use the `RELATED` function for the one-side relationship or the `RELATEDTABLE` function for the many-side relationship. **Practice: Calculate average revenue** ```python Revenue Avg Order Line = AVERAGEX( Sales, Sales[Order Quantity] * Sales[Unit Price] * (1 - Sales[Unit Price Discount Pct]) ) # Increate the granuality to the sales order level: Revenue Avg Order = AVERAGEX( VALUES('Sales Order'[Sales Order]), [Revenue] ) ``` Takeaway: - Use `VALUES` DAX function, this function lets your formulas determine what values are in filter context. In this case, this AVERAGEX function iterates over each sales order in filter context. In other words, it iterates over each sales order for the month. Filter context and the VALUES function are introduced in the filter context module. - Iterator functions are used in 2 scenarios: Complex summarizations; Higher grain summarization **Practice: Create ranking measures, also create a slicer with Category** ```python "Rank without dense" Product Quantity Rank = RANKX( ALL('Product'[Product]), [Quantity] ) "Rank with dense: No missing rank number should exist" Product Quantity Rank (dense) = RANKX( ALL('Product'[Product]), [Quantity], , , DENSE ) "Rank measure which return BLANK unless a single product is filtered" Product Quantity Rank = IF( HASONEVALUE('Product'[Product]), RANKX( ALL('Product'[Product]), [Quantity], , , DENSE ) ) ``` ## 4. Filter context Filter context: describes the filters that are applied during the evaluation of a measure or measure expression. For example: - Filter applied directly in a column - Filter applied indirectly through a column from related table - Filter applied during report design time: through filter pane or slicer (3 levels of filters: visual/page/report) - Filter applied when report users interact with reports (cross-filter, cross-highlight visuals) **The key to writing complex measures is mastering these concepts:** - Understanding how filter context works - Understanding when and how to modify filter context to achieve a required result - Composing a formula to accurately and efficiently modify filter context **Practice**: Create the following measures - Revenue - Revenue % Total Region ### **Modify Filter context** ```python CALCULATE(<expression>, [[<filter1>], <filter2>]…) ``` - Expression: measure /or any expression that can be used in filter context - Filters: Boolean expression or Table filter expression (BUT: All filter expressions that are passed in to the CALCULATE function are table filter expressions.) - Multiple filters mean: AND Practice: Create the following measure ```python # Apply Boolean expression filter in Calculate function Revenue Red = CALCULATE([Revenue], 'Product'[Color] = "Red") Revenue Red or Green = CALCULATE([Revenue], 'Product'[Color] IN {"Red", "Green"}) Revenue Expensive Products = CALCULATE([Revenue], 'Product'[List Price] > 1000) # Apply table expression filter in Calculate function Revenue High Margin Products = CALCULATE( [Revenue], FILTER( 'Product', 'Product'[List Price] > 'Product'[Standard Cost] * 2 ) ) "Comment: The filter table of above expression is the subset of its original table" "Similar expression as follows:" Revenue Red = CALCULATE( [Revenue], FILTER( 'Product', 'Product'[Color] = "Red" ) ) ``` Create a table visual to show how filter context work: - Region/Revenue/Revenue Red - Product Color/Revenue/Revenue Red ### **Use filter modifier functions** Filter functions can: - Add filter conditions to expression - Modify/replace current filter condition in expression Example: `REMOVEFILTER`, similar to `ALL`, `ALLEXCEPT`, `ALLNOBLANKROW` (used to remove filter in the current expression) ```python Revenue Total Region = CALCULATE([Revenue], REMOVEFILTERS('Sales Territory')) Revenue % Total Region = VAR CurrentRegionRevenue = [Revenue] VAR TotalRegionRevenue = CALCULATE( [Revenue], REMOVEFILTERS('Sales Territory') ) RETURN DIVIDE( CurrentRegionRevenue, TotalRegionRevenue ) Revenue % Total Country = VAR CurrentRegionRevenue = [Revenue] VAR TotalCountryRevenue = CALCULATE( [Revenue], REMOVEFILTERS('Sales Territory'[Region]) ) RETURN DIVIDE( CurrentRegionRevenue, TotalCountryRevenue ) Revenue % Total Group = VAR CurrentRegionRevenue = [Revenue] VAR TotalGroupRevenue = CALCULATE( [Revenue], REMOVEFILTERS( 'Sales Territory'[Region], 'Sales Territory'[Country] ) ) RETURN DIVIDE( CurrentRegionRevenue, TotalGroupRevenue ) ``` Create a matrix to show how `REMOVEFILTE` remove filter context in expression: - Group/Country/Region - Revenue - Revenue Total Region - Revenue % Total Region - Revenue % Total Country - Revenue % Total Group Example: Reserves Filter using `KEEPFILTERS` ```python "Compare the following measures:" Revenue Red (boolean) = CALCULATE([Revenue], 'Product'[Color] = "Red") Revenue Red = CALCULATE( [Revenue], KEEPFILTERS('Product'[Color] = "Red") ) ``` Create a matrix to show the following fields: - Color, Revenue Red, Revenue Red (boolean) - Comment: The reason why colors other than red are BLANK is because the filter contexts and the filter expressions are combined for these two filters. ### **Use Inactive Relationship** ```python Revenue Shipped = CALCULATE ( [Revenue], USERELATIONSHIP('Date'[DateKey], Sales[ShipDateKey]) ) ``` - `CROSSFILTER`: used to modify filter directions (from both to single or from single to both) and even disable a relationship. ### **Examine filter context** `VALUES` function: - Passed in table reference: return a table object with the same columns that contain rows for what's in filter context - Passed in column reference: return a single-column table of unique values that are in filter context The function always returns a table object and it's possible for a table to contain multiple rows. Therefore, to test whether a specific value is in filter context, your formula must first test that the `VALUES` function returns a single row. Two functions can help you accomplish this task: the `HASONEVALUE` and the `SELECTEDVALUE` DAX functions. The `HASONEVALUE` function returns TRUE when a given column reference has been filtered down to a single value. The `SELECTEDVALUE` function simplifies the task of determining what a single value could be. When the function is passed a column reference, it'll return a single value, or when more than one value is in filter context, it'll return `BLANK` (or an alternate value that you pass to the function). ```python Sales Commission = [Revenue] * IF( HASONEVALUE('Sales Territory'[Country]), IF( VALUES('Sales Territory'[Country]) = "United States", 0.15, 0.1 ) ) ``` Three other functions that you can use to test filter state are: - `ISFITLERED` DAX function - Returns `TRUE` when a passed-in column reference is directly filtered. - `ISCROSSFILTERED` DAX function - Returns `TRUE` when a passed-in column reference is indirectly filtered. A column is cross-filtered when a filter that is applied to another column in the same table, or in a related table, affects the reference column by filtering it. - `ISINSCOPE` DAX function - Returns `TRUE` when a passed-in column reference is the level in a hierarchy of levels. ```python Revenue % Total Country (filter values) = VAR CurrentRegionRevenue = [Revenue] VAR TotalCountryRevenue = CALCULATE( [Revenue], REMOVEFILTERS('Sales Territory'[Region]) ) RETURN IF( ISINSCOPE('Sales Territory'[Region]), DIVIDE( CurrentRegionRevenue, TotalCountryRevenue ) ) ``` ### **Perform Context Transition** Example: Create a calculated column in Customer table to classify customers into loyalty class, according to the following rule: when the revenue that is produced by the customer is less than USD 2500, the customer is classified as Low; otherwise they're classified as High. ```python # First try: Customer Segment = VAR CustomerRevenue = SUM(Sales[Sales Amount]) RETURN IF(CustomerRevenue < 2500, "Low", "High") # Second try: Customer Segment = VAR CustomerRevenue = CALCULATE (SUM(Sales[Sales Amount])) RETURN IF(CustomerRevenue < 2500, "Low", "High") # Thirt try: Customer Segment = VAR CustomerRevenue = [Revenue] RETURN IF(CustomerRevenue < 2500, "Low", "High") ``` Comment: - `CALCULATE` function applies row context values as filters, known as context transition. ( Power BI applies a filter on the CustomerKey column for the value in row context) - If you reference measures in an expression that's evaluated in row context, context transition is automatic (`[Revenue]`) ```python "Back to Sale Commission, to display Total" Sales Commission = SUMX( VALUES('Sales Territory'[Region]), CALCULATE( [Revenue] * IF( VALUES('Sales Territory'[Country]) = "United States", 0.15, 0.1 ) ) ) ``` Comment: - Notice that it no longer needs to test whether a single Country column value in the Sales Territory table is in filter context because it's known to be filtering by a single country (because it's iterating over the regions in filter context and a region belongs to only one country). ## 5. Time Intelligence Functions **Common questions:** * What's the accumulation of revenue for the year, quarter, or month? * What revenue was produced for the same period last year? * What growth in revenue has been achieved over the same period last year? * How many new customers made their first order in each month? * What's the inventory stock on-hand value for the company's products? **Date table requirements:** * It must have a column of data type Date (or date/time), known as the date column. * The date column must contain unique values. * The date column must not contain BLANKs. * The date column must not have any missing dates. * The date column must span full years. A year isn't necessarily a calendar year (January-December). * The date table must be indicated as a date table. [How to create datetime table in Power BI Desktop](https://docs.microsoft.com/en-us/power-bi/guidance/model-date-tables/) ### **Scenario 1: Summarizations over time** Common functions: - Group 1: DATESYTD, DATESQTD, DATESMTD (passed in CALCULATE function) - Group 2: TOTALYTD, TOTALQTD, TOTALMTD - Group 3: DATESBETWEEN, DATESINPERIOD > While the TOTALYTD function is simple to use, you are limited to passing in one filter expression. If you need to apply multiple filter expressions, use the CALCULATE function and then pass the DATESYTD function in as one of the filter expressions. ```python Revenue YTD = TOTALYTD([Revenue], 'Date'[Date], "6-30") # Similarly, create Revenue YTD with other functions in group 1, 3 ``` ### **Scenario 2: Comparisons over time** (Shifting time periods) - DATEADD, PARALLELPERIOD, SAMEPERIODLASTYEAR - NEXTDAY , NEXTMONTH , NEXTQUARTER , NEXTYEAR , and PREVIOUSDAY , PREVIOUSMONTH , PREVIOUSQUARTER , and PREVIOUSYEAR ```python Revenue PY = VAR RevenuePriorYear = CALCULATE([Revenue], SAMEPERIODLASTYEAR('Date'[Date])) RETURN RevenuePriorYear Revenue YoY % = VAR RevenuePriorYear = CALCULATE([Revenue], SAMEPERIODLASTYEAR('Date'[Date])) RETURN DIVIDE( [Revenue] - RevenuePriorYear, RevenuePriorYear ) ``` ### **Advanced Usecase of Time Intelligence Function** The `FIRSTDATE` and the `LASTDATE` DAX functions return the first and last date in the current filter context for the specified column of dates. **Scenario 1: Calculate new occurrences** Usecase: Calculate the number of new customers for a time period. A new customer is counted in the time period in which they made their first purchase. ```python # LTD: life-to-date, from the beginning of time until the last date in filter context Customers LTD = VAR CustomersLTD = CALCULATE( DISTINCTCOUNT(Sales[CustomerKey]), DATESBETWEEN( 'Date'[Date], BLANK(), MAX('Date'[Date]) ), 'Sales Order'[Channel] = "Internet" ) RETURN CustomersLTD New Customers = VAR CustomersLTD = CALCULATE( DISTINCTCOUNT(Sales[CustomerKey]), DATESBETWEEN( 'Date'[Date], BLANK(), MAX('Date'[Date]) ), 'Sales Order'[Channel] = "Internet" ) VAR CustomersPrior = CALCULATE( DISTINCTCOUNT(Sales[CustomerKey]), DATESBETWEEN( 'Date'[Date], BLANK(), MIN('Date'[Date]) - 1 ), 'Sales Order'[Channel] = "Internet" ) RETURN CustomersLTD - CustomersPrior ``` **Scenario 2: Snapshot Calculations** ```python Stock on Hand = CALCULATE( SUM(Inventory[UnitsBalance]), LASTDATE('Date'[Date]) ) Stock on Hand = CALCULATE( SUM(Inventory[UnitsBalance]), LASTNONBLANK( 'Date'[Date], CALCULATE(SUM(Inventory[UnitsBalance])) ) ) ```