Excel課小筆記 === [Toc] --- ## W0 syllabus ### Course grades 1. **Homework assignments 40%** > Numerical answers are provided, grading will be focused on methods/functions used. 2. **Discussion 10%** > Each student will be required to participate at least twice in the discussion section on NTU COOL course website; > It may be asking any course content-related question, or answering any course content-related questions; > Need to submit discussion assignments #1 and #2 to receive credit. 3. **Midterm exam 25%** 4. **Final exam 25%** > Paper exam, closed book; > Basic concepts, usage, and exceptions of EXCEL functions. ## W1 Basic Usage - home tab **format painter**: copy format of cells **wrap text**: use for extra-long content **conditional formatting**: great tool ! - formulas tab **trace reference**: show formula relationship between cells - data tab **sort** **text to columns**: split text into multiple columns **remove duplicates** - view tab **freeze panes**: lock some rows or columns, and enable split view - **selecting cells** hold down **Ctrl** - select all chosen cells, rows, and columns select cell A, press **Shift**, select cell B - select all cells between cell A and cell B - **file format** standard excel file **.xlsx** (**.xls** for old version) comma separated values **.csv** tab separated values **.txt** (or **.tsv**) text file **.txt** - within excel, we could open **.csv** or **.txt**; otherwise, within file explorer, we could also select "open with" ## W2 Cell References - four main types of values in excel: **ISTEXT(), ISNUMBER(), ISLOGICAL(), ISERROR()** - **convert** between them ![image](https://hackmd.io/_uploads/BJ6SGUjp6.png) - input value can be keyed in directly or with equal sign | ISTEXT() | ISNUMBER() | | -------- | ---------- | | ="123" | 123 | | 2\*3 | =2\*3 | - **find and replace** only searches for input values but not return values - Usually, cell reference range will be auto-adjusted through **copy and paste**; however, we can place **\$** to avoid this. furthermore, we can choose to fix either rows or columns, such as "**=\$C\$35**", "**=\$C4+D\$3**" - **LEFT()**: return the specified number of characters from the start of text or number - cell reference targeting cells of other worksheets, such as "**=Sheet1!B2**", "**=SUM(Sheet1!B2:B5)**", "**='halo word'!A1**", "**=AVERAGE(Sheet1:Sheet2!A19:A21)**" - hard-coding, such as "**=SUM(B2:B5)**"; to increase flexibility, we could apply the formula as the term of "**=SUM(INDIRECT(C7&":"&C8))**" - **ADDRESS()**: return cell reference in text format - **INDIRECT()**: return cell reference by text - must know the difference between "**=INDIRECT("D4")**" and "**=INDIRECT(D4)**" - place **&** to join text strings into a single text string - **COLUMN()**: return the column number of the cell reference - **INDEX()**: return value of specified relative cell in specified cell range ## W3 Conditional Logical Functions - **IF()**: check for condition, then return specified value - the logistic of true and false ![image](https://hackmd.io/_uploads/B1ZDNTAp6.png) - note that, >`""` **ISTEXT()**, not **ISBLANK()** > "**=IF(1,\,)**" will return 0 (**ISNUMBER()**) > "**=IF("TRUE", "T", "F")**" will return T, but "TRUE" `not` **ISLOGICAL()** > **TRUE** > **FALSE** > all text > all numbers - **AND()**: return true if all the literals are true - **OR()**: return true if at least one literal is true - outer layer of **IF()** will be processed first, for example, ![image](https://hackmd.io/_uploads/BkR_PaCaa.png) - **<>** is the operator to indicate `not equal to`, such as "**=3<>2**" - **NOT()**: return the opposite logical value - **IFERROR()**: return specified value if error happens - • Possible error messages: > #N/A → Search related functions could not find intended target > #VALUE → Wrong type of input used for a formula or function, such as 5 + abc > #REF! → Could not find targeted cell reference, usually occurs after deleting cells or data > #DIV/0! → Can not divide by 0 > #NUM! → Invalid value in a formula or function, such as log0 > #NAME? → Misused or misspelled keywords in a formula or function > #NULL! → Cell reference or cell reference range refers to a null collection of cells - **IFNA()**: return specified value if #N/A happens - **IFS()**: return the specified value of the first literal that is true - **SWITCH()**: compare the target value with the list of the matched value, then replace it - **XOR()**: return true if there is add number of literals are true ## W4 Statistical Functions I - **COUNT()**: return the number of cells that contain `number`, such as "**=COUNT(A1:A6)**", "**=COUNT(5,2,6,9)**" ![image](https://hackmd.io/_uploads/HJFs8xh0p.png) - **COUNTA()**: return the number of cells that are **not** empty - **SUM()**: return the sum of the specified cells that are `number` ![image](https://hackmd.io/_uploads/By-Bux2Ra.png) - **PRODUCT()**: return the product of the specified cells that are `number` - **AVERAGE()**: return the mean of the specified cells that are `number` - **AVERAGEA()**: similar to **AVERAGE()**, but `text` and `FALSE` are treated as `0` - **STDEVA()** series: - **STDEVP()**: return the **population** standard deviation of the specified cells that are `number` - **STDEV()**: return the **sample** standard deviation of the specified cells that are `number` - **STDEVA()**, **STDEVPA()** ![image](https://hackmd.io/_uploads/Bk0nKx2RT.png) - **MEDIAN()**: return the median of the specified cells - **MODE()**: return the most common number that first appears of the specified cells - **MODE.MULT()**: return all the multiple modes, note that array function is needed - **MAX()**, **MIN()**, **MAXA**, **MINA** - **LARGE()**: return the k-th largest number within the certain range - **AVERAGE(LARGE(range, {1,2, ... ,k}))**: return the avarage of the top k numbers - **SMALL()**, **AVERAGE(LARGE(range, {1,2, ... ,k}))** - **ROUND()**: rounds the input value to the k-th digit after the decimal point, 「四捨五入」 - **ROUNDUP()**「一到九都入」 - **ROUNDDOWN()**/**TRUNC()**「一到九都捨」 - `IMPORTANT` When processing negative numbers, treats them first as positive numbers, performs rounding, then add the negative sign back - **INT()**: rounds down the input value to the nearest integer - **ABS()** - **POWER()** can also directly input in the term of `=a^k` - **SQRT()** - **EXP()** means `e^k`, cannot be directlt inputted - **LOG()**, **LOG10()**, **LN()** - **RAND()**: returns the random value in the interval `[0,1)` - **RANDBETWEEN(x,y)**: returns the random value in the interval `[x,y]`, where x, y is treated into integer - Priority order for processing multiple operators: > If in the same priority order, then **from left to right**; otherwise, > 1) Reference operators “:” “ ” “,” (colon, space, comma) • Union operator : SUM(A1:A3), SUM(A1, A2, A3) • Intersection operator : SUM(D1:D4 D2:D5), equivalent to SUM(D2:D4) > 2) Negative sign “-” > 3) Percentage “%” > 4) Exponent “^” > 5) Multiplication/Division “*”, “/” > 6) Addition/Subtraction “+”, “-” > 7) Join “&” > 8) Comparison “=“, “>”, “<“, “>=“, “<=“, “<>” ## W6 Statistical Functions II - **COUNTIF()**: counts the number of cells that meet the criteria, such as "**=COUNTIF(A1:A4, "<"&B1)**", "**=COUNTIF(G8:G19, SUM(4))**" -countif will detect and convert the format of values in both data range and criteria, and values of different format are incomparable - **wildcard**: - `*` represents one or more text character - `?` represents a single text character - `~*`, `?` represents the corresponding character but not wildcard - only applicable to conditional criteria - "**=IF("GOLD">"GAS", "y", "n")**" acts like strcmp() - **COUNTIFS()**: counts the number of cells that meet all the criteria - the return value of countif is determined by comparison of combination of cells among cell reference ranges to all criteria ![image](https://hackmd.io/_uploads/HJLTPrFJA.png) - **AND()**, **OR()** can be achieved in countif - **SUMPRODUCT()** - **SUMIF()**: return the sum of specified range that meets the criteria - **SUMIFS()**, **AVERAGEIF()**, **AVERAGEIFS()**, **MAXIFS()**, **MINIFS()** - **SEARCH()**: return the position of the text within the specified cell reference ## W5 Lookup and References - **VLOOKUP()**: search for the matched lookup value - matching method: - `FALSE`: Exact match, returns the first matched one - `TRUE`: Approximate match, returns the largest one that is not greater than the lookup value, pre-sorting data in ascending order is a prerequisite - return `#N/A` if no match found - `{=func}` indicates executed as an array function, such as "**{=SUM(VLOOKUP(H1, A1:F7, {3,4}, FALSE))**}" aquivalent to "**=SUM(VLOOKUP(H1, A1:F7, 3, FALSE), VLOOKUP(H1, A1:F7, 4, FALSE))**" - **HLOOKUP()** similar but is in horizontal version - **LOOKUP()** simplified version with only approximate match - don't need the same dimensions between provided lookup range and return range - automatically extend return range if needed - use lookup range as return range if not provided - **XLOOKUP()** pro version with more features, such as return value if no matches, more match methods, more search methods - **MATCH()**: returns the row/col number of the lookup value that is matched - **MATCH+INDEX** can do a X-Y search - **CHOOSE()**: returns the k-th provided value/ cell reference/ cell reference range ## W7 Date and Time - customise date format: > yyyy, yy, m, mm, mmm, mmmm, mmmm, d, dd, ddd, dddd, aaa, aaaa - **DATEVALUE()**: return the date `integer` part of the serial number for the provided date in text format since `1900/1/1` - it will return `#VALUE!` if date DNE - **WEEKDAY()**: return the day of week for the provided date - `1900` is mistaken as leap year by Excel - will return wrong value prior to `1900/3/1` - **YEAR()**, **MONTH()**, **DAY()** - **DATE()**: return the full date, and will automatically rollover and rollback if needed - customise time format: > h, hh, m, mm, s, ss, AM/PM - **TIMEVALUE()**: return the time `decimal` part of the serial number - **HOUR()**, **MINUTE()**, **SECOND()** - **TIME()**: return the full time, and will also automatically rollover and rollback if needed - **VALUE()**: return the full serial number including `date` and `time` part ![image](https://hackmd.io/_uploads/Hkxryu8gA.png) - **TEXT()**: return the input value in specified format - `ADD` and `SUB` of date and time is normally resolved in serial numbers, must use **TEXT()** to perform; On the order hand, we can extract into **YEAR()**, **MONTH()**, and so on to perform to avoid bypass the serial numbers, such as "**=TIME(HOUR()+x, MINUTE()+y, SECOND()+z)**" - **EDATE()**: return the date after the number of month - **EOMONTH()**: return the last date of month for the provided date after the number of month - **NOW()**: return the current date and time - **TODAY()**: return the current date - **WEEKNUM()**: return the number of week since the first day of the year - **DATEDIF()**: return the number of years/months/days between the two dates - **NETWORKDAYS()**: return the number of workdays between the two dates - use **NETWORKDAYS.INTL()** to customise the weekends - **WORKDAY()**: return the date after the number of workdays ## W8 Figure Plots - charts in excel: - pie chart - bar chart, histogram - line graph - scatter plot (散點圖) - useful operations: - directly merge charts by copy and paste - Switch Row/Column - combo with secondary axis - sparkline (show within single cell) - mini-bar graph - **REPT()**: repeats provided text for a given number of times - conditional formatting - **CHAR(10)**: newline character - Box Plot <img src="https://hackmd.io/_uploads/Hy9285IZR.png" style="width: 50%; height: 50%"> - **PERCENTILE.INC()**: return the k-th percentile of the specified data for $0\leq k\leq1$ - **PERCENTILE.EXC()** for $0\lt k\lt1$ - **QUARTILE.INC()**: return the specified quartile of the specified data for $0\leq k\leq4\ where\ k \in \mathbb{Z}$ - **QUARTILE.EXC()** for $0\lt k\lt4\ where\ k \in \mathbb{Z}$ - ==HOW?== modify cell value, stacked column, switch row/col, no fill, error bars - Gantt Chart <img src="https://hackmd.io/_uploads/S1M-2qLZR.png" style="width: 60%; height: 60%; border: 1px solid gray;"> - ==HOW?== stacked bar, axis reversed, no fill, set bounds and units - Step Chart <img src="https://hackmd.io/_uploads/SyM8p9LbR.png" style="width: 60%; height: 60%; border: 1px solid gray;"> - ==HOW?== line graph, extend data into $(x_n, y_n)\ and\ (x_{n+1}, y_n)$ ## W9 Dynamic Table - could automatically adjust data range and formulas - ==HOW?== select the date range you want, then select "insert" and "table" - by **directly type in the formula** or **manually select the data range** - even could auto-complete the formula for entire row/col - Pivot Table - ==HOW?== select "insert" and "pivot table" - ==OP?== drag field, display hierarchy order, change display setting, display tabular format, filter, merge labels - Pivot Chart - ==HOW?== select "PivotTableAnalyse" and "PivotChart" - could insert slicer to filter data - Timeline - ==HOW?== select "PivotTableAnalyse" and "InsertTimeline" - this is a filter based on time - ==OP?== - link slicer of multiple pivot tables from same data - add field directly in pivot table - *prerequisities->* incorrect if there are multiple data entries for a single name/unit/type and other than basic addition/subtraction calculations are involved - split pivot table into different worksheets - ==HOW?== select "PivotTableAnalyse", "Options", "ShowReportFilterPages", and "SelectFilterField" - double click to obtain raw data ## W10 Array Formula - an array in excel means `a list of values represented as {...}` - values separated by `,` in horizontal array, `;` in vertical array - ==HOW?== select m x n cell range, enter "**={1,2,3,4}**", press **CTRL+SHIFT+ENTER** - eg. "**{=A3:A6+B3:B6}**", "**{=A1:A2+B2:B3}**" - potential errors :( <img src="https://hackmd.io/_uploads/HyVjvzKGR.png" style="width: 80%; height: 80%; border: 1px solid gray;"> - picture to comprehend Array Formula <img src="https://hackmd.io/_uploads/S1WAdzKM0.png" style="width: 80%; height: 80%; border: 1px solid gray;"> <img src="https://hackmd.io/_uploads/rJ0tuGKGR.png" style="width: 80%; height: 80%; border: 1px solid gray;"> - "**{=B6:B9*C6:C9}**" + "**=SUM(D6:D9)**" = "**{SUM(B6:B9*C6:C9)}**" - "**{=IF(A8:A14="gg", B8:B14, "")}**" + "**{=LARGE(F2:F5, {1;4})}**" + "**=AVERAGE(G8:H8)**" = "**{=AVERAGE(LARGE(IF(A8:A14="gg", B8:B14, ""), {1;4}))}**" - **Copy & Paste** also works for Array Formula, no need to **CTRL+SHIFT+ENTER** when copying - Pairwise Table (there are several approaches) - **AND & OR** not works with Array Formula, we could use **multiplication & addition** with combination of **TRUE & FALSE** instead - `logical` will convert into `number` ## W11 Manipulating Text - **LEFT(), RIGHT(), MID()**: return the specified number of characters - **LEN()**: return the length for the given string - `space, newline, decimal point` will be counted also - adjustment allows up to 15 digits/ decimal places, and then performs **LEN()** - Date: date format will be converted into serial number then **LEN()**; text format will directly **LEN()** - "**0.33**", "**0.3333333333333330000**" will have **LEN()** of 17 regardless of how it displays - error messages will not be processed - **TRIM()**: remove leading and trailing `space` (not `newline`) and compress `spaces` between non-space chars into single one - **CODE()**: return the ASCII code for the given char, while **CHAR()** is the reversed version - **FIND()**: return the position of pattern within the given string - to obtain multiple result, we could use Array Formula, or "**C4+1**" - **SEARCH()** similar but case sensitive - **CONCATENATE()**, **CONCAT()**: combines all texts and returns it - **TEXTJOIN()**: combines all texts with delimiter and returns it - **UPPER()**, **LOWER()**, **PROPER()** - **REPLACE()**, **SUBSTITUTE()** - **EXAXT()**: performs text comparison - **TEXT()**: return the specified text format - ==OP?== decimals, integers, thousand separator, thousand unit, fractions, percentage, scientific notation, date and time - `0` vs `#` vs `?` - `#` only displays significant digits, `0` displays 0 for non-significant digits, `?` displays space instead - extract n-th word in text, by **Text to Columns** or Array Formula ## W12 Conditional Formatting - Formats not changeable: - font, size, superscript, subscript, width of border - Properties can be used in rules: - value, text, date, blanks, errors - `blank` is not defined as how **ISBLANK()** does, while its def is "does not exist any non-space char" - perform **applying order**, which is not the same as setting order - ==OP?== stop if true, format painter - format will automatically change when opening the file next time - Rule Type: <img src="https://hackmd.io/_uploads/SkGbN_gER.png" style="width: 80%; height: 80%; border: 1px solid gray;"><img src="https://hackmd.io/_uploads/ryxSV_eNA.png" style="width: 80%; height: 80%; border: 1px solid gray;"> ## W13 Statistical Tests - haha ## W14 Examples and Applications - haha