# DataCamp _ Introduction to R ### class() ```python // 檢查變數的型態 my_character <- "universe" my_logical <- FALSE class(my_chracter) will shows character class(my_logical) will shows logical ``` --- ## :penguin: Vector :::info Vectors are **one-dimension arrays** that can hold numeric data, character data, or logical data. ::: ### Naming a vector -- names() function ```python // 使用names() # Poker winnings from Monday to Friday poker_vector <- c(140, -50, 20, -120, 240) # The variable days_vector days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") names(poker_vector) <- days_vector output: it will lable days_vector on poker_vector ``` ### Vector selection -- use [] :::info To select elements of a vector **(and later matrices, data frames, …)**, you can use square brackets. Except the single integer indexing you can use, you **can also use vector to do the index job** like the example below ::: ```python poker_vector <- c(140, -50, 20, -120, 240) poker_midweek <- poker_vector[c(2,3,4)] # this can abbreviated to [2:4], not showing again afterward output: poker_midweek will shows -50 / 20 / -120 ``` :::info If the vector which you created is named by the way, **you can also index subvector by the name** like the example below: ::: ```python poker_vector <- c(140, -50, 20, -120, 240) days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") names(poker_vector) <- days_vector poker_start <- poker_vector[c("Monday", "Tuesday", "Wednesday")] output: poker_start will shows Monday Tuesday Wednesday 140 -50 20 ``` :::info If you got a logical vector after some operations, you can also index subvector by using it. R knows what to do when you pass a logical vector in square brackets: **it will only select the elements that correspond to TRUE** Example below: ::: ```python poker_vector <- c(140, -50, 20, -120, 240) days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") names(poker_vector) <- days_vector selection_vector <- poker_vector > 0 # Monday Tuesday Wednesday Thursday Friday # TRUE FALSE TRUE FALSE TRUE poker_winning_days <- poker_vector[selection_vector] output: poker_winning_days will shows Monday Wednesday Friday 140 20 240 ``` --- ## :penguin: Matrix ### Creat a matrix -- Matrix() function ```python Matrix(1:9, byrow=TRUE, nrow=3) # or creating in this way new_hope <- c(460.998, 314.4) empire_strikes <- c(290.475, 247.900) return_jedi <- c(309.306, 165.8) # Create box_office box_office <- c(new_hope,empire_strikes,return_jedi) # Construct star_wars_matrix star_wars_matrix <- matrix(box_office,byrow=TRUE,nrow=3) ``` ### Naming a matrix -- colnames() & rownames() ```python # Vectors region and titles, used for naming region <- c("US", "non-US") titles <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi") # Name the columns with region colnames(star_wars_matrix) <- region # Name the rows with titles rownames(star_wars_matrix) <- titles ``` ### Matrix operation functions :::info There are some kind of matrix operation functions like: - **rowSums():** calculate the sum of each row. - **colSums():** calculate the sum of each column - **cbind():** add a new column to a matrix. Ex: add a vector to a matrix :warning: By the way, you can use ls() to check the variables which exists in the environment. - **rbind():** like paste a matrix to another matrix by row direction ::: --- ## :penguin: factor :::info ### What's a factor Factor used to store categorical variables. Ex. Limit the sex categories to "Male" or "Female". ::: ### Creat a factor -- factor() function ```python # First we create a vector sex_vector <- c("Male", "Female", "Female", "Male", "Male") # Obviously, there are two categories, or in R-terms 'factor levels', at work here: "Male" and "Female". # Convert sex_vector to a factor factor_sex_vector <- factor(sex_vector) # Print out factor_sex_vector print(factor_sex_vector) output: [1] Male Female Female Male Male Levels: Female Male ``` ### Categorical type :::info - **Nominal categorical variable:** is a categorical variable without an implied order. Meanwhile, you're not allowed to compare the elements in nominal type factor. - **Ordinal categorical variable:** do have a natural ordering. :warning:Hint: If there's order between elements, in the factor() function you need to set the **order & levels** parameters. ::: ### Summarizing a factor ```python # When it still a vector survey_vector <- c("M", "F", "F", "M", "M") summary(survey_vector) output: Length Class Mode 5 character character # When the vector become a factor and use summary() function factor_survey_vector <- factor(survey_vector) levels(factor_survey_vector) <- c("Female", "Male") summary(factor_survey_vector) output: Female Male 2 3 ``` ### Comparing elements in ordinal type factor :::info After you get the elements and save as a variable, you can simply compare with it by >... operators. ::: --- ## :penguin: DataFrame :::info Beforehand instruction of DataFrame like slicing are like vector or matrix, and further use some functions like **head()** or **str()**. But at slicing part, one of the different is you can use feature name as column index (Example below). ::: ```python # Hardly know the position number of the column we want, we can instead use its name to pick it up. # Select first 5 values of diameter column planets_df[1:5,"diameter"] output: [1] 0.382 0.949 1.000 0.532 11.209 ``` ```python # We can also select full elements from a feature by "$" sign. rings_vector <- planets_df$rings output: [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE ``` ### Take Partial of the DataFrame (1) :::info DataFrame itself contains "rings" feature, and we take the value of "rings" feature alone saved as rings_vector. If we use boolean like vector to be the index, R will take the **TRUE** option to be the index. ::: ```python # Select all columns for planets with rings planets_df[rings_vector, ] output: name type diameter rotation rings 5 Jupiter Gas giant 11.209 0.41 TRUE 6 Saturn Gas giant 9.449 0.43 TRUE 7 Uranus Gas giant 4.007 -0.72 TRUE 8 Neptune Gas giant 3.883 0.67 TRUE ``` ### Take Partial of the DataFrame (2) ```python # We use subset() to acheive the same result above subset(planets_df, subset = rings) # this will have same result from above subset(planets_df, subset = diameter < 1) output: name type diameter rotation rings 1 Mercury Terrestrial planet 0.382 58.64 FALSE 2 Venus Terrestrial planet 0.949 -243.02 FALSE 4 Mars Terrestrial planet 0.532 1.03 FALSE ``` ### Sorting ```python a <- c(100, 10, 1000) order(a) output: [1] 2 1 3 # Seems like it returns the index value of elements from min to max. a[order(a)] # reshuffle a vector output: [1] 10 100 1000 ``` ```python # Use order() to create positions positions <- order(planets_df$diameter) # A vector store index value # Use positions to sort planets_df planets_df[positions,] ``` --- ## :penguin: List :::info A list in R allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. ::: ### Creating a list -- list() function ```python # Vector with numerics from 1 up to 10 my_vector <- 1:10 # Matrix with numerics from 1 up to 9 my_matrix <- matrix(1:9, ncol = 3) # First 10 elements of the built-in data frame mtcars my_df <- mtcars[1:10,] # Construct list with these different elements: my_list <- list(my_vector, my_matrix, my_df) output: [[1]]: vector [[2]]: matrix [[3]]: dataframe ``` ### Nameing your list :::info :warning: To avoid not knowing the components of your list. ::: ```python # First way my_list <- list(name1 = your_comp1, name2 = your_comp2) ----------------------------------- # Second way (Same as we name a vector, using vector include the name and names() function) my_list <- list(your_comp1, your_comp2) # Create your list names(my_list) <- c("name1", "name2") # Name your list ``` ### Selecting elements from a list ```python # We already have shining_list contains vector matrix and dataframe # Select the component in the list shining_list[[2]] or shining_list[["actors"]] shining_list$actors # Select specific elements out of these components shining_list[["actors"]][2] ```