dplyr
for linking and merging related tablesPrimary objective of this lecture is to figure out how we work with multiple dataframes and join them together.
Often, dataframes will share values for certain variables. Remember Soltoff's example of the superheros and publisher's dataframe. They both share information on the publisher. We want to be able to join that information together.
Inner Join: Returns all rows of original (left) dataframe that have matching values in new (right) dataframe.
Left Join: Returns all rows in the left dataframe and merge shared information with the right dataframe.
Right Join: Returns all the rows in the right dataframe and merge shared infomatoin with the left dataframe.
Full Join: Keeps all data from left and all data from right.
Semi Join: All rows in left that also exist in right. Does not incorporate the information on the right.
Anti Join: Keep all rows in left that DO NOT exist in right.
Note, you can use the by
attribute to identify equal cols that do not share the same name. Syntax is roughly:
Used to organize a string col into shared buckets.
The syntax for this is: