###### tags: `RSelenium` # Scrape web tables from QuickFS using RSelenium in Windows7 The aim is to scrape all the tables at --- ## Connect Chrome driver, RSelenium [Data Analysis and Extraction - RSelenium tutorial](https://www.youtube.com/watch?v=kXgjU9njV9U) * Install R packages ```r! dir.R.packages <- "C:/Program Files/R/R-4.0.3/library" #install.packages("tidyquant", lib = dir.R.packages) #install.packages("RSelenium", lib = dir.R.packages) library(RSelenium,lib.loc = dir.R.packages) ``` * Download [chrome driver]( https://chromedriver.chromium.org/). Note that mismatched versions may occur (e.g., version of the downloaded chrome driver does not support old version of chrome browser). The downloaded working version is ChromeDriver 89.0.4389.23 * Download [Selenium Server](https://www.selenium.dev/downloads/) * Laucn cmd.exe, change directory to D:/My Software, where selenium-server-standalone-3.141.59.jar is located ```shell! D: cd D:\My Software # Execute the following command java -Dwebdriver.chrome.driver="C:\drivers\chromedriver_win32\chromedriver.exe" -jar selenium-server-standalone-3.141.59.jar ``` * Type the following in a new Google chrome webpage http://localhost:4444/ * Create a new session at http://localhost:4444/wd/hub/static/resource/hub.html. Create a new session > select chrome as the browser ![](https://i.imgur.com/rIjhPls.png) * Make a connection to the server at http://localhost:4444/wd/hub/static/resource/hub.html ```r! con <- RSelenium::remoteDriver(remoteServerAddr="localhost" ,port=4444 ,browserName="chrome") # Open the connection con$open() # Send an URL to the new session con$navigate("https://quickfs.net/company/CKF:AU") ``` * The website should be opened in the new session. Reconnect the URL if errors occur ![](https://i.imgur.com/WOcde5X.png) --- ## Scrape all tables of the webpage where dropdown option is "Overview" ```r! #------------------------------------------------------------- # Dropdown= Overview # Select "overview" from the dropdown list and then get the content of all tables #------------------------------------------------------------- tables <- htmlParse(con$getPageSource()[[1]]) # class(tables) readHTMLTable(tables) # Extracting tables library(rvest, lib.loc = dir.R.packages) x <- con$getPageSource()[[1]] %>% read_html() %>% html_table() table.1 <- x[[1]] # class(table.1) "data.frame" table.2 <- x[[2]] # class(table.1) "data.frame" # Extract sub tables names(table.1) str(table.1) # Reshape the table Valuation Ratios library(tidyr, lib.loc = dir.R.package) table.1.1 <- table.1[c(2:9),c(1:2)] %>% dplyr::rename(name=`Key Statistics` ,value=`Key Statistics.1`) %>% tidyr::pivot_wider(names_from = name, values_from=value) # Clean column names colnames(table.1.1) <- sub(x=colnames(table.1.1), pattern = "/", replacement = ".") # Reshape the table 10-Yr Median Returns table.1.2 <- table.1[c(2:4),c(3:4)] %>% dplyr::rename(name=`Key Statistics` ,value=`Key Statistics.1`) %>% tidyr::pivot_wider(names_from = name, values_from=value) # Reshape the table 10-Year CAGR table.1.3 <- table.1[c(6:9),c(3:4)] %>% dplyr::rename(name=`Key Statistics` ,value=`Key Statistics.1`) %>% tidyr::pivot_wider(names_from = name, values_from=value) # Reshape the table 10-Yr Median Margins table.1.4 <- table.1[c(2:5),c(5:6)] %>% dplyr::rename(name=`Key Statistics` ,value=`Key Statistics.1`) %>% tidyr::pivot_wider(names_from = name, values_from=value) # Reshape the table Capital Structure table.1.5 <- table.1[c(7:9),c(5:6)] %>% dplyr::rename(name=`Key Statistics` ,value=`Key Statistics.1`) %>% tidyr::pivot_wider(names_from = name, values_from=value) #---------------------------------------- # Reshape the table with 10 year overview #---------------------------------------- str(table.2) table.2$X1 <- table.2$X1 %>% gsub(x=., pattern = " ", replacement = ".") %>% gsub(x=., pattern = "%", replacement = "percent") base.table.2 <- data.frame() iterators <- colnames(table.2)[2:11] years <- table.2[1,c(2:11)] # dim(years) 1 10 item.names <- table.2[c(2:14),1] for(i in 1:ncol(years)){ # Get column name by positio year <- years[1,i] # "2011" name <- colnames(years)[i] # Reshape a single year of data to long format .year.long <- data.frame( name=table.2[c(2:14), 1] ,value=table.2[c(2:14),i+1] ,stringsAsFactors = F) .year.wide <- .year.long %>% tidyr::pivot_wider(names_from = name, values_from=value) %>% # Add year dplyr::mutate(year=year) %>% dplyr::select(year,everything()) # Vertically add the current year of data to the base data set base.table.2 <- dplyr::bind_rows(base.table.2, .year.wide) } ``` --- ## Scrape tables on the webpage where dropdown option is "Income Statement" (not working) Inspect the web elments ![](https://i.imgur.com/9U7miQu.png) ```htmlmixed! <div _ngcontent-c1="" class="col-xs-offset-3 col-xs-2"><select-fs-dropdown _ngcontent-c1="" _nghost-c4=""> <div _ngcontent-c4="" class="btn-group open" dropdown=""> <button _ngcontent-c4="" class="selectDropdown dropdown-toggle" dropdowntoggle="" type="button" aria-haspopup="true" aria-expanded="true"> <div _ngcontent-c4="" class="dropdownLabel">Overview</div> </button> <!----><ul _ngcontent-c4="" class="dropdown-menu" id="select-fs-dropdown" role="menu"> <!----><li _ngcontent-c4=""> <a _ngcontent-c4="" id="ovr">Overview</a> </li><li _ngcontent-c4=""> <a _ngcontent-c4="" id="is">Income Statement</a> </li><li _ngcontent-c4=""> <a _ngcontent-c4="" id="bs">Balance Sheet</a> </li><li _ngcontent-c4=""> <a _ngcontent-c4="" id="cf">Cash Flow Statement</a> </li><li _ngcontent-c4=""> <a _ngcontent-c4="" id="ratios">Key Ratios</a> </li> </ul> </div> </select-fs-dropdown></div> ``` ```r! id.ovr <- con$findElement(using = 'id', value = "ovr") Selenium message:no such element: Unable to locate element: {"method":"css selector","selector":"#ovr"} (Session info: chrome=89.0.4389.90) For documentation on this error, please visit: https://www.seleniumhq.org/exceptions/no_such_element.html Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:25:53' System info: host: 'CHANG-PC', ip: '192.168.0.167', os.name: 'Windows 7', os.arch: 'amd64', os.version: '6.1', java.version: '1.8.0_231' Driver info: driver.version: unknown Error: Summary: NoSuchElement Detail: An element could not be located on the page using the given search parameters. class: org.openqa.selenium.NoSuchElementException Further Details: run errorDetails method ``` [R - Rselenium - navigate drop down menu / list / box using = 'id'](https://stackoverflow.com/questions/39713466/r-rselenium-navigate-drop-down-menu-list-box-using-id)