###### tags: `RSelenium`
# Scrape web tables from QuickFS using RSelenium in Windows7
The aim is to scrape all the tables at
---
## Connect Chrome driver, RSelenium
[Data Analysis and Extraction - RSelenium tutorial](https://www.youtube.com/watch?v=kXgjU9njV9U)
* Install R packages
```r!
dir.R.packages <- "C:/Program Files/R/R-4.0.3/library"
#install.packages("tidyquant", lib = dir.R.packages)
#install.packages("RSelenium", lib = dir.R.packages)
library(RSelenium,lib.loc = dir.R.packages)
```
* Download [chrome driver]( https://chromedriver.chromium.org/). Note that mismatched versions may occur (e.g., version of the downloaded chrome driver does not support old version of chrome browser). The downloaded working version is ChromeDriver 89.0.4389.23
* Download [Selenium Server](https://www.selenium.dev/downloads/)
* Laucn cmd.exe, change directory to D:/My Software, where selenium-server-standalone-3.141.59.jar is located
```shell!
D:
cd D:\My Software
# Execute the following command
java -Dwebdriver.chrome.driver="C:\drivers\chromedriver_win32\chromedriver.exe" -jar selenium-server-standalone-3.141.59.jar
```
* Type the following in a new Google chrome webpage http://localhost:4444/
* Create a new session at http://localhost:4444/wd/hub/static/resource/hub.html. Create a new session > select chrome as the browser
![](https://i.imgur.com/rIjhPls.png)
* Make a connection to the server at http://localhost:4444/wd/hub/static/resource/hub.html
```r!
con <- RSelenium::remoteDriver(remoteServerAddr="localhost"
,port=4444
,browserName="chrome")
# Open the connection
con$open()
# Send an URL to the new session
con$navigate("https://quickfs.net/company/CKF:AU")
```
* The website should be opened in the new session. Reconnect the URL if errors occur
![](https://i.imgur.com/WOcde5X.png)
---
## Scrape all tables of the webpage where dropdown option is "Overview"
```r!
#-------------------------------------------------------------
# Dropdown= Overview
# Select "overview" from the dropdown list and then get the content of all tables
#-------------------------------------------------------------
tables <- htmlParse(con$getPageSource()[[1]]) # class(tables)
readHTMLTable(tables)
# Extracting tables
library(rvest, lib.loc = dir.R.packages)
x <- con$getPageSource()[[1]] %>%
read_html() %>%
html_table()
table.1 <- x[[1]] # class(table.1) "data.frame"
table.2 <- x[[2]] # class(table.1) "data.frame"
# Extract sub tables
names(table.1)
str(table.1)
# Reshape the table Valuation Ratios
library(tidyr, lib.loc = dir.R.package)
table.1.1 <- table.1[c(2:9),c(1:2)] %>%
dplyr::rename(name=`Key Statistics`
,value=`Key Statistics.1`) %>%
tidyr::pivot_wider(names_from = name, values_from=value)
# Clean column names
colnames(table.1.1) <- sub(x=colnames(table.1.1), pattern = "/", replacement = ".")
# Reshape the table 10-Yr Median Returns
table.1.2 <- table.1[c(2:4),c(3:4)] %>%
dplyr::rename(name=`Key Statistics`
,value=`Key Statistics.1`) %>%
tidyr::pivot_wider(names_from = name, values_from=value)
# Reshape the table 10-Year CAGR
table.1.3 <- table.1[c(6:9),c(3:4)] %>%
dplyr::rename(name=`Key Statistics`
,value=`Key Statistics.1`) %>%
tidyr::pivot_wider(names_from = name, values_from=value)
# Reshape the table 10-Yr Median Margins
table.1.4 <- table.1[c(2:5),c(5:6)] %>%
dplyr::rename(name=`Key Statistics`
,value=`Key Statistics.1`) %>%
tidyr::pivot_wider(names_from = name, values_from=value)
# Reshape the table Capital Structure
table.1.5 <- table.1[c(7:9),c(5:6)] %>%
dplyr::rename(name=`Key Statistics`
,value=`Key Statistics.1`) %>%
tidyr::pivot_wider(names_from = name, values_from=value)
#----------------------------------------
# Reshape the table with 10 year overview
#----------------------------------------
str(table.2)
table.2$X1 <- table.2$X1 %>%
gsub(x=., pattern = " ", replacement = ".") %>%
gsub(x=., pattern = "%", replacement = "percent")
base.table.2 <- data.frame()
iterators <- colnames(table.2)[2:11]
years <- table.2[1,c(2:11)] # dim(years) 1 10
item.names <- table.2[c(2:14),1]
for(i in 1:ncol(years)){
# Get column name by positio
year <- years[1,i] # "2011"
name <- colnames(years)[i]
# Reshape a single year of data to long format
.year.long <- data.frame( name=table.2[c(2:14), 1]
,value=table.2[c(2:14),i+1]
,stringsAsFactors = F)
.year.wide <- .year.long %>%
tidyr::pivot_wider(names_from = name, values_from=value) %>%
# Add year
dplyr::mutate(year=year) %>%
dplyr::select(year,everything())
# Vertically add the current year of data to the base data set
base.table.2 <- dplyr::bind_rows(base.table.2, .year.wide)
}
```
---
## Scrape tables on the webpage where dropdown option is "Income Statement" (not working)
Inspect the web elments
![](https://i.imgur.com/9U7miQu.png)
```htmlmixed!
<div _ngcontent-c1="" class="col-xs-offset-3 col-xs-2"><select-fs-dropdown _ngcontent-c1="" _nghost-c4="">
<div _ngcontent-c4="" class="btn-group open" dropdown="">
<button _ngcontent-c4="" class="selectDropdown dropdown-toggle" dropdowntoggle="" type="button" aria-haspopup="true" aria-expanded="true">
<div _ngcontent-c4="" class="dropdownLabel">Overview</div>
</button>
<!----><ul _ngcontent-c4="" class="dropdown-menu" id="select-fs-dropdown" role="menu">
<!----><li _ngcontent-c4="">
<a _ngcontent-c4="" id="ovr">Overview</a>
</li><li _ngcontent-c4="">
<a _ngcontent-c4="" id="is">Income Statement</a>
</li><li _ngcontent-c4="">
<a _ngcontent-c4="" id="bs">Balance Sheet</a>
</li><li _ngcontent-c4="">
<a _ngcontent-c4="" id="cf">Cash Flow Statement</a>
</li><li _ngcontent-c4="">
<a _ngcontent-c4="" id="ratios">Key Ratios</a>
</li>
</ul>
</div>
</select-fs-dropdown></div>
```
```r!
id.ovr <- con$findElement(using = 'id', value = "ovr")
Selenium message:no such element: Unable to locate element: {"method":"css selector","selector":"#ovr"}
(Session info: chrome=89.0.4389.90)
For documentation on this error, please visit: https://www.seleniumhq.org/exceptions/no_such_element.html
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:25:53'
System info: host: 'CHANG-PC', ip: '192.168.0.167', os.name: 'Windows 7', os.arch: 'amd64', os.version: '6.1', java.version: '1.8.0_231'
Driver info: driver.version: unknown
Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException
Further Details: run errorDetails method
```
[R - Rselenium - navigate drop down menu / list / box using = 'id'](https://stackoverflow.com/questions/39713466/r-rselenium-navigate-drop-down-menu-list-box-using-id)