###### tags: `C#` `爬蟲`
# 黑貓 - 貨態追蹤
* 主要使用技術: AngleSharp
## 黑貓貨態查詢網址
> 網址:
> https://www.t-cat.com.tw/Inquire/TraceDetail.aspx?BillID={waybillId}
> 例如:
> https://www.t-cat.com.tw/Inquire/TraceDetail.aspx?BillID=906999479515
> 圖片:

## 觀察HTML內文中的特徵
* 觀察到 Table 的 Class 為 `.tablelist`
* td 的 Class 為 `.style1`
```htmlembedded=
<!-------------------------------------contentContainer----------------------------------------->
<div id="contentContainer">
<!-------------------------------------contentContainer aside--------------------------------->
<div id="aside">
<!-------------------------------------contentContainer main---------------------------------->
<div id="main">
<!--contentsArea-------------------------------------------------->
<div class="contentsArea">
<div class="contentsOne">
<h2 class="typeA">一般包裹查詢</h2>
<div class="contentsBtm">
<div class="contentsInner">
<div class="articleTypeA">
<p class="paddingR18L10">您所輸入的包裹查詢號碼以及查詢結果如下: </p>
<!--<p class="paddingR18L10">點選包裹查詢號碼,可以查詢包裹的歷史狀態;點選營業所可查詢營業所聯絡方式 </p>-->
<p class="paddingR18L10"> </p>
<table cellpadding="0" cellspacing="0" class="tablelist">
<tbody>
<tr class="top">
<td height="38">包裹查詢號碼</td>
<td>目前狀態</td>
<td>資料登入時間</td>
<td>負責營業所</td>
<!--
<td>配送人員</td>
-->
</tr>
<tr valign="center" align="middle" bgcolor="#ffffff">
<td height="44" rowspan="5"><span class="bl12">906999479515</span></td>
<td class="style1" bgcolor="yellow" title="包裹已經送達收件人"> <span class="r2"><strong>順利送達</strong></span> </td>
<td class="style1" bgcolor="yellow"> <div align="center"> <span class="bl12">2021/09/07 <br>12:30</span></div> </td>
<td class="style1" bgcolor="yellow"> <span class="bl12">逢甲營業所</span></td>
</tr>
<tr valign="center" align="middle" bgcolor="#cef4f5">
<td class="style1" title="SD正在將包裹配送到收件人途中"> <span class="bl12">配送中</span> </td>
<td class="style1"> <div align="center"> <span class="bl12">2021/09/07 <br>12:24</span></div> </td>
<td class="style1"> <span class="bl12">逢甲營業所</span></td>
</tr>
<tr valign="center" align="middle" bgcolor="#ffffff">
<td class="style1" title="SD正在將包裹配送到收件人途中"> <span class="bl12">配送中</span> </td>
<td class="style1"> <div align="center"> <span class="bl12">2021/09/07 <br>05:57</span></div> </td>
<td class="style1"> <span class="bl12">逢甲營業所</span></td>
</tr>
<tr valign="center" align="middle" bgcolor="#cef4f5">
<td class="style1" title="包裹正從營業所送到轉運中心,或從轉運中心送到營業所"> <span class="bl12">轉運中</span> </td>
<td class="style1"> <div align="center"> <span class="bl12">2021/09/07 <br>00:02</span></div> </td>
<td class="style1"> <span class="bl12"><a class="text4" href="Foothold_Detail.aspx?ID=500">中區轉運 中心(區)</a></span></td>
</tr>
<tr valign="center" align="middle" bgcolor="#ffffff">
<td class="style1" title="SD已經至寄件人指定地點收到包裹"> <span class="bl12">已集貨</span> </td>
<td class="style1"> <div align="center"> <span class="bl12">2021/09/06 <br>18:54</span></div> </td>
<td class="style1"> <span class="bl12">北二特販二所</span></td>
</tr>
</tbody></table>
... ... (以下略)
</div>
```
## AngleSharp 介紹
透過HTML的方式操作DOM,與JS相仿的操作手感~
### GitHub
* [AngleSharp - GitHub](https://github.com/AngleSharp/AngleSharp)
### Documents
* [AngleSharpDocs](https://anglesharp.github.io/)
## 如何取得網頁上的資料?
### 方法 1 - 自幹 HttpClient 取得 HttpResponseMessage
```csharp=
/// <summary>
/// 黑貓物件
/// </summary>
public class TCatService
{
/// <summary>
/// Http Client 應該為單一實例,使用 Dispose 的話會導致每次請求開啟過多的Socket造成資源耗用
/// </summary>
private static HttpClient _client;
/// <summary>
/// 產生 TCat 物件
/// </summary>
static TCatService()
{
// 產生 HttpClient 的實例
CreateInstance();
}
/// <summary>
/// 取得 HttpClient
/// </summary>
public HttpClient HttpClient => _client;
/// <summary>
/// 黑貓的Domain
/// </summary>
private static string RootUri => "https://www.t-cat.com.tw/";
/// <summary>
/// 取得 貨態查詢 的Response
/// </summary>
/// <param name="waybillID"></param>
/// <returns></returns>
public async Task<HttpResponseMessage> GetTraceDetail_Response(string waybillID)
{
// 去除空白
waybillID = waybillID?.Trim();
// 發送請求
//=========================================
// API(貨態查詢) => Inquire/TraceDetail.aspx
// -
// 參數: BillID
// ========================================
var requestUri = $"Inquire/TraceDetail.aspx?BillID={waybillID}";
return await _client.GetAsync(requestUri);
}
/// <summary>
/// 產生 HttpClient 的實例
/// </summary>
private static void CreateInstance()
{
// 指定Domain
var baseUri = new Uri(RootUri);
// 產生 Instance
_client = new HttpClient();
// 設定 RootAddress
_client.BaseAddress = baseUri;
// 設定 1 分鐘沒有活動,則關閉連線,預設為 -1 (永不關閉)
var sp = ServicePointManager.FindServicePoint(baseUri);
sp.ConnectionLeaseTimeout = (int)TimeSpan.FromMinutes(1).TotalMilliseconds;
// 設定 2 分鐘自動更新DNS,預設為 12000 (2 分鐘)
ServicePointManager.DnsRefreshTimeout = (int)TimeSpan.FromMinutes(2).TotalSeconds;
}
}
```
### 方法 2 - 使用 AngleSharp 的 Context.OpenAsync() 取得 IDocument
```csharp=
/// <summary>
/// 取得 貨態 => 文字內容
/// </summary>
/// <param name="waybillId"></param>
/// <returns></returns>
public async Task<string> GetTraceDetail_Result(string waybillId)
{
// 回傳字串
var result = new StringBuilder();
// Trim掉空白
waybillId = waybillId.Trim();
// Request
var address = $"https://www.t-cat.com.tw/Inquire/TraceDetail.aspx?BillID={waybillId}";
#region 載入 AngleSharp 設定
//Use the default configuration for AngleSharp (With DefaultLoader)
var config = Configuration.Default.WithDefaultLoader();
//Create a new context for evaluating webpages with the given config
var context = BrowsingContext.New(config);
//Get a HtmlParser
var htmlParser = context.GetService<IHtmlParser>();
#endregion 載入 AngleSharp 設定
//Create a virtual request to specify the document to load (here from our fixed string)
var document = await context.OpenAsync(address);
... ... (以下略)
}
```
## 取得 TraceDetails 物件
### 方法 1 - 透過 `System.Net.Http.HttpResponseMessage` 取得
```csharp=
/// <summary>
/// 取得 貨態 => 物件清單
/// </summary>
/// <param name="response"></param>
/// <returns></returns>
public async Task<List<Cat_TraceDetails>> GetTraceDetail(HttpResponseMessage response)
{
// 回傳值
var result = new List<Cat_TraceDetails>();
// 檢查 Response
response.EnsureSuccessStatusCode();
// 取得 waybillId
// 透過 原始RequestMessage 的 Uri,透過 參數 (BillID) 拆出 waybillId
var separator = new string[] { "BillID=" };
var waybillId = response.RequestMessage
.RequestUri
.AbsoluteUri.Split(separator, StringSplitOptions.RemoveEmptyEntries)[1];
// 取得 Response 的值
var res_Content = await response.Content.ReadAsStringAsync();
#region 載入 AngleSharp 設定
//Use the default configuration for AngleSharp (With DefaultLoader)
var config = Configuration.Default.WithDefaultLoader();
//Create a new context for evaluating webpages with the given config
var context = BrowsingContext.New(config);
//Get a HtmlParser
var htmlParser = context.GetService<IHtmlParser>();
#endregion 載入 AngleSharp 設定
//Create a virtual request to specify the document to load (here from our fixed string)
var document = await context.OpenAsync(x => x.Content(res_Content));
// 取得 ContentsArea 底下的 tablelist
var table = document.QuerySelector(".tablelist");
// 取得 tr 清單
var tr_list = table.QuerySelectorAll("tr");
// 只有取回標題列 => Data Not Found!
if (tr_list.Length <= 1)
{
throw new Exception($"託運單號({waybillId}) 查無貨態追蹤紀錄!!");
}
// 取得內容欄位 (跳過標題列)
var tr_body = tr_list.Skip(1);
// 取得 waybillId => tr > td > .bl12
var id = tr_body
.FirstOrDefault()
.QuerySelector("td")
.QuerySelector(".bl12")
.TextContent
.Trim();
// 請求與取回的資料不同!!
if (!waybillId.Equals(id))
{
// 拋出錯誤
throw new Exception($"請求({waybillId}) 與 取回的資料({id}) 不同!!");
}
// 從 tr_body 中取回資料
result = tr_body
.Select(tr =>
{
// 取得 td_list 中 class 為 style1 的欄位
var td_list = tr.QuerySelectorAll("td")
.Where(x => (x.ClassName ?? "").Equals("style1"))
.ToList();
// 將每一個 Row 映射為 TraceDetails
return new Cat_TraceDetails
{
WaybillId = id,
GoodStatus = td_list[0].TextContent.Trim(),
ReceiveTime = Convert.ToDateTime(td_list[1].TextContent),
Station = td_list[2].TextContent.Trim()
};
}).ToList();
// 回傳資料
return result;
}
```
### 方法 2 - 透過 `AngleSharp.Dom.IDocument` 取得
```csharp=
/// <summary>
/// 取得 貨態 => 物件清單
/// </summary>
/// <param name="waybillId"></param>
/// <returns></returns>
public async Task<List<Cat_TraceDetails>> GetTraceDetail(string waybillId)
{
// 回傳值
var result = new List<Cat_TraceDetails>();
// Trim掉空白
waybillId = waybillId.Trim();
// 請求字串
var address = $"https://www.t-cat.com.tw/Inquire/TraceDetail.aspx?BillID={waybillId}";
#region 載入 AngleSharp 設定
//Use the default configuration for AngleSharp (With DefaultLoader)
var config = Configuration.Default.WithDefaultLoader();
//Create a new context for evaluating webpages with the given config
var context = BrowsingContext.New(config);
//Get a HtmlParser
var htmlParser = context.GetService<IHtmlParser>();
#endregion 載入 AngleSharp 設定
//Create a virtual request to specify the document to load (here from our fixed string)
var document = await context.OpenAsync(address);
// 取得 ContentsArea 底下的 tablelist
var table = document.QuerySelector(".tablelist");
// 取得 tr 清單
var tr_list = table.QuerySelectorAll("tr");
// 只有取回標題列 => Data Not Found!
if (tr_list.Length <= 1)
{
throw new Exception($"託運單號({waybillId}) 查無貨態追蹤紀錄!!");
}
// 取得內容欄位 (跳過標題列)
var tr_body = tr_list.Skip(1);
// 取得 waybillId => tr > td > .bl12
var id = tr_body
.FirstOrDefault()
.QuerySelector("td")
.QuerySelector(".bl12")
.TextContent
.Trim();
// 請求與取回的資料不同!!
if (!waybillId.Equals(id))
{
// 拋出錯誤
throw new Exception($"請求({waybillId}) 與 取回的資料({id}) 不同!!");
}
// 從 tr_body 中取回資料
result = tr_body
.Select(tr =>
{
// 取得 td_list 中 class 為 style1 的欄位
var td_list = tr.QuerySelectorAll("td")
.Where(x => (x.ClassName ?? "").Equals("style1"))
.ToList();
// 將每一個 Row 映射為 TraceDetails
return new Cat_TraceDetails
{
WaybillId = id,
GoodStatus = td_list[0].TextContent.Trim(),
ReceiveTime = Convert.ToDateTime(td_list[1].TextContent),
Station = td_list[2].TextContent.Trim()
};
}).ToList();
// 回傳資料
return result;
}
```
## 如何使用?
```csharp=
static void Main(string[] args)
{
// 產生 T-Cat 物件
var cat = new TCatService();
// 託運單號
var waybillId = "906999479515";
try
{
#region 方法1 => 自幹 HttpClient
// 取得 Response
var response = cat.GetTraceDetail_Response(waybillId).Result;
// 取得貨態,透過 Response
var traceDetails = cat.GetTraceDetail(response).Result;
#endregion
#region 方法2 => 使用 AngleSharp 的 Context 的 OpenAsync 取得上下文(IDocument)
//// 取得 貨態紀錄(文字檔)
//var text = cat.GetTraceDetail_Result(waybillId).Result;
//Console.WriteLine(text);
//// 取得 貨態紀錄
//var traceDetails = cat.GetTraceDetail(waybillId).Result;
#endregion
// 序列化物件
var json = JsonConvert.SerializeObject(traceDetails, Formatting.Indented);
// Print
Console.WriteLine(json);
}
catch (Exception ex)
{
Console.WriteLine("Error Occor:");
Console.WriteLine(ex.Message);
}
finally
{
Console.ReadKey();
}
}
```
## 效果圖

```json=
URL: https://www.t-cat.com.tw/Inquire/TraceDetail.aspx?BillID=906999479515
------------------------------------------------------------
包裹查詢號碼 目前狀態 資料登入時間 負責營業所
------------------------------------------------------------
906999479515 順利送達 2021/09/07 12:30 逢甲營業所
906999479515 配送中 2021/09/07 12:24 逢甲營業所
906999479515 配送中 2021/09/07 05:57 逢甲營業所
906999479515 轉運中 2021/09/07 00:02 中區轉運中心(區)
906999479515 已集貨 2021/09/06 18:54 北二特販二所
JSON
------------------------------------------------------------
[
{
"WaybillId": "906999479515",
"GoodStatus": "順利送達",
"ReceiveTime": "2021-09-07T12:30:00",
"Station": "逢甲營業所"
},
{
"WaybillId": "906999479515",
"GoodStatus": "配送中",
"ReceiveTime": "2021-09-07T12:24:00",
"Station": "逢甲營業所"
},
{
"WaybillId": "906999479515",
"GoodStatus": "配送中",
"ReceiveTime": "2021-09-07T05:57:00",
"Station": "逢甲營業所"
},
{
"WaybillId": "906999479515",
"GoodStatus": "轉運中",
"ReceiveTime": "2021-09-07T00:02:00",
"Station": "中區轉運中心(區)"
},
{
"WaybillId": "906999479515",
"GoodStatus": "已集貨",
"ReceiveTime": "2021-09-06T18:54:00",
"Station": "北二特販二所"
}
]
```
## 參考資料
* [AngleSharp - GitHub](https://github.com/AngleSharp/AngleSharp)
* [AngleSharpDocs](https://anglesharp.github.io/)
* [輕量高效.NET Core開源Blog引擎:Miniblog.Core - 17.抓取用戶IT鐵人賽文章 - IT邦幫忙](https://ithelp.ithome.com.tw/articles/10202512)
* [【C#】爬蟲抓IT邦問題 Part1 : 爬網頁並篩選資料 - Youtube](https://www.youtube.com/watch?v=Nu1obfoj5j8)
* [用.NET Core做網頁爬蟲抓取資料-使用HttpClicent與AngleSharp - 長庚的作業簿](https://dannyliu.me/%E7%94%A8-net-core%E5%81%9A%E7%B6%B2%E9%A0%81%E7%88%AC%E8%9F%B2%E6%8A%93%E5%8F%96%E8%B3%87%E6%96%99-%E4%BD%BF%E7%94%A8httpclicent%E8%88%87anglesharp/)
* [C# NetCore使用AngleSharp爬取周公解夢資料 - ITRead01](https://www.itread01.com/content/1546204876.html)
* [CSS選擇器 - MDN Web Docs](https://developer.mozilla.org/zh-TW/docs/Glossary/CSS_Selector)