<div style="display:flex; flex-direction:column;">
<div>
![](https://upload.wikimedia.org/wikipedia/commons/b/b0/Openstreetmap_logo.svg =200x) ![](https://wiki.openstreetmap.org/w/images/7/74/OSM-tw.svg =200x)
</div>
<div style="font-size:55%;padding:32px">
# The Journal of Importing Open Data Address in Taiwan into OpenStreetMap
</div>
<!-- Put the link to this slide here so people can follow -->
<div style="font-size:16px;display:flex;background-color:rgb(157 195 145/0.5);padding:32px;justify-content:flex-end;">
<div style="flex-direction:column; text-align:left">
<div>slide: https://hackmd.io/@osm-tw/HkNyi84oR</div>
<div>CC-BY-4.0 OpenStreetMap Taiwan Community</div>
<div>OpenStreetMap Taiwan</div>
<div>9/7</div>
</div>
</div>
</div>
Note:
Ta̍k-ke hó, Hello everyone, This is Dennis Raylin Chen from Taiwan, I want to talk about cleaning and managing dataset. My speech's title is "The Journal of Importing Open Data Address in Taiwan into OpenStreetMap". I will focus on importing address dataset
---
## Who am I?
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
- [Supaplex](https://www.openstreetmap.org/user/Supaplex)
- [OpenStreetMap](https://www.openstreetmap.org) :heart: [Wikidata](https://www.wikidata.org) :heart:
- [Wikimedia Taiwan](https://meta.wikimedia.org/wiki/Wikimedia_Taiwan/) :cat:
Note:
My online ID is Supaplex, one of the community member of OpenStreetMap Taiwan and Wikidata Taiwan, currently serving as a board of director of Wikimedia Taiwan
---
## [OpenStreetMap Taiwan(Q104641278)](https://www.wikidata.org/wiki/Q104641278) & [Wikidata Taiwan(Q65555605)](https://www.wikidata.org/wiki/Q65555605)
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
* Monthly Meetup in Taipei, co-host of OpenStreetMap Taiwan and Wikidata Taiwan
* members overlapped with each other
* Major change monitor and mapping, tagging scheme discussion
Note:
I am one of the co-host of the monthly meetup in Taiwan, co-hosted with Wikidata Taiwan community. There are a hugh overlapped of community members between Wikidata and OpenStreetMap in Taiwan. The OpenStreetMap Taiwan community keep track of major development site, and sometimes discuss tagging scheme of mapping in Taiwan.
---
## Keep track of vandlism of Taiwan and recovery
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
**easy spotted by QA Tools like Osmcha**
![](https://i.imgur.com/yLDU1uP.jpg =560x)
Note:
It is quite easy to spot vandlism by varies QA tools
---
## Keep track of vandlism of Taiwan and recovery
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
![](https://i.imgur.com/H4baf3c.png =560x)
Note:
The most annoying stuff is unhappy Chinese people adding notes or claimed China owns Taiwan.
---
## Cross Taiwan Strait Railway planned only by China
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
![](https://i.imgur.com/k8tRdw9.png =560x)
Note:
Sometimes Chinese people are making unrealistic edit, for example, cross Taiwan Strait Railway. It might be possible to add as proposal status, but most editor not aware about it. Might have trouble to distigish real and imagal status
---
## Visualization of Villages in Taiwan
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
[![](https://i.imgur.com/fs9Ds83.png)](https://overpass-turbo.eu/s/1kR3)
Note:
Here is the visualization of the whole near 8 thousand villages of Taiwan.
---
## First Attempt of Using HOT Tasking Manager during Mapping Party
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
<div style="display:inline-flex;align-items:left;">
<div left>
![HOT Tasking Manager Project](https://hackmd.io/_uploads/SJ_TAE6C6.png =1600x)
</div>
<div style="font-size:85%;display:flex;background-color:rgb(127 195 140/0.5);padding:1px;justify-content:flex-end;">
<div style="flex-direction:column; text-align:left">
* Date: 2024 03/26
* Venue: NCKU
* Map a Local Project in Taiwan
* OSM Diary: [OpenStreetMap Taiwan x TomTom NCKU Mapping Workshop](https://www.openstreetmap.org/user/Supaplex/diary/403835)
</div>
</div>
</div>
---
## 04/03 07:58:11 UTC+8 Hualien Earthquake
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/04.03_%E5%89%AF%E7%B8%BD%E7%B5%B1%E5%89%8D%E5%BE%80%E8%8A%B1%E8%93%AE%E7%9E%AD%E8%A7%A3%E7%81%BD%E5%AE%B3%E6%83%85%E5%BD%A2%E5%8F%8A%E6%95%91%E6%8F%B4%E9%80%B2%E5%BA%A6_-_53629407644_%28cropped%29.jpg/905px-04.03_%E5%89%AF%E7%B8%BD%E7%B5%B1%E5%89%8D%E5%BE%80%E8%8A%B1%E8%93%AE%E7%9E%AD%E8%A7%A3%E7%81%BD%E5%AE%B3%E6%83%85%E5%BD%A2%E5%8F%8A%E6%95%91%E6%8F%B4%E9%80%B2%E5%BA%A6_-_53629407644_%28cropped%29.jpg =800x)
<a href="https://commons.wikimedia.org/wiki/File:04.03_%E5%89%AF%E7%B8%BD%E7%B5%B1%E5%89%8D%E5%BE%80%E8%8A%B1%E8%93%AE%E7%9E%AD%E8%A7%A3%E7%81%BD%E5%AE%B3%E6%83%85%E5%BD%A2%E5%8F%8A%E6%95%91%E6%8F%B4%E9%80%B2%E5%BA%A6_-_53629407644_(cropped).jpg">Presidental Office</a>, <a href="https://creativecommons.org/licenses/by/2.0">CC BY 2.0</a>, via Wikimedia Commons
---
## Stuff that We Imported
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
<div style="display:inline-flex;align-items:center;gap:2rem;">
<div style="flex:1;text-align:left;font-size:80%;left">
* AED
* Emergency Shelter - accuracy
* ETC poll
* iBox
* Address node
</div>
<div style="flex:1;text-align:left;font-size:80%;left">
* Village boundary - semi-imported
* Drinking Water
* Place in name in local language
* Hydrant
<div style="flex-direction:column; text-align:left">
</div>
</div>
</div>
Note:
We have dealed with AED, emergency shelter, ETC toll, address, village boundary, drinking water, place name in national language, fire hydrant etc.
---
## Past(bad) Experience: Emergency Shelters went to null island
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
[![圖片](https://hackmd.io/_uploads/SJHttma-0.png)](https://www.openstreetmap.org/changeset/49185168#map=2/23.2/78.0)
Note:
The Taiwanese government had released dataset of emergency shelter. But the dataset is quite large, and have some 0,0 location stuff, the nickname null island place.
---
## Past(bad) Experience: Removing nodes outside Taiwan
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
[![圖片](https://hackmd.io/_uploads/r1yhtQ6b0.png)](https://www.openstreetmap.org/changeset/49183572#map=3/13.92/60.82)
Note:
So the internatl community, also known as the German mapper, decide to make a adjustment, delting the wrong and low-quality shelter data. In the early days of Taiwan community, we are not able to clean up our mess by ourself.
---
## Village boundary import
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
* Change of Code of Household Registration and Conscription Information System
* Directorate-General of Budget, Accounting and Statistics->Department of Household Registration
* Merging of Villages
* ex: split due to large population
* program to monitor change of source
Note:
We found out that some times villages will change. And made things worse is we didn't use the newest village dataset to import. It was dealt with Directorate-General of Budget, Accounting and Statistics, and hand over to Department of Household Registration to take care. And some times local government will merge or split villages.
---
## Network Analysis of administrative units
<div style="display:inline-flex;align-items:center;gap:2rem;">
<div style="flex:1;text-align:left;font-size:100%;left" >
![圖片](https://hackmd.io/_uploads/r1e-qUabR.png =600x)
</div>
<div style="flex:1;text-align:left;font-size:100%;left">
<div style="flex-direction:column; text-align:left">
![圖片](https://hackmd.io/_uploads/r1S4qLabC.png =350x)
</div>
</div>
</div>
Note:
This is an analytis by a Chinese mapper. There are some strange father-son relation in Taiwan. For example, empty township relation with no villages. And a single village with multi upper township relation.
---
## Ceb Wiki ljsbot: mass rivers import
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
[![](https://i.imgur.com/ZU69q1R.png)](https://www.vice.com/en/article/4agamm/the-worlds-second-largest-wikipedia-is-written-almost-entirely-by-one-bot)
Note:
The Cebuanoese Wikipedia is a mass robot imported Wikipedia. They use robot to massly create articles. And they also create many river items from GNS dataset.
---
## River Dataset
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
<div style="display:inline-flex;align-items:center;gap:2rem;">
<div style="flex:1;text-align:left;font-size:60%;left" >
* Not every River has River Code
* Wikidata(ceb) items import from GNS
* The National map from NLSC
* Community matching rivers and creeks with Wikidata and River Code
</div>
<div style="flex:1;text-align:left;font-size:100%;left">
<div style="flex-direction:column; text-align:left">
![JOSM River Relation](https://i.imgur.com/L3o35Z9.png =600x)
</div>
</div>
</div>
Note:
There are quite large spending on rivers in Taiwan by Taiwanese government. And they asign river code to each river. The number of list of river code is quite small compare to the actual number of river in Taiwan. We have to add more river on both Wikidata and OpenStreetMap, even though these rivers are not in the river code list.
---
## Address-Taichung the first Project
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
![Taichung GitHub](https://hackmd.io/_uploads/ry1fFsdKA.png)
[GitHub Taichung Address Import Process](https://gist.github.com/typebrook/c03326c77541733045331183c46032c3)
Note:
The first case in Taiwan is importing the Taichung address nodes. typebrook is quite familar with Linux tools, so he used a bunch of Linux command tools.
---
## Address-housenumber(號) Suffix
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
![Facebook-discuss](https://hackmd.io/_uploads/SyMmci_KR.png)
Note:
The community ask what is the best way to deal with, keep the 號 suffix or remove, but we decide to keep the suffix.
---
## Address-Taichung WGS84
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
[![Taichung](https://hackmd.io/_uploads/BywrXV6bA.png)](https://gist.github.com/typebrook/c03326c77541733045331183c46032c3?permalink_comment_id=3675141)
Note:
Taichung City is the first one to release address dataset. We have to convert the format to OpenStreetMap compatitbale format. Originally the city and district are in household register system code. the number in lane, alley, housenumber are in full-width character. All these stuff should be converted.
---
## Address-Text processor
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
<div style="display:inline-flex;align-items:center;gap:2rem;">
<div style="flex:1;text-align:left;font-size:70%;left" >
1. Regex
2. Housenumber register system code into normal administrative name
3. Full-wide character into half-wide character
4. Split road name address system and non-road name address system
5. Combined road, lane, alley
6. split housenumber, floor, and unit
7. addr:full (not in Taichung)
</div>
<div style="flex:1;text-align:left;font-size:100%;left">
<div style="flex-direction:column; text-align:left">
![圖片](https://hackmd.io/_uploads/BJiVi4TbC.png =500x)
</div>
</div>
</div>
Note:
I have told all of you that you have to convert the administative code into administrative name. Full-wide character into half-wide character number. And the road, lane, ally fields into one field. The government releases dataet the floor, unit are in the same field, also have to split into different fields. And to make the whole process much smoother, and also have to split road address system and non-road address system into differet workflow. And for human-readable, I compose the full address format addr:full.
---
## Address-Cities and Counties
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
<div style="display:inline-flex;align-items:center;gap:2rem;">
<div style="flex:1;text-align:left;font-size:80%;left">
* Six Cities: Taipei, Taichung, Taoyuan, Tainan, Kaohsiung
* Provincial City: Hsinchu
* Counties: Taitong, Miaoli, Yunlin
* Most Recent Case: New Taipei
</div>
<div style="flex:1;text-align:left;font-size:100%;left">
<div style="flex-direction:column; text-align:left">
![圖片](https://hackmd.io/_uploads/SkXcyrTW0.png)
</div>
</div>
</div>
Note:
There are also some strange thing for the Taipei City address nodes, you have to add a offset to the whole dataset. The cities or counties in this list all have been processed.
---
## Full-Wide Character Figures
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
![圖片](https://hackmd.io/_uploads/Bk-hMSpbA.png)
Note:
Let's look at the full-wide character figure problem. You have to convert those full-wide character into half-wide character.
---
## Rare-used Character Place Name
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
![圖片](https://hackmd.io/_uploads/S1vv3NpW0.png)
Note:
Not all hanji are well setup in the computer system. And we have to found out the right character to replece the question mark character.
---
## Special Housenumber
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
![圖片](https://hackmd.io/_uploads/BJ_jn4TWA.png)
Note:
There are some special address format. When we process these address nodes, we have to keep in mind about it.
---
## Filling Street Names by Address Nodes
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
![missing road](https://hackmd.io/_uploads/SyGVeH6-R.png)
Note:
Some external tools like Osmos can detect nameless street, and found out so many nameless streets.
---
## OSM-Fr Comments
![圖片](https://hackmd.io/_uploads/BJjYv3C_C.png)
Note:
Osmose is a tool maintained by OSM-Fr. Due to Taiwan recent address import tasks. They come to Taiwan forum page to ask us what is the situation. And for Taiwanese mappers, we have to spend large amount of time to add name to nameless street according to imported address dataset.
---
## Address Nodes of Taipei 101
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
![Taipei 101](https://hackmd.io/_uploads/By-Q-STW0.png)
Note:
One of the highest building around the world, the address of Taipei 101 is No. 7, Section 5 Xinyi Road, Xinyi District, Taipei City (臺北市信義區信義路五段7號), all the 101 level have at least 1 node.
---
## 99 housenumber nodes of Dàqún guǎn/kǎixuán yuàn Building
![kǎixuán yuàn](https://hackmd.io/_uploads/rJNZBkWY0.png)
Note:
The KMT president candidate own a house properity of 99 housenumber, which cause controvery during the election. We found out that the dataset included the whole 99 housenumber entries, and of course all imported in OpenStreetMap
---
## Housenumber Rearrange
![Tainan High Speed Railway Station](https://hackmd.io/_uploads/S1RjMyZY0.png)
Note:
Here is a example of no road name address converted to road name address system. A place near Tainan High Speed Railway Station.
---
## August 8 Flood and the Destroyed of Xiaolin Village
![Xiaolin Village](https://hackmd.io/_uploads/SJQi-yZtA.png)
Note:
88 Flood caused the destored of Xiaolin Village. But the address nodes are still in the dataset, not removed.
---
## Other Imported Datasets
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
<div style="background-color: purple;
color: white;
padding: 10px;
border: solid 3px #0F7391;
margin: 10px;">
* AED
* ETC Toll
* Fire Hydrant
* iBox
* Under Process: Power Cabin, Street Lamp, Power Pole
</div>
Note:
AED, ETC Toll, Fire Hydrant, iBox. There are still power Cabin, street lamp and power pole under process
---
## [To-siā!](https://en.wiktionary.org/wiki/%E5%A4%9A%E8%AC%9D#Chinese) [sṳ̀n-mùng-ǹ!](https://en.wiktionary.org/wiki/%E6%89%BF%E8%92%99%E4%BD%A0) Thank you!
<!-- .slide: data-background="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Wikidatacon_2023_Banner_-_04.png/1024px-Wikidatacon_2023_Banner_-_04.png" data-background-opacity="0.5"-->
- :sheep: [GitHub](https://github.com/Supaplextw/)
- Supaplex: [Wikidata](https://wikidata.org/wiki/User:Supaplex),[OpenStreetMap](https://www.openstreetmap.org/user/Supaplex)
- Or [email](mailto:dennis@wikimedia.tw)
- Facebook group [Wikidata Taiwan](https://www.facebook.com/groups/2212207218990971/)、[OpenStreetMap Taiwan](https://www.facebook.com/groups/OpenStreetMap.TW/)
Note:
Here is my contact information, To-siā, sṳ̀n-mùng-ǹ! Thank you!
{"title":"The Journal of Importing Open Data Address in Taiwan into OpenStreetMap","description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"6d29f5f5-3da6-40f2-b920-e9a4cc2181dd\",\"add\":27549,\"del\":7081}]"}