The Journal of Importing Open Data Address in Taiwan into OpenStreetMap
slide: https://hackmd.io/@osm-tw/HkNyi84oR
CC-BY-4.0 OpenStreetMap Taiwan Community
OpenStreetMap Taiwan
9/7
Ta̍k-ke hó, Hello everyone, This is Dennis Raylin Chen from Taiwan, I want to talk about cleaning and managing dataset. My speech's title is "The Journal of Importing Open Data Address in Taiwan into OpenStreetMap". I will focus on importing address dataset
Who am I?
My online ID is Supaplex, one of the community member of OpenStreetMap Taiwan and Wikidata Taiwan, currently serving as a board of director of Wikimedia Taiwan
Monthly Meetup in Taipei, co-host of OpenStreetMap Taiwan and Wikidata Taiwan
members overlapped with each other
Major change monitor and mapping, tagging scheme discussion
I am one of the co-host of the monthly meetup in Taiwan, co-hosted with Wikidata Taiwan community. There are a hugh overlapped of community members between Wikidata and OpenStreetMap in Taiwan. The OpenStreetMap Taiwan community keep track of major development site, and sometimes discuss tagging scheme of mapping in Taiwan.
Keep track of vandlism of Taiwan and recovery
easy spotted by QA Tools like Osmcha
Keep track of vandlism of Taiwan and recovery
Cross Taiwan Strait Railway planned only by China
Sometimes Chinese people are making unrealistic edit, for example, cross Taiwan Strait Railway. It might be possible to add as proposal status, but most editor not aware about it. Might have trouble to distigish real and imagal status
Visualization of Villages in Taiwan
First Attempt of Using HOT Tasking Manager during Mapping Party
Stuff that We Imported
Village boundary - semi-imported
Drinking Water
Place in name in local language
Hydrant
We have dealed with AED, emergency shelter, ETC toll, address, village boundary, drinking water, place name in national language, fire hydrant etc.
Past(bad) Experience: Emergency Shelters went to null island
The Taiwanese government had released dataset of emergency shelter. But the dataset is quite large, and have some 0,0 location stuff, the nickname null island place.
Past(bad) Experience: Removing nodes outside Taiwan
So the internatl community, also known as the German mapper, decide to make a adjustment, delting the wrong and low-quality shelter data. In the early days of Taiwan community, we are not able to clean up our mess by ourself.
Village boundary import
Change of Code of Household Registration and Conscription Information System
Directorate-General of Budget, Accounting and Statistics->Department of Household Registration
Merging of Villages
ex: split due to large population
program to monitor change of source
We found out that some times villages will change. And made things worse is we didn't use the newest village dataset to import. It was dealt with Directorate-General of Budget, Accounting and Statistics, and hand over to Department of Household Registration to take care. And some times local government will merge or split villages.
Network Analysis of administrative units
This is an analytis by a Chinese mapper. There are some strange father-son relation in Taiwan. For example, empty township relation with no villages. And a single village with multi upper township relation.
Ceb Wiki ljsbot: mass rivers import
The Cebuanoese Wikipedia is a mass robot imported Wikipedia. They use robot to massly create articles. And they also create many river items from GNS dataset.
River Dataset
Not every River has River Code
Wikidata(ceb) items import from GNS
The National map from NLSC
Community matching rivers and creeks with Wikidata and River Code
There are quite large spending on rivers in Taiwan by Taiwanese government. And they asign river code to each river. The number of list of river code is quite small compare to the actual number of river in Taiwan. We have to add more river on both Wikidata and OpenStreetMap, even though these rivers are not in the river code list.
Address-Taichung the first Project
GitHub Taichung Address Import Process
The first case in Taiwan is importing the Taichung address nodes. typebrook is quite familar with Linux tools, so he used a bunch of Linux command tools.
Address-housenumber(號) Suffix
The community ask what is the best way to deal with, keep the 號 suffix or remove, but we decide to keep the suffix.
Address-Taichung WGS84
Taichung City is the first one to release address dataset. We have to convert the format to OpenStreetMap compatitbale format. Originally the city and district are in household register system code. the number in lane, alley, housenumber are in full-width character. All these stuff should be converted.
Address-Text processor
Regex
Housenumber register system code into normal administrative name
Full-wide character into half-wide character
Split road name address system and non-road name address system
Combined road, lane, alley
split housenumber, floor, and unit
addr:full (not in Taichung)
I have told all of you that you have to convert the administative code into administrative name. Full-wide character into half-wide character number. And the road, lane, ally fields into one field. The government releases dataet the floor, unit are in the same field, also have to split into different fields. And to make the whole process much smoother, and also have to split road address system and non-road address system into differet workflow. And for human-readable, I compose the full address format addr:full.
Address-Cities and Counties
Six Cities: Taipei, Taichung, Taoyuan, Tainan, Kaohsiung
Provincial City: Hsinchu
Counties: Taitong, Miaoli, Yunlin
Most Recent Case: New Taipei
There are also some strange thing for the Taipei City address nodes, you have to add a offset to the whole dataset. The cities or counties in this list all have been processed.
Full-Wide Character Figures
Let's look at the full-wide character figure problem. You have to convert those full-wide character into half-wide character.
Rare-used Character Place Name
Not all hanji are well setup in the computer system. And we have to found out the right character to replece the question mark character.
Special Housenumber
There are some special address format. When we process these address nodes, we have to keep in mind about it.
Filling Street Names by Address Nodes
Some external tools like Osmos can detect nameless street, and found out so many nameless streets.
OSM-Fr Comments
Osmose is a tool maintained by OSM-Fr. Due to Taiwan recent address import tasks. They come to Taiwan forum page to ask us what is the situation. And for Taiwanese mappers, we have to spend large amount of time to add name to nameless street according to imported address dataset.
Address Nodes of Taipei 101
One of the highest building around the world, the address of Taipei 101 is No. 7, Section 5 Xinyi Road, Xinyi District, Taipei City (臺北市信義區信義路五段7號), all the 101 level have at least 1 node.
99 housenumber nodes of Dàqún guǎn/kǎixuán yuàn Building
The KMT president candidate own a house properity of 99 housenumber, which cause controvery during the election. We found out that the dataset included the whole 99 housenumber entries, and of course all imported in OpenStreetMap
Housenumber Rearrange
Here is a example of no road name address converted to road name address system. A place near Tainan High Speed Railway Station.
August 8 Flood and the Destroyed of Xiaolin Village
88 Flood caused the destored of Xiaolin Village. But the address nodes are still in the dataset, not removed.
Other Imported Datasets
AED
ETC Toll
Fire Hydrant
iBox
Under Process: Power Cabin, Street Lamp, Power Pole
AED, ETC Toll, Fire Hydrant, iBox. There are still power Cabin, street lamp and power pole under process
Here is my contact information, To-siā, sṳ̀n-mùng-ǹ! Thank you!
Resume presentation
The Journal of Importing Open Data Address in Taiwan into OpenStreetMap slide: https://hackmd.io/@osm-tw/HkNyi84oR CC-BY-4.0 OpenStreetMap Taiwan Community OpenStreetMap Taiwan 9/7 Ta̍k-ke hó, Hello everyone, This is Dennis Raylin Chen from Taiwan, I want to talk about cleaning and managing dataset. My speech's title is "The Journal of Importing Open Data Address in Taiwan into OpenStreetMap". I will focus on importing address dataset
{"title":"The Journal of Importing Open Data Address in Taiwan into OpenStreetMap","description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"6d29f5f5-3da6-40f2-b920-e9a4cc2181dd\",\"add\":27549,\"del\":7081}]"}