Dianbo Liu
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Remote Sensing Project === ###### tags: ## Overview The goal is to build a model which is able to estimate poverty given a region on a satellite image of Nigeria. ### Dataset - **Images**: 1,252,984 daytime satellite images which cover all of Nigeria taken in 2020 at 400x400 pixel per image at 2.5m spatial resolution.![](https://i.imgur.com/aXuELF8.jpg) - **Targets**: - **Survey data (~22k labels at household level)**: - **rwi_Subset**: is over electricity, numasset_smartphone, numasset_regmobilephone, numasset_car, numasset_motorbike, numasset_fridge, numasset_tv, numasset_radio, main_water, improved_water, cookstovetype, refusetype, sanitation_type, floortype, wallstype, rooftype, numsleepingrooms - **rwi_Full**: is over everything but hhid, interview__key, sector, zone, state, lga, ea, cluster, month_of_interview, year_of_interview, hh_gps_latitude, hh_gps_longitude, hh_gps_accuracy, hh_gps_altitude, popw, weight, totcons_pc, totcons_adj, pl - **log(adj_consumption)**: is the logarithm of the consumption per capita adjusted for regional price differences/inflation between the beginning and end of the survey. log(totcons_adj) - **Fraction of households below $1.90 per capita / day**: is the binary classification of the adjusted consumption per capita below and above $1.90 per capita/day which is equivalent to 247,679 Naira percapita. totcons_adj - **Poverty indices from the DHS Survey**: ~9000 labels(with a 5km jittering from the true point). - **Inferred wealth indices from another model**: ~1.8M labels(Although there are 1.2M satellite images, this dataset has 1.8M labels because inference was done on images of a different size). This model was trained on labels from the DHS Survey. - **Building dataset**: Covering the entire country. There are buildings in 215766/1.2M satellite images Nigeria. --- ## Feature ### Regression - Currently, the poverty index is represented by the rwi_Subset feature. This means that it was calculated using household assets, the ### Classification ``` refuse_type_map={'DISPOSAL IN A RIVER/STREAM':3, 'DISPOSAL IN THE BUSH':5, 'DISPOSAL WITHIN COMPOUND (INCL BURNING)':7, 'GOVT BIN OR SHED':1, 'HH BIN COLLECTED BY GOV':4, 'HH BIN COLLECTED BY PRIVATE FIRM OR INDIVIDUAL': 2, 'OTHER (SPECIFY)':0, 'UNAUTHORIZED REFUSE HEAP':6} roof_type_map ={'CORRUGATED IRON SHEETS':3, 'ASBESTOS SHEET':0, 'ZINC SHEET':10, 'LONG/SHORT SPAN SHEETS':4, 'CONCRETE/CEMENT':2, 'THATCH (GRASS OR STRAW)':9, 'STEP TILES':8, 'OTHER (SPECIFY)':6, 'MUD':5, 'PLASTIC SHEET':7, 'CLAY TILES':1} electricity_map={'yes':1,'no': 0} slum_ea_map ={'yes':1,'no': 0} slum_hh_map={'yes':1,'no': 0} sanitation_type_map ={'FLUSH TO SEPTIC TANK':5, 'FLUSH TO PIT LATRINE':4, 'FLUSH TO SOMEWHERE ELSE':6, 'PIT LATRINE WITH SLAB':11, 'PIT LATRINE W/O SLAB/OPEN PIT':10, 'NO FACILITIES,BUSH, OR FIELD':8, 'HANGING TOILET/HANGING LATRINE':7, 'VENTILIATED IMPROVED LATRINE':12, 'COMPOSTING TOILET':1, 'FLUSH TO OPEN DRAIN':2, 'FLUSH TO PIPED SEWAGE SYSTEM':3, 'BUCKET':0, 'OTHER (SPECIFY)':9} job_map ={'03 nfe': 2, '01 wage':0, '04 not working':3, '02 agriculture':1} rwi_full_map = {'upper':3,'mid_upper':2,'mid_lower':1,'lower':0} continous variables= {"lon":float,"lat":float,"num_houses":float,} ``` --- ### Methodology #### Data preparation --- ## Models - ### Using the 22k Dataset - Auxilliary labels: 11 attributes - Regression is on the continous PI value | Reg_Model | MSE | R2 | |-----------------|-------|-------| | M1_PI_only | 0.516 | 0.483 | | Pretrained model on satellite images_PI_only| 0.39 | 0.591 | | | | | | | | | | Class_Model | Acc | | --------------------------------------------- | ----- | | M1_(PI_only) | 0.53 | | Pretrained model on satellite images(PI_only) | 0.55 | | M1_(PI_rooftypes) | 0.56 | | M1_(PI_rooftypes_refuse_type_sanitation_type) | 0.54 | | | | | | | - **M1-reg** :Using a simple cnn trained on the 22k Dataset. input(satellite images-400x400) and outputs(poverty index-float)[[1]](#M1-model) - **M1-class**: Using a simple cnn trained on the 22k Dataset. input(satellite images-400x400) and outputs(poverty index-classes)[[1]](#M1-model) <!--- ##Methodology - Use a **clustering technique** to group the images into x groups so that we can create models better suited for each group since we do not have enough lables to represent the entire country. - ***YB: I am not at all convinced this is a good idea. Deep learning likes more data, and we deal with the issue of small amounts of labels of interest with multi-task learning here. Clustering loses too much information. At the very least, let us start by using standard deep learning approaches as baseline. Then try the clustering idea and compare (but personally I would not even do it).*** - Images that have buildings and road features are expected to be grouped together while those with features such as majority pixels as water bodies and open fields should be in a different group. This way, we can reduce the variance of the types of images the model trained on images of where people live sees(which is what we are interested in). This could be beneficial and more realistic since the groundtruth data almost/only covers images where people live. - ***YB: if you have a way to cluster images in the way you say, just use those cluster categories as extra input (near the top level of the deep net hierarchy). Sharing parameters across data is most of the time winner. Not having a separate model for each category.*** --- ![](https://i.imgur.com/KSUY1R3.png) **1. Create a baseline model**: - Get a pretrained model and fine-tune it using the 22k dataset. This will be predicting the poverty index using just the mean. - y = f(x), mse/CE loss ***YB: what is x here? an image? Why do you say it is just the mean?*** **2. Add the other attributes to the prediction**: - Get a pretrained model and fine-tune it using the 22k dataset. This will be making predictions using the mean square error and the covariance matrix of the other attributes. - ***YB: sorry but I don't follow your idea above. Please explain with math and clear specifications.*** - Since there could be multiple households per image, we will treat each household as an individual datapoint(i.e. the same satellite image can have multiple target values.). In addition, apart from the waelth index , when we went back to the original source of the data, we realized we have access to survey detailed such as monthly income of the household, occupation categories, number of bikes they have etc.... We realize these data could be fully utilzed. ***YB: yes! they should, and the best way according to me is to treat all those labels as auxiliary tasks, all sharing the same image-to-features pipeline but each having a different feature-to-prediction head branch.*** - 1) Add random noise to the loss ***YB: why? It will probably just slow the training. If you want to regularize, there are better ways, like L1 and L2 weight decay, dropout, early stopping, etc.*** - 2) Add more realistic noise to the MSE by introducing a prior of a covariance matrix in the form of a cholesky decomposition of the covariance matrix of our attributes. ***YB: I am completely lost as to why you are doing this. It does make sense to learn the joint distribution over all the labels, given the image, though. But I am not sure this is the intention here.*** - y = f(h(x)+ n(x,Lu)) - Attributes which are used at first will be those of real numbers and those that correlate with features on an image e.g number of buildings in an image, number of assets, wheter there is electricity, number of sleeping rooms etc. --> <!--- ### Implementation - #### Using the Pretrained network as a feature extractor. :::info **Model:** VGG16 pretrained on Big Earthnet - **Model Info:** To construct BigEarthNet with Sentinel-2 image patches (called as BigEarthNet-S2 now, previously BigEarthNet), 125 Sentinel-2 tiles acquired between June 2017 and May 2018 over the 10 countries (Austria, Belgium, Finland, Ireland, Kosovo, Lithuania, Luxembourg, Portugal, Serbia, Switzerland) of Europe were initially selected. All the tiles were atmospherically corrected by the Sentinel-2 Level 2A product generation and formatting tool (sen2cor). Then, they were divided into 590,326 non-overlapping image patches. Each image patch was annotated by the multiple land-cover classes (i.e., multi-labels) that were provided from the CORINE Land Cover database of the year 2018 (CLC 2018). - **Preprocessing:** We removed the last layer of the model. - We passed each image through a pretrained VGG16 CNN model to extract 4096 features from the daytime satellite images. ::: - #### Fine-tuning a Pretrained network. :::info **Models:** VGG16, Resnet101 pretrained on Big Earthnet - **Model Info:** To construct BigEarthNet with Sentinel-2 image patches (called as BigEarthNet-S2 now, previously BigEarthNet), 125 Sentinel-2 tiles acquired between June 2017 and May 2018 over the 10 countries (Austria, Belgium, Finland, Ireland, Kosovo, Lithuania, Luxembourg, Portugal, Serbia, Switzerland) of Europe were initially selected. All the tiles were atmospherically corrected by the Sentinel-2 Level 2A product generation and formatting tool (sen2cor). Then, they were divided into 590,326 non-overlapping image patches. Each image patch was annotated by the multiple land-cover classes (i.e., multi-labels) that were provided from the CORINE Land Cover database of the year 2018 (CLC 2018). - **Preprocessing:** Removed the last 6 layers of the model. ::: - #### Cluster 1.2M images and then train a weakly supervised model. :::info **Clustering pipeline:** - Load data - Resize the images into 224x224 for VGG16 - Reshape the input array into batches - Load feature extractor(pretrained model) - Using VGG16 model: this should not matter much since we are only interested in grouping images which look similar together. - Remove the last layer so that the output array is of size (4096,1) - Use the model to extract features - Dump features into a pickle file - Reduce the dimensionality - Use PCA to reduce the dimensionality of the feature vectors from (4096,1) to (100,1) - Cluster images - Decide on K: the number of clusters - Cluster images, and store the list of labels on disk --> ::: --- ## Evaluation :closed_book: Tasks --- ==Importance== (1 - 5) / Name ### TODO: - [x] ==5== Divide images into clusters/Groups - [x] ==3== Find a suitable clustering technique and create a pipeline - [x] ==3== Divide the images into clusters - [x] ==3== Store the clusters on disk - [ ] ==5== Configure pretrained network - Configure big earthnet model trained using tf 1(**current task**) - [ ] ==4== Add model to existing pipeline - [ ] ==4== Fine-tune model - [ ] ==3== Evaluate the model - [ ] ==2== Train a Weakly supervised model using the clusters --- ## Supplementary Materials <!-- Other important details discussed during the meeting can be entered here. --> ### M1 model ![](https://i.imgur.com/HXz92o5.png) ### M1_all_labels ![](https://i.imgur.com/r1niI6p.png) ### M1_trained from scratch_PI ![](https://i.imgur.com/naBNCtH.png) ### M2_pretrained_PI ![](https://i.imgur.com/GPE545i.png)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully