owned this note changed 6 years ago
Linked with GitHub

The Software Engineering Part of Data Science - Niño R. Eclarin

歡迎來到 PyCon TW 2019 共筆

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

共筆入口:https://hackmd.io/@pycontw/2019
手機版請點選上方 按鈕展開議程列表。

PyCon Taiwan 2019 PyNight 活動規劃與 BoF 簽到/建立交流主題
https://hackmd.io/NaI9ymzRQ5urCvwOQ31WAA

從這開始

  • The difference between data scientist and software enginner

Development challenges

  • no linters
  • catch errors
  • code quality
  • no language features

Arcbjtecbthre

  • Algorithm
  • models will be used

Deployment

  • Cloud or Local
  • Mobile
  • Micro Controllers

Data Driven Development

  1. Data Source
  2. Data input -> preprocessing -> model -> output
  3. Data Storage

Coding Standards

  • Code Quality

    • Data Scientists are not software developers
    • Use linters and formatting tools
      • Type hints
    • code review
  • Error Handling

    • Standardize Errors
      • Meaningful errors
      • Warning v.s. Errors v.s. fatal errors
    • Specific Erros
      • specify some training or overfitting exception to improve the readability
  • Data Integrity

    • Class Definition
      • explain what the parameters means and the possible values
    • Atomic Operations
    • Granularity of Data Results

Architectural Challenges

  • Memory v.s. CPU
    • CPU for training
    • Memory for model storage
  • Multithreading
    • Good for memory bound algorithms (Decision Trees)
    • Easy to implement
  • Multiprocessing
    • has better throughput than Multithreading
    • Better for algorithms with high CPU consumption (NN)
    • Difficult to implement than the Multithreading
  • Provide "Glue Code" for data scientists (through ABC)
    • Unified interface saves life

Deployment

  • Scaling up
    • Take note of algorithm runtime
    • O(n^2) algos won't work when you scaling up
  • Scaling down

pprmint/pycontw

tags: PyConTW2019
Select a repo