Try   HackMD

毀天滅地的 Python 模組引用

在上週某天突然發現 Production 上的資料全部都變成 RC 環境的資料,project settings 都是 RC 環境的參數,這是一個非常嚴重的錯誤發生在 Production,第一時間我還還毫無頭緒。

先來看看 VM 裡面的目錄長這樣:
(先別嗆我為什麼三個環境的目錄會放在同一個 VM)

project-admin/
    └──project-prod/
    |     ├─app.py
    |     └─...
    └──project-rc/
    |     ├─app.py
    |     └─...
    └──project-test/
          ├─app.py
          └─...

乍看之下沒問題,每個環境的 Code 都分別放在不同的目錄下,也都有個別使用虛擬環境來執行 gunicorn,出事當下也確認過 Production 環境的執行目錄是在 project-prod 底下沒錯,那怎麼還會有事?!

原因就是和 Python 模組的引用順序有關。

我們的專案是用 Flask 開發的,app.py 就是很簡單的實例化一個 Flask 物件。

# app.py from flask import Flask app = Flask(__name__)

接著我們來看看 Flask 的 Source Code:

# https://github.com/pallets/flask/blob/1.1.x/src/flask/app.py class Flask(_PackageBoundObject): ... def __init__( self, import_name, static_url_path=None, static_folder="static", static_host=None, host_matching=False, subdomain_matching=False, template_folder="templates", instance_path=None, instance_relative_config=False, root_path=None, ): _PackageBoundObject.__init__( self, import_name, template_folder=template_folder, root_path=root_path ) self.static_url_path = static_url_path self.static_folder = static_folder if instance_path is None: instance_path = self.auto_find_instance_path() elif not os.path.isabs(instance_path): raise ValueError( "If an instance path is provided it must be absolute." " A relative path was given instead." ) ...

建構式中有個 Optional 參數 root_path 會被傳進 _PackageCoundObject.__init__() 中, 接著來看這個函式:

# https://github.com/pallets/flask/blob/1.1.x/src/flask/helpers.py class _PackageBoundObject(object): #: The name of the package or module that this app belongs to. Do not #: change this once it is set by the constructor. import_name = None #: Location of the template files to be added to the template lookup. #: ``None`` if templates should not be added. template_folder = None #: Absolute path to the package on the filesystem. Used to look up #: resources contained in the package. root_path = None def __init__(self, import_name, template_folder=None, root_path=None): self.import_name = import_name self.template_folder = template_folder if root_path is None: root_path = get_root_path(self.import_name) self.root_path = root_path self._static_folder = None self._static_url_path = None # circular import from .cli import AppGroup #: The Click command group for registration of CLI commands #: on the application and associated blueprints. These commands #: are accessible via the :command:`flask` command once the #: application has been discovered and blueprints registered. self.cli = AppGroup() ... def get_root_path(import_name): """Returns the path to a package or cwd if that cannot be found. This returns the path of a package or the folder that contains a module. Not to be confused with the package path returned by :func:`find_package`. """ # Module already imported and has a file attribute. Use that first. mod = sys.modules.get(import_name) if mod is not None and hasattr(mod, "__file__"): return os.path.dirname(os.path.abspath(mod.__file__)) # Next attempt: check the loader. loader = pkgutil.get_loader(import_name) # Loader does not exist or we're referring to an unloaded main module # or a main module without path (interactive sessions), go with the # current working directory. if loader is None or import_name == "__main__": return os.getcwd() ...

如果沒有傳入指定的 root_path, Flask 會調用 get_root_path 來搜尋跟目錄,看到這邊有一行在閃閃發亮 sys.module.get(import_name) 原來連我們寫的 app = Flask(__name__) 不是從當前的工作目錄開始找而是 sys.path ?

來做個實驗,建立兩個 flask 的 project:

/home/kevin/
    └──repo1/
    |     ├─app.py
    └──repo2/
          ├─app.py
# repo1/app.py from flask import Flask app = Flask(__name__) print("repo1 app!")
# repo2/app.py from flask import Flask app = Flask(__name__) print("repo2 app!")

接著使用錯誤的環境變數來改變 Python 尋找模組的順序:

export PYTHONPATH="home/kevin/repo2:home/kevin/repo1"

再用 gunicorn 執行看看會發生什麼事

$ gunicorn app:app
# repo2 app!

如果再把 mod = sys.modules.get(import_name) 這行的 mod 變數印出來則會看到

<module 'app' from '/home/kevin/repo2/app.py'>

這樣就重現了這個錯誤,修改的方法也很簡單,有幾種修正的方式

  1. 執行 gunicorn 的時候給定正確的環境變數 PYTHONPATH
  2. 把 Production, RC, Test 環境分別包進 container 中
  3. 在 Flask() 中直接塞入 root_path