# CUHK-STAT3009: Kaggle Competition Instructions (Tentative) - **Date**: Oct 23 - **Time**: 12:30PM - 3:00PM - **Position**: LSB G25 (Floor G) > [!IMPORTANT] > To fairness, you are only permitted to use the laboratory computers and do not use your personal devices. ## **Rules and Guidelines:** * You are allowed to use lecture notes, Jupyter notebooks, and online resources to complete the quiz. * You must work independently and NO communicating with others during the quiz. * The use of AI-powered tools, such as ChatGPT, Claude, Gemini, and similar platforms, is strictly prohibited. * Additionally, social media platforms like WhatsApp, WeChat, and others are not permitted. ## **Academic Integrity:** Please acknowledge that violating these rules is equivalent to dishonesty, and please understand that doing so would result in a score of zero on the quiz. <!-- ## Example Kaggle Notebook (Item mean submission) ```python= # This Python 3 environment comes with many helpful analytics libraries installed # It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python # For example, here's several helpful packages to load import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv) # Input data files are available in the read-only "../input/" directory # For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory import os for dirname, _, filenames in os.walk('/kaggle/input'): for filename in filenames: print(os.path.join(dirname, filename)) # You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" # You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session train = pd.read_csv('/kaggle/input/cuhk-stat-3009-svd-models-competition/train.csv') test = pd.read_csv('/kaggle/input/cuhk-stat-3009-svd-models-competition/test.csv') sub = pd.read_csv('/kaggle/input/cuhk-stat-3009-svd-models-competition/sample_submission.csv') ## Casting data from pd.df -> np.array X_train, y_train = train[['user_id', 'item_id']].values, train['rating'].values X_test = test[['user_id', 'item_id']].values user = "statmlben" repo = "CUHK-STAT3009" src = "src" pyfile = "TabRS.py" url = f"https://raw.githubusercontent.com/{user}/{repo}/main/{src}/{pyfile}" !wget --no-cache --backups=1 {url} from sklearn.preprocessing import LabelEncoder X = np.concatenate([X_train, X_test], axis=0) ## introduce le_user and le_item ## fit under X ## tranform for X_train and X_test from sklearn.preprocessing import LabelEncoder ## user label encoder le_user = LabelEncoder() le_user.fit(X[:,0]) X_train[:,0] = le_user.transform(X_train[:,0]) X_test[:,0] = le_user.transform(X_test[:,0]) ## item label encoder le_item = LabelEncoder() le_item.fit(X[:,1]) X_train[:,1] = le_item.transform(X_train[:,1]) X_test[:,1] = le_item.transform(X_test[:,1]) ## num of users n_user = len(le_user.classes_) ## num of items n_item = len(le_item.classes_) from TabRS import rmse, GlobalMeanRS, UserMeanRS, ItemMeanRS, SVD itemRS = ItemMeanRS(n_item) itemRS.fit(X_train, y_train) y_pred = itemRS.predict(X_test) sub['rating'] = y_pred sub.to_csv('submission.csv', index=False) ``` -->