# CUHK-STAT3009: Kaggle Competition Instructions (Tentative)
- **Date**: Oct 23
- **Time**: 12:30PM - 3:00PM
- **Position**: LSB G25 (Floor G)
> [!IMPORTANT]
> To fairness, you are only permitted to use the laboratory computers and do not use your personal devices.
## **Rules and Guidelines:**
* You are allowed to use lecture notes, Jupyter notebooks, and online resources to complete the quiz.
* You must work independently and NO communicating with others during the quiz.
* The use of AI-powered tools, such as ChatGPT, Claude, Gemini, and similar platforms, is strictly prohibited.
* Additionally, social media platforms like WhatsApp, WeChat, and others are not permitted.
## **Academic Integrity:**
Please acknowledge that violating these rules is equivalent to dishonesty, and please understand that doing so would result in a score of zero on the quiz.
<!--
## Example Kaggle Notebook (Item mean submission)
```python=
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session
train = pd.read_csv('/kaggle/input/cuhk-stat-3009-svd-models-competition/train.csv')
test = pd.read_csv('/kaggle/input/cuhk-stat-3009-svd-models-competition/test.csv')
sub = pd.read_csv('/kaggle/input/cuhk-stat-3009-svd-models-competition/sample_submission.csv')
## Casting data from pd.df -> np.array
X_train, y_train = train[['user_id', 'item_id']].values, train['rating'].values
X_test = test[['user_id', 'item_id']].values
user = "statmlben"
repo = "CUHK-STAT3009"
src = "src"
pyfile = "TabRS.py"
url = f"https://raw.githubusercontent.com/{user}/{repo}/main/{src}/{pyfile}"
!wget --no-cache --backups=1 {url}
from sklearn.preprocessing import LabelEncoder
X = np.concatenate([X_train, X_test], axis=0)
## introduce le_user and le_item
## fit under X
## tranform for X_train and X_test
from sklearn.preprocessing import LabelEncoder
## user label encoder
le_user = LabelEncoder()
le_user.fit(X[:,0])
X_train[:,0] = le_user.transform(X_train[:,0])
X_test[:,0] = le_user.transform(X_test[:,0])
## item label encoder
le_item = LabelEncoder()
le_item.fit(X[:,1])
X_train[:,1] = le_item.transform(X_train[:,1])
X_test[:,1] = le_item.transform(X_test[:,1])
## num of users
n_user = len(le_user.classes_)
## num of items
n_item = len(le_item.classes_)
from TabRS import rmse, GlobalMeanRS, UserMeanRS, ItemMeanRS, SVD
itemRS = ItemMeanRS(n_item)
itemRS.fit(X_train, y_train)
y_pred = itemRS.predict(X_test)
sub['rating'] = y_pred
sub.to_csv('submission.csv', index=False)
```
-->