HW1_report - HackMD

# HW1_report Code for car brand classification. To read the detailed solution, please, refer to [the kaggle post](https://www.kaggle.com/c/cs-t0828-2020-hw1/overview) - [Installaton](#First) - [Create Class File and Download Official Image](#second) - [Define The Class](#third) - [Buiding Model](#fourth) - [Train The Model](#fifth) - [Predict Testing Data](#sixth) - [Make Submission](#seventh) <h2 id="First">Installation</h2> > using google colab with google drive <pre><code>auth.authenticate_user() gauth = GoogleAuth() gauth.credentials = GoogleCredentials.get_application_default() drive = GoogleDrive(gauth) download = drive.CreateFile({'id': '1sRUp_jnLMUTBNDiWZYy_qKjjiorDcOt9'}) download.GetContentFile('data.zip') !unzip data.zip</code></pre> <h2 id="second">Create Class File and Download Official Image</h2> > read traing_label.csv and move the image to the correct class file <pre><code>target_path = './data_test/' original_path = 'C:/Users/88697/hw1/data/raw/cs-t0828-2020-hw1/training_data/training_data/' with open('training_labels.csv',"rt", encoding="utf-8") as csvfile: reader = csv.reader(csvfile) rows= [row for row in reader] for row in rows: if row[0]!='id': if os.path.exists(target_path+row[1]) : full_path = original_path+row[0]+'.jpg' shutil.move(full_path,target_path + row[1] +'/') else : os.makedirs(target_path+row[1]) full_path = original_path+row[0]+'.jpg' shutil.move(full_path,target_path + row[1] +'/')</code></pre> <h2 id="third">Define The Class</h2> > Use the folder name to define the class and the class order <pre><code>def find_classes(dir): classes = os.listdir(dir) classes.sort() class_to_idx = {classes[i]: i for i in range(len(classes))} return classes, class_to_idx classes, c_to_idx = find_classes(data_dir+"/train") </code></pre> <h2 id="Fourth">Buiding Model</h2> > I use the model resnet34 in pytorch <pre><code>model = models.resnet18(pretrained=True) num_ftrs = model.fc.in_features model.fc = nn.Linear(num_ftrs, 196)</code></pre> <h2 id="fifth">Train The Model</h2> > I use 10 epochs and print the tain result every fourty steps <pre><code>epochs = 10 steps = 0 print_every = 40 model.to('cuda') model.train() for e in range(epochs): running_loss = 0 for ii, (inputs, labels) in enumerate(trainloader): steps += 1 inputs, labels = inputs.to('cuda'), labels.to('cuda') optimizer.zero_grad() outputs = model.forward(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() if steps % print_every == 0: model.eval() with torch.no_grad(): valid_loss, accuracy = validation(model, validloader, criterion) model.train() lrscheduler.step(accuracy * 100)</code></pre> <h2 id="sixth">Predict Testing Data</h2> > predict the test data <pre><code>a = [] te = [] for i in range(0,16182): print(i) if os.path.exists('test/My Drive/testing_data/'+str(i).zfill(6)+'.jpg'): te.append(i) img = image.load_img('test/My Drive/testing_data/'+str(i).zfill(6)+'.jpg', grayscale=False) img_preprocessed = preprocess(img) batch_img_cat_tensor = torch.unsqueeze(img_preprocessed, 0) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") batch_img_cat_tensor = batch_img_cat_tensor.to(device) model.eval() out = model(batch_img_cat_tensor) _, index = torch.max(out, 1) percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100 a.append(classes[index[0]])</code></pre> <h2 id="seventh">Make Submission</h2> > write the prediction into the csv <pre><code>sample = pd.read_csv('answer.csv') sample['id'] = te sample['label'] = a sample.to_csv('sample_cnn.csv', header=True, index=False) </code></pre>