ashwinassysh
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    --- title: "Homework 2" author: "Hamed Khoojinian, Yaassh Dhamani, Yun Deng, Xuan Ji, Samikshya Mishra, Ashwin Assysh Sharma" date: "9/29/2019" output: pdf_document: toc: T toc_depth: 3 --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` [toc] ```{r message = FALSE, warning=FALSE} library(dplyr) library(data.table) library(lubridate) library(tidyverse) library(dplyr) library(naniar) library(cluster) # for Similarity and PAM library(Rtsne) # for t-SNE plot library(ggplot2) # for Visualizations library(clustMixType) # for k-Prototypes library(airportr) ``` # Business ## Our Task There is intense competition in the airline carrier industry, but Sun Country, a small local player, managed to survive a series of threats in the past few decades. To improve its competitiveness, Sun Country came to our data science team hoping to improve its understanding of customer profile, and thus increase its revenue in the long run. The challenge is that this is a broad topic. To increase the efficacy of our analysis and make our recommendations specific and actionable, we decided to focus on our potential high value customers. Our goal is to increase the quantity of sales to high value customers so that we can gain their loyalty over time. We plan to increase sales to these customers by sending them targetted emails containing offers and prices that are of interest to them. `transform people who booked discount-first-class and first-class to Ufly members and increase revenues and customer loyalty.` ## Description of Business Problem Our business problem is what insights could we gain from customer segmentations to help Sun Country promote customer loyalty and increase its profit margin among first-class customers. ## Logic Flow and Proper Metrics Customers who can afford a first class ticket without a discount are those with higher spending power, and with all other things equal, should be of higher priority in our marketing efforts. Our data population was reduced to only those customers who at some point in the past 2 years bought an undiscounted first class ticket from us. If these customers could be segmented meaningfully, we could try to find interests or similarities among them that could help our marketing efforts increase sales to these customers. Staff could be alerted at various points along the customer experience to encourage a positive experience for these VIPs. ____#____ `` We chose to focus on customers who booked first-class or discount-first-class because the profit margin and the customer lifetime value are higher for Ufly members who book discount-first-class and first-class than other subsets of customers. Therefore, we will run an initial clustering analysis of customers who booked at least one discount-first-class or first-class flight. __ Customer segments can tell us about their interests, which will go into the targetted promotional emails, as well as tell us about their shopping habits which __` # Data Preparation ## Data Overview ```{r} data <- fread('SunCountry.csv', header = T, stringsAsFactors = T) data <- as.data.frame(data) summary(data) ``` There are a few irregularities in our dataset. For example, there are some missing values in `gender`, `age` and `birthdate` column, and noisy data in the `age` column where values are below 0 or above 100. Additionally, we had duplicate trips, and trips where customers did not fly with Sun Country. Detailed data processing procedures are provided below. ## Missing ```{r, eval=FALSE} # dealing with missing values and outliers clean1 <- data %>% drop_na(c("GenderCode","Age","birthdateid")) %>% filter(Age >=0, Age <= 100) %>% filter(GenderCode %in% c("F","M")) # replace UflyReward NAs with -1 clean2$UFlyRewardsNumber <- replace_na(clean4$UFlyRewardsNumber,-1) ``` * We dropped NA value in `gender`, `age`, and `birthdate` because we will not able to create unique passenger IDs with NA’s in these columns. * We replaced UflyRewards NA values with -1 since we wanted to include these passengers in our clustering analysis, but NAs are not allowed in the k-Prototypes analysis. ## Outliers * We dropped rows with `age` above 100 or below 0, since the possibility that a passenger’s age is above 100 is very low and it is impossible to have negative ages. * We removed rows with `gender` other than Female and Male, since no clear definition of ‘U’ gender is given in the data dictionary. ## Data Filtering ```{r, eval=FALSE} # use only SY flights clean3 <- clean2 %>% filter(MarketingAirlineCode=='SY') # remove duplicates clean4 <- distinct(clean3, PNRLocatorID, CouponSeqNbr, ServiceStartCity, ServiceEndCity, PaxName, GenderCode, birthdateid, ServiceStartDate, .keep_all= TRUE) # filter out PNR's whose SeqNum does not start with 1 df_MinCouponSeqNbr <- clean4 %>% group_by(PNRLocatorID) %>% summarize(MinCouponSeqNbr = min(CouponSeqNbr)) clean5 <- left_join(clean4,df_MinCouponSeqNbr,by="PNRLocatorID") ``` * We kept trips flying with Sun Country because we lack email information for customers that did not fly with SCA. Without contact information customer identificaiton for marketing purposes would be difficult. * We removed duplicate flights by retaining only those rows which had a single occurrence of PNRLocatorID, CouponSeqNbr, PaxName, ServiceStartCity, ServiceEndCity, ServiceStartDate combination. * We removed PNR locationID whose CouponSeqNbr did not start with 1 because PNR should always start with 1 and they were invalid data. ## Attribute Creation ```{r, eval=FALSE} # add MaxSeqNumber df_MaxCouponSeqNbr <- clean5 %>% group_by(PNRLocatorID) %>% summarize(MaxCouponSeqNbr = max(CouponSeqNbr)) clean6 <- left_join(clean5,df_MaxCouponSeqNbr,by="PNRLocatorID") # calculate the fare amount clean7 <- clean6 %>% mutate(Amt = case_when((is.na(BaseFareAmt) & is.na(TotalDocAmt)) ~ 0, TotalDocAmt >= BaseFareAmt ~ TotalDocAmt, TotalDocAmt < BaseFareAmt ~ BaseFareAmt + TotalDocAmt, TRUE ~ 0), AmtPerFlight = Amt/MaxCouponSeqNbr) # create unique customerID clean7$PaxID <- paste(clean7$EncryptedName, clean7$birthdateid, clean7$GenderCode) clean7$PNRCreateDate <- ymd(clean7$PNRCreateDate) clean7$ServiceStartDate <- ymd(clean7$ServiceStartDate) # days prebook clean7$PlannedDays <- clean7$ServiceStartDate - clean7$PNRCreateDate clean7$BookingChannel <- as.character(clean7$BookingChannel) # true origin true_origin <- clean7 %>% group_by(PNRLocatorID, PaxID) %>% filter(CouponSeqNbr == min(CouponSeqNbr)) %>% select(PNRLocatorID, PaxID, CouponSeqNbr, ServiceStartCity) true_origin$ServiceStartCity <- as.character(true_origin$ServiceStartCity) # true destination true_destination <- clean7 %>% group_by(PNRLocatorID, PaxID) %>% filter(CouponSeqNbr == max(CouponSeqNbr)) %>% select(PNRLocatorID, PaxID, CouponSeqNbr, ServiceEndCity) true_destination$ServiceEndCity <- as.character(true_destination$ServiceEndCity) clean7 <- clean7 %>% left_join(true_origin, by = c("PNRLocatorID" = "PNRLocatorID", "PaxID" = "PaxID")) %>% left_join(true_destination, by = c("PNRLocatorID" = "PNRLocatorID", "PaxID" = "PaxID")) clean7 <- distinct(clean7) # south or north # south and north preparation # add latitude and flight distance (flight distance may be too computationally expensive) distinct_start = data.frame(distinct(data, ServiceStartCity, .keep_all = FALSE)) distinct_end = data.frame(distinct(data, ServiceEndCity, .keep_all = FALSE)) # function to calculate latitude cal_latitude <- function(airport) { tryCatch( {latitude = as.numeric(airport_location(airport, input_type = "IATA")[1])}, error=function(x) { return(NA) } ) } distinct_start$start_latitude = apply(distinct_start, 1, cal_latitude) distinct_end$end_latitude = apply(distinct_end, 1, cal_latitude) clean7 <- clean7 %>% left_join(distinct_start, by = c("ServiceStartCity.x" = "ServiceStartCity")) %>% left_join(distinct_end, by = c("ServiceEndCity.x" = "ServiceEndCity")) %>% drop_na(c("start_latitude","end_latitude")) %>% mutate(StartCitySouth=ifelse(start_latitude<=38,1,0), EndCitySouth = ifelse(end_latitude<=38,1,0)) # round trip clean7 <- mutate(clean7, round_trip_flag = ifelse(clean7$ServiceStartCity.y == clean7$ServiceEndCity.y, 1, 0)) # one-way trip clean7 <- mutate(clean7, one_way_trip_flag = ifelse(clean7$round_trip_flag == 0, 1, 0)) # seasonality clean7$ServiceStartQuarter <- quarters(clean7$ServiceStartDate) # Booking channel -> airport clean7$BookingChannel <- ifelse(clean7$BookingChannel %in% c("XTM", "FCM", "MSP", "LAN", "UFO", "JFK", "DFW", "LAS", "MCO", "PSP", "MDW", "LAX", "MKE", "HRL", "ANC", "SFO", "RSW", "DCA", "BOS", "SEA", "PHX", "MSN", "GJT", "MIA"), "Airports", clean7$BookingChannel) #dummy_cols(clean7, select_columns = c("BookingChannel")) # group size group_size <- clean7 %>% group_by(PNRLocatorID) %>% summarise(GroupSize = n_distinct(PaxID)) clean7 <- clean7 %>% left_join(group_size, by = "PNRLocatorID") test <- data.frame(model.matrix(~ clean7$BookingChannel - 1, data = clean7)) clean7 <- cbind(clean7, test) # add vacation labels vacation_labels <- read.csv('airport_vacation_label.csv') vacation_labels <- airport_vacation_label clean7 <- clean7 %>% inner_join(vacation_labels,by=c('ServiceEndCity.x'='distinct_destinations')) %>% rename(vaca_label = vaca, EndCityName = city) # keep trips with customers who have ever flying with first class clean7$binaryfirst <- ifelse(clean7$BkdClassOfService =='First Class'| clean7$BkdClassOfService =='Discount First Class', 1,0) first_class_data <- clean7 %>% group_by(PaxID) %>% summarise(firstclassflag=max(binaryfirst)) %>% filter(firstclassflag == 1) clean7 <- clean7 %>% inner_join(first_class_data, by = "PaxID") clean7 <- clean7 %>% filter(Age > 18) sample_data <- clean7 ``` * We created new variables to help us understand customer characteristics. * `PaxId` - We created unique customerID (as PaxID) using `EncryptedName`, `Birthdate`, `Gender`. * `AmtPerFlight` - Since the total amount of a PNR is shown for each sequence number, we calculated the Amt per flight as the total amount divided by the maximum sequence number per PNR. * `Start and end city south/north` - Since part of Sun Country’s business is leisure flights, we added the latitude of the start and end cities and categorized as south or north. * `Days pre-booked` - Days pre-booked is defined as the difference between the service start date and booking date * `TrueOrigin` (group by PNR & customerID) and `TrueDestination` (for each customer within one PNR) - True origin and destination is the start and end city of the interlining tickets * `RoundTrip` (for each customer within one PNR) and `One-wayTrip` - It is a round trip if true origin and destination are the same, otherwise one-way trip * `Seasonality` (service was fulfilled in which quarter) - Seasonality (Q1-Q4) is generated from the service start date * `BookingChannels` - We created dummy variables for booking channels, and different airport names are replaced by ‘Airport’. * `GroupSize` - Group size is the number of customers traveling together, which is derived from the number of customers in each PNR number. * After calculating the group size, we **selected only passengers who flew first-class or discount first-class and excluded all passengers under 18 years old**, since it is very likely that they are dependent on their parents and they cannot be the direct marketing targets. ## Data Aggregation ```{r, eval=FALSE} customer_df <- sample_data %>% select(PNRLocatorID,TicketNum,CouponSeqNbr.x,PaxName,EncryptedName,GenderCode,Age, UflyMemberStatus,CardHolder,StopoverCode, MaxCouponSeqNbr,AmtPerFlight,MinCouponSeqNbr,PaxID,PlannedDays,round_trip_flag, one_way_trip_flag,GroupSize, sample_data.BookingChannelAirports, sample_data.BookingChannelOutside.Booking, sample_data.BookingChannelReservations.Booking, sample_data.BookingChannelSCA.Website.Booking, sample_data.BookingChannelSY.Vacation, sample_data.BookingChannelTour.Operator.Portal, sample_data.BkdClassOfServiceCoach, sample_data.BkdClassOfServiceDiscount.First.Class, sample_data.BkdClassOfServiceFirst.Class, sample_data.ServiceStartQuarterQ1, sample_data.ServiceStartQuarterQ2, sample_data.ServiceStartQuarterQ3, sample_data.ServiceStartQuarterQ4, StartCitySouth, EndCitySouth, vaca_label, binaryfirst) %>% rename(CouponSeqNbr = CouponSeqNbr.x, BookingChannel_Airports = sample_data.BookingChannelAirports, BookingChannel_Outside_Booking=sample_data.BookingChannelOutside.Booking, BookingChannel_Reservations_Booking = sample_data.BookingChannelReservations.Booking, BookingChannel_SCA_Website_Booking = sample_data.BookingChannelSCA.Website.Booking, BookingChannel_SY_Vacation=sample_data.BookingChannelSY.Vacation, BookingChannel_TourOperator_Portal=sample_data.BookingChannelTour.Operator.Portal, BkdClassOfService_Coach=sample_data.BkdClassOfServiceCoach, BkdClassOfService_Discount_First_Class=sample_data.BkdClassOfServiceDiscount.First.Class, BkdClassOfService_First_Class=sample_data.BkdClassOfServiceFirst.Class, ServiceStartQ1 = sample_data.ServiceStartQuarterQ1, ServiceStartQ2 = sample_data.ServiceStartQuarterQ2, ServiceStartQ3 = sample_data.ServiceStartQuarterQ3, ServiceStartQ4 = sample_data.ServiceStartQuarterQ4) %>% group_by(PaxID) %>% summarize(GenderCode = first(GenderCode), Age = first(Age), CardHolder = first(CardHolder), UflyMemberStatus = first(UflyMemberStatus), NumOfLayover = sum(StopoverCode=='O', na.rm=T), NumOfStopover = sum(StopoverCode=='X', na.rm=T), SumAmt = sum(AmtPerFlight, na.rm=T), AvgPlannedDays = mean(PlannedDays, na.rm=T), CntRoundTrip = sum(round_trip_flag, na.rm=T), CntOneWayTrip = sum(one_way_trip_flag, na.rm=T), Cnt_BC_Airport = sum(BookingChannel_Airports, na.rm=T), Cnt_BC_OutSideBooking = sum(BookingChannel_Outside_Booking, na.rm=T), Cnt_BC_Reservations_Booking = sum(BookingChannel_Reservations_Booking, na.rm=T), Cnt_BC_SCA_Website_Booking = sum(BookingChannel_SCA_Website_Booking, na.rm=T), Cnt_BC_SY_Vacation = sum(BookingChannel_SY_Vacation, na.rm=T), Cnt_BC_TourOperator_Portal = sum(BookingChannel_TourOperator_Portal, na.rm=T), CntBkdClass_Coach = sum(BkdClassOfService_Coach, na.rm=T), CntBkdClass_DiscountFirstClass = sum(BkdClassOfService_Discount_First_Class, na.rm=T), CntBkdClass_FirstClass = sum(BkdClassOfService_First_Class, na.rm=T), CntQ1 = sum(ServiceStartQ1, na.rm=T), CntQ2 = sum(ServiceStartQ2, na.rm=T), CntQ3 = sum(ServiceStartQ3, na.rm=T), CntQ4 = sum(ServiceStartQ4, na.rm=T), AvgGroupSize = mean(GroupSize, na.rm=T), SumStartCitySouth = sum(StartCitySouth, na.rm=T), SumEndCitySouth = sum(EndCitySouth,na.rm=T), Numofvacation = sum(vaca_label,na.rm=T), numfirclasstra = sum(binaryfirst,na.rm=T)) ``` * We aggregated the transactional-level data to customer-level data. We took the first instance of gender, age, and Ufly member status of each customer. We counted the number of layovers, stopovers, amount per flight, round-trips, one-way trips. Also, we counted how many times each passenger booked from each booking channel, in which quarter, and in which classes, from and to a southern city. We averaged the planned days and the group size per passenger. ```{r, eval=FALSE} normalize <- function(x){ return ((x - min(x))/(max(x) - min(x)))} customer_df2 <- customer_df1 %>% select(-PaxID) %>% mutate(Age = normalize(Age), NumOfLayover = normalize(NumOfLayover), NumOfStopover = normalize(NumOfStopover), SumAmt = normalize(SumAmt), AvgPlannedDays = normalize(AvgPlannedDays), CntRoundTrip = normalize(CntRoundTrip), CntOneWayTrip = normalize(CntOneWayTrip), Cnt_BC_Airport = normalize(Cnt_BC_Airport), Cnt_BC_OutSideBooking = normalize(Cnt_BC_OutSideBooking), Cnt_BC_Reservations_Booking = normalize(Cnt_BC_Reservations_Booking), Cnt_BC_SCA_Website_Booking = normalize(Cnt_BC_SCA_Website_Booking), Cnt_BC_SY_Vacation = normalize(Cnt_BC_SY_Vacation), Cnt_BC_TourOperator_Portal = normalize(Cnt_BC_TourOperator_Portal), CntBkdClass_Coach = normalize(CntBkdClass_Coach), CntBkdClass_DiscountFirstClass = normalize(CntBkdClass_DiscountFirstClass), CntBkdClass_FirstClass = normalize(CntBkdClass_FirstClass), CntQ1 = normalize(CntQ1), CntQ2 = normalize(CntQ2), CntQ3 = normalize(CntQ3), CntQ4 = normalize(CntQ4), AvgGroupSize = normalize(AvgGroupSize), SumStartCitySouth = normalize(SumStartCitySouth), SumEndCitySouth = normalize(SumEndCitySouth), Numofvacation = normalize(Numofvacation) ) ``` * After the aggregation, we normalized the numeric data using min-max to avoid the situation where attributes with large scales swamping attributes with small scales. ## Clean Data Summary ```{r} summary(customer_df2) ``` ## Analysis of Customers Clusters ### Description and Rationale for the Chosen Analysis Given our main focus is on customers taking first-class flight, we want to segment these customers into heterogeneous groups, and look into potential attributes that lead to the diversity in consumer behaviors. Therefore, we performed clustering analysis among all customers who have booked first-class tickets, so that we can develop different marketing strategies tailored towards customers in different clusters. ### Execution and results We found first-class customers can be classified into 4 groups. Based on our analysis, cluster 4 has the least amount of customers, but customers in this cluster spend more and fly more frequently with Sun Country. A large proportion of customers in cluster 4 are ufly members and they are prone to book their tickets from website of Sun Country. ```{r, echo=False} library(tidyverse) setwd("") clusteredData <- read.csv("ClusteredData1_denormalized.csv") clusteredData2 <- clusteredData # replace(clusteredData$UflyMemberStatus,c(""),c("NotAMember")) # count of membership status by cluster clusteredData3 <- clusteredData2 %>% group_by(cluster, UflyMemberStatus) %>% summarise(n=n()) # replace "" by NotAMember in UflyMemberStatus clusteredData3$UflyMemberStatus <- as.character(clusteredData3$UflyMemberStatus) clusteredData3$UflyMemberStatus <- ifelse(clusteredData3$UflyMemberStatus=="","NotAMember",clusteredData3$UflyMemberStatus) # head(clusteredData3) # number of people per cluster clustersum <- clusteredData2 %>% group_by(cluster) %>% summarise(total=n()) clusteredData4 <- clusteredData3 %>% left_join(clustersum, by="cluster") %>% mutate(percet=n/total) clusteredData4$cluster <- as.character(clusteredData4$cluster) ggplot(clusteredData4, aes(x=UflyMemberStatus,y=percet,fill=cluster))+ geom_bar(position="dodge",stat="identity")+ ylab("percentage")+ ggtitle("UflyMember Status across clusters (%)")+ theme(plot.title=element_text(hjust = 0.5,size=15,face="bold")) # book from sc website bookings <- clusteredData %>% select(PaxID,Cnt_BC_Airport,Cnt_BC_OutSideBooking, Cnt_BC_Reservations_Booking, Cnt_BC_SCA_Website_Booking, Cnt_BC_SY_Vacation, Cnt_BC_TourOperator_Portal) %>% group_by(PaxID) %>% mutate(totalBooking=Cnt_BC_Airport+Cnt_BC_OutSideBooking+ Cnt_BC_Reservations_Booking+ Cnt_BC_SCA_Website_Booking+ Cnt_BC_SY_Vacation+ Cnt_BC_TourOperator_Portal) %>% select(PaxID,totalBooking) clusteredData5 <- clusteredData %>% left_join(bookings, by="PaxID") %>% select(cluster,Cnt_BC_SCA_Website_Booking,totalBooking) clusteredData6 <- clusteredData5 %>% group_by(cluster) %>% mutate(totalWebsiteBooking=sum(Cnt_BC_SCA_Website_Booking))%>% mutate(totalBooking2=sum(totalBooking))%>% select(cluster,totalWebsiteBooking,totalBooking2)%>% distinct()%>% arrange(cluster)%>% mutate(perct_webBooking=totalWebsiteBooking/totalBooking2) ggplot(clusteredData6,aes(x=cluster,y=perct_webBooking,fill=cluster))+ geom_bar(position="dodge",stat="identity")+ ggtitle("Booking From Sun Country Website (%)")+ ylab("Percentage")+ theme(plot.title=element_text(hjust = 0.5,size=15,face="bold")) ``` ### Interpretation To understand what factor(s) drive the ‘abnormal’ behavior of customers in cluster 4, we looked into each customer attribute, and found two attributes that distinguish cluster 4 from the rest clusters. The majority of customers in cluster 4 are Ufly members, whereas most customers in cluster 123 did not enroll in UflyRewards program. In addition, a significantly higher percentage (around 80%) of customers in cluster 4 booked their tickets from Sun Country website, as compared to cluster 1,2 and 3 (around 70%). ### Conclusion It is reasonable for Sun Country to promote enrollment in Ufly rewards Program to encourage first-class customers to fly with Sun Country. We could introduce Ufly-member-only travel package to stimulate the subscription. Details about travel packages will be provided in the next section. In addition, being Ufly members might encourage customers to book their tickets from Sun Country’s website, rather than third parties. ## Targeting the Potential Customer Segment *Description and Rationale for the Chosen Analysis* Given that all clusters are lucrative for Sun Country, we could send them emails to persuade them to join the membership. However, since they did not join before, we need to differentiate our membership program and provided benefits that they want. We could differentiate from two perspectives, low price and differentiation in packages, campaigns, and services. Since Sun Country did not follow the low-cost strategy, we chose to recommend what align with their strategy. Thus, we investigated the destinations and seasonality of each cluster to see whether we could target them using campaigns tailored to their traveling patterns. *Execution and Results (including code)* We investigated all seasons for all clusters. Since the destinations of cluster 1 and 2 do not vary across seasons, we showed the destinations of the entire span of the data (i.e., two years). For cluster 3 and 4, their destinations are different in summer and winter, and the distribution of destination is more uniform in the spring and the fall. Therefore, we only included the plots for summer and winter. ```{r, echo=False} # read data data <- fread('first_class_trip.csv', header = T, stringsAsFactors = T) customer <- read.csv('ClusteredData1_denormalized.csv', header = T) # plot where cluster 1, 2, 3, 4's destinations customer_destination <- customer %>% inner_join(data, by=c('PaxID'='PaxID')) ``` ```{r} # cluster1 # end city end1 <- customer_destination %>% filter(cluster.y==1 & ServiceEndCity.y!='MSP') %>% select(ServiceEndCity.y) %>% rename(ServiceEndCity = ServiceEndCity.y) %>% group_by(ServiceEndCity) %>% summarize(cnt = n()) %>% top_n(10, cnt) ggplot(end1, aes(x=reorder(ServiceEndCity, -cnt), y=cnt)) + geom_bar(position="dodge", stat="identity") + ggtitle('End Cities of Cluster1') + xlab('Service End City') + ylab('Number of Flights') + theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) # cluster2 end2 <- customer_destination %>% filter(cluster.y==2 & ServiceEndCity.y!='MSP') %>% select(ServiceEndCity.y) %>% rename(ServiceEndCity = ServiceEndCity.y) %>% group_by(ServiceEndCity) %>% summarize(cnt = n()) %>% top_n(10, cnt) ggplot(end2, aes(x=reorder(ServiceEndCity, -cnt), y=cnt)) + geom_bar(position="dodge", stat="identity") + ggtitle('End Cities of Cluster2') + xlab('Service End City') + ylab('Number of Flights') + theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) # cluster3 # Winter end3_winter <- customer_destination %>% filter(cluster.y==3 & ServiceEndCity.y!='MSP') %>% mutate(ServiceStartMonth = month(ServiceStartDate)) %>% filter(ServiceStartMonth %in% c(11,12,1,2, 3)) %>% select(ServiceEndCity.y) %>% rename(ServiceEndCity = ServiceEndCity.y) %>% group_by(ServiceEndCity) %>% summarize(cnt = n()) %>% top_n(10, cnt) ggplot(end3_winter, aes(x=reorder(ServiceEndCity, -cnt), y=cnt)) + geom_bar(position="dodge", stat="identity") + ggtitle('End Cities of Cluster3 from November to March') + xlab('Service End City') + ylab('Number of Flights') + theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) # Summer end3_summer <- customer_destination %>% filter(cluster.y==3 & ServiceEndCity.y!='MSP') %>% mutate(ServiceStartMonth = month(ServiceStartDate)) %>% filter(ServiceStartMonth %in% c(6, 7, 8)) %>% select(ServiceEndCity.y) %>% rename(ServiceEndCity = ServiceEndCity.y) %>% group_by(ServiceEndCity) %>% summarize(cnt = n()) %>% top_n(10, cnt) ggplot(end3_summer, aes(x=reorder(ServiceEndCity, -cnt), y=cnt)) + geom_bar(position="dodge", stat="identity") + ggtitle('End Cities of Cluster3 from June to August') + xlab('Service End City') + ylab('Number of Flights') + theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) # cluster4 # Winter end4_winter <- customer_destination %>% filter(cluster.y==4 & ServiceEndCity.y!='MSP') %>% mutate(ServiceStartMonth = month(ServiceStartDate)) %>% filter(ServiceStartMonth %in% c(11,12,1,2,3)) %>% select(ServiceEndCity.y) %>% rename(ServiceEndCity = ServiceEndCity.y) %>% group_by(ServiceEndCity) %>% summarize(cnt = n()) %>% top_n(10, cnt) ggplot(end4_winter, aes(x=reorder(ServiceEndCity, -cnt), y=cnt)) + geom_bar(position="dodge", stat="identity") + ggtitle('End Cities of Cluster4 from November to March') + xlab('Service End City') + ylab('Number of Flights') + theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) # Summer end4_summer <- customer_destination %>% filter(cluster.y==4 & ServiceEndCity.y!='MSP') %>% mutate(ServiceStartMonth = month(ServiceStartDate)) %>% filter(ServiceStartMonth %in% c(6, 7, 8)) %>% select(ServiceEndCity.y) %>% rename(ServiceEndCity = ServiceEndCity.y) %>% group_by(ServiceEndCity) %>% summarize(cnt = n()) %>% top_n(10, cnt) ggplot(end4_summer, aes(x=reorder(ServiceEndCity, -cnt), y=cnt)) + geom_bar(position="dodge", stat="identity") + ggtitle('End Cities of Cluster4 from June to August') + xlab('Service End City') + ylab('Number of Flights') + theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) ``` *Interpretation* On one hand, cluster 1 and 2 travelled to metropolitan regions (e.g., New York, Los Angeles, San Francisco) most frequently in all seasons. On the other hand, cluster 3 and 4 travelled to different types of destinations in different seasons. They travelled to metropolitan areas in the summer and leisure-and-warm destinations in the winter. *Conclusion* Given their travel pattern, we propose that Sun Country could collaborate with hotels and agents in different areas in different seasons and design Ufly-member-only travel packages to target different clusters. The choice of city and the number of travel agents and hotels in each city will be subject to the budget of Sun Country. Cluster 1 and 2 can be targeted by collaboration with hotels and agents in metropolitan areas such as New York, Los Angeles, San Francisco, Washington D.C., Dallas, Lancing, and Boston. For cluster 3 and 4, Sun Country could collaborate with hotels and travel agents in Orlando, Palm Springs, Las Vegas, Los Angeles, Phoenix, and the City of Harlingen in the winter (from November to March). In summer (from June to August), Sun Country could target cluster 4 by collaborating with hotels and agents in Los Angeles, San Francisco, New York, Dallas and target cluster 3 by collaborating with hotels and agents in Los Angeles, Seattle, New York, San Francisco, Boston. # Final Takeaways ## Conclusion Customers who travel with Sun Country more often are those who enroll in UflyRewards Program Frequent fliers are prone to book their tickets from Sun Country’s website Sun Country could launch different travel packages tailored towards customers in each cluster based on season. Package types are shown as below: Clusters and Season | Clusters and Season | Destination of Travel Packages | | -------- | -------- | |Cluster 1 & 2 (all year round)|Metropolitan areas| |Cluster 3 & 4 (summer)|Orlando, Palm Springs, Las Vegas, Los Angeles, Phoenix, and the City of Harlingen| |Cluster 3 (winter)| Los Angeles, Seattle, New York, San Francisco, Boston| |Cluster 4 (winter)| Los Angeles, San Francisco, New York, Dallas| ## Final Recommendation We believe Sun Country could greatly increase its profit margin by promoting Ufly membership among (infrequent) first-class customers through the Ufly-member-only travel package. In this way, customers are encouraged to subscribe as Ufly member to enjoy the benefits of the travel packages (e.g. discount on flight tickets, hotels and local guide). Once they are part of the membership, they are likely to travel more frequently with Sun Country, and book their tickets from Sun Country’s website. As a result, Sun Country could increase revenue and customer lifetime value because of the increase in customer loyalty and switching costs, and decrease its costs on commission fee paid to third party websites. # Appendix

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully