--- title: 'K-Means Clustering and High Confidence Leads using DeepLearning' disqus: hackmd --- K-Means Clustering and High Confidence Leads using DeepLearning === ## Table of Contents [TOC] ## Rationale Many real estate firms establish a set of rules to succeed in the real estate business: constant marketing to generate leads. However, the probability of gaining clientele follows a somewhat pareto principle where one can say 20% of the leads generate 80% of the income. By this rule, one can expect 100 leads to equate to 20 successful transactions. The average home value is estimated at $300,000 (add source). An average comission rate is 3%, thus obtaining around $180,000 for that yield. At 80%, this would make a net amount of $144,000. Each open house per Sunday yields around 7 contacts, in which, 14% of them are leads. Thus, we need around 714 contacts in order to make this income. This is equivalent to 102 weeks; close to 2 years. The purpose of this md is to document ways to make the process more efficient: reducing the amount of work and increasing the yield. I hypothesize that sucess-transactional leads are produced from homes that have higher interest based on popular features (price, bedroom, size, look of home, days on market) in direct relation with days on market(DOM). Days on market are assumed to associate with interest of the home, the quickness of the offers and qualification. This assumes to correlate with buyers of higher quality, thus higher lead generation and transaction efficiency. The plan is to use exploratory techniques on the data. We will then use **principle component analysis** to find which features are most important for property's days on market. Lastly, we will find one standard deviation of the DOM sample (shortest days) and compare it against the correlated/important features extracted. ## Methodology ```flow st=>start: Extract 10 year data e=>end: End step1=>operation: Perform Exploratory Analysis of Time-Series Data step2=>operation: Predict Features to 1STD DoM sampels (lesser tail) step3=>operation: Describe Central Tendancy (Mean, median, mode) to the 1STD of DOM samples e=>end: plot everything st->step1->step2->step3->e ``` ### Step 1: Extract Data 1. Contact CoreLogic to get API information. 2. Download 10 year worth of real estate data in Las Vegas 3. ###### tags: `realestate` `realestatestrategy`