Alex Zavalny
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Previously on Drexel AI: https://arxiv.org/pdf/2001.09764.pdf features: * crime location $(x,y)$ * time $t$ $f((x,y), t)=\text{crime type}$ more practically, we can also give the user how much % confident the model is on classifying the crime by probabilities. techniques used (for reference): data cleaning, feature selection, outlier detection, and component reduction and transformation. engineered features: * cluster center of high crime areas and use distance Preprocessed features: 1) Hour 2) Month 3) Year 4) DayOfWeek 5) Is_Weekend 6) X 7) Y 8) Is_Intersection 9) Is_Block 10) Police District 11) Street_Type, (St, Blv, Ave etc) previously researchers study time (regression of crime point over steps in time series) and location (predict neighbors of crime dense locations) Looked at specific crime incidents and aggregated them over hours, months, and years to find patterns in the data. ml models do not depend on city as clusters of **crime hotspots** are automatically generated. We can also stack crime points throughout all time on top of each other to get rid of the temporal dimension (time). This means a clustering technique would be able to set this up for any city or town to reduce data preprocessing work. * They only got a 20-30% accuracy which is very low. We should be able to beat it with more features and possibly different models. This is why they reported log loss instead of accuracy on the abstract which shows that this problem is likely more difficult than we thought. * We also have better services available for model training (services other than Colab). Also, Colab has been providing better GPUs recently for free so we might not face session timeouts (meaning better data reporting) ## General ideas/things after reading the research Time used was the dispatch time that the 911 call operator recorded * Not sure if this matters a lot but just wanted to mention * We too will likely use the dispatch time given This research used Euclidean Distance, but we can also try other methods to calculate distance like city-block distance (mentioned in Future Works) It would also be interesting to see how Covid-19 affected the crime rates (just like the effects of 2009 recession seen in this research) Note about data: In the Crime Incidents dataset visualizatoin (https://data.phila.gov/visualizations/crime-incidents), "Homocide - Criminal" has two different entries (essentially same name but different categories). This research treated it as one and we might need to do the same so just something to keep in mind If the missing data is only lat-lng values, then we can potentially use the address given to find the lat-lng values and fill in the missing data * Might be a better option than deleting entire entries # Literature Review: ## Crime Analysis Through Machine Learning (2018) https://ieeexplore.ieee.org/document/8614828 Summary: This paper analysis Vancouver's crime data and tries to build a crime prediction model using K-nearest neighbor and boosting decision tree. The former achieves an accuracy of 39% while the later achieves 44%. Overall, the paper provides good insight into how crime analysis and prediction could be performed. Provides a review of various other research papers related to crime analysis and prediction, most of which used different techniques than the one presented here. Introduction: * This research focuses on machine-learning-based crime prediction * Vancouver PD has been managing a crime database that gets updated every Sunday morning, showcasing the crime that took place in the city each week * VPD introduced a crime-predictive model and saw a 27% decrease in residential break-ins * Main objective is to use VPD's crime dataset + Vancouver's neighborhood dataset to create an accurate crime prediction model (target is crime type) * Techniques used: KNN and boosted decision tree Background * Lot of background study done related to past research in crime prediction, analysis and control * Some interesting studies and findings * One research used KNN, Naïve Bayesain and Decision Tree to study road accident patterns in Ethiopia and achieved accuries from 79% to 81% * Most research in crime prediction is focused on identifying crime hotspots * One study in Vancouver tried to model known offenders' activities using probalistic modeling of spatial behavior known for these offendors * Something that could be done for Philly as well * One study analyzed various crime-prediction methods and the results are as follows: "Knowledge Discovery in Databases (KDD) techniques, which combine statistical modelling, machine learning, database storage, and AI technologies, was suggested as an effective tool for crime prediction" Techniques Used in the Research * Like Philly crime research, time and location were used in the data. They also used neighborhood data for Vancouver to distinguish crime amongst its 22 neighborhoods/areas * Two approaches were used for data preprocessing * Approach 1: All categorical variables were converted into binary variables 0 and 1. Basically, for each data point, there were 21 zeros and 1 one to represent the neighborhood in which the crime took place. Similary, there all the days in a week were made into feature and 1 was used to show the day on which the crime took place * Benefits: Gave more variables to train the model on, and prevented data from skewing to one side * Approach 2: Categorical variables were converted into numerical values with unique IDs. All crime types and neighborhoods had different IDs, and these values were used in each data point ![](https://hackmd.io/_uploads/ryHir0902.png) Results * Boosted decision tree performed better than KNN * KNN results * Approach 1: Accuracy - 40.1%, Training time - 2209 seconds * Approach 2: Accuracy - 39.9%, Training time - 102 seconds * Boosted Decision Tree * Approach 1: Accuracy - 41.9%, Training time - 904 seconds * Approach 2: Accuracy - 43.2%, Training time - 459 seconds Observations * Used Choropleth Mapm to describe the geographic info. about crime incidents * GIS has been used for crime mapping (shows location of crime series with varoius geographic locations) * The addresses were converted into latitude and longitude data (WGS84) * Python libraries used for plotting graphs: PySal, GeoPandas, Folium, Shapely * 0s and NA were used to fill missing values * Overall crime pattern pattern similar to Philly one * Increase in crime in Summer, with peak around June to August and decrease in winter, with least being December and February * Like Philly, crime at its lowest around 5 to 6 am and starts to increase around lunch time (~12pm) and continues to increase till midnight ## Predicting and Preventing Crime: A Crime PredictionModel Using San Francisco Crime Data by Classification Techniques https://www.hindawi.com/journals/complexity/2022/4830411/ https://www.kaggle.com/competitions/sf-crime https://datasf.org/opendata/ Summary: A study that compared and proposed crime prediction models based on Naive Bayes, Random Forest, and Gradient Boosting Decision Tree. The model analyzed top ten crimes in San Francisco area and achieved accuracy of 65.82%, 63.43%, and 98.5%, respectively. Introduction * This study proposes a prediction model that can predict crime in San Francisco based on historical data. * Uses the SF Crime Classification dataset found and managed on Kaggle (used in competitions as well) * Naive Bayes, Random Forest, and Gradient Boosting Decision Tree are used for prediction and classification of crimes into two types of violent and nonviolent crimes Background * The researchers summarized previous research articles related to crime prediction and analysis especially ones focusing on SF * One research comparing Naive Bayes and Decision Tree classifiers found Naive Bayes classifier as the better performing one * Other researchs disagreed, with one proposing Gradient Tree Boosting and other showing Decision Tree classifier to be better suitable for crime classification problem * The Decision Tree classifier achieved 83.95% accuracy. The main focus was prediction of crime categories for different states in US Summary of Data: * 9 total selected features: Date, Category, Description, DayOfWeek, PdDistricts, Resolution, Address, X, Y * Description and Resolution are short descriptions of crimes and their results and thus were dropped from the data * * Data Transformation as follows: * Date broken down into Year (2003-2015), Month(1-12), Day(1-31), Hour(0-23) * DayOfWeek and PdDistrict indexed and replaced by numbers in (1-7) and (1-10) respectively * 878049 total records with 80/20 validation-test split (after shuffling) * For prediction, the dependent variable is Category (i.e. the type of crime). The rest are used as independent variable * For classification, the main objective is to classify crime as either violent or nonviolent Data Analysis Results * Like other studies, this study used graphs based on varying sets of time (hour, week, month, year) to find patterns in the data * Commonalities with Philly: * \>30 unique crime types measured in the study (although only top 10 were used for analysis) * Crime increased and decreased based on seasons * Thefts, Narcotics/Drug Law Violation, Vandalism, Vehicle Thefts etc are among the most common crime types in both cities * Interestingly, when viewing total crimes per hour, both cities experience decreased crime between 3am to 6am and start to see a peak around 5pm to 6pm where crimes increase until midnight * Might show that crime pattern in a day do not change with cities and thus the model could be broadly applicable * Differences * In SF, crime peaks around Winter and Fall while in Philly, crime peaks around Spring and Summer * This suggests season can be an important factor when focusing on crime rate and density and the peak crime season varies based on geography Summary of Prediction and Classification Model * Metrics used: Accuracy, Precision, Recall for prediction models and ROC and Lift for classification models * Equations used based on confusion matrix * $Accuracy = TP + TN/TP + FP + TN + FN$ * $Recall = TP/t = TP/(TP + FN)$ * $Specificity = TN/n = TN/(TN + FP)$ * $Precision = TP/p = TP/(TP + FP)$ * Classification Results (Testing Data only) * Naive Bayes ![](https://hackmd.io/_uploads/SkNSmTjCn.png) * Random Forest ![](https://hackmd.io/_uploads/SJAvQas0h.png) * Gradient Boosting Decision ![](https://hackmd.io/_uploads/B1Uc7TjRn.png) * Prediction Results (Testing Data only) | Method | Accuracy | Precision | Recall | |:-------------------------- | -------- | --------- |:------ | | Naive Bayes | 64.33% | 64.67% | 63.88% | | Random Forest | 63.43% | 63.29% | 62.80% | | Gradient Boosting Decision | 99.75% | 100% | 99.50% | ## Aoristic Crime Analysis Introduction * In crime analysis, crime hotspots are often used to find areas with higher density of criminal activities. * This research outlines a different approach at finding these crime hotspots and presents a framework for temporal analysis of aoristic crime data (aoristic = without defined occurance in time) * At the time, there was more emphasis placed on spatial data rather than temporal data in crime analysis (as evident by algorithms like Openshaw's GAM) * Focusing on temporal analysis can help us identify patterns in crime and focus on lower density crime areas where increasing crime may not be evident by just spatial analysis ---- Notes for this research halted for now due to inapplicability with current research data ---- ## A Time Series Analysis of Associations between Daily Temperature and Crime Events in Philadelphia, Pennsylvania **Introduction** * Temparature and its effects on several factors have been studied in past * Example: Temparature and aggressive behavior (hottest and coldest temperatures have a high correlation with increase in aggressive behavior), temperature, and mortality and morbidity etc * Likewise, it would be of interest to study how temperature and fluctuations in temperature is associated with crime and whether it has any impact on crime or not * Study findings: "There was a positive, linear relationship between deviations of the daily mean heat index from the seasonal mean and rates of violent crime and disorderly conduct, especially in cold months" * NOTE: Only studied specific categories of crime (disorderly conduct and violent crimes) so findings might not generalize to entire population of crimes * Theories explaining relationship between temperature and aggressive behavior | Theory | Summary | | ------------------------------- |:---------------------------------------------------------------------------------------- | | Negative affect escape model | Aggressive behavior highest at moderate temperatures (lower at highest and lowest temps) | | Simple negative affect model | Aggressive behavior highest at coldest and hottest temperatures | | General affect aggression model | Linear relationship between temperature and aggression | ***Routine Activity Theory*** : "Treats crimes as events that occur as a result of spatial and temporal meeting of motivated offenders with suitable targets, and during times when individuals who would prevent crimes from occuring are absent" * Conducted a time-series analysis to find associations between temperature and crime **Data + Methodologies used** * Used crime data from January 1, 2006 through December 31,2015 from OpenDataPhilly * Categorized crimes into Part 1 crimes (40% of total crime) and Part 2 crimes (60%) * Part 1 crimes: homicide, rape, robbery, aggravated assault, burglary, and thefts * Part 2 crimes: assaults, arson, forgery and counterfeiting, fraud, embezzlement, receiving stolen property, vandalism/criminal mischief, weapon violations, prostitution and commercialized vice, other sex offenses, narcotic/drug law violations, gambling violations, offenses against family and children, driving under the influence, liquor law violations, public drunkenness, disorderly conduct, and vagrancy/loitering * Mostly focused on **three** groups of crime: violent crimes, robberies, and disorderly conduct * Measured association betweent temperature and crime in two ways: analyzing all data points from 2006 to 2016, and secondly based on seasons (fall, winter, spring summer) * Later, they also evaluate patterns by *warm months* (May-Septmeber) and *cold months* (October-April) * Used R to derive *heat index*, *daily heat index values*, and *seasonal mean heat index value * Heat Index is derived from temperature and dew point and it represents thermal comfort * *Seasonal mean heat index value*: $\frac{\sum_{i=0}^nHI_i}{N}$, where i=0 is the presumed first day of season, n is the presumed last day of season, $HI_i$ is the daily mean heat index value for $i$th day, and $N$ is the total number of days in the season * *Measuring association between $HI_i$ and seaonsal mean heat index value*: $HI_i$ - *seasonal mean* * Used all these values to derive **relative rates (RR)** and **95% confidence intervals** of the association between daily heat index and crime * Analyzed associations for all calendar months + warm and cold months * Used median of the mean daily heat index as reference temperature for RR and CI * RR values calculated for 0.1, 5th, 75th, 90th and 99th percentile of the distribution for each temperature metric **Results** *Associations with Daily mean heat index* * Daily heat index results by season | Season | Mean | SD | |:------ | ---- |:--- | | Spring | 15.7 | 6.8 | | Summer | 25.1 | 3.8 | | Fall | 10.6 | 6.6 | | Winter | 2.3 | 5.1 | * Changes in crime based on 75th and 99th percentile in temperature * The percent higher reflects how much the rate of crime increased relative to the rate at median of distribution | Type of crime | % higher (75th) | % higher (99th) | | ------------------ | ------------------ | ------------------ | | Violent crimes | 8% (95% CI 6-10%) | 9% (95% CI 6-12%) | | Disorderly Conduct | 13% (95% CI 6-21%) | 7% (95% CI -4-19%) | | Robberies | Not reported | Not reported | * Note: Robberies increased as temperatures increased only until the median * *Cold Months*: There was a nearly complete linear relationship between the daily mean heat index and rates of disorderly conduct and violent crime | Type of crime | % higher (5th) | % higher (75th) | % higher (99th) | | ------------------ |:------------------------ | ------------------ | ------------------- | | Violent crimes | -12%(95% CI -14%, -10%) | 5% (95% CI 3-7%) | 16% (95% CI 12-21%) | | Disorderly Conduct | -19% (95% CI -26%, -12%) | 8% (95% CI 2-13%) | 23% (95% CI 10-39%) | | Robberies | Not Reported | Not reported | Not reported | *stats for cold months only* * *Warm Months*: RR estimates close to null for all 3 crimes. For all crimes, part 1 crimes and part 2 crimes, the crimes were highest at the median of the distribution of the mean heat index values *Associations with Seasonal Mean Heat Index Deviations* * Reminder, deviation on ith day is calculated by $HI_i - seasonal\_mean\_HI$ * *Violent Crimes*: Linear relationship between heat index deviation values and violent crimes * *Disorderly Conduct*: Like violent crimes, it has a linear relationship with heat index deviation values * *Robberies*: Association with heat index deviations value close to null Again, all values/rates are relative to days that had same daily mean HI as seasonal index (so deviation = 0) | Type of crime | % higher (99th percentile or +13°C than seasonal mean heat index) | | ------------------ | ------------------ | | Violent crimes | 5% (95% CI 3, 8%) | | Disorderly Conduct | 7% (95% CI -1,15%) | | Robberies | Not reported | * *Cold Months*: linear relationship between the deviation and RR of violent crime and disorderly conduct * For rest of crimes, part 1 crimes, part 2 crimes and robbery, the RR estimates were close to null * *Warm Months*: Overall relationship between season mean heat index deviation and crime = close to null **Takeaways** * Rate of crime, especially for disorderly conduct and violent crime, was highest when temps were comfortable (above the median). Highest crime rate when temperatures were warm (i.e. higher percentile) # Datasets: * Crime Incidents https://data.phila.gov/visualizations/crime-incidents * features: * district * psa * dispatch date and time * address of crime * ucr * type of crime * x,y location of crime * Arrests * https://opendataphilly.org/datasets/arrests/ * features: * offense category * datetime * defendant race * count * Charges * By district * https://github.com/phillydao/phillydao-public-data/blob/main/docs/data/charges_data_daily_by_district.csv * * Citywide * https://github.com/phillydao/phillydao-public-data/blob/main/docs/data/charges_data_daily_citywide.csv * features: * date * dc district * crime category as one hot encoded feature columns * Case Length * By district * https://github.com/phillydao/phillydao-public-data/blob/main/docs/data/summary_charges_data_daily_by_district.csv * Citywide * https://github.com/phillydao/phillydao-public-data/blob/main/docs/data/summary_case_outcomes_data_daily_citywide.csv * features: * date * case outcome * crime category as one hot encoded feature columns * Case Outcomes * By district * https://github.com/phillydao/phillydao-public-data/blob/main/docs/data/case_outcomes_data_daily_by_district.csv * Citywide * https://github.com/phillydao/phillydao-public-data/blob/main/docs/data/case_outcomes_data_daily_citywide.csv * features: * date * dc district * case outcome * crime category as one hot encoded feature columns * other related datasets can be found: https://github.com/phillydao/phillydao-public-data

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully