# Trabajo practico final
## Limpieza
- Seleccionar las columnas o variables que nos interesan y limpiamos.
- Variables peso y altura dividir por 10 (Alguna otra mas?).
- La celda esta vacia si tiene valor 0.
- Checkear si un valor es o tiene sentido que tenga valor negativo.
- Edad en meses.
## Analisis Explorativo de Datos
### Analisis descriptivo
- Variables Comunes
- Tipos de datos
- Valores faltantes
- Variables numericas
- Estadistica de cuantiles
- Q1, Q2, Q3, min, max, range, interquartile range
- Estadistica descriptiva
- Mean, standard deviation, median
- Histograma de distribucion
- Variables categoricas
- Valores unicos
- Numero de ocurrencias de valores unicos
### Visualizacion
- Graficos
- Tipos de graficos
- Scatterplots
- Histogramas
- Barplots
- Boxplots
- Constantes (Lineas y puntos)
- Escalas logaritmicas y naturales
- Pch
- Colores
- Alpha
- Gradientes
- Labs y mains
- Legends
- Graficar variables y relaciones entre variables (Analisis de correlacion cualitativo)
- Agregar variables a considerar (en el dataset o no).
- Tablas o matrices varias (comparativas, etc.)
- Clusters
- Variables categoricas vs continuas
## Modelado
- Modelos lineales
- lm(x ~ y + z)
- lm(x ~ y * z) -> x = a * y + b * z + y:z
- rlm(x ~ y +/* z)
- lm(x ~ poly)
- lm(x ~ log)
- Valores predichos vs reales
- Error de ajuste y de prediccion
- Residuos
- MAE y PMAE
- $R^2$
- Clusters
- Comparacion de modelos lineales
- Tablas o matrices
- Residuos, MAE, PMAE, R^2, cantidad de variables, etc.
### Preguntas sobre el modelado
#### Fiabilidad y validez
- Does the model make intuitive sense? Is the model easy to understand and interpret?
- Are all coefficients statistically significant? (p-values less than .05)
- Are the signs associated with the coefficients as expected?
- Does the model predict values that are reasonably close to the actual values?
- Is the model sufficiently sound? (High R-square, low standard error, etc.)
---
# Papers / Referencias
## A demonstration of correlation graphs to human body dimensions [1]
- Seventeen of the 24 variables correlated strongly with at least one other variable in the dataset (rï³0.80).
- the F Group compared to 0.59 in the M group suggests that body shapes of persons in the F group are more proportional and predictable.
- n both cases the centrality and degree of the Weigh variable proposes that a large number of body measurements are strongly regulated by weight
- Weight plays a vital role in shaping body measurements and height is more influential in males compared to females. Older males and females often become more alike and this is often associated with changes in the lower torso. Males have an increased ability to change their upper torso and upper limbs disproportionally to the rest of their bodies.
- he diameter of joints in a personâs limbs (e.g. elbows, knees and ankles) are genetically determined and are less dependent on lifestyle, weight and age.
## Correlation Analysis on the Main and Basic Body Dimension for Chinese Adults [2]
- There’re significant linear correlations between weight and circumference/width/thickness related measurements (Correlation coefficients are basically above 0.6).
- There’re comparatively closer correlations between stature and height related items, with the majority of the correlation coefficients being above 0.8. There’re also comparatively significant linear correlations between stature and length related items, with the correlation coefficients being basically between 0.5 and 0.7.
- There’re comparatively closer correlations between chest circumference/waist circumference/hip circumference and weight/circumference/thickness/width related measurements. What’s more, the correlations between chest circumference, waist circumference and hip circumference are also relatively closer. The correlation coefficients are all above 0.8.
- There’re comparatively weaker correlations between head/face measurements and the five independent variables, with the correlation coefficients all being below 0.5.
- Among the hand and foot related measurements, only hand length, palm length and foot length have closer correlations with stature (Their correlation coefficients are above 0.7)
## Relationship between Human Body Anthropometric Measurements and Basal Metabolic Rate [3]
## Correlation Between Body Measurements of Different Genders and Races [4]
As expected, height and arm span have high correlation, and in most cases, the two measurements have very little difference when sampled on the same subject. However, the ratio between head width andheadheight and the ratio between foreheadheightand lower face length was closer to 0.7. It was also concluded that African Americans tend to have larger heights, arm spans, head heights, and lower faces. Based on the data from this experiment, people of Oriental descent had larger foreheads and shorterarms in comparison to their height, and Caucasians had the shortest arm spans. South Asians had almost the same ratio for head width and head height asforehead height and lower face. There are several ways this experiment could be improved or more accurate. Several of our subjects were younger than 18, and some of them may not have fully finished growing. Also, asking subjects to remove shoes or testing subjects at the same time each day would have also improved the accuracy of the height measurements. In addition to this, while ImageJ can give a very accurate estimate of measurements, there arestill some errors. Further research could be done by sampling a larger group of young adultsrandomly, instead of a smaller concentrated group of teenagers from just one camp
[1]:https://academicjournals.org/article/article1380795025_Bezuidenhout%20and%20Domleo.pdf "A demonstration of correlation graphs to human body
dimensions"
[2]:https://link.springer.com/chapter/10.1007/978-3-319-21070-4_4 "Correlation Analysis on the Main and Basic Body Dimension for Chinese Adults"
[3]:https://www.intechopen.com/chapters/63234 "Relationship between Human Body Anthropometric Measurements and Basal Metabolic Rate"
[4]:https://www.semanticscholar.org/paper/Correlation-Between-Body-Measurements-of-Different-Goel-Tashakkori/dab02a09f645fff906d1a2979d0dc3fa8c0cd744#paper-header "Correlation Between Body Measurements of Different Genders and Races"
## Complementario / Miscelanea
[Visualizar correctamente](https://seaborn.pydata.org/tutorial.html)
[Correlation and
Regression Analysis](https://www.oicstatcom.org/file/TEXTBOOK-CORRELATION-AND-REGRESSION-ANALYSIS-EGYPT-EN.pdf)
Body measurements correlation (google images)
# Trabajo modelo
[Exploring Relationships in Body Dimensions](http://jse.amstat.org/v11n2/datasets.heinz.html)
# Notas
- [x] Particionar el dataframe.
- [ ] Checkear con otros datasets los resultados finales.
- [ ] ACC en subseccion?
- [ ] Existen variables independientes y variables dependientes?
- [ ] Hipotesis en funcion de papers y luego yendo a los datos ver si se cumplen
- [ ] Proporciones
- [ ] Si existen medidas o magnitudes que estan determinadas por la genetica, es posible predecir si la persona es hombre o mujer
- Se establece establece un modelo agarrando ciertas variables geneticas, y se divide el dataset entre hombres y mujeres. Luego de entrenar el modelo, se ve en el dataframe final que tan certero es.
- [ ] Variables categoricas en funcion de edad
- [ ] Variables que son geneticas que ayudan a predecir variable categoricas pero otras que no
- [ ] El crecimiento en niños distorciona las correlaciones.
# Referencias