# Quansight Data Science Residency Jam Session
# 2019-11-13
### Attendees
* TonyFast - @tonyfast
* Adam Lewis - @balast
### Magic of the day
* https://github.com/jmcarpenter2/swifter
* https://gist.github.com/tonyfast/410e7a5d42942eb0a750cddf31c504c9
* snakeviz, py-spy
# 2019-11-13
### Attendees
* Tyler Potts - @t-potts
* Adam Lewis - @balast
* Dillon Roach - @dillonroach
* Abraham Maxfield - @utabe
### Magic of the day
Changes the path of your current notebook
```python
% pushd <path>
```
# 2019-10-30
### Attendees
* Tony Fast - @tonyfast
* Tyler Potts - @t-potts
* Abraham Maxfield -@utabe
* Adam Lewis - @balast
### Magic Dir()
`dir()` will list all of the attributes of whatever object you pass to it.
Caveat: dir can be customized in the class, so it might not always return what you want
`[x for x in dir(pd.Series) if x != '_']` will only return the methods
* https://en.wikipedia.org/wiki/Schr%C3%B6dinger%27s_cat
* https://en.wikipedia.org/wiki/Uncertainty_quantification#Aleatoric_and_epistemic_uncertainty
* http://www.ce.memphis.edu/7137/PDFs/Abrahamson/C05.pdf
* https://gist.github.com/tonyfast/a858087a657bb8fbcb9a6b8771e3d025
# 2019-10-23
* https://lifelines.readthedocs.io/en/latest/Survival%20analysis%20with%20lifelines.html
* https://anaconda.org/tonyfast/lifelines_next_try/notebook
* https://gist.github.com/tonyfast/2f69f4003768c57fbf2675bfcee58397
### Attendees
* Tyler Potts - @t-potts
* Adam Lewis - @balast
* Pam Wadhwa - @ppwadhwa
# 2019-10-16
### Issues vs. Slack from data science communications (Saul)
### Attendees
* Tony Fast - @tonyfast
* Tyler Potts - @t-potts
* Pam Wadhwa - @ppwadhwa
* Adam Lewis - @balast
* Abraham Maxfield - @utabe
* Dillon Roach - @dillonroach
# 2019-10-09
https://zoom.us/j/133471615
https://gist.github.com/sloria/7001839
https://github.com/simonw/datasette
https://github.com/simonw/sqlite-utils
https://gist.github.com/tonyfast/e638af10424de0284b36c0bf77fcd42a
https://github.com/ibis-project/ibis
https://datasette.readthedocs.io/en/stable/publish.html
https://cloud.google.com/community/tutorials/bigquery-ibis
https://docs.ibis-project.org/udf.html
### text, music, visuals
@tonyfast- visual
@adam - audio books
@trent - music
@t-potts music
@pam - visual learner
@fatma - visual & text
@abe - visual & text
@dillon - visual learner / music til the afterlife
### Attendees
* Tony Fast - @tonyfast
* Dillon Roach - @dillonroach
* Abraham Maxfield -@utabe
* Adam Lewis - @balast
* Tyler Potts - @t-potts
* Trent Oliphant - @trentoliphant
* Pam Wadhwa - @ppwadhwa
audioeye.com
listing of image recognition services
https://www.g2.com/categories/image-recognition
#### Audioeye is currently using the following two endpoints
https://api.projectoxford.ai/vision/v1.0/analyze?visualFeatures=Description,Tag
We’re using that API for image recognition
https://api.projectoxford.ai/vision/v1.0/ocr?language=en
We’re using that for OCR
### Pandas formatting trick
Line up all of your methods for easy readability
```
df = (
df.merge(other_df)
.groupby(['thing1', 'thing2'])
.sum()
)
```
### Expand a Series of dicts to individual columns for each key
`df.column_name.apply(pd.Series)`
Kernel Shell
https://en.wikipedia.org/wiki/Shell_(computing)
# 2019-10-02
https://zoom.us/j/133471615
### Attendees
* Tony Fast - @tonyfast
* Pam Wadhwa - @ppwadhwa
* Adam Lewis - @balast
* Abraham Maxfield - @utabe
* Tyler Potts - @t-potts
## Pam - Presentations in Jupyter Notebooks
[Using RISE for presentations](https://github.com/damianavila/RISE)
## Abe - GraphQl from github
https://github.com/willingc/pyquery-ql
https://gist.github.com/tonyfast/00ae53f59c9340f71b9605eca1d07019
import requests, pandas
__import__('requests_cache').install_cache('gh')
df = pandas.concat([pandas.DataFrame(requests.get("https://api.github.com/users?page={i}").json()) for i in range(10)])
## Tyler - Aggravations from the Rock
https://github.com/xonsh/xonsh
Connect to EC2:
ssh -i ~/.ssh/north_ca_analytics.pem ubuntu@ec2-18-144-54-246.us-west-1.compute.amazonaws.com
Start Jupyter Lab from EC@
jupyter lab --no-browser --port=8800
Forward port 8800 from EC2 to local machine which can be accessed by going to url localhost:8800 on a web browser
ssh -i ~/.ssh/north_ca_analytics.pem -N -f -L localhost:8800:localhost:8800 ubuntu@ec2-18-144-54-246.us-west-1.compute.amazonaws.com
Copy a file from the EC2 to local
scp -i ~/.ssh/north_ca_analytics.pem ubuntu@ec2-18-144-54-246.us-west-1.compute.amazonaws.com:~/insert_test.ipynb /home/tyler/
# 2019-09-25
https://zoom.us/j/133471615
### Attendees
* Tony Fast - @tonyfast
* Dillon Roach - @dillonroach
* Adam Lewis - @balast
## Dask Intro Presentation - @balast
* https://docs.google.com/presentation/d/1lRHM9Tdb_u1t8oiPmPdtWtkjSndHx2ic49oeV9pCOTQ/edit?usp=sharing
## Sphinx - @ppwadhwa
* http://www.sphinx-doc.org/en/master/usage/quickstart.html#setting-up-the-documentation-sources
* The following is for setting up travis to run doctr to publish to gh-pages
* https://drdoctr.github.io/
* https://github.com/ammaraskar/sphinx-action
## A Tale of Two 🐼s Dataframes.
* There are informal names. 1-1 doesn't match as often as one would like.
* [FuzzyWuzzy](https://github.com/seatgeek/fuzzywuzzy) wuz 🐻
* [Edit distance definition](https://en.wikipedia.org/wiki/Edit_distance)
* [Jaro](https://rosettacode.org/wiki/Jaro_distance)
* [Jaro-Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance)
* [Levenshtein](https://en.wikipedia.org/wiki/Levenshtein_distance)
* [dedupe](https://github.com/dedupeio/dedupe)
* [Record Linkage](https://recordlinkage.readthedocs.io/en/latest/about.html)
* [CMU string search methods compare](https://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf)
* Jellyfish
## [Notebook Documents](https://gist.github.com/tonyfast/7b66a5d9b09c4754c4a7577242975cb1)
# 2019-09-18
https://zoom.us/j/133471615
### Attendees
* Tony Fast - @tonyfast
* Dillon Roach - @dillonroach
* Pam Wadhwa - @ppwadhwa
* Adam Lewis - @balast
## Notes
https://gist.github.com/tonyfast/87562b3e76e20855aa076bf793c8208f
https://www.wired.com/story/artificial-intelligence-confronts-reproducibility-crisis/
https://pypi.org/project/requests-cache/
https://en.wikipedia.org/wiki/VisiCalc
Linking a pdf in NBs, can be helpful to just link to the requested page with `#page2`
Duck typing - style of dynamic typing in which an object's current set of methods and properties determines the valid semantics, rather than its inheritance from a particular class or implementation of a specific interface. If it quacks like a duck, assume it's a duck.
globals.update(df) to push the DF names into global call name-space
Going to try to include 3-5min per-person 'lightning round' where we all discuss/present something we learned during the week
## 2019-09-10
https://zoom.us/j/725442060
### Attendees
* Tony Fast - @tonyfast
* Tyler Potts - @t-potts
* Pam Wadhwa - @ppwadhwa
* Dillon Roach - @dillonroach
* Adam Lewis - @balast
* Plotting dataframes with pandas, hvplot, holoviews.
* https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html
* https://hvplot.pyviz.org
* https://holoviews.org/
* https://hvplot.pyviz.org/user_guide/Subplots.html
* Deploying a panel application.
## Notes
Error bars for holoviews:
http://holoviews.org/reference/elements/matplotlib/ErrorBars.html
Holoviews implements fuzzy string interpretation. When trying to discover options try something that sounds right and see if there is a suggestion in the error message.
Use tab when exploring options for objects (ie self.params)
DataFrames/hvplot allows linking of added views
hvplot is a high-level API built on holoviews: https://mybinder.org/v2/gh/pyviz/holoviews/master?filepath=examples
Anything with panel objects will serve as a panel object (panel serve)
http://holoviews.org/reference/containers/bokeh/DynamicMap.html
https://panel.pyviz.org
### pyviz ecosystem
panel - widgets
holoviews - for plots
datashader - lotta point plots
hvplot - plotting dataframes
Make dataframes, not plots
Altair provides a higher level plotting syntax for dataframes. That is partially consistent with holoviews. It renders json with an altair schema instead of bokeh or matplotlib.
### Tyler Lambda presentation
https://aws.amazon.com/premiumsupport/knowledge-center/build-python-lambda-deployment-package/
https://blog.shikisoft.com/access-mongodb-instance-from-aws-lambda-python/
jupyter lab --no-browser --port=8800
ssh -i ~/.ssh/north_ca_analytics.pem ubuntu@ec2-18-144-54-246.us-west-1.compute.amazonaws.com
ssh -i ~/.ssh/north_ca_analytics.pem -N -f -L localhost:8800:localhost:8800 ubuntu@ec2-18-144-54-246.us-west-1.compute.amazonaws.com
scp -i ~/.ssh/north_ca_analytics.pem ubuntu@ec2-18-144-54-246.us-west-1.compute.amazonaws.com:~/insert_test.ipynb /home/tyler/quansight/saveday/lambda/