# Quansight Data Science Residency Jam Session # 2019-11-13 ### Attendees * TonyFast - @tonyfast * Adam Lewis - @balast ### Magic of the day * https://github.com/jmcarpenter2/swifter * https://gist.github.com/tonyfast/410e7a5d42942eb0a750cddf31c504c9 * snakeviz, py-spy # 2019-11-13 ### Attendees * Tyler Potts - @t-potts * Adam Lewis - @balast * Dillon Roach - @dillonroach * Abraham Maxfield - @utabe ### Magic of the day Changes the path of your current notebook ```python % pushd <path> ``` # 2019-10-30 ### Attendees * Tony Fast - @tonyfast * Tyler Potts - @t-potts * Abraham Maxfield -@utabe * Adam Lewis - @balast ### Magic Dir() `dir()` will list all of the attributes of whatever object you pass to it. Caveat: dir can be customized in the class, so it might not always return what you want `[x for x in dir(pd.Series) if x != '_']` will only return the methods * https://en.wikipedia.org/wiki/Schr%C3%B6dinger%27s_cat * https://en.wikipedia.org/wiki/Uncertainty_quantification#Aleatoric_and_epistemic_uncertainty * http://www.ce.memphis.edu/7137/PDFs/Abrahamson/C05.pdf * https://gist.github.com/tonyfast/a858087a657bb8fbcb9a6b8771e3d025 # 2019-10-23 * https://lifelines.readthedocs.io/en/latest/Survival%20analysis%20with%20lifelines.html * https://anaconda.org/tonyfast/lifelines_next_try/notebook * https://gist.github.com/tonyfast/2f69f4003768c57fbf2675bfcee58397 ### Attendees * Tyler Potts - @t-potts * Adam Lewis - @balast * Pam Wadhwa - @ppwadhwa # 2019-10-16 ### Issues vs. Slack from data science communications (Saul) ### Attendees * Tony Fast - @tonyfast * Tyler Potts - @t-potts * Pam Wadhwa - @ppwadhwa * Adam Lewis - @balast * Abraham Maxfield - @utabe * Dillon Roach - @dillonroach # 2019-10-09 https://zoom.us/j/133471615 https://gist.github.com/sloria/7001839 https://github.com/simonw/datasette https://github.com/simonw/sqlite-utils https://gist.github.com/tonyfast/e638af10424de0284b36c0bf77fcd42a https://github.com/ibis-project/ibis https://datasette.readthedocs.io/en/stable/publish.html https://cloud.google.com/community/tutorials/bigquery-ibis https://docs.ibis-project.org/udf.html ### text, music, visuals @tonyfast- visual @adam - audio books @trent - music @t-potts music @pam - visual learner @fatma - visual & text @abe - visual & text @dillon - visual learner / music til the afterlife ### Attendees * Tony Fast - @tonyfast * Dillon Roach - @dillonroach * Abraham Maxfield -@utabe * Adam Lewis - @balast * Tyler Potts - @t-potts * Trent Oliphant - @trentoliphant * Pam Wadhwa - @ppwadhwa audioeye.com listing of image recognition services https://www.g2.com/categories/image-recognition #### Audioeye is currently using the following two endpoints https://api.projectoxford.ai/vision/v1.0/analyze?visualFeatures=Description,Tag We’re using that API for image recognition https://api.projectoxford.ai/vision/v1.0/ocr?language=en We’re using that for OCR ### Pandas formatting trick Line up all of your methods for easy readability ``` df = ( df.merge(other_df) .groupby(['thing1', 'thing2']) .sum() ) ``` ### Expand a Series of dicts to individual columns for each key `df.column_name.apply(pd.Series)` Kernel Shell https://en.wikipedia.org/wiki/Shell_(computing) # 2019-10-02 https://zoom.us/j/133471615 ### Attendees * Tony Fast - @tonyfast * Pam Wadhwa - @ppwadhwa * Adam Lewis - @balast * Abraham Maxfield - @utabe * Tyler Potts - @t-potts ## Pam - Presentations in Jupyter Notebooks [Using RISE for presentations](https://github.com/damianavila/RISE) ## Abe - GraphQl from github https://github.com/willingc/pyquery-ql https://gist.github.com/tonyfast/00ae53f59c9340f71b9605eca1d07019 import requests, pandas __import__('requests_cache').install_cache('gh') df = pandas.concat([pandas.DataFrame(requests.get("https://api.github.com/users?page={i}").json()) for i in range(10)]) ## Tyler - Aggravations from the Rock https://github.com/xonsh/xonsh Connect to EC2: ssh -i ~/.ssh/north_ca_analytics.pem ubuntu@ec2-18-144-54-246.us-west-1.compute.amazonaws.com Start Jupyter Lab from EC@ jupyter lab --no-browser --port=8800 Forward port 8800 from EC2 to local machine which can be accessed by going to url localhost:8800 on a web browser ssh -i ~/.ssh/north_ca_analytics.pem -N -f -L localhost:8800:localhost:8800 ubuntu@ec2-18-144-54-246.us-west-1.compute.amazonaws.com Copy a file from the EC2 to local scp -i ~/.ssh/north_ca_analytics.pem ubuntu@ec2-18-144-54-246.us-west-1.compute.amazonaws.com:~/insert_test.ipynb /home/tyler/ # 2019-09-25 https://zoom.us/j/133471615 ### Attendees * Tony Fast - @tonyfast * Dillon Roach - @dillonroach * Adam Lewis - @balast ## Dask Intro Presentation - @balast * https://docs.google.com/presentation/d/1lRHM9Tdb_u1t8oiPmPdtWtkjSndHx2ic49oeV9pCOTQ/edit?usp=sharing ## Sphinx - @ppwadhwa * http://www.sphinx-doc.org/en/master/usage/quickstart.html#setting-up-the-documentation-sources * The following is for setting up travis to run doctr to publish to gh-pages * https://drdoctr.github.io/ * https://github.com/ammaraskar/sphinx-action ## A Tale of Two 🐼s Dataframes. * There are informal names. 1-1 doesn't match as often as one would like. * [FuzzyWuzzy](https://github.com/seatgeek/fuzzywuzzy) wuz 🐻 * [Edit distance definition](https://en.wikipedia.org/wiki/Edit_distance) * [Jaro](https://rosettacode.org/wiki/Jaro_distance) * [Jaro-Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) * [Levenshtein](https://en.wikipedia.org/wiki/Levenshtein_distance) * [dedupe](https://github.com/dedupeio/dedupe) * [Record Linkage](https://recordlinkage.readthedocs.io/en/latest/about.html) * [CMU string search methods compare](https://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf) * Jellyfish ## [Notebook Documents](https://gist.github.com/tonyfast/7b66a5d9b09c4754c4a7577242975cb1) # 2019-09-18 https://zoom.us/j/133471615 ### Attendees * Tony Fast - @tonyfast * Dillon Roach - @dillonroach * Pam Wadhwa - @ppwadhwa * Adam Lewis - @balast ## Notes https://gist.github.com/tonyfast/87562b3e76e20855aa076bf793c8208f https://www.wired.com/story/artificial-intelligence-confronts-reproducibility-crisis/ https://pypi.org/project/requests-cache/ https://en.wikipedia.org/wiki/VisiCalc Linking a pdf in NBs, can be helpful to just link to the requested page with `#page2` Duck typing - style of dynamic typing in which an object's current set of methods and properties determines the valid semantics, rather than its inheritance from a particular class or implementation of a specific interface. If it quacks like a duck, assume it's a duck. globals.update(df) to push the DF names into global call name-space Going to try to include 3-5min per-person 'lightning round' where we all discuss/present something we learned during the week ## 2019-09-10 https://zoom.us/j/725442060 ### Attendees * Tony Fast - @tonyfast * Tyler Potts - @t-potts * Pam Wadhwa - @ppwadhwa * Dillon Roach - @dillonroach * Adam Lewis - @balast * Plotting dataframes with pandas, hvplot, holoviews. * https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html * https://hvplot.pyviz.org * https://holoviews.org/ * https://hvplot.pyviz.org/user_guide/Subplots.html * Deploying a panel application. ## Notes Error bars for holoviews: http://holoviews.org/reference/elements/matplotlib/ErrorBars.html Holoviews implements fuzzy string interpretation. When trying to discover options try something that sounds right and see if there is a suggestion in the error message. Use tab when exploring options for objects (ie self.params) DataFrames/hvplot allows linking of added views hvplot is a high-level API built on holoviews: https://mybinder.org/v2/gh/pyviz/holoviews/master?filepath=examples Anything with panel objects will serve as a panel object (panel serve) http://holoviews.org/reference/containers/bokeh/DynamicMap.html https://panel.pyviz.org ### pyviz ecosystem panel - widgets holoviews - for plots datashader - lotta point plots hvplot - plotting dataframes Make dataframes, not plots Altair provides a higher level plotting syntax for dataframes. That is partially consistent with holoviews. It renders json with an altair schema instead of bokeh or matplotlib. ### Tyler Lambda presentation https://aws.amazon.com/premiumsupport/knowledge-center/build-python-lambda-deployment-package/ https://blog.shikisoft.com/access-mongodb-instance-from-aws-lambda-python/ jupyter lab --no-browser --port=8800 ssh -i ~/.ssh/north_ca_analytics.pem ubuntu@ec2-18-144-54-246.us-west-1.compute.amazonaws.com ssh -i ~/.ssh/north_ca_analytics.pem -N -f -L localhost:8800:localhost:8800 ubuntu@ec2-18-144-54-246.us-west-1.compute.amazonaws.com scp -i ~/.ssh/north_ca_analytics.pem ubuntu@ec2-18-144-54-246.us-west-1.compute.amazonaws.com:~/insert_test.ipynb /home/tyler/quansight/saveday/lambda/