python
bash
PBS
array.array()
len()
DataFrame.fillna()
DataFrame.replace()
DataFrame.rename()
DataFrame.isin()
DataFrame.drop_duplicates()
DataFrame.sort_values()
pd.DataFrame()
DataFrame.info()
DataFrame.isnull()
DataFrame.notna()
DataFrame.to_csv()
DataFrame['column'].unique()
numpy.c_
numpy.r_
from
import
DataFrame.values
flatten()
tuple()
reduce()
DataFrame.iloc()
__init__()
__name__
+
lambda
sum()
What is an API? An API is a way of returning data by sending a HTTP or GET request to a website or server and it will send you a response back with data. Dash and python 2: Dash core components
Call for a python script from a PBS job
Adrian Campos's example of calling a python script from a PBS job
pip
is a python dependency manager.
Syntax pip freeze
shows python packages that have been installed
Syntax pip install -r D:\googleDrive\DataCamp_Dash-for-beginners\python-getting-started\requirements.txt
will install packages and versions specified in the requirements.txt file
Syntax pip list
lists all the installed python packages. Note that if multiple python script folders (e.g., D:\Anaconda3\Scripts
and D:\python_3.8.1\Scripts
) have been added to the Path environmental variable, this will list the installed packages under the first directory. This was changed later to D:\python_3.8.1\Scripts;D:\Anaconda3;D:\Anaconda3\Scripts;D:\Anaconda3\Library\bin
A module is a single .py file (or files) that are imported under one import and used.
import aModuleName
Here 'aModuleName' is just a regular .py file.
Python3 importlib.util.spec_from_file_location with relative path?
import A
import a Python module called Aimport A as a
import a Python module called A and abbreviate this module as 'a'from B import c
import a function 'c' from the module 'B'
Use 'import module' or 'from module import'?
Creating and Importing Modules in PythonA package is a collection of modules in directories that give a package hierarchy. A package contains a distinct __init__.py
file. from aPackageName import aModuleName
Here 'aPackageName' is a folder with a __init__.py
file and 'aModuleName', which is just a regular .py file. Therefore, the correct version of your proj-dir would be something like this,
proj-dir –|–init.py –package1 –|–init.py –|–module1.py –package2 –|–init.py –|–module2.py
Python3 importlib.util.spec_from_file_location with relative path?
Dash is a web application framework that provides pure Python abstraction around HTML, CSS, and JavaScript. Instead of writing HTML or using an HTML templating engine, you compose your layout using Python structures with the dash-html-components library. The source for this library is on GitHub
Here is an example of a simple HTML structure:
which gets converted (behind the scenes) into the following HTML in your web-app:
<div> </div>
are meant to divide down your elements to structure your page, and other tags like sections, articles header…etc are just a kind of aliases of the div we know, and let's face it with their names they are making our life easier!
The first few attempts to install the gunicorn package in cmd.exe with the following commands
resulted in the following error:
Could not fetch URL [need more reputation to post link]: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org (http://pypi.org)', port=443): Max retries exceeded with url: /simple/pip/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not avaiable.")) - skipping")
Following the steps at How do I set or change the PATH system variable? and Requests (Caused by SSLError(“Can't connect to HTTPS URL because the SSL module is not available.”) Error in PyCharm requesting website (https://stackoverflow.com/questions/54135206/requests-caused-by-sslerrorcant-connect-to-https-url-because-the-ssl-module), this issue has been resolved by adding the following the to the Path environmental variable
;D:\Anaconda3;D:\Anaconda3\Scripts;D:\Anaconda3\Library\bin;D:\python_3.8.1;D:\python_3.8.1\Scripts
__file__
__file__
is the pathname of the file from which the module was loaded, if it was loaded from a file. This means __file__
will only work when you run it as a script not in interpreter.(unless you import it in interpreter what does the __file__
variable mean/do?.
__name__
__name__
is a special Python variable. It gets its value depending on how we execute the containing script. When a script is running directory, the value of __name__
is __main__
. When a script is running by importing as a module in another script, the value of __name__
is the name of the imported script.
What’s in a (Python’s) __name__
?
What does if __name__ == “__main__”:
do?
array is a collection of elements of the same type.
Syntax array.array(data_type, value_list)
creates an array of the same type specified by data_type. Note array.array() takes no keyword arguments.
Python Arrays
Syntax numpy.r_[array1,array2,array3]
concatenates 3 arrays row-wise. The merged array grows in rows with column dimension unchanged
numpy中np.c_和np.r_
Syntax numpy.c_[array1,array2,array3]
concatenates 3 arrays column-wise. The merged array grows in columns with row dimension unchanged
Syntax array.shape
shows the dimension of the array as (rows, columns)
dictionary.keys()
shows all the keys
dictionary.values()
shows all the values
dictionary.items()
returns items in a list format of (key, value) tuple pairs
Syntax list[0:10]
or list[:10]
will give you the first 10 elements of this list using slicing
Python: Fetch first 10 results from a list [duplicate]
Syntax list[-10:]
will give you the last 10 elements of this list using slicing
Syntax len(list)
counts the number of items in the list
A list of dictionaries Book - 5.4) Python- Combining Lists and Dictionaries
is graphically like a table, where each key
Each row in the list is an instance (i.e. dictionary)
Change the sharing settings of the CSV file to public. Who has access : Anyone who has the link can view
The Link to share is like https://drive.google.com/file/d/1oNrjNmIF42SfIUzki-iyXnR04FJ2e4Jy/view?usp=sharing
, which ends with usp=sharing
Replace the link to share with the value of orig_url
in the following code
Pandas: How to read CSV file from google drive public?
Syntax DataFrame.columns.values
Syntax DataFrame.columns= ['NewColumn1','NewColumn2','NewColumn3']
adds header rows to a 3 column DataFrame
How to add header row to a pandas DataFrame
syntax DataFrame.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2',...})
Renaming columns in pandas
DataFrame["A"]= pd.to_numeric(DataFrame["A"])
change the A column to numeric
Syntax DataFrame[['column_1','column_2','column_3']]
Selecting multiple columns in a pandas dataframe
Syntax DataFrame.loc[:,np.r_[0,n1:n2]]
selects all rows and column first, n1+1 ~ n2-1
Selecting multiple dataframe columns by position in pandas [duplicate]
DataFrame['NewColumn']= DataFrame['string_column1'] + DataFrame['string_column2']
DataFrame['NewColumn']= DataFrame['string_column1'] + " " +DataFrame['string_column2']
DataFrame['NewColumn']= DataFrame['numeric_column1'].apply(str) + DataFrame['numeric_column2'].apply(str)
convert the column to string using .apply(str)
before the concatenation
Syntax DataFrame.drop_duplicates()
How to “select distinct” across multiple data frame columns in pandas?
Syntax DataFrame_sorted= DataFrame.sort_values(['column1','column2'], ascending=[True, False])
Syntax DataFrame.sort_values(['column1','column2'], ascending=[True, False], inplace=True)
How to sort a dataFrame in python pandas by two or more columns?
Syntax: DataFrame[(condition1) & (condition2) | (condition3) ~ (condition4)]
& means and | means or ~ means not
A condition can be specified as DataFrame['column'] ==
or DataFrame['column'] !=
Filtering Pandas Dataframe using OR statement
Syntax DataFrame[(condition1) ~(condition2)]
filters the DataFrame with 1 positive condition and 1 negative condition.
condition1 can be written as DataFrame['column1'].isin(list1) ==
condition2 can be DataFrame['column2'].isin(list2)
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
df[df['column1']=='condition']['column2']
selects rows based on column1 and selects column2
Syntax DataFrame['column'].isnull().sum()
counts number of missing values in the column
Syntax DataFrame['column'].notna().sum()
counts number of non-missing values in the column
Syntax DataFrame.isnull().sum()
counts total NaN at each column in a DataFrame
Python Pandas : Count NaN or missing values in DataFrame ( also row & column wise)
Syntax DataFrame.fillna(value={'column1': value1, 'column2': value2},...)
replaces missing values in column1 with value1 and missing values in column2 with value2
HOW TO USE THE PANDAS FILLNA METHOD
The replaced values can be DataFrame['column'].mean()
, DataFrame['column'].median()
, DataFrame['column'].mode()
DataFrame.replace(to_replace = np.nan, value =-99999)
replaces NaN values with -99999
DataFrame.replace(to_replace="A", value="a")
replaces "A" with "a" across all columns
DataFrame.replace(to_replace=["A","B"], value="ab")
replaces "A" and "B" with "a" across all columns
Python | Pandas dataframe.replace()
DataFrame[condition]
DataFrame[(condition 1) | (condition 2)]
DataFrame[(condition 1) & (condition 2)]
pd.melt(DataFrame, id_vars=['g1','g2','g3'], value_vars=['m1','m2','m3'])
reshapes the data from wide to long format. The m1, m2, m3 columns are grouped into 2 new columns variable and value. The variable contains the name of m1, m2, m3 and the value column contains their values.
mydict= dict(zip(DataFrame['columnA'], DataFrame['columnB']))
creates a new dictionary using columnA as key and columnB as value. python pandas dataframe columns convert to dict key and value
DataFrame.to_csv(path_or_buf='filepath',sep=',',na_rep='',header=True)
export a python object as a file
pandas.DataFrame.to_csv