Python Panda (Page: 7)
This will be very long list of most useful commands with panda, usually accompanied with some examples, if it is necessary for clear understanding. At he beginning I will introduce example sets and will continue to work only with them.
Pandas cheat sheet starting
For all your python code you need to load numpy and pandas libraries first. Also, defined here data sets we will use later for examples. Aslo we will use name df for our data-frame variable
import pandas as pd
import numpy as np
df1 = pd.DataFrame(
[[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]],
index = ['r1', 'r2', 'r3'],
columns = ['c1', 'c2', 'c3', 'c4'])
Pandas importing data
df = pd.read_csv('file_name.csv') – read CSV file
Read CSV file. Important parameters are:
- sep=',' - separator between columns
- header=0 - select the row for column names. Use header=None for ignoring
df = pd.read_excel('file_name.xlsx') – read EXEL file
df = read_sql(query, connection_object) – read SQL table-specific
This function can read directly from many different SQL databases. This is an example how to read from MySQL
import mysql.connector as sql
import pandas as pd
db_connection = sql.connect(host='hostname', database='db_name',
user='username', password='password')
df = pd.read_sql('SELECT * FROM table_name', con=db_connection)
df = pd.read_json(json_string) – read from JSON file
df = pd.read_html(url) – read table from URL given
pd.read_html(url) can read tables directly from HTM, but usually a lot of further cleaning required.
df = pd.read_table(filename) – read table from text file
This function is very similar to read_csv fucntion
df = pd.read_clipboard() - read content of clipboard
Read content of clipboard and send to read_table() function
Pandas exporting data
df = df.to_csv(filename) Write to a CSV file
df = df.to_excel(filename) Write to an Excel file
Before writing to excel, you need to specify the engine
writer = pd.ExcelWriter('./dataset/numbers.xlsx', engine='xlsxwriter')
numbers_df.to_excel(writer, sheet_name='Sheet1')
df = df.to_sql(table_name, connection_object) - Write to a SQL table
df = df.to_json(filename) - Write to a file in JSON format
View and inspect
df.head(n) – first n lines
df.tail(n) – last n lines
nrow, ncol = df.shape – number of rows and columns
nrow, ncol = df1.shape
print(nrow, ncol) # 3 4
df.info() - Index. DataType and Memory information
Statistical information about DataFrame
df.describe() - Summary statistic for numerical columns
Give basic statistic for all numerical columns, like number of elements, mean, standard deviation, minimal, maximal values and 25% 50% 75% quantilies
print(df1.describe())
# c1 c2 c3 c4
#count 3.0 3.0 3.0 3.0
#mean 5.0 6.0 7.0 8.0
#std 4.0 4.0 4.0 4.0
#min 1.0 2.0 3.0 4.0
#25% 3.0 4.0 5.0 6.0
#50% 5.0 6.0 7.0 8.0
#75% 7.0 8.0 9.0 10.0
#max 9.0 10.0 11.0 12.0
df.mean() - mean of all numerical columns
This function is similar to df.describe() but produce differently oriented dataframe
print(df1.mean())
#c1 5.0
#c2 6.0
#c3 7.0
#c4 8.0
#dtype: float64
df.corr() - correlation between columns in a DataFrame
This example is not very demonstrative, because all columns are “parallel” with 100% correlation.
print(df1.corr())
# c1 c2 c3 c4
#c1 1.0 1.0 1.0 1.0
#c2 1.0 1.0 1.0 1.0
#c3 1.0 1.0 1.0 1.0
#c4 1.0 1.0 1.0 1.0
df.count() - number of non-null values in each DataFrame column
Count any present numbers. 0 is counted also, this is not NULL
df.max() -highest value in each column
df.min() - lowest value in each column
df.median() - median of each column
df.std() - Returns the standard deviation of each column
Go to Page: 1; 2; 3; 4; 5; 6; 7; 8;
Published: 2021-11-05 09:11:16
Updated: 2021-12-17 02:48:39