Python Panda (Page: 1)
Pandas install and upgrade
To work with Panda, it is necessary to have pandas library to be installed. Also together with pandas, you need to have numpy library. Pandas is abstraction on top of NumPy. This abstraction will make simple and powerful tool for manipulation of NymPy arrays
Pandas instalation
To install panda, you can use pip or conda commands:pip install pandas or conda install pandas
Some method and functions in Panda are version specific, therefore you always need to know, what version you have
import pandas as pd
import sys
import numpy as np
print('Python version :' + sys.version) #Python version :3.6.3 ...
print('Pandas version :' +pd.__version__) # Pandas version :0.20.3
print('NumPy version :' +np.__version__) # NumPy version :1.19.5
Pandas upgrade
If you think that you would like to install the latest version, then it is possible to do with pip install --upgrade pandas command-line
~$ pip install --upgrade pandas
Requirement already satisfied: pandas in /home/luxs/anaconda3/lib/python3.6/site-packages (0.20.3)
Collecting pandas
Downloading pandas-1.1.5-cp36-cp36m-manylinux1_x86_64.whl (9.5 MB)
|████████████████████████████████| 9.5 MB 7.9 kB/s
Collecting python-dateutil>=2.7.3
Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
|████████████████████████████████| 247 kB 12.3 MB/s
. . . .
~$ python
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__ #'1.1.5'
In all further codes we will assume, that all necessary libraries already imported. Also, if in some code we are using some variables or data structures without initialization, then we assume, that these variables were initialized in previous bits of code.
Pandas series
Pandas series is a one dimensional data structure.
Create Panda object from the list
The easiest way is to convert Python list into Panda Series
import pandas as pd
data = ['A', 'B', 'C', 'D']
let = pd.Series(data) # This will create Pandas Series
print(let)
#0 A
#1 B
#2 C
#3 D
# dtype: object <- pay attention to the type of elements
let1 = pd.Series(['A', 'B', 'C']) # will give the same result
Access to element in pandas Series
It is possible to access elements of pandas Series via few different techniques
Access via index
Use index or slices like in traditional lists. It is always necessary to remember, that few elements are return back as a series. To convert them to list, use function list()
print(let[2]) # C
print(let[1:3])
#1 B
#2 C
#dtype: object
print(list(let[1:3])) # ['B', 'C']
Access via list of indexes
It is possible to use list of indexes to access to the data in Series
idx = [0, 2, 3]
print(let[idx])
#0 A
#2 C
#3 D
#dtype: object
Access via list of boolean
List of boolean can be used as a way to access data in series.
idx = [True, False, True, False]
print(let[idx])
#0 A
#2 C
#dtype: object
So, once again, it is possible to use Panda Series as a usual list, but the type can be different. It is possible to use metod tolist() to make type conversion. To access to element via index - use the index value in quotes
print(type(country_cap_series.values))
#
print(country_cap_series.values)
# ['Kabul' 'Canberra' 'Paris' 'London']
print(country_cap_series.index)
# Index(['Afganistan', 'Austalia', 'France', 'UK'], dtype='object')
print(country_cap_series[1:3])
#Austalia Canberra
#France Paris
#dtype: object
print(country_cap_series['UK']) # London
print(country_cap_series[['UK','France']])
#UK London
#France Paris
#dtype: object
Index manipulations for Series
It is possible to create custom index for pandas series. It is important to remember, that index is not unique!
e_f = ['F_name', 'L_name', 'Dept']
e_d = ['John', 'Smith', 'ICQA']
e_series = pd.Series(index = e_f, data = e_d)
print(e_series)
#F_name John
#L_name Smith
#Dept ICQA
#dtype: object
Initialization with dictionary
Panda structure can be initialized with dictionary. Key will be used as an Index
country_cap = {'Afganistan' : 'Kabul',
'Austalia' : 'Canberra',
'France' : 'Paris',
'UK': 'London'
}
country_cap_series = pd.Series(country_cap)
print(country_cap_series)
#Afganistan Kabul
#Austalia Canberra
#France Paris
#UK London
#dtype: object
Extracting Index from Series
To exatract series index from the ata structure, use method index.
print(let.index) # RangeIndex(start=0, stop=4, step=1)
print(let.index.tolist()) # [0, 1, 2, 3]
print(country_cap_series.index) # See before for this structure definition
# Index(['Afganistan', 'Austalia', 'France', 'UK'], dtype='object')
Name for the Series
Pandas series is a complex object, and this object have property name. It is possible to set it up and use it
let.name = '+++Four letters+++'
print(let)
#0 A
#1 B
#2 C
#3 D
#Name: +++Four letters+++, dtype: object
Compare Series
To check which elements of two series is in common, you can use method isin(). The result will be boolean mask in series format. It is possible to use this mask to extract the data itself. It is possible to use ~ as an opposite operation
s1 = pd.Series([1, 2, 3, 4, 5])
s2 = pd.Series([1, 3, 5, 7])
print(s1.isin(s2))
#0 True
#1 False
#2 True
#3 False
#4 True
#dtype: bool
print(s1[s1.isin(s2)])
#0 1
#2 3
#4 5
#dtype: int64
print(s1[~s1.isin(s2)])
#1 2
#3 4
#dtype: int64
Operations with every element of Series
Combination of map() and lambda functions can allow to perform some operations over all elements in series. In following example we will make a squared elements
s1 = pd.Series([1, 2, 3, 4, 5])
s1sq = s1.map(lambda x: x*x)
print(s1sq)
#0 1
#1 4
#2 9
#3 16
#4 25
#dtype: int64
Element vice operations between series
It is possible to do some basic mathematical operations between elements of two series. Fro example add() for addition, mul() for multiplication etc. See dir(ps.Series) for the full list of methods. Please note, that after applying some of these methods, the type of Series can change! If some elements are not defined – you will have NaN value
s1 = pd.Series([1, 2, 3, 4, 5])
s2 = pd.Series([10, 20, 30])
print(s1.add(s2))
#0 11.0
#1 22.0
#2 33.0
#3 NaN
#4 NaN
#dtype: float64
print(s1.mul(s2))
#0 10.0
#1 40.0
#2 90.0
#3 NaN
#4 NaN
dtype: float64
Add or remove elements to Series
Append one series to another
It is possible to append one series to another, but oyu need to watch what till happen with indexes. Compare two examples with preserving indexes (default) and with generating new index
s1 = pd.Series([1, 2, 3, 4])
s2 = pd.Series([10, 20, 30])
si = s1.append(s2)
#0 1
#1 2
#2 3
#3 4
#0 10
#1 20
#2 30
#dtype: int64
snoi = s1.append(s2, ignore_index = True)
#0 1
#1 2
#2 3
#3 4
#4 10
#5 20
#6 30
#dtype: int64
print(si[2])
#2 3
#2 30
dtype: int64
Removing elements from Series
It is possible to remove element by it’s index labelled with methoddrop(). This method do not change original series, but return new series
sm = si.drop(0)
#1 2
#2 3
#3 4
#1 20
#2 30
#dtype: int64
smm = si.drop(labels = [1, 2])
#0 1
#3 4
#0 10
#dtype: int64
Other useful Series methods
Here I will give you few very useful Series methods without examples.
- count() - Return number of elements
- nunique() - Return number of unique elements
- unique() - Return array of unique elements
- sort_values() - Return sorted Series. Use key inplace = True to change Series itself
- reset_index() - Return table with new columns. Use drop = True to affect Series
Go to Page: 1; 2; 3; 4; 5; 6; 7; 8;
Published: 2021-11-05 09:11:16
Updated: 2021-12-17 02:48:39