My Coding > Programming language > Python > Python libraries and packages > Python Panda

Python Panda (Page: 1)

Go to Page:

  1. Panda Series;
  2. Pandas DataFrame: Creation;
  3. Pandas: Create test DataFrame;
  4. Pandas DataFrame: Add/Remove;
  5. Pandas DataFrame: Export/Import;
  6. Panda search and select;
  7. Pandas Cheat Sheet;
  8. Pandas: MultiIndex DataFrame;

Pandas install and upgrade

To work with Panda, it is necessary to have pandas library to be installed. Also together with pandas, you need to have numpy library. Pandas is abstraction on top of NumPy. This abstraction will make simple and powerful tool for manipulation of NymPy arrays

Pandas instalation

To install panda, you can use pip or conda commands:pip install pandas or conda install pandas

Some method and functions in Panda are version specific, therefore you always need to know, what version you have

import pandas as pd
import sys
import numpy as np 

print('Python version :' + sys.version)   #Python version :3.6.3 ...
print('Pandas version :' +pd.__version__) # Pandas version :0.20.3
print('NumPy version  :' +np.__version__) # NumPy version  :1.19.5

Pandas upgrade

If you think that you would like to install the latest version, then it is possible to do with pip install --upgrade pandas command-line

~$ pip install --upgrade pandas
Requirement already satisfied: pandas in /home/luxs/anaconda3/lib/python3.6/site-packages (0.20.3)
Collecting pandas
  Downloading pandas-1.1.5-cp36-cp36m-manylinux1_x86_64.whl (9.5 MB)
     |████████████████████████████████| 9.5 MB 7.9 kB/s            
Collecting python-dateutil>=2.7.3
  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
     |████████████████████████████████| 247 kB 12.3 MB/s            
. . . .
~$ python
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__ #'1.1.5'

In all further codes we will assume, that all necessary libraries already imported. Also, if in some code we are using some variables or data structures without initialization, then we assume, that these variables were initialized in previous bits of code.

Pandas series

Pandas series is a one dimensional data structure.

Create Panda object from the list

The easiest way is to convert Python list into Panda Series

import pandas as pd
data = ['A', 'B', 'C', 'D']
let = pd.Series(data) # This will create Pandas Series
#0    A
#1    B
#2    C
#3    D
# dtype: object <- pay attention to the type of elements
let1 = pd.Series(['A', 'B', 'C']) # will give the same result

Access to element in pandas Series

It is possible to access elements of pandas Series via few different techniques

Access via index

Use index or slices like in traditional lists. It is always necessary to remember, that few elements are return back as a series. To convert them to list, use function list()

print(let[2]) # C
#1    B
#2    C
#dtype: object

print(list(let[1:3])) # ['B', 'C']

Access via list of indexes

It is possible to use list of indexes to access to the data in Series

idx = [0, 2, 3]
#0    A
#2    C
#3    D
#dtype: object

Access via list of boolean

List of boolean can be used as a way to access data in series.

idx = [True, False, True, False]
#0    A
#2    C
#dtype: object

So, once again, it is possible to use Panda Series as a usual list, but the type can be different. It is possible to use metod tolist() to make type conversion. To access to element via index - use the index value in quotes

#   ['Kabul' 'Canberra' 'Paris' 'London']
#   Index(['Afganistan', 'Austalia', 'France', 'UK'], dtype='object')
   #Austalia    Canberra
   #France         Paris
   #dtype: object
print(country_cap_series['UK']) # London
   #UK        London
   #France     Paris
   #dtype: object

Index manipulations for Series

It is possible to create custom index for pandas series. It is important to remember, that index is not unique!

e_f = ['F_name', 'L_name', 'Dept']
e_d = ['John', 'Smith', 'ICQA']

e_series = pd.Series(index = e_f, data = e_d)
#F_name     John
#L_name    Smith
#Dept       ICQA
#dtype: object

Initialization with dictionary

Panda structure can be initialized with dictionary. Key will be used as an Index

country_cap = {'Afganistan' : 'Kabul',
               'Austalia' : 'Canberra',
               'France' : 'Paris',
               'UK': 'London'
country_cap_series = pd.Series(country_cap)
#Afganistan       Kabul
#Austalia      Canberra
#France           Paris
#UK              London
#dtype: object

Extracting Index from Series

To exatract series index from the ata structure, use method index.

print(let.index)          # RangeIndex(start=0, stop=4, step=1)
print(let.index.tolist()) # [0, 1, 2, 3]

print(country_cap_series.index) # See before for this structure definition
# Index(['Afganistan', 'Austalia', 'France', 'UK'], dtype='object')

Name for the Series

Pandas series is a complex object, and this object have property name. It is possible to set it up and use it = '+++Four letters+++'
#0    A
#1    B
#2    C
#3    D
#Name: +++Four letters+++, dtype: object

Compare Series

To check which elements of two series is in common, you can use method isin(). The result will be boolean mask in series format. It is possible to use this mask to extract the data itself. It is possible to use ~ as an opposite operation

s1 = pd.Series([1, 2, 3, 4, 5])
s2 = pd.Series([1, 3, 5, 7])
#0     True
#1    False
#2     True
#3    False
#4     True
#dtype: bool
#0    1
#2    3
#4    5
#dtype: int64
#1    2
#3    4
#dtype: int64

Operations with every element of Series

Combination of map() and lambda functions can allow to perform some operations over all elements in series. In following example we will make a squared elements

s1 = pd.Series([1, 2, 3, 4, 5])
s1sq  = x: x*x)
#0     1
#1     4
#2     9
#3    16
#4    25
#dtype: int64

Element vice operations between series

It is possible to do some basic mathematical operations between elements of two series. Fro example add() for addition, mul() for multiplication etc. See dir(ps.Series) for the full list of methods. Please note, that after applying some of these methods, the type of Series can change! If some elements are not defined – you will have NaN value

s1 = pd.Series([1, 2, 3, 4, 5])
s2 = pd.Series([10, 20, 30])
#0    11.0
#1    22.0
#2    33.0
#3     NaN
#4     NaN
#dtype: float64
#0    10.0
#1    40.0
#2    90.0
#3     NaN
#4     NaN
dtype: float64

Add or remove elements to Series

Append one series to another

It is possible to append one series to another, but oyu need to watch what till happen with indexes. Compare two examples with preserving indexes (default) and with generating new index

s1 = pd.Series([1, 2, 3, 4])
s2 = pd.Series([10, 20, 30])
si = s1.append(s2)
#0     1
#1     2
#2     3
#3     4
#0    10
#1    20
#2    30
#dtype: int64
snoi = s1.append(s2, ignore_index = True)
#0     1
#1     2
#2     3
#3     4
#4    10
#5    20
#6    30
#dtype: int64
#2     3
#2    30
dtype: int64

Removing elements from Series

It is possible to remove element by it’s index labelled with methoddrop(). This method do not change original series, but return new series

sm = si.drop(0)
#1     2
#2     3
#3     4
#1    20
#2    30
#dtype: int64
smm = si.drop(labels = [1, 2])
#0     1
#3     4
#0    10
#dtype: int64

Other useful Series methods

Here I will give you few very useful Series methods without examples.

  • count() - Return number of elements
  • nunique() - Return number of unique elements
  • unique() - Return array of unique elements
  • sort_values() - Return sorted Series. Use key inplace = True to change Series itself
  • reset_index() - Return table with new columns. Use drop = True to affect Series

Go to Page: 1; 2; 3; 4; 5; 6; 7; 8;

Published: 2021-11-05 09:11:16
Updated: 2021-12-17 02:48:39

Last 10 artitles

9 popular artitles

© 2020 -My blog about coding and further learning. This blog was writen with pure Perl and front-end output was performed with TemplateToolkit.