My Coding > Programming language > Python > Python FAQ > Python: How to create Pandas column based on conditions

Python: How to create Pandas column based on conditions

This is a pretty common task, to create pandas column, based on some logical conditions on other columns. Let’s consider how to solve this task

Creating test DataFrame

We will create DataFrame with 2 columns, with random letters ‘a’ and ‘b’. Read more about how to create Pandas test DataSets


import pandas as pd
import numpy as np
import random
import string
a = [random.choice(['a', 'b']) for _ in range(5)]
b = [random.choice(['a', 'b']) for _ in range(5)]
df = pd.DataFrame({'c1': a, 'c2': b})
#  c1 c2
#0  a  a
#1  a  b
#2  a  a
#3  b  a
#4  b  b

Create Pandas column on conditions over other column

Our task is to create column with values ‘Yes’ is c1 contain ‘a’ and ‘No’ if c1 contains ‘b’. It is possible to have 3 basic solutions:

Create column with NumPy.where()

np.where() create column based on condition given.


df['c1_1'] = np.where(df['c1'] == 'a', 'Yes', 'No')
#  c1 c2 c1_1
#0  a  a  Yes
#1  a  b  Yes
#2  a  a  Yes
#3  b  a   No
#4  b  b   No

Create column with list comprehension

It is possible to create Pandas column as a list comprehension


df['c1_2'] = ['Yes' if x == 'a' else 'No' for x in df['c1']]
#  c1 c2 c1_1 c1_2
#0  a  a  Yes  Yes
#1  a  b  Yes  Yes
#2  a  a  Yes  Yes
#3  b  a   No   No
#4  b  b   No   No

Using apply with lambda function

apply() with lambda function can perform this line by line conditions for creating new column


df['c1_3'] = df['c1'].apply(lambda x: 'Yes' if x == 'a' else 'No')
#  c1 c2 c1_1 c1_2 c1_3
#0  a  a  Yes  Yes  Yes
#1  a  b  Yes  Yes  Yes
#2  a  a  Yes  Yes  Yes
#3  b  a   No   No   No
#4  b  b   No   No   No

Create Pandas column on conditions over few columns

How to create Pandas column, based on conditions over few columns. This is a bit more complicated. For example we will create column with value ‘yes’ if column ‘c1’ equal ‘c2’ and no othervise

Create column with NumPy.where()

np.where() is very powerful tool for creating new columns based on conditions.


df['c2_1'] = np.where(df['c1'] == df['c2'], 'Yes', 'No')
#  c1 c2 c1_1 c1_2 c1_3 c2_1
#0  a  a  Yes  Yes  Yes  Yes
#1  a  b  Yes  Yes  Yes   No
#2  a  a  Yes  Yes  Yes  Yes
#3  b  a   No   No   No   No
#4  b  b   No   No   No  Yes


Published: 2021-12-09 09:19:32
Updated: 2021-12-09 09:20:13

Last 10 artitles


9 popular artitles

© 2020 MyCoding.uk -My blog about coding and further learning. This blog was writen with pure Perl and front-end output was performed with TemplateToolkit.