Exercise 1:create a series from a list, numpy array and dict¶
In [1]:
import pandas as pd
import numpy as np
In [4]:
# List
l1=[1,2,3,4,5]
l1
Out[4]:
[1, 2, 3, 4, 5]
In [5]:
# Array
arr = np.array(['a','b','c','d','e'])
arr
Out[5]:
array(['a', 'b', 'c', 'd', 'e'], dtype='<U1')
In [13]:
# Dictionary
d1=dict(zip(l1,arr))
d1
Out[13]:
{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}
In [11]:
# Series from list
s1=pd.Series(l1)
s1
Out[11]:
0 1 1 2 2 3 3 4 4 5 dtype: int64
In [12]:
# Series from array
s2=pd.Series(arr)
s2
Out[12]:
0 a 1 b 2 c 3 d 4 e dtype: object
In [14]:
# Series from array
s3=pd.Series(d1)
s3
Out[14]:
1 a 2 b 3 c 4 d 5 e dtype: object
Exercise 2:convert the index of a series into a column of a dataframe¶
In [20]:
s4=s3.reset_index()
df=pd.DataFrame(s4)
df
Out[20]:
index | 0 | |
---|---|---|
0 | 1 | a |
1 | 2 | b |
2 | 3 | c |
3 | 4 | d |
4 | 5 | e |
In [21]:
# Renaming the column with 0 as the header
df.columns=['index','Col1']
df
Out[21]:
index | Col1 | |
---|---|---|
0 | 1 | a |
1 | 2 | b |
2 | 3 | c |
3 | 4 | d |
4 | 5 | e |
Exercise 3:combine many series to form a dataframe¶
In [26]:
s1=pd.Series(['a','b','c','d','e'])
s2=pd.Series([1,2,3,4,5])
s3=pd.Series(np.random.rand(5))
In [29]:
df=pd.DataFrame([s1,s2,s3]).T
df
Out[29]:
0 | 1 | 2 | |
---|---|---|---|
0 | a | 1 | 0.960486 |
1 | b | 2 | 0.07414 |
2 | c | 3 | 0.586193 |
3 | d | 4 | 0.54194 |
4 | e | 5 | 0.603479 |
In [30]:
df.columns=['Name','ID','Salary']
df
Out[30]:
Name | ID | Salary | |
---|---|---|---|
0 | a | 1 | 0.960486 |
1 | b | 2 | 0.07414 |
2 | c | 3 | 0.586193 |
3 | d | 4 | 0.54194 |
4 | e | 5 | 0.603479 |
Exercise 4:Assign name to the series’ index¶
In [31]:
s1=pd.Series([1,2,3])
s1
Out[31]:
0 1 1 2 2 3 dtype: int64
In [42]:
s1.index.names=['Index_Name']
s1
Out[42]:
Index_Name 0 1 1 2 2 3 dtype: int64
Exercise 5:Compare items between two series¶
In [43]:
A = pd.Series([1,2,3,4])
A
Out[43]:
0 1 1 2 2 3 3 4 dtype: int64
In [130]:
B = pd.Series([3,4,5,6,7])
B
Out[130]:
0 3 1 4 2 5 3 6 4 7 dtype: int64
In [104]:
# Converting series into list
A_list=list(A)
A_list
Out[104]:
[1, 2, 3, 4]
In [131]:
B_list=list(B)
B_list
Out[131]:
[3, 4, 5, 6, 7]
Common elements¶
In [132]:
common_elements=set(A_list) & set(B_list)
common_elements_ls=list(common_elements)
common_elements_ls
Out[132]:
[3, 4]
Through list comprehension¶
In [133]:
[i for i in A_list for j in B_list if i==j]
Out[133]:
[3, 4]
Elements in A not present in B¶
In [134]:
Not_in_A=list(set(A_list)-common_elements)
Not_in_A
Out[134]:
[1, 2]
Elements in B not present in A¶
In [135]:
Not_in_B=list(set(B_list)-common_elements)
Not_in_B
Out[135]:
[5, 6, 7]
Using the series approach¶
Common Elements¶
In [138]:
A[A.isin(B)]
Out[138]:
2 3 3 4 dtype: int64
In A not present in B¶
In [139]:
A[~A.isin(B)]
Out[139]:
0 1 1 2 dtype: int64
In B not present in A¶
In [141]:
B[~B.isin(A)]
Out[141]:
2 5 3 6 4 7 dtype: int64
Exercise 6:get the minimum, 25th percentile, median, 75th, and max of a numeric series¶
In [142]:
# Creating a random series
s1=pd.Series(np.random.rand(10))
s1
Out[142]:
0 0.484274 1 0.990742 2 0.580644 3 0.161801 4 0.816207 5 0.640640 6 0.494005 7 0.562894 8 0.339194 9 0.988645 dtype: float64
In [143]:
min(s1)
Out[143]:
0.1618008141423638
In [144]:
max(s1)
Out[144]:
0.9907420622435107
In [146]:
s1.median()
Out[146]:
0.5717688611141396
In [147]:
s1.quantile(0.75)
Out[147]:
0.7723152702700348
Exercise 7:frequency counts of unique items of a series¶
In [148]:
s1=pd.Series([1,2,3,3,3,4,4,4,4,5])
s1
Out[148]:
0 1 1 2 2 3 3 3 4 3 5 4 6 4 7 4 8 4 9 5 dtype: int64
In [151]:
s1.value_counts()
Out[151]:
4 4 3 3 1 1 2 1 5 1 dtype: int64
Exercise 8:keep only top 2 most frequent values as it is and replace everything else as ‘Other’¶
In [158]:
s1=pd.Series([1,2,3,3,3,4,4,4,4,5])
s1
Out[158]:
0 1 1 2 2 3 3 3 4 3 5 4 6 4 7 4 8 4 9 5 dtype: int64
In [159]:
# Getting the count of values
s2=s1.value_counts()
s2
Out[159]:
4 4 3 3 1 1 2 1 5 1 dtype: int64
In [156]:
# Getting the top two most frequent values
top_2=list(s2[:2])
top_2
Out[156]:
[4, 3]
In [162]:
# Using numpy where function to create conditional column
arr1=np.where(s1.isin(top_2),s1,"Others")
arr1
Out[162]:
array(['Others', 'Others', '3', '3', '3', '4', '4', '4', '4', 'Others'], dtype='<U21')
In [163]:
s1_recoded=pd.Series(arr1)
s1_recoded
Out[163]:
0 Others 1 Others 2 3 3 3 4 3 5 4 6 4 7 4 8 4 9 Others dtype: object
Exercise 9:bin a numeric series to 10 groups of equal size¶
In [164]:
s1=pd.Series(np.random.rand(20))
s1
Out[164]:
0 0.614530 1 0.020887 2 0.264154 3 0.342449 4 0.117960 5 0.081493 6 0.864229 7 0.642582 8 0.241652 9 0.656198 10 0.889487 11 0.545045 12 0.334097 13 0.820589 14 0.100319 15 0.890976 16 0.907038 17 0.258617 18 0.086489 19 0.289996 dtype: float64
In [168]:
gp_values=list(s1.quantile([0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]))
gp_values
Out[168]:
[0.08598908915755857, 0.11443163727729971, 0.25352740749430075, 0.27965892220551697, 0.3382729724163345, 0.5728387110301085, 0.646666712157136, 0.8293170005262019, 0.8896355153009737]
In [186]:
s2=np.where(s1 <= gp_values[0],"G1",
np.where((s1 > gp_values[0]) & (s1 <= gp_values[1]),"G2",
np.where((s1 > gp_values[1]) & (s1 <= gp_values[2]),"G3",
np.where((s1 > gp_values[2]) & (s1 <= gp_values[3]),"G4",
np.where((s1 > gp_values[3]) & (s1 <= gp_values[4]),"G5",
np.where((s1 > gp_values[4]) & (s1 <= gp_values[5]),"G6",
np.where((s1 > gp_values[5]) & (s1 <= gp_values[6]),"G7",
np.where((s1 > gp_values[6]) & (s1 <= gp_values[7]),"G8",
np.where((s1 > gp_values[7]) & (s1 <= gp_values[8]),"G9",
"G10")))))))))
s2
Out[186]:
array(['G7', 'G1', 'G4', 'G6', 'G3', 'G1', 'G9', 'G7', 'G3', 'G8', 'G9', 'G6', 'G5', 'G8', 'G2', 'G10', 'G10', 'G4', 'G2', 'G5'], dtype='<U3')
In [190]:
df=pd.DataFrame([s1,s2]).T
df.columns=['Values','Groups']
df
Out[190]:
Values | Groups | |
---|---|---|
0 | 0.6145298225899369 | G7 |
1 | 0.020886734419647057 | G1 |
2 | 0.2641535431059717 | G4 |
3 | 0.3424493159747698 | G6 |
4 | 0.11795974066079085 | G3 |
5 | 0.0814925546914802 | G1 |
6 | 0.8642288997565742 | G9 |
7 | 0.6425819946348839 | G7 |
8 | 0.2416520629589226 | G3 |
9 | 0.6561977197090577 | G8 |
10 | 0.8894865607576895 | G9 |
11 | 0.5450446366568895 | G6 |
12 | 0.33409662885789926 | G5 |
13 | 0.8205890257186088 | G8 |
14 | 0.10031922374333513 | G2 |
15 | 0.8909761061905312 | G10 |
16 | 0.9070383169324525 | G10 |
17 | 0.25861684086660564 | G4 |
18 | 0.08648870409823395 | G2 |
19 | 0.28999584160521374 | G5 |