Thursday, June 4, 2020

Python Blog 3: Arrays

Array in Python using numpy library


Introduction

In this blog we will look at arrays and how to store data in it.Broadly, we will look at the following points

  • Initialization of Arrays
  • Indexing in Array
  • Multidimensional Array
  • Matrix and associated functions
  • Other functions like ones, ones_like etc

Importing the Library

To use arrays, we need to import the numpy library as shown below.We will use the alias name np for referring to it

import numpy as np 


Creating Array using list

x = [1,2,3,4,5]
y =np.array(x)
print(y)
[1 2 3 4 5]
print(type(y))
<class 'numpy.ndarray'>

We can see that an darray y has been created by using a list ‘x’. If we print the type of y, it shows an array


Creating Array using list

x = (1,2,3,4,5)
y = np.array(x)
print(y)
[1 2 3 4 5]
print(type(y))
<class 'numpy.ndarray'>


Getting the dimensions of the array using shape

l2 = [[1,2,3],[4,5,6],[6,7,8]]
l3 = np.array(l2)
print(l3) #data is stored row wise
[[1 2 3]
 [4 5 6]
 [6 7 8]]
print(l3.shape)
(3, 3)

3 elements are stored in the array with each having 3 elements of its own


Getting the dimensions of the 1 Dimensional array using shape

x = (1,2,3,4,5)
y = np.array(x)
y.shape
(5,)

It is a single dimensional array. It is worthy to note that Arrays with varying dimensions are considered as 1 dimensional.Lets look at an example


l2 = [[1,2],[4,5,6],[6,7,8,4]]
l3 = np.array(l2)
l3 #data is stored row wise
array([list([1, 2]), list([4, 5, 6]), list([6, 7, 8, 4])], dtype=object)
l3.shape
(3,)

There are 3 elements present in the array and since all of them have different number of elements within them, it is a single dimensional array


Arrays using nextedt lists

Two or higher dimensional arrays are initialised using nexted list

y = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])  # [[1,2],[3,4]] as one row

#Lets access the First element and items within it
y[0],y[0][0],y[0][1]

#Lets access the Second element and items within it
(array([[1, 2],
       [3, 4]]), array([1, 2]), array([3, 4]))
y[1],y[1][0],y[1][1]

# Shape of y
(array([[5, 6],
       [7, 8]]), array([5, 6]), array([7, 8]))
y.shape
(2, 2, 2)


Array data types: Data types of elements stored in it

Basically there are 4 data types that are typically used

  • int32
  • float64
  • <U1 : if character <U5 : if maximum length of string is 5
  • complex

Lets look at some examples to understand this.As complex is rarely used, we wont be discussing that


int32 and float64

x = [0, 1, 2, 3, 4]
x1 = np.array(x)
x1.dtype
dtype('int32')
y = [0, 1.3, 2.5, 3, 4]
y1 = np.array(y)
y1.dtype
dtype('float64')


string

y = ['data','string','science']
y1 = np.array(y)
y1.dtype
dtype('<U7')

It gives U6 as the string element with max length(science) is 6


Array Operation

Arrays follow element by element operations

a = np.array([1,2,3,4])
a
array([1, 2, 3, 4])
a*a
array([ 1,  4,  9, 16])
a.shape # shows 4 element (4,) 
(4,)
type(a.shape)
<class 'tuple'>
b = np.array([[1,2,3],[4,5,6]])
b.shape
(2, 3)
b*b
array([[ 1,  4,  9],
       [16, 25, 36]])


Matrix using the numpy library

b = np.array([[1,2,3],[4,5,6]])
c = np.asmatrix(b)
c
matrix([[1, 2, 3],
        [4, 5, 6]])
type(c)
<class 'numpy.matrixlib.defmatrix.matrix'>
b.shape # Read as number of rows, number of columns
(2, 3)


Matrix with uneven values

There is an added space to the last element of words list. We can use rstrip function to remove it

c = np.array([[1,2,3],[4,5],[4,6,7,8]])
c
array([list([1, 2, 3]), list([4, 5]), list([4, 6, 7, 8])], dtype=object)
c.shape # Considered as single dimension
(3,)
d = np.asmatrix(c)
d
matrix([[list([1, 2, 3]), list([4, 5]), list([4, 6, 7, 8])]], dtype=object)
d.shape # we can see the difference between shape of c and d
(1, 3)


Matrix can only be 2 Dimensional

y = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
y
array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])
y.shape

# Creating matrix using y will give an error
# np.asmatrix(y) # Matrix can only be 2 Dimensional
(2, 2, 2)


Array Concatenation:using axis argument

x = np.array([[1.0,2.0],[3.0,4.0]])
y = np.array([[5.0,6.0],[7.0,8.0]])
x
array([[1., 2.],
       [3., 4.]])
y

# axis = 0 :Concatenating along the rows /indexes
array([[5., 6.],
       [7., 8.]])
np.concatenate((x,y),axis = 0)


# axis = 1 :Concatenating along the columns 
array([[1., 2.],
       [3., 4.],
       [5., 6.],
       [7., 8.]])
np.concatenate((x,y),axis = 1)
array([[1., 2., 5., 6.],
       [3., 4., 7., 8.]])


Array Concatenation:using vstack and hstack

x = np.array([[1.0,2.0],[3.0,4.0]])
y = np.array([[5.0,6.0],[7.0,8.0]])
x
array([[1., 2.],
       [3., 4.]])
y
array([[5., 6.],
       [7., 8.]])
np.vstack((x,y))# Same as np.concat , axis = 0 ; vertical stacking
array([[1., 2.],
       [3., 4.],
       [5., 6.],
       [7., 8.]])
np.hstack((x,y)) # Same as axis = 1 ; Horizontal Stacking
array([[1., 2., 5., 6.],
       [3., 4., 7., 8.]])


Array Slicing

Arrays like Lists and Tuples can be sliced.In this section we will look at the detiled analysis of different methods that can be used to subset an array


1 - Dimensional Slicing

x = [1,2,3,4,5,6,7,8]
x[:] # Prints all the elements
[1, 2, 3, 4, 5, 6, 7, 8]
x[2:5] # Prints elements from 2 to 4 index position
[3, 4, 5]
x[2::2] # Prints elements starting from 2 index position with a leap of 2 positions
[3, 5, 7]


2- Dimensional Slicing

In 2-dimensional arrays,

  • the first dimension specifies the row or rows of the slice
  • the second dimension specifies the the column or columns.
y = np.array([[1, 2, 3, 4, 5],[5, 6, 7, 8, 9]])
y
array([[1, 2, 3, 4, 5],
       [5, 6, 7, 8, 9]])
y.shape
(2, 5)


First Method of selecting 2 - Dimensional records ; shortest mode and preferred method

y[0:1,:3] # Selecting Rows and Columns (Easy Method of Slicing)
array([[1, 2, 3]])


Second Method of selecting 2 - Dimensional records
y[0:1,:][:,:3]
array([[1, 2, 3]])


Third Method of selecting 2 - Dimensional records

y[0:1][:,:3]
array([[1, 2, 3]])


Special Array

  • ones
  • ones_like
  • zeroes


ones

ones generates an array of 1s and is generally called with one argument, a tuple, containing the size of each dimension

x = np.ones((2,3,4)) # Pass tuple as an arguement
x
array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]])


ones_like

ones_like : Create the array of same dimension as given in the arguement and fill it with 1’s.Used when we want to initialize the array

a = np.array([[1,2,3],[4,5,6]]) # Defined using Lists of List
a
array([[1, 2, 3],
       [4, 5, 6]])
y=np.ones_like(a)
y
array([[1, 1, 1],
       [1, 1, 1]])


zero

Same as ones.Inly difference is that it populates it with zeroes

x = np.zeros((2,3,4)) # Pass tuple as an arguement
x
array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])


zeros_like

Same as ones.Inly difference is that it populates it with zeroes

a = np.array([[1,2,3],[4,5,6]]) # Defined using Lists of List
a
array([[1, 2, 3],
       [4, 5, 6]])
y=np.zeros_like(a)
y
array([[0, 0, 0],
       [0, 0, 0]])


Reshape

  • Reshape transforms an array with one set of dimensions and to one with a different set, preserving the number of elements.
  • Arrays with dimensions M by N can be reshaped into an array with dimensions K by L as long as MN = K L.
  • The most useful call to reshape switches an array into a vector or vice versa.
  • Can be used to flatten an array(basically convert into a wide format)


x = np.array([[1,2],[3,4]])
x
array([[1, 2],
       [3, 4]])
y=np.reshape(x,(1,4)) # 1 row and 4 columns ; used to flatten the array
y
array([[1, 2, 3, 4]])
z=np.reshape(x,(4,1)) # 4 rows and 1 columns
z
array([[1],
       [2],
       [3],
       [4]])
x.shape
(2, 2)
y.shape
(1, 4)
z.shape
(4, 1)


Ravel

  • It is also used to flatten the array
  • ravel does not copy the underlying data if possible; therefore faster than flatten


Lets randomly populate values in an array

c = np.random.rand(2,3)
c

# Lets trnaspose
array([[0.53764445, 0.21785175, 0.82587626],
       [0.11052173, 0.48502006, 0.93388508]])
c.T.ravel()
array([0.53764445, 0.11052173, 0.21785175, 0.48502006, 0.82587626,
       0.93388508])


arrange

It is used to define range in arrays


np.arange(5)
array([0, 1, 2, 3, 4])
list(np.arange(5))
[0, 1, 2, 3, 4]
x = np.reshape(np.arange(6),(2,3)) # np.arange is an array
x
array([[0, 1, 2],
       [3, 4, 5]])


split, vsplit and hsplit

x = np.reshape(np.arange(20),(4,5))
x
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])
a = np.split(x,2,axis = 0) # axis = 0 means across rows /vertically stack
a
[array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]), array([[10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])]
a[0]
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

We got 1 as the second occurence of with is beyond the index position 30

x = np.reshape(np.arange(20),(4,5))
x
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])
y = np.vsplit(x,2) # Output is a list of arrays
y
[array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]), array([[10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])]

Same as the split with axis=0


x = np.reshape(np.arange(20),(4,5))
x
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])
z = np.hsplit(x,[1,3]) # Will break into 3 parts [0:1], [1:3] and [3:]
z
[array([[ 0],
       [ 5],
       [10],
       [15]]), array([[ 1,  2],
       [ 6,  7],
       [11, 12],
       [16, 17]]), array([[ 3,  4],
       [ 8,  9],
       [13, 14],
       [18, 19]])]


Sorting an array

a = np.random.random((5,4))
a
array([[0.50773984, 0.46492856, 0.25738772, 0.60823973],
       [0.25928653, 0.52348181, 0.8193148 , 0.45929712],
       [0.60380532, 0.55430883, 0.48055292, 0.19994516],
       [0.84830118, 0.94296168, 0.80052804, 0.67806645],
       [0.81694176, 0.47756603, 0.91126107, 0.85124623]])
a.sort()
a
array([[0.25738772, 0.46492856, 0.50773984, 0.60823973],
       [0.25928653, 0.45929712, 0.52348181, 0.8193148 ],
       [0.19994516, 0.48055292, 0.55430883, 0.60380532],
       [0.67806645, 0.80052804, 0.84830118, 0.94296168],
       [0.47756603, 0.81694176, 0.85124623, 0.91126107]])


It sorts the array in a row wise fashion by default


a = np.random.random((5,4))
a
array([[0.41756356, 0.67744361, 0.77485213, 0.10424094],
       [0.11203566, 0.55193574, 0.76895605, 0.37702381],
       [0.83575451, 0.75336271, 0.18544628, 0.32940647],
       [0.88554859, 0.10787863, 0.56780058, 0.21357942],
       [0.7273673 , 0.80553625, 0.22282339, 0.82791999]])
a.sort(axis = 0)
a
array([[0.11203566, 0.10787863, 0.18544628, 0.10424094],
       [0.41756356, 0.55193574, 0.22282339, 0.21357942],
       [0.7273673 , 0.67744361, 0.56780058, 0.32940647],
       [0.83575451, 0.75336271, 0.76895605, 0.37702381],
       [0.88554859, 0.80553625, 0.77485213, 0.82791999]])


axis=0 sorts the array column wise


a = np.random.random((5,4))
a
array([[0.95475834, 0.93488635, 0.56371189, 0.90189626],
       [0.82792212, 0.4956901 , 0.06061518, 0.61593093],
       [0.13823703, 0.09479956, 0.81690228, 0.87854127],
       [0.05530786, 0.08058938, 0.44211145, 0.19927479],
       [0.68592515, 0.2390315 , 0.79272115, 0.61011319]])
a.sort(axis = 1)
a
array([[0.56371189, 0.90189626, 0.93488635, 0.95475834],
       [0.06061518, 0.4956901 , 0.61593093, 0.82792212],
       [0.09479956, 0.13823703, 0.81690228, 0.87854127],
       [0.05530786, 0.08058938, 0.19927479, 0.44211145],
       [0.2390315 , 0.61011319, 0.68592515, 0.79272115]])


a = np.random.random((5,4))
a
array([[0.13042987, 0.69811332, 0.21264715, 0.79817816],
       [0.62537929, 0.79332926, 0.55932878, 0.26451115],
       [0.846183  , 0.67627071, 0.3874921 , 0.28636519],
       [0.16006374, 0.98871903, 0.48173074, 0.02783769],
       [0.24465369, 0.29171198, 0.21245599, 0.59930933]])
np.sort(a)
array([[0.13042987, 0.21264715, 0.69811332, 0.79817816],
       [0.26451115, 0.55932878, 0.62537929, 0.79332926],
       [0.28636519, 0.3874921 , 0.67627071, 0.846183  ],
       [0.02783769, 0.16006374, 0.48173074, 0.98871903],
       [0.21245599, 0.24465369, 0.29171198, 0.59930933]])

Final Comments

In this blog we saw how to create arrays using numpy library.We also looked at various aspects such as indexing, slicing, matrix creation, reshaping, sorting etc .It is very important to understand how to use and manipulate arrays as it will be required while working with data frames


Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...