Array in Python using numpy library
Parag Verma
Introduction
In this blog we will look at arrays and how to store data in it.Broadly, we will look at the following points
- Initialization of Arrays
- Indexing in Array
- Multidimensional Array
- Matrix and associated functions
- Other functions like ones, ones_like etc
Importing the Library
To use arrays, we need to import the numpy library as shown below.We will use the alias name np for referring to it
import numpy as np
Creating Array using list
x = [1,2,3,4,5]
y =np.array(x)
print(y)
[1 2 3 4 5]
print(type(y))
<class 'numpy.ndarray'>
We can see that an darray y has been created by using a list ‘x’. If we print the type of y, it shows an array
Creating Array using list
x = (1,2,3,4,5)
y = np.array(x)
print(y)
[1 2 3 4 5]
print(type(y))
<class 'numpy.ndarray'>
Getting the dimensions of the array using shape
l2 = [[1,2,3],[4,5,6],[6,7,8]]
l3 = np.array(l2)
print(l3) #data is stored row wise
[[1 2 3]
[4 5 6]
[6 7 8]]
print(l3.shape)
(3, 3)
3 elements are stored in the array with each having 3 elements of its own
Getting the dimensions of the 1 Dimensional array using shape
x = (1,2,3,4,5)
y = np.array(x)
y.shape
(5,)
It is a single dimensional array. It is worthy to note that Arrays with varying dimensions are considered as 1 dimensional.Lets look at an example
l2 = [[1,2],[4,5,6],[6,7,8,4]]
l3 = np.array(l2)
l3 #data is stored row wise
array([list([1, 2]), list([4, 5, 6]), list([6, 7, 8, 4])], dtype=object)
l3.shape
(3,)
There are 3 elements present in the array and since all of them have different number of elements within them, it is a single dimensional array
Arrays using nextedt lists
Two or higher dimensional arrays are initialised using nexted list
y = np.array([[[1,2],[3,4]],[[5,6],[7,8]]]) # [[1,2],[3,4]] as one row
#Lets access the First element and items within it
y[0],y[0][0],y[0][1]
#Lets access the Second element and items within it
(array([[1, 2],
[3, 4]]), array([1, 2]), array([3, 4]))
y[1],y[1][0],y[1][1]
# Shape of y
(array([[5, 6],
[7, 8]]), array([5, 6]), array([7, 8]))
y.shape
(2, 2, 2)
Array data types: Data types of elements stored in it
Basically there are 4 data types that are typically used
- int32
- float64
- <U1 : if character <U5 : if maximum length of string is 5
- complex
Lets look at some examples to understand this.As complex is rarely used, we wont be discussing that
int32 and float64
x = [0, 1, 2, 3, 4]
x1 = np.array(x)
x1.dtype
dtype('int32')
y = [0, 1.3, 2.5, 3, 4]
y1 = np.array(y)
y1.dtype
dtype('float64')
string
y = ['data','string','science']
y1 = np.array(y)
y1.dtype
dtype('<U7')
It gives U6 as the string element with max length(science) is 6
Array Operation
Arrays follow element by element operations
a = np.array([1,2,3,4])
a
array([1, 2, 3, 4])
a*a
array([ 1, 4, 9, 16])
a.shape # shows 4 element (4,)
(4,)
type(a.shape)
<class 'tuple'>
b = np.array([[1,2,3],[4,5,6]])
b.shape
(2, 3)
b*b
array([[ 1, 4, 9],
[16, 25, 36]])
Matrix using the numpy library
b = np.array([[1,2,3],[4,5,6]])
c = np.asmatrix(b)
c
matrix([[1, 2, 3],
[4, 5, 6]])
type(c)
<class 'numpy.matrixlib.defmatrix.matrix'>
b.shape # Read as number of rows, number of columns
(2, 3)
Matrix with uneven values
There is an added space to the last element of words list. We can use rstrip function to remove it
c = np.array([[1,2,3],[4,5],[4,6,7,8]])
c
array([list([1, 2, 3]), list([4, 5]), list([4, 6, 7, 8])], dtype=object)
c.shape # Considered as single dimension
(3,)
d = np.asmatrix(c)
d
matrix([[list([1, 2, 3]), list([4, 5]), list([4, 6, 7, 8])]], dtype=object)
d.shape # we can see the difference between shape of c and d
(1, 3)
Matrix can only be 2 Dimensional
y = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
y
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
y.shape
# Creating matrix using y will give an error
# np.asmatrix(y) # Matrix can only be 2 Dimensional
(2, 2, 2)
Array Concatenation:using axis argument
x = np.array([[1.0,2.0],[3.0,4.0]])
y = np.array([[5.0,6.0],[7.0,8.0]])
x
array([[1., 2.],
[3., 4.]])
y
# axis = 0 :Concatenating along the rows /indexes
array([[5., 6.],
[7., 8.]])
np.concatenate((x,y),axis = 0)
# axis = 1 :Concatenating along the columns
array([[1., 2.],
[3., 4.],
[5., 6.],
[7., 8.]])
np.concatenate((x,y),axis = 1)
array([[1., 2., 5., 6.],
[3., 4., 7., 8.]])
Array Concatenation:using vstack and hstack
x = np.array([[1.0,2.0],[3.0,4.0]])
y = np.array([[5.0,6.0],[7.0,8.0]])
x
array([[1., 2.],
[3., 4.]])
y
array([[5., 6.],
[7., 8.]])
np.vstack((x,y))# Same as np.concat , axis = 0 ; vertical stacking
array([[1., 2.],
[3., 4.],
[5., 6.],
[7., 8.]])
np.hstack((x,y)) # Same as axis = 1 ; Horizontal Stacking
array([[1., 2., 5., 6.],
[3., 4., 7., 8.]])
Array Slicing
Arrays like Lists and Tuples can be sliced.In this section we will look at the detiled analysis of different methods that can be used to subset an array
1 - Dimensional Slicing
x = [1,2,3,4,5,6,7,8]
x[:] # Prints all the elements
[1, 2, 3, 4, 5, 6, 7, 8]
x[2:5] # Prints elements from 2 to 4 index position
[3, 4, 5]
x[2::2] # Prints elements starting from 2 index position with a leap of 2 positions
[3, 5, 7]
2- Dimensional Slicing
In 2-dimensional arrays,
- the first dimension specifies the row or rows of the slice
- the second dimension specifies the the column or columns.
y = np.array([[1, 2, 3, 4, 5],[5, 6, 7, 8, 9]])
y
array([[1, 2, 3, 4, 5],
[5, 6, 7, 8, 9]])
y.shape
(2, 5)
First Method of selecting 2 - Dimensional records ; shortest mode and preferred method
y[0:1,:3] # Selecting Rows and Columns (Easy Method of Slicing)
array([[1, 2, 3]])
Second Method of selecting 2 - Dimensional records
y[0:1,:][:,:3]
array([[1, 2, 3]])
Third Method of selecting 2 - Dimensional records
y[0:1][:,:3]
array([[1, 2, 3]])
Special Array
- ones
- ones_like
- zeroes
ones
ones generates an array of 1s and is generally called with one argument, a tuple, containing the size of each dimension
x = np.ones((2,3,4)) # Pass tuple as an arguement
x
array([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]])
ones_like
ones_like : Create the array of same dimension as given in the arguement and fill it with 1’s.Used when we want to initialize the array
a = np.array([[1,2,3],[4,5,6]]) # Defined using Lists of List
a
array([[1, 2, 3],
[4, 5, 6]])
y=np.ones_like(a)
y
array([[1, 1, 1],
[1, 1, 1]])
zero
Same as ones.Inly difference is that it populates it with zeroes
x = np.zeros((2,3,4)) # Pass tuple as an arguement
x
array([[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],
[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]])
zeros_like
Same as ones.Inly difference is that it populates it with zeroes
a = np.array([[1,2,3],[4,5,6]]) # Defined using Lists of List
a
array([[1, 2, 3],
[4, 5, 6]])
y=np.zeros_like(a)
y
array([[0, 0, 0],
[0, 0, 0]])
Reshape
- Reshape transforms an array with one set of dimensions and to one with a different set, preserving the number of elements.
- Arrays with dimensions M by N can be reshaped into an array with dimensions K by L as long as MN = K L.
- The most useful call to reshape switches an array into a vector or vice versa.
- Can be used to flatten an array(basically convert into a wide format)
x = np.array([[1,2],[3,4]])
x
array([[1, 2],
[3, 4]])
y=np.reshape(x,(1,4)) # 1 row and 4 columns ; used to flatten the array
y
array([[1, 2, 3, 4]])
z=np.reshape(x,(4,1)) # 4 rows and 1 columns
z
array([[1],
[2],
[3],
[4]])
x.shape
(2, 2)
y.shape
(1, 4)
z.shape
(4, 1)
Ravel
- It is also used to flatten the array
- ravel does not copy the underlying data if possible; therefore faster than flatten
Lets randomly populate values in an array
c = np.random.rand(2,3)
c
# Lets trnaspose
array([[0.53764445, 0.21785175, 0.82587626],
[0.11052173, 0.48502006, 0.93388508]])
c.T.ravel()
array([0.53764445, 0.11052173, 0.21785175, 0.48502006, 0.82587626,
0.93388508])
arrange
It is used to define range in arrays
np.arange(5)
array([0, 1, 2, 3, 4])
list(np.arange(5))
[0, 1, 2, 3, 4]
x = np.reshape(np.arange(6),(2,3)) # np.arange is an array
x
array([[0, 1, 2],
[3, 4, 5]])
split, vsplit and hsplit
x = np.reshape(np.arange(20),(4,5))
x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
a = np.split(x,2,axis = 0) # axis = 0 means across rows /vertically stack
a
[array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]]), array([[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])]
a[0]
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
We got 1 as the second occurence of with is beyond the index position 30
x = np.reshape(np.arange(20),(4,5))
x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
y = np.vsplit(x,2) # Output is a list of arrays
y
[array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]]), array([[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])]
Same as the split with axis=0
x = np.reshape(np.arange(20),(4,5))
x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
z = np.hsplit(x,[1,3]) # Will break into 3 parts [0:1], [1:3] and [3:]
z
[array([[ 0],
[ 5],
[10],
[15]]), array([[ 1, 2],
[ 6, 7],
[11, 12],
[16, 17]]), array([[ 3, 4],
[ 8, 9],
[13, 14],
[18, 19]])]
Sorting an array
a = np.random.random((5,4))
a
array([[0.50773984, 0.46492856, 0.25738772, 0.60823973],
[0.25928653, 0.52348181, 0.8193148 , 0.45929712],
[0.60380532, 0.55430883, 0.48055292, 0.19994516],
[0.84830118, 0.94296168, 0.80052804, 0.67806645],
[0.81694176, 0.47756603, 0.91126107, 0.85124623]])
a.sort()
a
array([[0.25738772, 0.46492856, 0.50773984, 0.60823973],
[0.25928653, 0.45929712, 0.52348181, 0.8193148 ],
[0.19994516, 0.48055292, 0.55430883, 0.60380532],
[0.67806645, 0.80052804, 0.84830118, 0.94296168],
[0.47756603, 0.81694176, 0.85124623, 0.91126107]])
It sorts the array in a row wise fashion by default
a = np.random.random((5,4))
a
array([[0.41756356, 0.67744361, 0.77485213, 0.10424094],
[0.11203566, 0.55193574, 0.76895605, 0.37702381],
[0.83575451, 0.75336271, 0.18544628, 0.32940647],
[0.88554859, 0.10787863, 0.56780058, 0.21357942],
[0.7273673 , 0.80553625, 0.22282339, 0.82791999]])
a.sort(axis = 0)
a
array([[0.11203566, 0.10787863, 0.18544628, 0.10424094],
[0.41756356, 0.55193574, 0.22282339, 0.21357942],
[0.7273673 , 0.67744361, 0.56780058, 0.32940647],
[0.83575451, 0.75336271, 0.76895605, 0.37702381],
[0.88554859, 0.80553625, 0.77485213, 0.82791999]])
axis=0 sorts the array column wise
a = np.random.random((5,4))
a
array([[0.95475834, 0.93488635, 0.56371189, 0.90189626],
[0.82792212, 0.4956901 , 0.06061518, 0.61593093],
[0.13823703, 0.09479956, 0.81690228, 0.87854127],
[0.05530786, 0.08058938, 0.44211145, 0.19927479],
[0.68592515, 0.2390315 , 0.79272115, 0.61011319]])
a.sort(axis = 1)
a
array([[0.56371189, 0.90189626, 0.93488635, 0.95475834],
[0.06061518, 0.4956901 , 0.61593093, 0.82792212],
[0.09479956, 0.13823703, 0.81690228, 0.87854127],
[0.05530786, 0.08058938, 0.19927479, 0.44211145],
[0.2390315 , 0.61011319, 0.68592515, 0.79272115]])
a = np.random.random((5,4))
a
array([[0.13042987, 0.69811332, 0.21264715, 0.79817816],
[0.62537929, 0.79332926, 0.55932878, 0.26451115],
[0.846183 , 0.67627071, 0.3874921 , 0.28636519],
[0.16006374, 0.98871903, 0.48173074, 0.02783769],
[0.24465369, 0.29171198, 0.21245599, 0.59930933]])
np.sort(a)
array([[0.13042987, 0.21264715, 0.69811332, 0.79817816],
[0.26451115, 0.55932878, 0.62537929, 0.79332926],
[0.28636519, 0.3874921 , 0.67627071, 0.846183 ],
[0.02783769, 0.16006374, 0.48173074, 0.98871903],
[0.21245599, 0.24465369, 0.29171198, 0.59930933]])
Final Comments
In this blog we saw how to create arrays using numpy library.We also looked at various aspects such as indexing, slicing, matrix creation, reshaping, sorting etc .It is very important to understand how to use and manipulate arrays as it will be required while working with data frames