Tuesday, May 19, 2020

Python Blog 2: Handling String

Handling String in Python


Introduction

In this blog we will look at some of the most important functions to analyse a string. We will look at the following functionality

  • Dividing string into chunks
  • Repalcing a given sequence with another
  • Removing leading and trailing spaces
  • Finding a string pattern

Creating a string

Lets create an string and check its length


str1 = "Sample string"
print("Text in str1 is '",str1,"' and the length ",len(str1))
 
Text in str1 is ' Sample string ' and the length  13


Formatting as string using print option

weight = 75
name = 'Parag'
print(" My Name is : %s \t weight is :%i" %(name,weight))
 My Name is : Parag      weight is :75

We can see that formatting is possible by using print function. The following options can be used along with print to enhance representation of output

  • %d : Integer
  • %s : string
  • %f : float
  • %i : integer


Concatenation Operation using “+”

str1 = "Sample string"
str2 = " Second string"
print(str1+str2)
Sample string Second string


Repetition using asterisk

str1 = "Sample string"
str2 = " Second string"
print(str1*2 + str2*3)
Sample stringSample string Second string Second string Second string


Using “in” & “not in” membership

str1 = "Sample string"
flag1='a' in str1
print(flag1)
True
flag2='x' not in str1
print(flag2)
True


Using double quotes and backslas in the strings

If there is a double quotes within the string itself, then we can use backslash


print(" Use backslash when string needs to have quotes \"Special\"")
 Use backslash when string needs to have quotes "Special"


Using “for loop” to iterate through the strings

str1 = "Sample string"
for i in str1:
    print(i,end =" ")
S a m p l e   s t r i n g 


String Functions

We will look at the below string maniupulation functions.These are most widely used in day to day programming tasks

  • lower
  • upper
  • capitalize
  • join
  • rstrip
  • split
  • rsplit
  • lstrip
  • rstrip
  • find
  • replace


lower


x = "this IS A strINg "
print(x.lower())
this is a string 


upper

x = "this IS A strINg "
print(x.upper())
THIS IS A STRING 


join method to concatenate two strings


a = "Python join method"
b = "is a powerful method"
print(';'.join([a,b]))
Python join method;is a powerful method

a = "Python join method"
b = "is a powerful method"
print(','.join([a,b]))
Python join method,is a powerful method


using list with join function


words = ['Python','is','suitable','for','ML']
print(' '.join(words))
Python is suitable for ML


‘strip’ commas from a string

There is an added space to the last element of words list. We can use rstrip function to remove it

str_t = ""
words = ['Python','is','suitable','for','ML ']
for i in words:
    str_t =str_t+i+" "
    
print(str_t)
Python is suitable for ML  
print(str_t.rstrip(' '))
Python is suitable for ML

Executing the similar example on tuples


words_t = ('Python','is','suitable','for','ML ')
print(', '.join(words_t))
Python, is, suitable, for, ML 
ls=', '.join(words_t)
print(ls.rstrip(' '))
Python, is, suitable, for, ML


split and rsplit

Output of a split is a list

a = "Python join method with split and rsplit"
print(a.split(' '))
['Python', 'join', 'method', 'with', 'split', 'and', 'rsplit']

We can Restrict number of splits


a = "Python join method with split and rsplit"
print(a.split(' ',2))
['Python', 'join', 'method with split and rsplit']

As seen above, passing a parameter of 2 generates a list with 3 elements: starting index position 1 and going to index position 2


split on multiple delimiters

Here we will use split function from ‘re’ library. In the function, each delimiter will be separated using a pipe operator

c = " This is ; a string: with * multiple /n delimiters"

# Importing the re library
import re
print(re.split(';|:|\*|/n|/',c))
[' This is ', ' a string', ' with ', ' multiple ', ' delimiters']


rsplit : scans the string from the end

r in rsplit starnds for the rear end of the string.Hence it scans the string from the end

a = "Python join method with split and rsplit"
print(a.rsplit(' '))
['Python', 'join', 'method', 'with', 'split', 'and', 'rsplit']

You would imagine that there is no difference between split and rsplit.Lets lok at the below example to understand the difference



a = "Python join method with split and rsplit"
print(a.rsplit(' ',2))
['Python join method with split', 'and', 'rsplit']

We can see that difference is clear if we specify the number of splits parameter


strip, lstrip and rstrip

Used during text mining activities and data cleaning. Examples include removing leading and trailing spaces or any other pattern.

  • strip-removes leading and trailing blanks
  • lstrip-removes leading blanks
  • rtrip-removes trailing blanks
a = "    Python join method with split and rsplit   "
print(a.strip())
Python join method with split and rsplit
a = "    Python join method with split and rsplit   "
print(a.lstrip())
Python join method with split and rsplit   
a = "    Python join method with split and rsplit   "
print(a.rstrip())
    Python join method with split and rsplit


find and rfind function

  • Returns index if string is found
  • return -1 if string is not found
  • Returns index of the first occurence
  • returns the index of last occurence
a = "Python join method with split and with rsplit"
print(a.find("join"))
7

We got 7 as the result.join starts at index position 7

a = "Python join method with split and with rsplit"
print(a.find("joi"))
7

Still got 7 as it gives the index position of the starting word

Lets retrieve the string from the index where string is found


a = "Python join method with split and with rsplit"
print(a[a.find("join"):])
join method with split and with rsplit


Lets try and create a function to find a string in the text


def find_str(source_string,search_string):

    if(source_string.find(search_string)) != -1:
        print("string found at index: %d \t String is :%s" %(source_string.find(search_string),source_string[source_string.find(search_string):]))
    else:
        print("String not found")
        
a = "Python join method with split and with rsplit"
find_str(a,"method")
string found at index: 12    String is :method with split and with rsplit


counts the number of occurrences of a substring


a = "Python join method with split and with rsplit"
print(a.count('with',1,30))
1

We got 1 as the second occurence of with is beyond the index position 30


a = "Python join method with split and with rsplit"
print(a.count('with',1,50))
2

Once we increased the index range, we were able to capture the other ‘with’ as well


Repalce

a = "Python join method with split and with rsplit"
print(a.replace(" ",""))
Pythonjoinmethodwithsplitandwithrsplit


Final Comments

In this blog we saw how to play around with text string, how to use different assocaited functions.This will enhance our understanding about how to manupulate string, convert it into list, use join function and several other functions from re(regular expression) library

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...