Occupation Health and Safety

Explain the importance of regular expressions in data analytics? 

Regular expressions are defined as fancy wildcards which are typically abbreviated as “regexp or regex” that help to identify and define a particular search pattern that are used mostly in string searching algorithms for “find and replace” or “find” most of the operations for input validation or those that are inclined to the string.  Regular expressions can be also described as specific sequences of a given characters that have the ability to broadly and narrowly match patterns during data analysis (ZAKI RIZVI, 2020).

 The essence of search partitions is allowing one to make use of characters that form search patterns to describe the type of data in a text for easy searching. Secondly, regular expressions help in removing extraneous texts from a given data set hence helping one to properly thread emails and perhaps accurately identify textual near duplicates. For easy manipulation of data set, regular expression normally makes use the several steps such as identifying and specifying a pattern, compile all patter strings to a regular expression format and use the said regular expression object to replace any matching from the string (Kamath, 2020).

Also, describe the differences among the types of regular expressions. Choose two types of regular expressions and discuss the differences between the two. Please ensure to involve two or three differences for every. Include how they help manipulate data

Main types of regular expressions are;

  1. Extended regular expression; in this type of expressions the syntax is usually supported by the POIX C that forms a regular expression of API’s as well as awk and egrep expression commands (Kamath, 2020). Normally it involves construction of notational regulations ERE that are applied on the identified utilities by suing greatly expanded regular expressions.
  2. Basic regular expression: this type makes use of set backslash to provide metacharacters with their special meanings. It does not support any form of alternations (InfoGuides, 2020)

Differences. 

Extended regular expression:

  1. Most of the characters in POSIX match except a few unique characters.
  2. Uses command are awk.
  3. Syntax use is awk

Basic regular expressions type.

  1. commands used are grep and sed.
  2. Make sure of instructions and character matching
  3. Use syntax

References

InfoGuides. (2020, October 30). Regular Expressions – Working with Data – InfoGuides at George Mason University. Retrieved from https://infoguides.gmu.edu/data-work/regex

Kamath, K. U. (2020, June 5). Regular Expression — A very important tool for data science | by Kiran U Kamath | Analytics Vidhya | Medium. Retrieved from https://medium.com/analytics-vidhya/regular-expression-a-very-important-tool-for-data-science-6806110b5e43

Kang, J., Caffo, B., & Liu, H. (2017). Recent Advances and Challenges on Big Data Analysis in Neuroimaging. Lausanne, Switzerland: Frontiers Media SA.

ZAKI RIZVI, M. S. (2020, January 27). Applications Of Regular Expressions. Retrieved from https://www.analyticsvidhya.com/blog/2020/01/4-applications-of-regular-expressions-that-every-data-scientist-should-know-with-python-code/