Interesting commands in Ubuntu -Part I

1. ^foo^bar
This Ubuntu command helps to modify the last run command. This avoids the retyping of the whole command again and is helpful when typo’s occurs in a long command. The syntax of the command is ^typo_code^actual_code
Example:
pint localhost -c 3

greetz@ubuntu:~$ pint localhost -c 3
No command 'pint' found, did you mean:
 Command 'print' from package 'mime-support' (main)
 Command 'pinot' from package 'pinot' (universe)
 Command 'pinq' from package 'qmail' (universe)
 Command 'tint' from package 'tint' (universe)
 Command 'pinto' from package 'pinto' (universe)
 Command 'ping' from package 'iputils-ping' (main)
 Command 'ping' from package 'inetutils-ping' (universe)
 Command 'pinta' from package 'pinta' (universe)
pint: command not found

^pint^ping

greetz@ubuntu:~$ ^pint^ping
ping localhost -c 3
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.023 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.019 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.022 ms
--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.019/0.021/0.023/0.004 ms

Note: Carelessly replacing the typo with system commands may be risky.


2. sudo !!
This command can be used in situation where you forgot to specify sudo for the root privileges before the command. sudo !! command recalls the last command and runs it with sudo permissions.
Example:
apt-get update

greetz@ubuntu:~$ apt-get update
E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied)
E: Unable to lock directory /var/lib/apt/lists/
E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?

sudo !!

greetz@ubuntu:~$ sudo !!
sudo apt-get update
[sudo] password for greetz: 
Get:1 http://security.ubuntu.com trusty-security InRelease [65.9 kB]
Ign http://extras.ubuntu.com trusty InRelease 
......

3. look
The look command can be a handy option to check for words from an English dictionary in case you have forgotten the spellings.
Example:
look geek

 
greetz@ubuntu:~$ look geek
geek
geek's
geekier
geekiest
geeks
geeky

4. tree
This command helps to list the current directory structure in a tree format. You have to  first install the package ‘tree‘ (sudo apt-get install tree) in order to use this command.
Example:
tree

greetz@ubuntu:~$ tree
.
├── Desktop
├── Documents
│   ├── greetz to geeks.docx
│   └── New User Manual.docx
├── Downloads
├── examples.desktop
├── Music
├── Pictures
├── Public
├── Templates
└── Videos
8 directories, 3 files

5. ss
The ss command is a very useful and faster command which gives the socket statistics. It is a substitute for the netstat command.
Example:
ss

greetz@ubuntu:~$ ss
Netid State Recv-Q Send-Q          Local Address:Port    Peer Address:Port 
u_str ESTAB 0      0         @/tmp/.X11-unix/X0   17472          *   17471 
u_str ESTAB 0      0                         *    16670          *   16671 
u_str ESTAB 0      0      @/tmp/dbus-VQpvce49KE   17336          *   17335 
..........

Which function in R

The which() function will return the position of the elements(i.e., row number/column number/array index) in a logical vector which are TRUE. Unlike the other base R functions, the which() will accept only the arguments with typeof as logical while the others will give an error.

Syntax: which(x, arr.ind = FALSE, useNames = TRUE)
                 arrayInd(ind, .dim, .dimnames = NULL, useNames = FALSE)

x: can be a logical vector or an array. NAs are treated as FALSE and  allowed. But they will be omitted
arr.ind: logical; should array indices be returned when x is an array?
ind: integer-valued index vector, as resulting from which(x)
.dim: integer vector
.dimnames: optional list of character dimnames(.), of which only .dimnames[[1]] is used
useNames: logical indicating if the value of arrayInd() should have (non-null) dimnames at all

Examples:

> which(letters == "g")
[1] 7
> x <- c(1,5,8,4,6)
> x
[1] 1 5 8 4 6
> which(x == 5)
[1] 2
> which(x != 5)
[1] 1 3 4 5
> datasets::npk
block N P K yield
1   1  0 1 1 49.5
2   1  1 1 0 62.8
3   1  0 0 0 46.8
4   1  1 0 1 57.0
5   2  1 0 0 59.8
6   2  1 1 1 58.5
7   2  0 0 1 55.5
8   2  0 1 0 56.0
9   3  0 1 0 62.8
10  3  1 1 1 55.8
11  3  1 0 0 69.5
12  3  0 0 1 55.0
13  4  1 0 0 62.0
14  4  1 1 1 48.8
15  4  0 0 1 45.5
16  4  0 1 0 44.2
17  5  1 1 0 52.0
18  5  0 0 0 51.5
19  5  1 0 1 49.8
20  5  0 1 1 48.8
21  6  1 0 1 57.2
22  6  1 1 0 59.0
23  6  0 1 1 53.2
24  6  0 0 0 56.0
> which(npk$yield == 62)
[1] 13
> which((npk$yield == 48.8) & (npk$N == 1))
[1] 14
which(npk$yield == max(npk$yield))
[1] 11
> x <- matrix(1:16,4,4)
> x
[,1] [,2] [,3] [,4]
[1,]  1    5   9    13
[2,]  2    6  10    14
[3,]  3    7  11    15
[4,]  4    8  12    16
> which.min(x)
[1] 1
> which.max(x)
[1] 16
> which(x %% 2 == 0)
[1] 2 4 6 8 10 12 14 16
> which(x %% 2 == 0, arr.ind=TRUE)
row col
[1,] 2 1
[2,] 4 1
[3,] 2 2
[4,] 4 2
[5,] 2 3
[6,] 4 3
[7,] 2 4
[8,] 4 4
> which(x %% 2 == 0, arr.ind=TRUE, useNames = FALSE)
[,1] [,2]
[1,] 2   1
[2,] 4   1
[3,] 2   2
[4,] 4   2
[5,] 2   3
[6,] 4   3
[7,] 2   4
[8,] 4   4

 

Match function in R

The match() function returns a vector of the position of first occurrence of the vector1 in vector2. If the element of the vector1 does not exist in vector2, NA is returned.

Syntax: match(vector1, vector2, nomatch = NA_integer_, incomparables = NULL)

vector1: vector, the values to be matched
vector2: vector, the values to be matched against
nomatch: the value which should be returned when no match is found
incomparables: the vector of values that cannot be matched.

By default the nomatch argument will return NA in case the match is not found in vector2.

According to the R Documentation  the %in% operator is equivalent to match(). It is a logical vector which indicates whether a match was located for vector1 in vector2. The result value will be either TRUE or FALSE but never NA. So the %in% operator can be useful in if conditions.

Syntax: vector1 %in% vector2

Examples:

print(match(5, c(1,2,9,5,3,6,7,4,5)))
[1] 4
5 %in% c(1,2,9,5,3,6,7,4,5)
[1] TRUE
8 %in% c(1,2,9,5,3,6,7,4,5)
[1] FALSE
> v1 <- c("a1","b2","c1","d2")
> v2 <- c("g1","x2","d2","e2","f1","a1","c2","b2","a2")
> x <- match(v1,v2)
> x
[1] 6 8 NA 3
> v1 <- c("a1","b2","c1","d2")
> v2 <- c("g1","x2","d2","e2","f1","a1","c2","b2","a2")
> v1 %in% v2
[1] TRUE TRUE FALSE TRUE
> v1 <- c("a1","b2","c1","d2")
> v2 <- c("g1","x2","d2","e2","f1","a1","c2","b2","a2")
> x <- match(v1,v2, nomatch = 0)
> x
[1] 6 8 0 3
> v1 <- c("a1","b2","c1","d2")
> v2 <- c("g1","x2","d2","e2","f1","a1","c2","b2","a2")
> x <- match(v1,v2, nomatch = 0, incomparables = "a1")
> x
[1] 0 8 0 3

 

 

 

 

 

 

 

 

 

 

 

First Practical Collision Attack on SHA-1

Secure Hash Algorithm (SHA-1) was designed by National Security Agency (NSA) during 1995 as part of the Digital Signature Standard and mainly used to verify the authenticity of the digital content. SHA-1 is a cryptographic hash function which produces 160 bit hash value known as a Message Digest. It is used in several security applications and protocols, including TLS, SSL, PGP, SSH, S/MIME, and IPsec.

The theoretical flaws of SHA-1 was first identified in 2005 which would break the algorithm using Collision attacks. But despite these warnings SHA-1 was widely used as a powerful cryptographic algorithm over the last decade. Google researchers on 23rd February 2017 announced the First Practical Collision Attack on SHA-1.

SHA-1 hash function converts an input message to a hash value called as a digital fingerprint for the given message which is rendered as hexadecimal numbers. By checking the hash, the authenticity and integrity of the message can be easily verified. Each message creates a unique hash value, if the hashes match then the messages will also match. This helps to verify the messages without exposing the message itself.

Collision attacks happens when same digital fingerprint is created for two different messages i.e. the legitimate message and malicious message. This could allow the attackers to forge digital signatures and break communications encoded with SHA-1. On the grounds of collision attacks on SHA-1, it is not secure to use it for digital signatures, file integrity, and file identification purposes.

As a proof of concept, the researches have used two different PDF’s with totally different content and have the same SHA-1 hash. As pointed out by one of the researchers, this attack required over 9,223,372,036,854,775,808 SHA-1 computations, This took the equivalent processing power as 6,500 years of single-CPU computations and 110 years of single-GPU computations. Google is planing to release the proof of concept code within 90 days. So its high time to move to safer standards before real world attacks happen.

SHA-1 collision attacks have not affected the HTTPS-protected websites as the certificate authorities are no longer allowed to use SHA-1 to sign TLS certificates.

WHAT IS MACHINE LEARNING?

Machine learning (ML) was defined as “the field of study that gives computers the ability to learn without being explicitly programmed” by Arthur Samuel. It was evolved from pattern recognition and computational learning in artificial intelligence. ML mainly focuses on developing models which are capable of independently adapting to changes when exposed to new data.

The requirements to create a good Machine Learning system includes data preprocessing capabilities, Choosing the correct algorithms, automation and iterative processes, scalability and ensemble modelling to improve the performance of the models.

ML tasks are classified into three based on the nature of learning. They are,

  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning

Supervised learning algorithms uses the given set of predictors (independent variables) to predict the target variable (dependent variable). The algorithm learns by comparing the actual output with the predicted outputs to find the errors, this process is called Training. The model is modified according to the trained data. Regression and classification are two major subcategories of supervised ML. Supervised learning is used in applications which require predicting future events based on historical data.

Unsupervised learning does not have target variable (dependent variable) to predict. They are used for exploring the data to find out some relationship within the data. There are no training data used in unsupervised learning. The system uses the given data to find out correlations and discovers patterns within the data.

Reinforcement learning algorithms makes use of trail and error method to find out which actions yield the maximum output. The machine learns from its past experience and applies the learning in making accurate decisions. These algorithms are used for gaming, robotics and navigation.

R or Python for Data Science?

Addressing the question ‘R or Python for data science’ depends mainly on the problems which is to be solved, the tools required to solve the problem and your personal preference.

Python is a general purpose programming language created by Guido Van Rossum in 1991 and R was created four years later by Ross Ihaka and Robert Gentleman keeping the statisticians in mind.

R has a steep learning curve which makes it a bit difficult for beginners but once the basics are clear it will be easy to learn advanced stuffs. On the other hand, the simplicity and readability of Python makes its learning curve relatively low and also it is a good choice for beginners.

The same functionality can be written in different ways in R but it is not the same in Python.

RStudio is the best IDE for R. Spyder, IPython, Notebook, Eric etc are some of the IDE for Python. Both R and Python have a huge number of reliable libraries. The CRAN is the biggest repository of R packages while PyPi is the Python repository.

The popular libraries in R includes caret, dplyr, data.tables, zoo, ggplot2, ggvis, stringr, lattice etc. Libraries like Pandas, Scikit Learn, SciPy, NumPy, matplotlib etc makes Python more attractive. Both R and Python have a good support and documentation.

When it comes to data visualization, R has an upper hand over Python. Packages like ggplot2 and ggvis are two incredible visualization packages in R.

Few examples of codes from both the languages which are used to get the same results.

To import a .csv dataset,
R:
dataset_name <- read.csv(“dataset_name.csv”)

Python:
import pandas
dataset_name = pandas.read_csv(“dataset_name.csv”)

To find the dimension of the dataset,
R:
dim(dataset_name)

Python:
dataset_name.shape

To obtain the first n observation in a dataframe,
R:
head(dataset_name)

Python:
dataset_name.head()

For splitting the dataset into training and test sets,
R:
RowCount <- floor(0.75 * nrow(dataset_name))
set.seed(123)
trainIndex <- sample(1:nrow(dataset_name), RowCount)
train <- dataset_name[trainIndex,]
test <- dataset_name[-trainIndex,]

Python:
train = dataset_name.sample(frac=0.75, random_state=1)
test = dataset_name.loc[~dataset_name.index.isin(train.index)]

R is more functional in nature and has a lot of build-in data analysis features. On the other hand Python is object oriented language which mostly relay on packages for data analysis. When it comes to data science, both these languages are important and it depends on the data analyst to choose between the two. If you know both, then you are definitely ahead of many others in this field.

How to reset a forgotten Ubuntu password

Sometimes you might have experienced a situation where you forgot your Ubuntu login password. I have came across such a situation. This post is about resetting a forgotten login password in Ubuntu.

1

Resetting a forgotten password in Ubuntu is very easy with the help of root-shell. All you have to do is to follow these simple steps.

Firstly you have to reboot your system, and press the ESC key when GRUB starts loading. From the menu, select the ‘Advanced options for Ubuntu‘ as shown.

 

2

Now select the ‘recovery mode‘ option from the menu.

3

From the recovery menu, select the option ‘root‘ which opens the root-shell prompt.

4

Now type the following commands and enter your new password when prompted,

mount -rw -o remount /
passwd <username>
sync
reboot -f

For example, I have to reset the password for the username ‘greetz’. Then my command will be like,

mount -rw -o remount /
passwd greetz
sync
reboot -f

6

Note: If you skip the command ‘mount -rw -o remount /’ you might get ‘Authentication token manipulation error’.

5

That’s it, you can again login with your new password.

LISTING ALL THE FILES WITH A SPECIFIED EXTENSION USING R

For listing all the files in your current working directory we can use following function,

 list.files()

list-files

Note: By default, the current working directory will be your home directory. For managing the workspaces (choosing the folder or directory) we can use the setwd() and getwd(). Refer my previous post on Managing workspaces in R

What if we need to list only the files with a specified extension ? In that case, we can make use of the functions Sys.glob() and the list.files() along with some arguments.

If we need to list only the files with the extension (say .csv) from our working directory,

list.files(pattern = “\\.csv$”)

2

Here we use the regular expression to match the files with the specified file extension. The ‘$‘ symbol shows the end-of-the-string, ‘\\‘ symbol is used to make sure that the files match the specified extension exactly.

Note: The above command is case sensitive and only displays the file extensions in lower case. If your file extension contains upper cases or a combination of both upper and lower cases then specify it in the function as shown,

list.files(pattern = “\\.csv$”, ignore.case=TRUE)

3

Note: This command can also work without ‘\\’ symbol. In this case the files which has the string ‘csv’ in its extension will be listed.

list.files(pattern = “.csv$”)

4

The Sys.glob() can also be used for listing the files which has a specified extension

Sys.glob(“*.csv”)
The symbol ‘*’ means zero or more characters

5

Note: This function is also case sensitive. Both the list.files(pattern = “\\.csv$”) and Sys.glob(“*.csv”) gives similar results except that Sys.glob() returns a sorted list of files.

TRACEBACK FUNCTION IN R

We often come across errors while coding. In R, when a function returns an error, there is an easy way to figure out which command have triggered the error. All we need is to use the traceback function.

This helps a developer to find out where and why the error occured. Once the error is traced it is easy for debugging and rectifying the error. The function syntax is as follows,

traceback()

The traceback() by default prints the call stack of the last error. Apart from tracing the errors, this function is also helpful in times if we require to print the current stack.

Note: The errors caught via try or tryCatch will not generate a traceback

DIFFERENCE BETWEEN library() AND require() IN R

The library() and require() can be used to attach and load add-on packages which are already installed. The installed packages are identified with the help of the ‘DESCRPTION’ file which contains Build:field. The name of the package which need to be loaded using the library() and require() must match the name of the package’s ‘DESCRPTION’ file.

Note: The package names are case sensitive.

Both these functions keep updating the list of the attached packages without reloading the namespaces which are already loaded.

The main difference between these functions are,

The library() by default returns an error if the requested package does not exist.

example:
> library(xyz)
Error in library(xyz) : there is no package called ‘xyz’

The require() is designed to be used inside functions as it gives a warning message and returns a logical value say, FALSE if the requested package is not found and TRUE if the package is loaded.

example:
> require(xyz)
Loading required package: xyz
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called ‘xyz’

It is better to use the library() as it gives the error message if the package is not found during the package loading time. This will indeed avoid unnecessary headaches of tracking down the errors caused while attempting to use the library routines which are not installed.