Review Article

Phishing Detection: Analysis of Visual Similarity Based Approaches

Table 5

Summary of visual similarity based phishing detection techniques.

S. numberProposed scheme Description AdvantagesLimitations

1A layout similarity based approach for detecting phishing pages [48]HTML DOM based detection (i) Useful in online banking transaction
(ii) High true positive rate (almost 100%)
(i) It fail if attackers create different DOM for similar webpage
(ii) It fails if phishing websites only contain images

2An antiphishing strategy based on visual similarity assessment [66]Visual features matching, font size, font colour, font family, background and foreground colour, border, and so forth(i) Detection accuracy is good using visual features(i) Small dataset used to check performance of the approach
(ii) No online detection

3Mitigate web phishing using site signatures [67]Text and image based comparison(i) If only images present (embedded objects) in website, then approach can detect phishing attack(i) It needs to maintain a large database to store images
(ii) It cannot detect zero-hour phishing attack

4PhishZoo: detecting phishing websites by looking at them [68]Text, SSL certificate, and images(i) It gives 96% accuracy
(ii) It can detect zero-hour phishing attack
(i) If the logo rotates in phishing website then it cannot detect phishing attack

5BaitAlarm: detecting phishing sites using similarity in fundamental visual features [71]CSS based comparison(i) It can detect embedded object present in a webpage
(ii) Good TP rate (more than 99%)
(iii) Using large dataset for testing (7764 webpages)
(i) It used previous visited webpages to compare CSS of new page; therefore it cannot detect zero-hour attack

6Fighting phishing with discriminative keypoint features [72]Image processing based approach and use of Contrast Context Histogram to compare similarity between pages(i) If text content is replaced by image or some other embedded objects then this technique can detect it(i) Passive monitoring of websites
(ii) It needs to maintain a large database to store images

7Goldphish: using images for content-based phishing analysis [73]Convert logo into text and then use search engine to verify(i) It can identify well-known popular companies logo
(ii) It can detect zero-hour attack
(i) If background of image is dark, then it cannot convert text from images
(ii) If sentences are included in the image then it cannot find the relevant site on the search engine

8Detection and prevention of phishing attack using dynamic watermarking [74]Dynamic watermarking(i) It can protect against man in middle attack(i) High complex, registration required on each unique website for individual user

9Counteracting phishing page polymorphism: an image layout analysis approach [75]Compare two images (snapshot of webpage)(i) It can detect dynamic components like embedded objects and unicode homograph attack
(ii) High detection rate, 99.6%
(i) It cannot detect new phishing webpages

10Detecting phishing web pages with visual similarity assessment based on Earth Mover’s Distance (EMD) [77]Earth Mover’s Distance technique is used to compare webpages(i) It can dynamically adjust threshold by supervised learning
(ii) Good precision rate, 99.87%
(i) System fails to detect phishing attack if suspicious site partial copies from legitimate site

11Visual similarity based phishing detection without victim site information [70]Image processing based (i) It does not require the dataset of legitimate websites(i) When database is empty then consider the new site as a legitimate site
(ii) High false positive rate 17.5%

12Textual and visual content-based antiphishing: a Bayesian approach [79]Hybrid model, text and image based(i) The threshold adjusted by the probabilistic model derived from the Bayesian theory(i) It cannot detect zero-hour attack

13Visual similarity based phishing detection [81]Hybrid approach, using image, text, and style similarity(i) It can detect embedded objects in webpage(i) It is time-consuming and takes a lot of time to compare text and images
(ii) Signature is compared with expected legitimate page; therefore it is difficult to find expected target

14Detecting visually similar webpages: application to phishing detection [84]Gestalt theory based on visual perception(i) It can detect embedded objects
(ii) Very low false positive rate, 0.8%
(i) To detect a phishing webpage corresponding legitimate page must be present in the database

15Automatic detection of phishing target from phishing webpage [82]Hybrid features, using hyperlinks and keywords from webpage(i) It can detect zero-hour phishing attack(i) Accuracy of system depends on the TF-IDF algorithm and search engine

16Utilisation of website logo for phishing detection [83]Using machine learning to extract logo, utilised Google image search(i) It can detect zero-hour phishing attack(i) High false negative rate, 13%