Invented by William Parke Bowditch, Raul Garcia Calvo, SecureWorks Corp
The SecureWorks Corp invention works as followsThe present disclosure discloses systems and methods to determine whether a request made for information about a user is malicious, safe/legitimate or not. The request information for a request of a user’s data can be received and one or several screenshots can be provided to a machine-learning model. The machine-learning model can produce a probability level or confidence that the request is malicious.
Background for Systems and Methods Using Computer Vision and Machine Learning for Detection of Malicious Actions
Malicious actors can use phishing and other tactics to try to steal login credentials of unsuspecting Internet users. Threat actors might try to steal the details of an individual’s bank account, or other personal data, like their email account, corporate account, etc. This would give them access to confidential information about a person and/or company. Users are often tricked into providing valid login credentials for a website that appears to be legitimate but is controlled by a threat agent. Threat actors can then use stolen credentials to gain access to the user’s account. They could steal money or sensitive information that was thought to be protected by access controls. Countermeasures against phishing attacks are educating users on the signs and indicators of an attack, creating blacklists that include webpages reported as phishing websites, etc. According to some estimates however, millions of new phishing websites are registered every month. Phishing attacks are responsible for the majority of cybersecurity incidents. Threat actors continue to perceive phishing, despite existing protection solutions or countermeasures.
Recently, some attempts have been made to automate the detection of phishing, such as by using social graphs within a corporate environment to create a network of correspondence among users to identify abnormal connections to external websites. These systems can generate false positives for new connections and/or overlook or ignore an initial connection between a company network and a phishing website. Other systems have also attempted to implement deep belief networks trained on ISP data flows, but this approach requires large quantities of labelled raw logs and a model which needs to be continually retrained to keep up with the changing landscape.
The present disclosure provides computer vision and machine-learning for phishing detection in order to address these and other related and unrelated issues/problems.
The present disclosure, as briefly described, is directed at systems and methods that utilize computer vision and machine-learning components and processes to detect malicious behavior such as potential phishing. The systems/methods in the present disclosure, for example, can implement a number of processes/components that can detect that a user is directed to a page with interface elements that indicate that the site impersonates a reputable website and is asked to enter login credentials. At that point, the user is warned/alerted and/or prevented from entering login credentials on the site.
For instance, the systems/methods are able to identify that the page that the user navigates to is a log-in screen, but it is not on a Whitelist. They can also use computer vision and machine-learning components and processes to detect interface elements such as logos and trademarks. In order to identify websites that impersonate reputable companies. The systems/methods may also alert the user via pop-ups, alarms or notifications. The domain of a webpage can be labeled and stored into a database for future use (e.g. in a Blacklist).
In one embodiment, a system for detecting or classifying security threats or malicious actions can include at least one processor and a memory that contains a plurality instructions that when executed by one or multiple processors implements one or several components that facilitate the detection or classification or security threats/malicious acts, such as the phishing attack.
The one component or components can be configured in a way to receive data or information related to the request for information from a user. Example requests can include an email or a webpage requesting a user’s information/credentials, a webpage with a login for entry of a users personal credentials/login information, or other similar webpages or requests for credentialing information. Information or data related the request may include iterate URLs (requests), POST requests (email data stored in a datacenter), emails forwarded by users, webpages that have a login form etc. Or combinations thereof.
The one or multiple components of the systems can include a detection and extraction processor, which is configured to determine whether the request has been classified as malicious or safe. If the request does not fall into either category, the information collected will be sent to the classification engine.
In some variations, an initial detection and extracting processor may, as part of the initial review/determination, compare the data or information sought by a website request to data or information in a blacklist or a whitelist, to determine if the request is indicative or matches a known malicious request/site. The initial detection and extractor can also compute or extract features from information related to the site request. These include domain reputation, IP analysis or keywords in an e-mail, among others, to determine if the request/site matches or is indicative of a known malicious or known safe request.
The classification engine may include a computer-vision model and a model of machine learning. The computer vision model can be configured to get at least one screenshot of the request and then provide that screenshot, as well as any additional data or information related to it, to the machine-learning model. The machine learning model is able to extract or identify screenshot information, and determine or generate a probability level or confidence that the request was malicious or not. This extracted and/or identifies screenshot information may include user interface elements such as logos, slogans or trademarks. It can also contain phrases, keywords, images or indicia.
If the probability or the confidence level determined by the machine-learning model indicates that the requested is malicious (e.g. the determined probability level or the confidence level exceeds the prescribed threshold), one or more system components, such as the logic/action processor can be configured in order to classify it as malicious and/or to generate and notify the user via an alert, alarm or notification. The logic/action processor further can be configured to generate and/or update a Blacklist of known malicious requests based on output/classifications of the machine learning model.
In one embodiment, this disclosure provides a method or a process for detecting and classifying malicious actions or activities by threat actors such as phishing. The method may include receiving information about a request made by a user, such as an email or a request to access a site, and obtaining one or more screenshots of the website/request, e.g. using a computer-vision model. The method can also include providing screenshots to a computer vision model that includes a machine-learning model. This machine learning model will identify or extract screenshot information from the submitted screenshots, and analyze the screenshot information as well as/or other information to generate a confidence or probability level that the website/request is malicious. The method can take further action if the probability or confidence that the website/request exceeds the prescribed threshold. This could include classifying the domain or actor(s) associated with the request as malicious, or generating an alarm, notification, or alert to notify the user.
The following detailed description in conjunction with the accompanying illustrations will reveal to those in the know various objects, features, and advantages of this disclosure.
The following description is combined with the Figures to help you understand the teachings that are disclosed in this document. The description is centered on the implementations and embodiments. It is intended to help describe the teachings. This should not be taken as a restriction on the scope of application or the applicability.
As shown in FIGS. The present disclosure describes systems and methods to detect and/or act on security threats, including actions taken by threat actors such as requests for user information or credentials. As part of phishing. For example, the systems and methods can utilize computer vision and machine learning components and processes to determine if a request or requests (such as a webpage requesting information being accessed by user, a webpage with a login form for entry of a user’s login credentials/information, a link in an email or other action directing a user to webpage, etc. If the system and method determine that the request (or other suitable requests), is an attempt by a malicious party to pose as a legitimate request in order to steal the user’s login credentials or information, then they can initiate or direct one or more protective measures, such as sending out an alert or alarm. The system can alert the user, their employer, or another appropriate entity such as a Managed Security Service Provider, (?MSSP?) Security researcher, etc. can be notified that the request was malicious and/or block further interaction with the requestor/threat actors, such as a website, domain, or server.
In some embodiments as shown in FIG. The system 10 can be composed of a number of modules, components, etc. The system 12 includes a detection processor 14 that performs an initial search/analysis to determine if such requests are a threat or legitimate. The plurality 12 can also include a classification engine or classifier 16 configured to analyze the webpage or email data and determine a probability, for example, by using machine learning analyses of images or screenshots, or other information. The plurality 12 can also include a logic/action processing 18 that is configured to control logic, dataflow and to store the results of machine teaming analysis to be used in future determinations. It may also take one or more actions to alert/warn users about malicious activities.
FIG. The detection and extraction processor 14 is shown in Figure 1. It will be configured to analyze and receive information/data 20 relating to one or several requests for information from the user, such as a request via email to provide credentials or personal data, or a prompt on a website requesting login credentials. Iterate URLs can be included in the information/data 20 relating to the request. As shown in Figure 1, POST requests can be made by emailing data to a datacenter, forwarding emails from clients, or data and information related to them, or using webpages that have a login form, or other data. 1.
In one embodiment, the detection-and-extraction processor 14 can include a feature extractor, an element, etc. Based on/from received request information, 22 is configured to extract domains or URLs associated with the requests, keywords from an email accompanying a request, 1P analyses, or other features indicative a phishing or malicious action, such as domain registration age, domain registrars, and domain’s SSL certificates details (e.g. if an SSL cert is present), etc. The detection and extraction processor can then analyze and compare extracted, identified or computed features from the request/webpage with known features, as well as other information such a Whitelist, Blacklist and/or any other repository of known malicious and/or legitimate/safe requestors. To determine whether the request is malicious, safe/trusted or requires further analysis.
As shown in FIG. If the system 10 determines that the site or action requested is known to be malicious (as determined in 24), it can then make a preliminary decision to either block or allow access to the site, or to minimize the use of computing resources. As shown at 26, for example, if the request has been determined to be legitimate or safe, the system may stop its process and communicate with the requesting entity. The request may be permitted, but if it is malicious or attributed to a known threat agent, then further communication with the requesting entity (e.g., email, email server, webpage, domain, etc.) may be blocked or prevented.
The extraction processor 14 may also evaluate the URL’s HTML DOM to determine if it includes a login page. If the login page is determined to be an unknown logon page (e.g., a login that is not on a Whitelist or a Blacklist), the extraction processor can send the URL or information related to the request, to the classifier 16 for a complete analysis. The extraction processor can also evaluate the URL’s HTML to determine if a login is included. If the login is found to be unknown (e.g. a login that is not listed on either a Whitelist nor a Blacklist), then the extraction processor can send the URL to the classifier for a complete analysis.
As FIG. As shown in FIG. 1, the classifier 16 may include a computer-vision model 28 which is used to review the login site/website from the extraction processor 14. This allows for the extraction of screenshots and images (e.g. screenshots of an email, website, login page etc.). In some variations, a web automation framework can be used to retrieve/obtain images or screenshots related to the request. Screenshots of an email sent by the user, or the URL’s webpage. Web automation frameworks can include tools that are used for front-end testing webpages and allow screenshots of emails or URLs to be taken in a secure manner. The automation framework can be configured to open or execute and?detonate? URLs, page links in emails etc. in isolation without opening these links, pages etc. . . “The user’s computer or network should be protected to avoid or minimize the risk of infection or other adverse effects.Click here to view the patent on Google Patents.