Antivirus Scanning: A Useful First Step
When first analyzing prospective malware, a good first step is to run it
through multiple antivirus programs, which may already have identified it.
But antivirus tools are certainly not perfect. They rely mainly on a database
of identifiable pieces of known suspicious code (file signatures), as well as
behavioral and pattern-matching analysis (heuristics) to identify suspect
files. One problem is that malware writers can easily modify their code,
thereby changing their program’s signature and evading virus scanners.
Also, rare malware often goes undetected by antivirus software because it’s
simply not in the database. Finally, heuristics, while often successful in
identifying unknown malicious code, can be bypassed by new and unique
malware.
Because the various antivirus programs use different signatures and
heuristics, it’s useful to run several different antivirus programs against the
same piece of suspected malware. Websites such as VirusTotal (http://www
.virustotal.com/) allow you to upload a file for scanning by multiple antivirus
engines. VirusTotal generates a report that provides the total number of
engines that marked the file as malicious, the malware name, and, if avail-
able, additional information about the malware.
Hashing: A Fingerprint for Malware
Hashing is a common method used to uniquely identify malware. The mali-
cious software is run through a hashing program that produces a unique
hash that identifies that malware (a sort of fingerprint). The Message-Digest
Algorithm 5 (MD5) hash function is the one most commonly used for
malware analysis, though the Secure Hash Algorithm 1 (SHA-1) is also
popular.
For example, using the freely available md5deep program to calculate the
hash of the Solitaire program that comes with Windows would generate the
following output:
C:\>md5deep c:\WINDOWS\system32\sol.exe
373e7a863a1a345c60edb9e20ec32311 c:\WINDOWS\system32\sol.exe
The hash is 373e7a863a1a345c60edb9e20ec32311.
The GUI-based WinMD5 calculator, shown in Figure 1-1, can calculate
and display hashes for several files at a time.
Once you have a unique hash for a piece of malware, you can use it as
follows:
Use the hash as a label.
Share that hash with other analysts to help them to identify malware.
Search for that hash online to see if the file has already been identified.
Finding Strings
A string in a program is a sequence of characters such as “the.” A program
contains strings if it prints a message, connects to a URL, or copies a file to a
specific location.
Searching through the strings can be a simple way to get hints about
the functionality of a program. For example, if the program accesses a URL,
then you will see the URL accessed stored as a string in the program. You can
use the Strings program (http://bit.ly/ic4plL), to search an executable for
strings, which are typically stored in either ASCII or Unicode format.
NOTE Microsoft uses the term wide character string to describe its implementation of Uni-
code strings, which varies slightly from the Unicode standards. Throughout this book,
when we refer to Unicode, we are referring to the Microsoft implementation.
Both ASCII and Unicode formats store characters in sequences that end
with a NULL terminator to indicate that the string is complete. ASCII strings
use 1 byte per character, and Unicode uses 2 bytes per character.
Figure 1-2 shows the string BAD stored as ASCII. The ASCII string is stored
as the bytes 0x42, 0x41, 0x44, and 0x00, where 0x42 is the ASCII representa-
tion of a capital letter B, 0x41 represents the letter A, and so on. The 0x00 at
the end is the NULL terminator.
Figure 1-2: ASCII representation of the string BAD
Figure 1-3 shows the string BAD stored as Unicode. The Unicode string is
stored as the bytes 0x42, 0x00, 0x41, and so on. A capital B is represented by
the bytes 0x42 and 0x00, and the NULL terminator is two 0x00 bytes in a row.
B A D NULL Terminator
ASCII
42 41 44 00
No comments:
Post a Comment