Domain generation algorithm

Domain generation algorithms (DGA) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers. The large number of potential rendezvous points makes it difficult for law enforcement to effectively shut down botnets, since infected computers will attempt to contact some of these domain names every day to receive updates or commands. The use of public-key cryptography in malware code makes it unfeasible for law enforcement and other actors to mimic commands from the malware controllers as some worms will automatically reject any updates not signed by the malware controllers.

For example, an infected computer could create thousands of domain names such as: www.<gibberish>.com and would attempt to contact a portion of these with the purpose of receiving an update or commands.

Embedding the DGA instead of a list of previously-generated (by the command and control servers) domains in the unobfuscated binary of the malware protects against a strings dump that could be fed into a network blacklisting appliance preemptively to attempt to restrict outbound communication from infected hosts within an enterprise.

The technique was popularized by the family of worms Conficker.a and .b which, at first generated 250 domain names per day. Starting with Conficker.C, the malware would generate 50,000 domain names every day of which it would attempt to contact 500, giving an infected machine a 1% possibility of being updated every day if the malware controllers registered only one domain per day. To prevent infected computers from updating their malware, law enforcement would have needed to pre-register 50,000 new domain names every day. From the point of view of botnet owner, they only have to register one or a few domains out of the several domains that each bot would query every day.

Recently, the technique has been adopted by other malware authors. According to network security firm Damballa, the top-5 most prevalent DGA-based crimeware families are Conficker, Murofet, BankPatch, Bonnana and Bobax as of 2011.[1]

DGA can also combine words from a dictionary to generate domains. These dictionaries can be hard-coded in malware or taken from a publicly accessible source.[2] Domains generated by dictionary DGA tend to be more difficult to detect due to their similarity to legitimate domains.

Example

def generate_domain(year: int, month: int, day: int) -> str:
    """Generate a domain name for the given date."""
    domain = ""

    for i in range(16):
        year = ((year ^ 8 * year) >> 11) ^ ((year & 0xFFFFFFF0) << 17)
        month = ((month ^ 4 * month) >> 25) ^ 16 * (month & 0xFFFFFFF8)
        day = ((day ^ (day << 13)) >> 19) ^ ((day & 0xFFFFFFFE) << 12)
        domain += chr(((year ^ month ^ day) % 25) + 97)

    return domain + ".com"

For example, on January 7, 2014, this method would generate the domain name intgmxdeadnxuyla.com, while the following day, it would return axwscwsslmiagfah.com. This simple example was in fact used by malware like CryptoLocker, before it switched to a more sophisticated variant.

Detection

DGA domain[3] names can be blocked using blacklists, but the coverage of these blacklists is either poor (public blacklists) or wildly inconsistent (commercial vendor blacklists).[4] Detection techniques belong in two main classes: reactionary and real-time. Reactionary detection relies on non-supervised clustering techniques and contextual information like network NXDOMAIN responses,[5] WHOIS information,[6] and passive DNS[7] to make an assessment of domain name legitimacy. Recent attempts at detecting DGA domain names with deep learning techniques have been extremely successful, with F1 scores of over 99%.[8] These deep learning methods typically utilize LSTM and CNN architectures,[9] though deep word embeddings have shown great promise for detecting dictionary DGA.[10] However, these deep learning approaches can be vulnerable to adversarial techniques.[11][12]

See also

References

  1. ^ "Top-5 Most Prevalent DGA-based Crimeware Families" (PDF). Damballa. p. 4. Archived from the original (PDF) on 2016-04-03.
  2. ^ Plohmann, Daniel; Yakdan, Khaled; Klatt, Michael; Bader, Johannes; Gerhards-Padilla, Elmar (2016). "A Comprehensive Measurement Study of Domain Generating Malware" (PDF). 25th USENIX Security Symposium: 263–278.
  3. ^ Shateel A. Chowdhury, "DOMAIN GENERATION ALGORITHM – DGA IN MALWARE", Aug 30, 2019.
  4. ^ Kührer, Marc; Rossow, Christian; Holz, Thorsten (2014), Stavrou, Angelos; Bos, Herbert; Portokalidis, Georgios (eds.), "Paint It Black: Evaluating the Effectiveness of Malware Blacklists" (PDF), Research in Attacks, Intrusions and Defenses, vol. 8688, Springer International Publishing, pp. 1–21, doi:10.1007/978-3-319-11379-1_1, ISBN 9783319113784, retrieved 2019-03-15
  5. ^ Antonakakis, Manos; et al. (2012). "From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware". 21st USENIX Security Symposium: 491–506.
  6. ^ Curtin, Ryan; Gardner, Andrew; Grzonkowski, Slawomir; Kleymenov, Alexey; Mosquera, Alejandro (2018). "Detecting DGA domains with recurrent neural networks and side information". arXiv:1810.02023 [cs.CR].
  7. ^ Pereira, Mayana; Coleman, Shaun; Yu, Bin; De Cock, Martine; Nascimento, Anderson (2018), "Dictionary Extraction and Detection of Algorithmically Generated Domain Names in Passive DNS Traffic" (PDF), Research in Attacks, Intrusions, and Defenses, Lecture Notes in Computer Science, vol. 11050, Springer International Publishing, pp. 295–314, doi:10.1007/978-3-030-00470-5_14, ISBN 978-3-030-00469-9, retrieved 2019-03-15
  8. ^ Woodbridge, Jonathan; Anderson, Hyrum; Ahuja, Anjum; Grant, Daniel (2016). "Predicting Domain Generation Algorithms with Long Short-Term Memory Networks". arXiv:1611.00791 [cs.CR].
  9. ^ Yu, Bin; Pan, Jie; Hu, Jiaming; Nascimento, Anderson; De Cock, Martine (2018). "Character Level based Detection of DGA Domain Names" (PDF). 2018 International Joint Conference on Neural Networks (IJCNN). Rio de Janeiro: IEEE. pp. 1–8. doi:10.1109/IJCNN.2018.8489147. ISBN 978-1-5090-6014-6. S2CID 52398612.
  10. ^ Koh, Joewie J.; Rhodes, Barton (2018). "Inline Detection of Domain Generation Algorithms with Context-Sensitive Word Embeddings". 2018 IEEE International Conference on Big Data (Big Data). Seattle, WA, USA: IEEE. pp. 2966–2971. arXiv:1811.08705. doi:10.1109/BigData.2018.8622066. ISBN 978-1-5386-5035-6. S2CID 53793204.
  11. ^ Anderson, Hyrum; Woodbridge, Jonathan; Bobby, Filar (2016). "DeepDGA: Adversarially-Tuned Domain Generation and Detection". arXiv:1610.01969 [cs.CR].
  12. ^ Sidi, Lior; Nadler, Asaf; Shabtai, Asaf (2019). "MaskDGA: A Black-box Evasion Technique Against DGA Classifiers and Adversarial Defenses". arXiv:1902.08909 [cs.CR].

Further reading