Windows Malware Binaries in C/C++ Github Repositories: Prevalence and Lessons Learned
Windows Malware Binaries in C/C++ Github Repositories: Prevalence and Lessons Learned
Abstract: Does malware lurking in GitHub pose a threat? GitHub is the most popular open source software website,
having 188 million repositories. GitHub hosts malware-related projects for research and educational purposes
and has also been used by malware to attack users. In this paper, we explore the prevalence of unencrypted,
uncompressed binary code malware in Microsoft Windows compatible C and C++ GitHub repositories and
characterize the threat. We mined 1,835 repositories for already-compiled malicious files and data suggesting
whether the repository is malware-related. We focused on these repositories because Windows is frequently
targeted by malware written in C or C++. These repositories are good resources for attackers and could target
Windows users. We extracted all Portable Executable (PE) files from all commits and queried the malware
resource VirusTotal for analysis from its 76 anti-virus engines. Of the 24,395 files, 4,335 are suspicious, with
at least one detection; 440 could be considered malicious, with at least seven detections. We identify topic tags
suggesting malware or offensive security content, to differentiate from seemingly benign repositories. 197 of
440 malicious executables were in 27 ostensibly benign repositories. This work illustrates risks in source code
repositories and lessons learned in relating GitHub and VirusTotal data.
475
Cholter, W., Elder, M. and Stalick, A.
Windows Malware Binaries in C/C++ GitHub Repositories: Prevalence and Lessons Learned.
DOI: 10.5220/0010237904750484
In Proceedings of the 7th International Conference on Information Systems Security and Privacy (ICISSP 2021), pages 475-484
ISBN: 978-989-758-491-6
Copyright
c 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy
To determine whether a file might be malicious, violation of this policy occurred in March 2018, when
we searched the VirusTotal malware information ser- cybercriminals uploaded cryptocurrency mining mal-
vice that aggregates the detection results of 76 anti- ware to forked GitHub projects and used phishing ads
virus (AV) products (VirusTotal, 2020b). Any reg- to download and execute the malware (Avast Threat
istered user can submit a sample to VirusTotal for Intelligence Team, 2018). More recently, 26 open
analysis. The detection results and other file infor- source projects were discovered to have backdoors in-
mation are available to anyone for subsequent query, serted by the Octopus malware, which used the build
by submitting a cryptographic hash of the file. Virus- process to spread to other NetBeans projects (Munoz,
Total’s Application Programming Interface (API) in- 2020). GitHub appears to allow executable malware
cludes rescan requests for results from the most up- in curated malware collections. A search for “mal-
to-date AV products and much threat intelligence data ware samples” returns over 250 repositories. Al-
related to malware (VirusTotal, 2020a). though many repository descriptions suggest analy-
The contribution of this paper is a methodology sis tools or malware-related resources, some explic-
for investigating the presence of malware over all the itly indicate that they include malware samples.
commits in the lifetime of a GitHub repository. While In terms of detecting malware or malicious repos-
it is straightforward to clone a repository to a specific itories in GitHub, only recently have two efforts sys-
point in time - e.g., the current head state or some tematically studied this problem. Recent work by
arbitrary branch in the past - our approach investi- Rokon et al. developed a methodology for find-
gates all of the commits throughout the history of the ing malware source code within GitHub projects and
repository to identify files for analysis. We use the identified 7,504 malware source repositories (Rokon
well-established method of VirusTotal anti-virus en- et al., 2020). While the findings from this work can be
gine results to assess maliciousness of a particular file used to search for malware binaries in GitHub as well,
type (Windows portable executable binaries), and we our work seeks to find malicious binaries in GitHub
apply our methodology to a subset of GitHub reposi- repositories that are not necessarily purporting to con-
tories (Windows C and C++ repositories) in this pre- tain malware. Zhang et al. developed a deep neural
liminary investigation. However, this methodology network approach to detect malicious GitHub reposi-
could be applied to additional populations of GitHub tories using content-based features from source code
repositories, identifying other file types of interest files, investigating a population of blockchain and
through the repository lifetimes, and using other mal- crytocurrency repositories (Zhang et al., 2020). They
ware analysis methods. used VirusTotal as part of their evaluation process
In this paper, we present our preliminary inves- for comparison purposes, ultimately labeling 1,492
tigation into the presence of malware files in Win- repositories as malicious out of their population of
dows C/C++ GitHub repositories. Section 2 provides 3,729 repositories, but again this work was more fo-
background on GitHub and related work in VirusTotal cused on malicious source code in GitHub.
malware research. We describe our approach to mine Many previous research efforts have used Virus-
Windows binary files from GitHub and then query Total to support malware detection and analysis
VirusTotal for malware detection results in Section 3. in the domains of malware binaries run in dy-
Section 4 presents our initial VirusTotal analysis re- namic analysis sandboxes (Graziano et al., 2015),
sults for the Windows files that we mined from our signed malware binaries (Kim et al., 2018), and mo-
GitHub repositories of interest. Section 5 provides a bile applications (Hurier et al., 2017), (Pendlebury
discussion and more detailed analysis of our results. et al., 2019), (Salem et al., 2019), (Suciu et al.,
We present our conclusions and directions for future 2018), (Wang et al., 2019). VirusTotal can also be
research in Section 6. used for analysis of malicious web addresses, i.e.,
Uniform Resource Locators (URLs), such as those
used in phishing campaigns (Peng et al., 2019). These
2 BACKGROUND AND RELATED research efforts and others each utilize VirusTotal in
different ways, either using various thresholds for the
WORK number of VirusTotal engines needed to consider a
sample as malicious (e.g., 1, 5, or 10), thresholds
GitHub is known to host malware, both legitimately based on percentage of engines (e.g., 50%), or results
(i.e., in compliance with GitHub’s terms of use) and from a subset of engines based on high reputation or
illegitimately. GitHub prohibits content that “con- market share. In short, there is little consensus on
tains or installs any active malware or exploits, or how to definitively interpret VirusTotal results to de-
uses our platform for exploit delivery” (GitHub.com, termine whether a sample is malicious.
2020b). An example of GitHub hosting malware in
476
Windows Malware Binaries in C/C++ GitHub Repositories: Prevalence and Lessons Learned
477
ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy
Table 1: VirusTotal Detection Results - Suspicious Files, Previously Scanned and Unseen.
Binary Code Files # Samples # Prior Hits # Latest Hits
Previously scanned by VT 10,413 1,353 1,090
Previously unseen by VT 13,982 N/A 3,245
Total 24,395 1,353 4,335
Table 2: VirusTotal Detection Results - Malicious Files, Previously Scanned and Unseen.
Binary Code Files # Samples # Prior Hits # Latest Hits
Previously scanned by VT 10,413 226 240
Previously unseen by VT 13,982 N/A 200
Total 24,395 226 440
tection results. We downloaded “latest” results from engines’ detection results are highly correlated, and
24-December-2019 to 7-January-2020. the largest cluster consisted of six engines, a threshold
VirusTotal provides four core file-related AV re- of seven ensures that at least two independent engines
quest APIs for non-premium users: the most recent are indicating ”malicious.”
scan results of a file, the request to rescan a file, the re-
sults of the request to scan a file, and the results from
a specific non-public request identifier. The commer- 4 RESULTS
cial/premium API service also offers users the ability
to query the list of non-public request identifiers, nec-
In this section we present the VirusTotal detection
essary to obtain results from arbitrary past requests.
results for the Windows binaries extracted from our
1,835 GitHub repositories of interest. We built a data
3.3 Threats to Validity set of 24,395 unique binary code files, mining all
commits from all 1,835 GitHub repositories of inter-
VirusTotal introduces inherent variability of results est. A file was included if its MIME type was “exe-
that challenge reproducibility: the accuracy of any cutable.” (One 171 MB file was excluded because we
given AV engine scan; the variability of available were unable to upload it to VirusTotal.) The first sub-
engines in VirusTotal at any given time; the suc- section presents the results for the data set as a whole,
cess of individual engines processing the sample in and the second subsection provides results based on
a VirusTotal-managed processing window, the results repository characteristics.
from specific engines over time; the opacity, consis-
tency, and provenance of details in reports; and the 4.1 VirusTotal Results
ability to obtain the most recent results without ob-
taining a paid premium account. It is not controver-
Table 1 shows the results of VirusTotal scans for new
sial to say that a given AV engine scanning a given file
and previously uploaded binary files when setting the
at a given time may report false positive or false neg-
threshold to at least one malicious detection, indicat-
ative results. We do not consider that a threat to our
ing that a file is ”suspicious.” Of the 24,395 files,
experiment’s validity because of the well-understood
10,413 had been submitted previously, indicated by
caveats one may apply to an interpretation of AV re-
“Previously scanned by VT”; 1,353 of those had at
sults. In this research, the main threat is that data cap-
least one malicious detection at the time of prior anal-
ture is not instantaneous and that the same file could
ysis in VirusTotal, labeled “# prior hits.” When we
garner different results at the beginning and end of a
requested reanalysis for these files, 1,090 files had at
capture window.
least one malicious engine detection, showing that de-
We captured data in a two-week period, December
tections decreased overall on rescan. Of the 13,982
2019 – January 2020 to minimize the period of time
files “Previously unseen by VT” that we uploaded for
that a change could have occurred. We provide results
analysis, 3,245 had a malicious detection.
for any Windows binary that has at least one AV en- Table 2 shows the results of VirusTotal scans for
gine detection of ”malicious”, which indicates that the new and previously uploaded binary files when at
sample is ”suspicious.” We also report results using least seven engines provide a malicious detection,
a threshold of seven AV engine detections of ”mali- our threshold to determine that a file is ”malicious.”
cious”, based on recommendations and interpretation Setting the detection threshold higher results in far
of the recent Zhu paper. Given the finding that certain fewer hits, of course: only 440 out of the 24,395
478
Windows Malware Binaries in C/C++ GitHub Repositories: Prevalence and Lessons Learned
Table 3: VirusTotal Detection Results - Suspicious Files, Previous Scan and Rescan Results.
Binary Code Files # Samples Detected Not Detected
Previously submitted to VT 10,413 1,353 9,060
Resubmitted to VT 10,413 1,090 9,323
have at least seven AV engines indicating malicious (DLL) on modern Windows poses a risk of incorpo-
detections. Of the 10,413 files previously scanned ration into the repository’s build outputs or execution
by VirusTotal, 226 previously exceeded our malicious as a system service or code injected into a process on
detection threshold and 240 are currently deemed ma- a build host. Table 5 shows that of the 4,280 suspi-
licious in the latest results. Of the 13,982 files pre- cious files, 1,074 are DLLs and 3,206 are standalone
viously unseen by VirusTotal, 200 are deemed mali- executable files. For the 418 malicious files, 28 are
cious in the latest results. DLLs and 390 are standalone executable files.
Both tables of VirusTotal detection results demon- Table 6 presents the number of files in weighted
strate the change in engine detections over time. To bins by the number of engines indicating “malicious.”
highlight these changes in more detail for the suspi- This shows the range of hits and the large proportion
cious file results (i.e., those with at least one mali- of samples with low hit counts.
cious detection), Table 3 shows that some previously The results above for all files represent the aggre-
benign-seeming files were considered suspicious–and gate across all commits over the lifetime of the repos-
vice-versa–in the reports that we requested in the De- itory. For results at a single point in time, we also
cember 2019 – January 2020 timeframe. The overall analyzed the files that were accessible from the head
decrease of 263 files—from 1,353 to 1,090—having of the repository. A repository’s head commit—the
at least one malicious detection is the net result of files accessible after cloning and updates—represents
289 files being detected as malicious that were not a public view of the repository at the time of cloning
previously and 552 files previously being detected as and analysis. Across all 1,835 of our repositories of
malicious no longer having any AV engine detections. interest, there are 7,772 unique binary files in the head
Table 4 shows the relative change in results for the commits on 9-July-2019, of which 939 were suspi-
suspicious files. The substantial re-characterization of cious with at least one AV detection in VirusTotal,
files as having detections vs. not having detections co- and 204 were malicious with at least seven AV de-
incides with a relatively small number of initial posi- tections. 5,512 files were already analyzed by Virus-
tives results, with 1 to 3 AV engines previously indi- Total, while 2,260 had to be uploaded for analysis.
cating malicious. On the other hand, files only later
getting malicious detections have a much larger range 4.2 Repository-based Results
of 1 to 69 detecting engines.
Table 5 shows the breakdown of files within differ- Of the 1,835 repositories queried, 593 repositories
ent categories of Windows executable binaries. The contain binary files. 314 have at least one suspicious
vast majority of binary code files are targeted to run binary file, which is a significant subset. 52 reposito-
on modern 32- or 64-bit Windows versions. There ries have at least one malicious binary with seven or
are also files targeting DOS and 16-bit Windows in more VirusTotal AV engine detections.
the “Pre-Win32” category, which are ostensibly com- We examined the concentration of suspicious bi-
patible with Windows. Finally, there are incompat- naries across repositories, presented in Table 7. Of
ible ELF and boot image files in the “Other” cate- the 314 repositories having suspicious files, a major-
gory (presumably misclassified by libmagic). As seen ity, 182 repositories, have one (1) or two (2) suspi-
in the second column of Table 5, 4,280 Windows- cious files. Across the population, the mean file count
compatible files were suspicious and 418 were ma- is 7.03 and standard deviation is 20.67. Similarly, Ta-
licious. Except for “Other” files, any standalone ex- ble 8 presents the distribution of malicious file counts
ecutable file poses an immediate risk to a repository across the 52 repositories with malicious binaries and
user who runs it, while a dynamically linked library shows that most only have one or two.
479
ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy
Table 9 shows the top ten repositories by num- security incidents and not a computer security natu-
ber of suspicious files and the mean score of those ral language topic model. There are other efforts to-
detections. The second column of Table 9 provides wards defining cyber security ontologies (Syed et al.,
the number of overall binary files in these reposito- 2016), which could contribute to a characterization of
ries for additional context, indicating how prevalent malware-related purposes. This is an area of future
binary files are in each of these repositories and the exploration.
ratio of suspicious binaries. Of the 314 repositories that contain at least one
To assess the stated purpose of each GitHub repos- suspicious binary, only 50 have at least one malware
itory, we extracted the user-provided repository tags or offensive security-related tag. This leaves 259
and found 1,802 unique tags across the 1,835 reposi- repositories with suspicious/malicious binaries where
tories. We classified 70 tags as potentially related to users might not expect that risk. Of the 52 reposi-
malware or other offensive security topics. Each au- tories that contain at least one malicious binary, 25
thor identified candidate tags, and those receiving a have at least one malware or offensive security-related
majority of votes were selected. Our malware-related tag. The 27 repositories not tagged as being related to
tags have overlap with the Malware Attribution Enu- malware or offensive security contain 197 malicious
meration and Characterization (MAEC) structured binaries, representing risk to unsuspecting repository
language for malware information sharing (The Mitre users.
Corporation, 2017), allowing for fuzzy matching and
semantic equivalence. It is important to note that
MAEC is a prescriptive taxonomy for documenting
480
Windows Malware Binaries in C/C++ GitHub Repositories: Prevalence and Lessons Learned
5 DISCUSSION by AV engines.
Finally, build files such as Makefiles, .vcxproj
5.1 Risks Posed by Unhygienic files, and continuous integration orchestration files are
essentially executable scripts, which pose the risk that
Repositories building a project can compromise a system. Non-
malware repository researchers would also benefit
Without even considering the risk of malicious con- from safe handling, such as processing as much as
tent, binary files in repositories should raise concerns. possible on less-targeted OSs and with repositories
It is almost always a bad practice to store build out- that are bare or mirrors without local file copies.
puts in any repository because they increase the repos-
itory size, are not amenable to editing or compar- 5.2 Not All Windows Malware Is in PE
isons across versions, and may be accidentally up-
dated when the repository is built–especially Win- Files
dows PE files, which contain the build timestamp.
Including binaries, such as libraries, as build in- Malware comes in many forms. We looked for bi-
puts or runtime dependencies violates the spirit of nary files, but these repositories may have malware
open source development. It may be unavoidable for in other formats, such as documents and scripts. It
a repository owner seeking to baseline specific build is worth noting that in scanning repository head com-
inputs while holding a software license that allows mits, we identified 761 archive files (WinZip, 7-Zip,
redistribution of binaries. In most cases, however, and RAR), 33 of which are or could be encrypted.
GitHub repository maintainers should provide pre- Perhaps the 33 represent responsibly encrypted mal-
built software in GitHub release bundles, outside the ware samples. There are other forms of malware
Git repositories. that we could mine from GitHub repositories beyond
The virus research community has adopted safe Windows binaries, such as Linux malware, mobile
handling procedures, including packaging malware in malware, malicious scripts, and malicious PDF docu-
encrypted archives (Zeltser, 2020), and sharing sam- ments.
ples only after vetting interested researchers. Reposi-
tories that violate these rules expose non-malware re- 5.3 Git-related Observations
search environments. Indeed, when we cloned repos-
itories from our Linux environment onto a Windows In the course of this research we used many in-
server, we set off over 100 alerts in our enterprise AV terfaces to Git-related data. While not necessarily
sensors—and that was only in the file system copies critical to this immediate work, our experience pro-
from the head branches. Many malicious binaries lay vides some insights for future researchers. Online
dormant and unscanned while they rest in Git’s cus- APIs such as GitHub REST v3, GitHub GraphQL,
tom storage formats, likely unsupported for scanning
481
ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy
GH Archive (gharchive.com, 2020), and Google Big- imentation time, which will be whenever the file last
Query (Google Cloud, 2020a) are powerful for high- had an analysis requested; and (3) “latest” analysis,
level data, but for compute-intensive file analysis, lo- requested at the current experimentation time. Ta-
cal execution may be the only option. While Git is the bles 1 and 2 previously presented the change in de-
primary source for commit history and files, its data tection results between points (2) and (3). Table 10
model is optimized for efficiency and extensibility of illustrates with four example files that metrics based
end-user file-based operations. The researcher is left on these points in time can be inconsistent within a
to develop a new data model to manage the federation small time window, across larger windows, and fabri-
of Git and online APIs. cated.
GitHub provides a rich online community and For example, before our rescan request (“Our
source of data, but does not provide direct temporal Scan Results” in Table 10), the results for one “be-
control over results comparable to the cryptograph- nign” binary named “curl.exe” (Example 1) were
ically stable Git commit log, which admittedly is originally created when scanned on 20-September-
coming under attack because of SHA-1’s emerging 2013, updated with scan results from 52 engines on
weaknesses to hash collisions. So, while it may be 10-December-2015, and modified on 8-January-2019.
straightforward to time-box commits up to a certain Other dates in a report, such as first seen in the wild
date, finding the GitHub topic associations at that (the year 2097 in Example 4 in Table 10), and of
date requires forethought to query all GitHub infor- course, the PE header timestamp have no assurance
mation, sifting through events from the beginning of because they are subject to spoofing by the submit-
the repository to that point in time (or in reverse from ter or binary author (sometimes the same individ-
the present time), or queries using third-party services ual! (Zetter, 2014)).
such as GH Archive and Google BigQuery. GitHub’s Across all of our rescan requests started on 24-
5,000 REST requests or GraphQL 5,000 points per December-2019, we received results from 46 to 76
hour (GitHub.com, 2020d) and BigQuery’s 1 TB free engines, with a mean of 73.4 engines and standard
per month API (Google Cloud, 2020b) quotas re- deviation of 1.21.
quire considerable planning and data acquisition de- The VirusTotal terms of service do not allow shar-
sign, and therefore we attempted to maximize local ing full reports that would reveal AV vendor capabili-
analysis with Git. Moreover, a local checkout of Git ties. Therefore, experiments relying on precise scan
provides groundtruth for what a developer would see details are not reproducible and the data cannot be
from cloning the repository. broadly shared. One researcher could affect an un-
related researcher’s work by requesting a rescan at a
5.4 VirusTotal Observations non-deterministic time during overall data capture, a
significant risk with a public API rate limited to four
As previous research has shown (Zhu et al., per minute and with a potential for three requests for
2020), (Pendlebury et al., 2019), (Peng et al., a single sample. Indeed, the footprints of our queries
2019), (Salem et al., 2019), VirusTotal engine data is are all over the data. It is also possible that the foot-
subtle: results change based on when a query is run, prints from the authors’ IT department can be ob-
and the non-premium API provides only the most re- served in the data, as the authors were contacted by
cent results based on the time of the last requested them in the course of cloning repositories to explore
scan, which could have been any arbitrary point in build experimentation.
time in the past. It is possible that one or more en- It is possible to get all scan history for a sample,
gines within VirusTotal could provide a false positive by purchasing the premium service—but those results
detection for a file. VirusTotal’s AV engines change indicate which scans were requested, not whether a
over time and the results from the engines can change given file might have been considered malicious at
based on AV engine implementation and signature up- a particular point in time, if only someone had re-
dates. While it may be tempting to use VirusTotal as quested a scan at that time. For example, it is infeasi-
a form of oracle for malware detection, there is no ble to perform a post-mortem of an attack by asking,
universally accepted threshold for the number of AV ”Could all of the files in an intrusion have been identi-
engines in VirusTotal that “guarantees” a file is mali- fied as malware on 1-June-2015?” Although VirusTo-
cious. tal adds a very different dimension of data to software
There are at least three interesting points in the repository research, it does not offer the temporal con-
lifetime of a file analyzed by VirusTotal: (1) initial trol required in many studies and experiments.
analysis at the time of first submission to VirusTo-
tal; (2) “prior” analysis relative to the current exper-
482
Windows Malware Binaries in C/C++ GitHub Repositories: Prevalence and Lessons Learned
483
ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy
Munoz, A. (2020). The octopus scanner mal- github. In 2020 IEEE International Conference on
ware: Attacking the open source supply Knowledge Graph (ICKG), pages 458–465.
chain. https://securitylab.github.com/research/ Zhu, S., Shi, J., Yang, L., Qin, B., Zhang, Z., Song, L., and
octopus-scanner-malware-open-source-supply-chain. Wang, G. (2020). Measuring and modeling the la-
Pendlebury, F., Pierazzi, F., Jordaney, R., Kinder, J., and bel dynamics of online anti-malware engines. In 29th
Cavallaro, L. (2019). Tesseract: Eliminating experi- USENIX Security Symposium (USENIX Security 20),
mental bias in malware classification across space and pages 2361–2378. USENIX Association.
time. In Proceedings of the 28th USENIX Confer-
ence on Security Symposium, SEC’19, page 729–746,
USA. USENIX Association.
Peng, P., Yang, L., Song, L., and Wang, G. (2019). Opening
the blackbox of virustotal: Analyzing online phishing
scan engines. In Proceedings of the Internet Measure-
ment Conference, IMC ’19, page 478–485, New York,
NY, USA. Association for Computing Machinery.
Rokon, M. O. F., Islam, R., Darki, A., Papalexakis, V. E.,
and Faloutsos, M. (2020). Sourcefinder: Finding mal-
ware source-code from publicly available repositories.
Salem, A., Banescu, S., and Pretschner, A. (2019).
Don’t pick the cherry: An evaluation methodol-
ogy for android malware detection methods. CoRR,
abs/1903.10560.
Suciu, O., Mărginean, R., Kaya, Y., Daumé, H., and
Dumitraş, T. (2018). When does machine learning
fail? generalized transferability for evasion and poi-
soning attacks. In Proceedings of the 27th USENIX
Conference on Security Symposium, SEC’18, page
1299–1316, USA. USENIX Association.
Syed, Z., Padia, A., Finin, T., Mathews, L., and
Joshi, A. (2016). Uco: A unified cybersecurity
ontology. https://www.aaai.org/ocs/index.php/WS/
AAAIW16/paper/view/12574.
The Mitre Corporation (2017). Maec 5.0 specification –
vocabularies. http://maecproject.github.io/releases/5.
0/MAEC Vocabularies Specification.pdf.
VirusTotal (2020a). Getting started. https://developers.
virustotal.com/reference.
VirusTotal (2020b). How it works – virustotal.
https://support.virustotal.com/hc/en-us/articles/
115002126889-How-it-works.
Wang, H., Si, J., Li, H., and Guo, Y. (2019). Rmvdroid:
Towards a reliable android malware dataset with app
metadata. In Proceedings of the 16th International
Conference on Mining Software Repositories, MSR
’19, page 404–408. IEEE Press.
ytisf (2020). Github - ytisf/thezoo: A repository of live
malwares for your own joy and pleasure. thezoo is
a project created to make the possibility of malware
analysis open and available to the public. https://
github.com/ytisf/theZoo.
Zeltser, L. (2020). How to share malware sam-
ples with other researchers. https://zeltser.com/
share-malware-with-researchers/.
Zetter, K. (2014). A google site meant to protect you is
helping hackers attack you. https://www.wired.com/
2014/09/how-hackers-use-virustotal/.
Zhang, Y., Fan, Y., Hou, S., Ye, Y., Xiao, X., Li, P., Shi,
C., Zhao, L., and Xu, S. (2020). Cyber-guided deep
neural network for malicious repository detection in
484