Server Farmin': Technology: Forensic Analysis of a Ransomware Attack

On 5/11/2019, I received a fairly routine alert from a developer - "Hey Louis, the deployment server isn't working". At the time, our deployment server hosted Jenkins (a tool used to build code) and Octopus Deploy (a tool used to send the code out into the real world). Both tools are fairly stable, but have some occasional maintenance needs - so I opened up the server through a Remote Desktop Connection to see why they were broken. What I found was a sophisticated Ransomware attack; all of the non-operating system files were encrypted and every directory had a copy of these instructions:

---= GANDCRAB V5.2 =---

***********************UNDER NO CIRCUMSTANCES DO NOT DELETE THIS FILE, UNTIL ALL YOUR DATA IS RECOVERED***********************
*****FAILING TO DO SO, WILL RESULT IN YOUR SYSTEM CORRUPTION, IF THERE ARE DECRYPTION ERRORS*****
Attention!

All your files, documents, photos, databases and other important files are encrypted and have the extension: .DEVPN

The only method of recovering files is to purchase an unique private key. Only we can give you this key and only we can recover your files.
The server with your key is in a closed network TOR. You can get there by the following ways:
----------------------------------------------------------------------------------------
| 0. Download Tor browser - https://www.torproject.org/

| 1. Install Tor browser

| 2. Open Tor Browser

| 3. Open link in TOR browser: http://gandcrabmfe6mnef.onion/8ff5caefabf9673

| 4. Follow the instructions on this page

----------------------------------------------------------------------------------------

Over the next 12 hours, I traced the entry points, pinpointed the security vulnerability that allowed the attack in the first place and worked with our sysadmin to permanently fix the security issue; since May 2019, this attack has not been repeated. I'm pretty proud of this work, forensic analysis isn't my area of expertise so this was outside my normal scope of responsibilities. The following is the writeup I put together at the time of the attack.

Attack Timeline:

DEVPN-MANUAL(the ransom instructions) were added in every directory, the oldest such file is in the top-level C: directory and was created on 5/11/2019 @ 4:19 AM UTC (5:19 AM server time). This appears to be the first file created by the gandcrab ransomware. Circumstantial evidence in the logs points to this time as the likely start of encryption:

5:19:04 (server time) - SQL services killed unexpectedly

5:19:04 (server time) - Logoff from a service (logon type 5 in the event viewer) for the OAFDEPLOY\OneAcreFundAdmin account. Most likely the jenkins server restarting.

Possible Causes:

Dictionary attack brute forced the passwords to the deployment server

All servers in the OAF network are under constant dictionary attacks, but the passwords are long enough that these probably won’t work.
Unlikely that a dictionary attack happened, the password used for the “OneAcreFundAdmin” account that initiated the attack was long and non-standard.

Keylogger captured passwords for Deployment Server

Possible, but strange that only the deployment server was affected - all people who have access to the deployment server also regularly use other servers

Vulnerability in Jenkins software allowed server access

Most likely attack vector, see detail below

Vulnerability in Octopus Deploy allowed server access

Possible Attack Vector, but requires known/guessed credentials

Man-in-the-middle attack captured part of an RDP session.

Possible, but again strange that only the deployment server was affected

Login History:

Windows Event Viewer logs tend to be cluttered with many login/logoff records from system events, not all of which correspond to real logins. Real logins are type 10 (interactive remote session). No logins of type 10 can be found in the deployment server logs from the start of the log (4/28/2019) up until my logins (5/13/2019). No remote desktop sessions where initiated on the date of the encryption attack. This would indicate that it wasn’t a successful dictionary attack, rather a different attack surface was used.

Note: It may be possible to modify the login history, it’s not out of the question that an advanced attack could have covered its own tracks.

Probable Cause:

On May 10, 2019 Trendmicro wrote about a jenkins vulnerability that allowed arbitrary code execution. This is not a new phenomenon, other jenkins exploits are known. Based on the execution logs, just before the attack several requests came in to the endpoint:

https://78.46.206.200:8443/securityRealm/user/admin/descriptorByName/org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SecureGroovyScript/checkScript

From github, this endpoint appears to allow arbitrary code compilation within a sandbox. Searching for that endpoint brought up a sketchy vulnerability marketplace for just this kind of attack.

Note: at the time of the attack our jenkins server was on v.2.118 and the security module was on v1.40. In January the security module v1.50 released an update with a warning about this version allowing code to exit the sandbox and perform arbitrary code execution on the jenkins server.

Evidence Supporting the Jenkins Attack Theory:

No remote desktop sessions logged in on the 11th. This suggests another attack vector was used.

No other servers were encrypted, just the deployment server. This suggests something about the deployment server made it especially vulnerable.

Jenkins error logs were not encrypted by the attack (the ransomware is smart enough to not encrypt system files - they want your money, not a fully bricked system). Error logs from the server show hits on the SecureGroovyScript endpoint just before the attack as well as afterward (see Log notes at end).

Backup files taken in October do not show any SecureGroovyScript endpoint messages in the error logs.

The Secure-Script plugin in question was last updated in January 2018 (v1.40), the backup files and current server setup show that this plugin has not changed. It’s possible that another plugin has changed and started to call the secure-script endpoints for an unknown reason.

The affected endpoint uses a dummy account (admin) that exists on our server but does not have admin privileges. However, the endpoint can be called without logging into the account (allowing anyone to POST arbitrary code without needing a password).

The vulnerability marketplace evidence shows exactly the endpoint that was hit and the version of code running at the time; our server would have had the vulnerability listed as “for sale”.

Lateral Attacks:

After gaining access to the Deployment server, malicious code had access to:

Accounts:

One Acre Fund QA Accounts

SSL Certificates:

Production server certificate (password protected, but with a weak password)

Possible:

Octopus deploy accounts

Jenkins Accounts

MSSQL accounts

Immediate Steps Taken:

Immediately on finding the server intrusion, I locked down all existing test/stage/CI/QA servers to only 1 new RDP account (with the goal of preventing lateral attacks). No changes have been made to the affected deployment server.

I found my most recent server backup from October 2018. It will be a little out of date but not terribly so (possibly 50-100 lines of code were added between October and now).

Next Steps:

Attempt to confirm the jenkins theory. Verify that no other servers were affected by lateral attacks. Verify other servers do not show a history of suspicious logins and/or accurate dictionary attacks using real accounts (1-2 days of forensic analysis). If I’m wrong about the jenkins theory then some other attack vector was used and is probably still open.

Leave the deployment server as-is for now, system changes may fire off additional actions on it. Leave QA/Test servers locked with only 1 RDP account for now.

Setup a new deployment server with jenkins/octopus in docker containers and VPN-only access to prevent jenkins vulnerabilities from spreading outside jenkins. This will probably take 2-4 days and may require some duplicate work recreating the angular deployment configurations that were not present in October.

Move Test/Stage/QA servers inside VPNs and unlock them, changing their passwords/accounts. This will take several days (3-4 days).

Retire the existing deployment server and migrate the DNS records for it.

Change the production server certificates ASAP.

Error Logs/Relevant Data:

Error Log #1: Jenkins Error Log from Encrypted Server (showing CURL attempt on 5/12). Appears to show attempted arbitrary code execution from a POST request. Note that this would have been a day after the Ransomware attack (potentially multiple hijack attempts).

May 12, 2019 4:25:19 AM org.eclipse.jetty.server.handler.ContextHandler$Context log

WARNING: Error while serving https://78.46.206.200:8443/securityRealm/user/admin/descriptorByName/org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SecureGroovyScript/checkScript
java.lang.reflect.InvocationTargetException
at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:347)
at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184)
...
Caused by: groovy.lang.GroovyRuntimeException: Failed to create Script instance for class: class x. Reason: java.io.IOException: Cannot run program "curl": CreateProcess error=2, The system cannot find the file specified
at org.codehaus.groovy.runtime.InvokerHelper.createScript(InvokerHelper.java:466)

Error Log #2: Jenkins error log data from just before the attack (on the encrypted server). Seems to show an attempt to run a script without defining a script body. Possibly an automated probe to determine if our server had the affected version of jenkins.

May 11, 2019 5:17:38 AM org.eclipse.jetty.server.handler.ContextHandler$Context log
WARNING: Error while serving http://78.46.206.200:8080/securityRealm/user/admin/descriptorByName/org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SecureGroovyScript/checkScript
java.lang.reflect.InvocationTargetException
at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:347)
...
Caused by: java.lang.IllegalArgumentException: Script text to compile cannot be null! at groovy.lang.GroovyClassLoader.validate(GroovyClassLoader.java:315) at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:275) at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:268) at groovy.lang.GroovyShell.parseClass(GroovyShell.java:688)

Error Log #3: Jenkins error log from October 2018 (last backup before the attack). No instances of the phrase “Script text to compile cannot be null!” or “org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SecureGroovyScript” could be found in the old logs.

Retrospective:

As of August 2020, we have yet to see this attack repeated so the Jenkins theory seems most plausible. Thankfully there was no evidence of lateral attacks and the server in question held no customer data. This is a valuable lesson about updates and open-source software - missing a security update can leave you open to a wide variety of vulnerabilities. In general I like open-source software, but this was a harsh lesson on the dangers of trusting open-source code; we now use VPNs for almost all server security except cases where code is explicitly meant to be publicly accessible.

Server Farmin'

Monday, August 17, 2020

Technology: Forensic Analysis of a Ransomware Attack