The Namecoin blockchain was created as an alternative to traditional DNS registrars, protected from censorship and forced domain seizure. In the past few years, botnet operators such as Dimnie, Shifu, RTM, and Gandcrab have begun to use it to manage the addresses of C&C servers.
On the one hand, the decentralization and stability of the blockchain prevent researchers and providers from removing such domains or taking control of them. On the other hand, the infrastructure based on the blockchain has an architectural feature: all changes in the network are publicly available and can be used to study and track the actions of attackers.
')
This paper presents the approach used to map botnets in Namecoin and track them later to extract new IOCs. Using the described approach, lists of assets (see the appendix) used by the botnets mentioned above were compiled.
Lyrical digression
Inventions that change the Internet often solve not only and not so much a technical problem as a social one. It is such technologies and services that allow the community to take a look at some axioms that seemed unshakable, rethink them, recreate from scratch, leaving only an idea and dropping the load of legacy conventions and limitations accumulated over the years. Blockchain and Bitcoin, Tor, Wikipedia - behind the success of each of them is a small group of enthusiasts with burning eyes, sincerely believing that they are making a better society.
Alas, often after them others come - alien to the strange ideals of the pioneers of the Internet, but much more practical. They find
an alternative application for technology, which the creators did not think about (or did not want to think about). Being on the border (and more often, to hide, openly abroad) of the permissible, this alternative application, often not without the help of the media, for the majority turns into an
implied default , or even the
only one .
The equivalence of technology as an idea and the most discussed method of its use can lead to rejection by the society of the technology itself. As a result of the criminalization of its use, an immature service can be reduced to the level of a marginal culture or completely destroyed. So it happened a long time ago with
Napster , not so long ago - with
BitTorrent and Tor, right now this is happening with Bitcoin.
This is not past the hero of this work - Namecoin. Namecoin is a blockchain designed to store arbitrary key-value pairs, the most famous of which is a decentralized, censorship-resistant DNS name registration system - Dot-Bit.
Our interest in Namecoin grew after the RTM botnet management group started using Dot-Bit to manage their C&C servers. At some point, we wondered - is it possible to detect new C & C-servers immediately after they are registered in Dot-Bit? And if there were no problems with the updates of well-known domains, the development of an approach that allows one to detect strong evidence of the connection of new domains with a person of interest suddenly turned out to be an exciting research task, the result of which was this work.
In general, Namecoin research and the collection of indicators of compromise in Dot-Bit were carried out earlier. The most detailed work can be considered
an article by Kevin Perlow . He was the first to draw attention to the fundamental possibility of extracting data from Namecoin and described several heuristic techniques that allow an expert to find domains that are similar in characteristics to the well-known C&C servers of a particular group.
The approach presented in this study has several significant differences from the expert indexing and pivoting technique described by Kevin. The heuristic rules we developed for determining domain owners are derived from the principles of the blockchain and the formation of transactions in it and, in addition to the general description, are presented in the form of strict logical formulations. Together with a formal description of the workaround algorithm, this allows you to automate the search for IOC, which significantly increases the effectiveness of the investigation. In addition, the developed algorithm helps not only to find other names that were once used by the study group, but also allows you to track the creation of new domains that are controlled by the same person.
All work is divided into three chapters. The first chapter describes the basics of Bitcoin, the code of which was used as a platform for creating Namecoin. Many entities, relationships, and their implementations defined in Bitcoin have been inherited by Namecoin. Their understanding is critical for further discussion.
The second chapter is devoted directly to Namecoin and its main application - Dot-Bit.
The third chapter describes the proposed approach for extracting data from Namecoin, and also gives a formal description of the blockchain bypass algorithm and heuristic rules used to establish relationships between domains.
The appendices contain IOCs collected using the described method for some botnets, as well as a list of references and repositories that will help researchers who want to continue working on this topic.
Bitcoin 201
Most of the information in this section is collected from the materials of the series of articles by Sergey
Pavlov_dog Potekhin “
Bitcoin in a nutshell ”. For Russian-speaking readers, this source, in our opinion, is the most comprehensive and deepest publicly available, but it is surprisingly easy to read. Researchers interested in the internal device of Bitcoin, we urge not to limit ourselves to the excerpts given in this section, but to familiarize themselves with the full text of the articles, available by reference in the application. The rest of the information presented below will be enough to understand the description of the algorithm and heuristic rules for finding the relationship between addresses in Namecoin, given in the last chapter.
Although it is customary to start the story about the blockchain with blocks and the cryptography that connects them, we will start with transactions.
Transaction
As you know, the closest analogue to Bitcoin is an account book in which all transactions with coins are recorded. But, strangely enough, in Bitcoin there is no general table of the form
<, >
, just like there is no chief accountant who would edit this table.
Instead, the same notorious blockchain is used, that is, all transactions are generally stored. For simplicity, we can assume that these are messages of the form:
<address 1> sent <amount> BTC to <address 2>
So, if you go around the entire blockchain, you can calculate how many coins “belong” to a particular address.
Inputs and outputs
A real transaction on a Bitcoin network is a bit more complicated than the one described above. This is a structure whose main components are inputs and outputs.
Inputs are transactions to which you "refer". Imagine that three transactions were sent to your address X once:
TXN_ID: 123456, VALUE: 40 BTC TXN_ID: 645379, VALUE: 10 BTC TXN_ID: 888888, VALUE: 100 BTC
If you need to spend, for example,
45 BTC
, then you can refer to transaction
888888
or two transactions at once:
123456
and
645379
.
Outputs - literally "outputs". We can assume that these are “addresses” to which coins will be “sent” as a result of the transaction. There can also be several exits; each of them has its own amount.
Outputs - literally "outputs". We can assume that these are “addresses” to which coins will be “sent” as a result of the transaction. There can also be several exits; each of them has its own amount.
In the picture below, a new transaction
C
, which refers to two outputs -
A
and
B
As a result, the transaction receives
0.008 BTC
at the input, which are then divided into two outputs -
0.001 BTC
sent to the first address, and
0.006 BTC
to the second.
The ability to specify multiple outputs at once is a very important feature, because
the transaction output can be used as an input only once and only in its entirety . If you have an incoming transaction at
10 BTC
, and you need to spend 8 of them, you simply create a transaction with one input and two outputs:
8 BTC
to the seller and
2 BTC
back to your address. If you create a transaction in which the sum of the outputs is less than the sum of the inputs (as in the picture), then the difference is sent to the address of the miner who wrote your transaction in the block.
Fee
It is this difference between the sum of inputs and the sum of outputs that is called the “
transaction fee
”,
transaction fee
. It is the second most important source of income for miners, and the time it takes for the transaction to be included in the blockchain depends on it. This is due to the fact that each miner has a certain pool of unverified transactions that claim to be in the block, and, as a rule, the miner simply sorts them in descending order, thereby maximizing their profit. Therefore, the higher the commission, the higher you will be in the queue and the faster your payment will go.
The general view of the transaction is described in the
official specification of the protocol , here one of the most common particular cases is given.
previous output hash
- identifier (hash) of the transaction we are referring to.
previous output index
- since we need to refer not to the transaction itself, but to one of its outputs, then in this parameter we indicate which particular output we are interested in. Numbering starts from scratch.
value
- the amount of satoshi (
1/100000000
BTC) sent to the exit. It is recorded in a little-endian form, i.e.
62 64 01 00 00 00 00 00
- it is
0x016462
or
0.00091234 BTC
.
The
block lock time
and
sequence
parameters are rarely used in practice. We are not interested in them, so we will omit the description of their purpose.
But on the parameters with the word
script
in the title we dwell in more detail.
Script
The Bitcoin network has a mechanism based on cryptographic algorithms with a public key that allows you to create a system in which only the owner of the key can use the coins associated with the address obtained from this key. We will figure out how this is implemented under the hood.
To begin with, inside Bitcoin there is a simple stacked programming language called
Script
. Here is the simplest Script program:
2 3 OP_ADD 5 OP_EQUAL
Each instruction is called
opcode
, there are about 80 of them in total. The picture below shows the execution process of the above program.
Bitcoin
Script
is used to set the condition under which it will be possible to spend the output, and to be able to confirm that the condition is met. The condition (
locking script
) is stored in the transaction in the
scriptPubKey
field for each exit. Confirmation that the condition is met (
unlocking script
) is written in the
scriptSig
field for each input.
To check the right to use the output, you need to connect the
unlocking script
+
locking script
and run the resulting program as a whole. If after execution,
TRUE
remains on top of the stack, then the transaction is valid.
Pay to Public Key Hash (P2PKH)
The
P2PKH
script
P2PKH
used in most transactions, so you should figure out how it works. Here is its general view:
This script has been known since the advent of Bitcoin, and it is he who performs the task that was mentioned at the beginning of the chapter - to make sure that only the owner of the key can use the coins associated with the address obtained from this key.
The idea is this: let your friend
B
own a pair of keys -
P
(private) and
K
(public). Using the hash function, he gets the address
A
from the public key and tells the address to you. Then you send, for example,
1 BTC
to address
A
and write the following in the
locking script
field:
Only someone who owns the private key for address A
can spend this transaction. As proof, write in the unlocking script
, firstly, the public key K
, and secondly, the signature of the transaction with the private key P
When
B
decides to use your transaction as an input, he will create his, for example,
0.5 BTC
, and in the
unlocking script
field
unlocking script
put the signature of his transaction with the private key
P
-
sig
and the public key K -
PubK
.
Here is the execution process of the combined program:
Blocks and blockchain
If the entire blockchain is a book, then individual blocks can be represented as pages on which transactions are recorded. Each block "refers" to the previous one, and so on until the very first block (
genesis block
). This is what creates such a feature of the blockchain as immutability. You can’t take and change block
#123
so that no one will notice: the blockchain is designed in such a way that it will entail a change in block
#124
, then
#125
and so on, to the very top.
The block structure looks like this:
The first six parameters (all except
txn_count
and
txns
) form the header of the block. The header hash is called the block hash; Transactions themselves do not directly participate in hashing.
merkle_root
is responsible for their immutability - if simplified, then this is a hash of all transactions in the block. You can read more about the algorithm for constructing the Merkle tree here at this
link .
Nonce and bits are directly related to the process of block appearance - mining.
Mining
Mining is a critical process for Bitcoin, consisting in creating new blocks and pursuing two goals at once. The first is money supply production. Each time a miner creates a new block, he is rewarded for this with the Nth number of coins that he then spends somewhere, thereby launching new funds into the network.
The second, and much more important, goal is to control compliance with the rules on the network. It is the miners who check the scripts and transaction inputs before including them in the block.
Those who wish to learn more about the financial foundations of Bitcoin can advise
this article . I will not pay much attention to the first aspect of mining and concentrate on the second - checking transactions and launching them on the network.
Proof-of-work
Let you be a miner. You have 10 transactions that you want to include in the block. You check these transactions for validity, form a block out of them, specify 0 in the
nonce
field and consider the block hash. Then change
nonce
to 1, count the hash again.
Your task is to find a
nonce
such that the block hash (256-bit number) is less than a predetermined number
N
The search for such a hash is possible only through brute force
nonce
. Therefore, the faster you want to find nonce, the more power you will need.
The number
N
is exactly that parameter (it is also called
target
), which the network adjusts depending on the total power of the miners. If tomorrow blocks start to come out, relatively speaking, every three minutes, then
N
will be reduced, more time will be required to search for nonce and
block time
will again increase to 10 minutes. And vice versa.
This is how the Proof-of-Work algorithm underlying Bitcoin and many other blockchains looks like. With apparent simplicity, it has a number of important characteristics:
- Creating a new block is a computationally difficult task. At the same time, checking the block for correctness is a simple and almost instantaneous operation.
- The entire network takes 10 minutes to calculate a new block (on average). The specific time is different for each blockchain, but the bottom line is that the average time is pre-set. Moreover, this time does not depend on the number of network participants. Even if one day there will be a hundred times more miners, the algorithm will change its parameters so that it becomes more difficult to find the
block time
, and block time
drops back to the vicinity of the specified time.
As described above, the mining process comes down to finding a block hash less than a number called target
. In the block structure, this number is written in the bits field. For example, for block #277316 target
was 1903a30c
.
As described above, the mining process comes down to finding a block hash less than a number called
target
. In the block structure, this number is written in the
bits
field. For example, for block
#277316
target
was
1903a30c
.
$ bitcoin-cli getblock 0000000000000001b6b9a13b095e96db41c4a928b97ef2d944a9b31b2cc7bdc4 { "hash" : "0000000000000001b6b9a13b095e96db41c4a928b97ef2d944a9b31b2cc7bdc4", "confirmations" : 35561, "size" : 218629, "height" : 277316, "version" : 2, "merkleroot" : "c91c008c26e50763e9f548bb8b2fc323735f73577effbc55502c51eb4cc7cf2e", "tx" : ["d5ada064c6417ca25c4308bd158c34b77e1c0eca2a73cda16c737e7424afba2f", ...], "time" : 1388185914, "nonce" : 924591752, "bits" : "1903a30c", // <-- "difficulty" : 1180923195.25802612, "chainwork" : "000000000000000000000000000000000000000000000934695e92aaf53afa1a", "previousblockhash" : "0000000000000002a7bbd25a417c0374cc55261021e8a9ca74442b01284f0569", "nextblockhash" : "000000000000000010236c269dd6ed714dd5db39d36b33959079d78dfd431ba7" }
In
bits
in fact, two numbers are written at once: the first byte
0x19
is the exponent, the remaining three bytes
0x03a30c
are the mantissa. To get the
target
from
bits
, you need to use the following formula:
target = mantissa * 2^(8 * (exponent - 3))
But it is
bits
, as a rule, that are indicated in all online block registries, such as, for example,
https://namecha.in/ - Namecoin block registry.
And yes, enough theory. Everything that we talked about above when applied to Bitcoin equally applies to Namecoin - except for the small differences, which we will talk about in the next section.
Namecoin
Namecoin is a blockchain based on the algorithms and source code of Bitcoin, the main idea of which is to use a distributed transaction registry scheme to manage a domain name system, an analog of traditional DNS.
Namecoin copies the basic Bitcoin approaches (Proof-of-Work, 10-minute block generation interval) and data formats, with the exception of small additions, which we will talk about later.
Namecoin domains have the suffix .bit. This zone was not allocated by IANA and was not assigned to the list of
special purpose domains . Regular DNS servers typically respond to such NXDOMAIN requests. But there are gateways from DNS to Namecoin (for example,
OpenNIC ), public proxies with Namecoin support, browser
plugins , and
an open source project that allows you to start your own DNS server with Namecoin support.
In order to manage a domain with the name, say,
facebook.bit
, it is enough to register the
d/facebook
key
d/facebook
prefix is used in Namecoin for domains) and determine its value. The JSON format is used to set the values. The record setting the domain resolution to IP address
1.2.3.4
looks like this:
{"ip": ["1.2.3.4"]}
Namecoin allocates names on a
first-come-first-served basis. Even for Mark Zuckerberg himself, it will be cryptographically impossible to take the
facebook.bit
domain from the owner.
In fact, nothing restricts the use of Namecoin to just managing the DNS name bundles - IP address. Namecoin can be used (and used) as a distributed table to map arbitrary keys to values. But we will concentrate precisely on the scenario of its use in which it represents an alternative DNS over blockchain.
Domain management
Namecoin uses a transaction to store a domain record. Namely, the
scriptPubKey
field containing the program is the condition for using the transaction output, which we devoted so much time to in the previous chapter. To manage records, Namecoin introduced three new operators (more precisely, redefined existing ones):
- NAME_NEW
- NAME_FIRSTUPDATE
- NAME_UPDATE
Their meaning is clear from the names, but nevertheless we will analyze the purpose and format of use of each of them.
You may notice that the domain deletion or invalidation operator is missing. To clean the registry of unused names, a mechanism is built into the network that automatically releases a name that has not been updated for 36,000 blocks (~ 250 days).
NAME_NEW
The first step is to announce the intention to register a new name on the network. To do this, just create a special coin (output) weighing at least
0.01 NMC
, the
output script
which will look something like this:
OP_NAMENEW <20 byte hash> OP_2DROP <lock script>
To demonstrate, I will use the transactions that Stephen Morse made to
illustrate his article .
So, if we want to announce the registration of the name
d/stephenmorse
, then we need to do the following:
Looking at the resulting transaction, you can notice two interesting facts. Firstly, despite the fact that the
output script
written in Namecoin notation, it is still valid from the point of view of the original Bitcoin. The creators of Namecoin have so successfully chosen codes for their operations that in Bitcoin they correspond to operations that are essentially equivalent to writing to a stack of constants.
NAME_NEW (0x51)
code corresponds to
OP_1
, which pushes
OP_1
stack 1. A similar story with
NAME_FIRSTUPDATE
(
0x52
or
OP_2
, puts 2) and with
NAME_UPDATE
(
0x53
or
OP_3
, puts 3). So the first two steps of the script only put two values on the stack. And the next operation
OP_2DROP
removes them from the stack, so that further
P2PKH
works “from scratch”. Therefore, all those script tricks that we covered in the chapter on Bitcoin are also applicable to Namecoin, despite the redefinition of some operations.
Secondly, the keys that open a special coin and change are different. Although technically nothing prevents you from using the same key repeatedly, it’s common practice to generate a new key for each receipt. This is done in order to make it difficult to identify correlations between transactions and increase the level of anonymity in the network.
At first glance it seems strange that, contrary to common sense, it is impossible to immediately take and register a name plus an IP address for it. This is done so that no one can intercept the name as soon as they see that you want to register it (and then resell it to you).
For example, miners, analyzing unconfirmed (not yet included in any of the blocks) transactions in the network, could create their own transaction for registration of the same domain and include it (and not yours) in their block. To implement this attack, it is not even necessary to mine your block. It will be enough to put your transaction on the network with a large fee. Therefore, two separate operations
NAME_NEW
and
NAME_FIRSTUPDATE
, and the second can only be carried out by the one who carried out the first, and only after
NAME_NEW
gets into any block.
In fact, this restriction is even a little stricter:
NAME_FIRSTUPDATE
is possible no earlier than 12 blocks after
NAME_NEW
(which is about 2 hours). In order to understand why the blocks in this restriction are not 1, not 2, not 3, namely 12, we will have to step back a little from the main story and figure out what
fork
and
51% attack
.
ForkImagine that miners are looking for block
#123456
. And at about the same time, he was found independently by two miners, one of whom lives in Australia, and the other in the United States. Each of them begins to scatter its version of the block over the network, and as a result it turns out that one half of the world has one blockchain, and the other has another.
Is this possible? Yes it is possible. Moreover, this happens quite often. In this case, each node continues to adhere to its version of the blockchain until someone finds the next block. Suppose that the new block continues the green branch, as in the picture below. In this case, those nodes that adhere to the red version automatically synchronize the green one, because in Bitcoin (and, accordingly, in Namecoin) the rule works: the longest version of the blockchain is true. The red version of the blockchain will simply be forgotten, along with rewards for those who find it.
Of course, theoretically, in the second step, the situation could repeat itself and at the same time with purple they will find another one that will continue the red version of the blockchain. And on the third, and so on. But the probability of even the first fork is rather small, the second one is even less and so on. The longest fork in the history of Bitcoin was only
four blocks . So at some point, one of the branches will nevertheless break ahead, and the entire network will go to it.
51% attackThe fact that the longest chain in the blockchain is dominant is based on an attack bearing the name 51%.
Imagine that you are a scammer and buy goods at
1000 BTC
in a store. You agree with the seller and send him the money. The seller checks the blockchain, sees that such a transaction really was, passed all the checks and even got into some block, for example
#123
. After that, the seller goes to the mail and sends you the goods.
At this time, you turn on your mining farm and start mining, starting from block
#122
. If you have enough power, then you can overtake the rest of the network and count the fastest to block
#124
, after which the whole world will switch to your version of the blockchain.
1000 BTC
, , , . .
, . . 11
, , 6 , 0,1% , 10% . , . , Namecoin , 20% .
. , . ,
NAME_NEW
, 12 ,
NAME_FIRSTUPDATE
.
NAME_FIRSTUPDATE
The purpose of the operation NAME_FIRSTUPDATE
is to publish the name that I announced in NAME_NEW
, and indicate a value for it. To do this, I need to start a transaction on the network, the input of which is the very special coin that I generated at the output NAME_NEW
. In order to confirm the right to use it, I present in the input script my public key and the signature of the transaction NAME_NEW
made by the private pair key, exactly according to the scheme that we examined in the chapter on P2PKH
.
One of the outcomes of the transaction will be a new special coin weighing, like the previous one, no less 0.01 NMC
. Her output script should be like this:
OP_NAME_FIRSTUPDATE <Name> <Salt> <Value> OP_2DROP OP_2DROP <lock script>
Salt
Is the very random number 0xd5eeb22ee8117f57
that we created in the first stage of preparing the script for NAME_NEW
. Name
- it is d/stephenmorse
in hexadecimal 0x642f7374657068656e6d6f727365
.
The field Value
should contain an associative array representing the rules by which the name will be resolved. A complete list of possible keys and rules for filling them out here . In a first approximation, this is an analogue of the zone file; the link above shows the mapping of Namecoin entities to familiar DNS entities. The most popular of them are ip, an example with which was higher, and ns, which we use now.
To indicate what the NS server will be for the domain, we will 1.2.3.4
put a value in Value {“ns”:[“1.2.3.4”]}
, but, of course, in hexadecimal - 0x7b226e73223a5b22312e322e332e34225d7d
.
Like last time, close the coin with P2PKH
. In his example, Stephen deliberately created a coin at step NAME_NEW with a weight of not exactly 0.01 NMC
, but with a margin, so that in the next step this margin would be enough for the miner to commission. In the general case, the transaction will have one more entrance to ensure the commission - and one more exit for delivery.
We collect everything into a transaction and throw it into the network .
When the transaction falls into the block, the hosts will update in their tables the value for the key d/stephenmorse
on {“ns”:[“1.2.3.4”]}
. All browsers with Namecoin support will now resolve the domain stephenmorse.bit
and its subdomains to IP addresses through the DNS server located at 1.2.3.4
.
NAME_UPDATE
The “table” with keys and their meanings, which I mentioned at the end of the last section, is actually called UTXO set (unspent transaction output)
. Since it is critical for the network to prevent repeated spending of funds, before adding a transaction to the block, the miner checks to see if the inputs previously specified in the transaction were used. To speed up this operation, all unused outputs are stored in a separate data structure. This structure does not exist at the network level, but is calculated and stored by each node locally.
After I completed the transaction NAME_FIRSTUPDATE
, the output of my coin with a weight 0.01 NMC
, to which the value for the key is attached d/stephenmorse
, hit the tableUTXO
. If this output is not spent for 36,000 blocks (which is more than 8 months with 10 minutes per block on average), then it will be considered as invalid, and its corresponding name - as free.
This period of 36,000 blocks (as well as the minimum value of a special coin in 0.01 NMC
) is clearly defined at the start of the network and is unchanged. To extend the registration of the name, as well as for any changes to the record or transfer it to another owner, a transaction is used NAME_UPDATE
.
The rules for the formation of such a transaction practically do not differ from those described above. The input for the transaction should be the output of the coin obtained in the transactionNAME_FIRSTUPDATE
. An additional entrance is needed to ensure commission. Of the two outputs of the transaction, one is a new coin with an updated value for the name, and the second is designed to transfer change from the commission. The output script format for the coin is:
OP_NAME_UPDATE <Name> <Value> OP_2DROP OP_DROP <lock script>
As in the previous case, Name
this is d/stephenmorse
, and Value
- JSON with a value, both in hexadecimal. Close the exit using P2PKH and throw the transaction into the network .
In general, this is practically everything that I wanted to tell about name management in Namecoin. All that remains is to say a few words about the costs of owning a domain.
Expenses
Let's calculate how much the content of the name in Dot-Bit
(the name of the DNS zone .bit, operating on the basis of Namecoin) costs in cryptocurrency and, translating the numbers into fiat currency, is comparable to the cost of a “normal” DNS domain.
So, as can be seen from the previous section, for a transaction, the NAME_NEW
costs of the domain owner will be 0.01 NMC
to create a coin to which the zone will be attached, plus the miner's commission. For a transaction, a NAME_FIRSTUPDATE
new coin is created at the expense of the old one, and in addition the owner pays only the commission. After about 8 months, the owner will have to complete the transaction NAME_UPDATE
to retain the registered name. And this is where the required costs for the first year end.
Most articles about Namecoin (including the previously cited article by Stephen Morse) are based on data from the first years of the network’s existence and claim that the miner’s commission is 0.005 NMC. But since then, the median value of the commission has gradually decreased and at the beginning of 2019 is about 0.0003 NMC. The NMC exchange rate to the US dollar, on the contrary, having undergone several ups, returned to the level of 2015 and amounts to about $ 0.7 for 1 NMC. It is easy to calculate that the domain in the .bit zone in the first year will cost the owner from 0.0109 NMC or $ 0.00763. Perhaps it will be easier for someone to remember an approximate analogue of this amount in Russian currency - 50 kopecks.
Well, this is the lower limit that matches the scenario for buying a name for future use or cybersquatting. What about the upper limit? Since the input from each transaction updating the zone should be a coin from one of the previous blocks, the theoretical maximum of the name update frequency is equal to the frequency of the appearance of new blocks. Recalling that the average value of this value was set at the start of the network and is about 10 minutes, it can be estimated that the upper limit of the cost of maintaining the domain will be 15.7744 NMC or a little more than $ 11.
As you can see, even such a fantastic scenario for using a name in Namecoin costs approximately the same as the first year of owning a regular domain in the most popular .com zone. If we compare the more realistic scenario with the update on average once a day, then the name in the .bit zone will cost about 8 cents a year, which is an order of magnitude cheaper than the most advantageous offers in the traditional DNS, which do not fall below $ 1. In the scenario of short-term use of the domain (from several hours to a month), the difference in favor of Namecoin will already be two orders of magnitude.
Taking into account the financial attractiveness of the service, as well as the anonymity of the domain owner, including the lack of a traditional “money trail” for ordinary DNS, it becomes clear why Namecoin has become a popular network for service owners with an increased risk of disconnection or blocking, in particular botnets.
Botnets in Namecoin
Indeed, the fact that botnet operators have begun to use anonymity Dot-Bit
to protect their C&C servers is not surprising. Another thing is more interesting - how long botnets in .bit remain active.
C&C domains registered in “normal” DNS zones are sooner or later withdrawn from the owner, which forces the operator to pay for registering a new name and launch a new bot assembly on the network with a new management server. The fundamental impossibility of removing a domain in the .bit zone increased the botnet's lifetime by orders of magnitude.
Take, for example pationare.bit
, the domain registered in December 2016. It was used to control a botnet Chthronic
(distributing a banking trojan built on the basis of the famous ZeuS). Distribution campaignChthronic
was associated with the use of the exploit pack RIG and was described in detail by various researchers (for example, malware-traffic-analysis.net ) at the end of 2016 and the first half of 2017.
One could assume that this botnet was destroyed long ago. But no - more than two years after the launch, the botnet’s C & C domain and its network are still active. As can be seen from the screenshot below, the last update was made in December 2018.
It looks tempting, right? Since the management server’s DNS name remains intact, there is no need to frequently update the bot code and restart the distribution campaign. There remains only the cost of changing the hosting after blocking the IP address, but these costs can also be reduced by using hacked web servers as proxies, the shells of which cost less than a dollar.
On the other hand, all transactions in the blockchain are publicly available to any participant. As we discussed in detail in previous chapters, coins in Namecoin do not disappear without a trace, which means we can track their redistribution between addresses. And knowing the rules and restrictions, taking into account which transactions are formed in Namecoin, we can find meaningful patterns in which the uniform management of some addresses that participated in the transaction will be obvious. In this case, the domains paid with coins from these addresses will have a common owner - the group we are managing, which controls the botnet.
We will develop this idea further.
General IOC Collection Scheme
Let us describe the general search scheme using the example of a real botnet of the RTM group. We will build on this sample , which was identified as Win32/Spy.RTM.N
.
As you can see from the screenshot above, after starting it tries to get the IP address for the name stat-counter-4.bit
. We get information about the transaction history for this name in Namecoin.
The identifier of the transaction that created this domain, we get by clicking on the link to the operation NAME_NEW. The input address of this transaction, with the help of which the domain was created, is obviously managed by the group of interest to us. He will be the initial set of data: N3KPt8py24EAsAiKquyFgoKGyTYeR5Tmry
.
Having pushed off from the initial data set, we iteratively go around the blockchain, moving in the direction of its growth (upward movement, or upstream movement). At the beginning of each step, we get a transaction, a certain coin at the input of which belongs to the person we are interested in. In the first step, we check the transaction from the initial data set, the owner of the coins at the input of which we know a priori.
The transaction is checked for compliance with heuristic rules (we will formulate them below), which guarantee that a certain coin (or coins) at the exit of the transaction belongs to the same person as the input coin known to us. If the transaction in question satisfies one or more heuristics, then such guiding coins will indicate the direction of further movement. The transaction that spends the guiding coin will be the next step in the iteration.
At each step of the iteration, we replenish the list of domains that participated in the transactions and the list of IP addresses to which these domains were resolved. These are historical identifiers of compromise (IOC), which can be used for forensicists, as well as to identify tactics and grouping methods.
The movement stops if the transaction in question does not satisfy any of the heuristics. This means that we cannot say with certainty that any of the outputs of the transaction in question is controlled by the person we are interested in.
Another situation that stops the movement is the lack of transactions from the output address. We will save such addresses in a separate list of unspent coins (UTXO). They represent the greatest value in the entire study. Since we are confident that these addresses are managed by the person we are interested in, any future transaction using these addresses will generate a new, previously unknown IOC — the domain name or IP address — that has not yet been used by the grouping. But with high probability it will be soon.
To bypass the blockchain, it is convenient to export it to the database. For this, you can use, for example, a modified rusty-blockparser utility , in which we improved Namecoin support by adding operations recognition NAME_*
, data structures Auxiliary Proof-of-Work
and expanding the export format.
The Python pseudo code for the upward movement is presented below. Hereinafter, it is assumed that the blockchain transaction data is stored in MongoDB.
start = "37d40bc2f3ca7415908dc9e276593b50d3120158cd540cb088246f2e2cf88b16" tx = namecoin.transactions.find_one({"id": start}) def upstream_movement(tx): global names global IPs global utxo global known_addresses heuristic_result = upstream_heuristic_test(tx) if heuristic_result and heuristic_result.guiding_outs: if tx.has_name_op(): names.add(tx.name_op.name) for ip_address in tx.name_op.get_ip(): IPs.add(ip_address) for guiding_out in heuristic_result.guiding_outs: known_addresses.add(guiding_out.address) tx = namecoin.transactions.find_one({"in.id": guiding_out.id}) if tx: upstream_movement(tx) else: utxo.add(guiding_out)
The second part of the blockchain bypass is the movement against the growth of the blockchain (downward movement, or downstream movement). In general, the downward movement algorithm is no different from the upward algorithm. A movement begins with a transaction from the original data set. At each step, the transaction is checked for compliance with heuristic rules (generally different from the rules for upward movement). The only difference is that the coin, whose membership is known a priori, is at the exit of the transaction, and the heuristics guarantee that the same person has one or more coins at the entrance.
A downward movement also stops if the current transaction does not satisfy any of the heuristics. Unlike the upward movement, we cannot meet unspent coins among the guides, and this option to exit recursion in the downward movement will not work. But, as with the upward movement, we replenish both the list of names and the list of IP addresses.
Python pseudo code for the downward movement would look like this:
start = "37d40bc2f3ca7415908dc9e276593b50d3120158cd540cb088246f2e2cf88b16" tx = namecoin.transactions.find_one({"id": start}) def downstream_movement(tx): global names global IPs global utxo global known_addresses heristic_result = downstream_heuristic_test(tx) if heuristic_result and heuristic_result.guiding_ins: if tx.has_name_op(): names.add(tx.name_op.name) for ip_address in tx.name_op.get_ip(): IPs.add(ip_address) for guiding_in in heuristic_result.guiding_ins: known_addresses.add(guiding_in.address) tx = namecoin.transactions.find_one({"out.id": guiding_in.id}) if tx: downstream_movement(tx)
Now consider the heuristic rules that we will use when moving along the blockchain.
Heuristic rules
Common change
Let's look again at the transaction, a screenshot of which is given above. An address N3KPt8py24EAsAiKquyFgoKGyTYeR5Tmry
containing money to create a new name is sent to the transaction input . There will be two addresses for transactions NAME_FIRSTUPDATE
and NAME_UPDATE
at the entrance - a special coin with a zone from the previous transaction by domain and additional funds to cover the commission.
I will immediately note that in the context of transactions, we will talk about both coins and addresses. Despite the fact that in some works these concepts are considered almost equivalent, it’s important for us to clearly indicate the difference between these terms, since in the course of the study we will draw conclusions about both coins and addresses.
Saying "coin", we will mean a positive balance, formed as the output of a transaction. This coin is identified by the transaction number that generated it and the exit index. For example, a coin at the input of the transaction considered above has an identifier 5778be8e1901e9931e9b41a128a0b7f963e6e1ae72e461df2cba26e6279d433a:1
, since it was formed as the output (with index 1) of the transaction 5778be8e1901e9931e9b41a128a0b7f963e6e1ae72e461df2cba26e6279d433a
.
A special coin, as before, we will call a coin with a face value of b 0.01 NMC
, locking script
which contains the operation with a domain name. We examined in detail the mechanism for the formation of such coins in the Domain Management section. We will call a coin of arbitrary denomination an ordinary coin, to which the operation with the domain is not tied.
The main property of coins is their immutability. Any coin can be spent only once and only in its entirety. Thus, any is mentioned in the Namecoin network a maximum of two times: once at creation, and a second time at expenditure.
Saying "address", we will mean an identifier that uniquely identifies a key pair that can open a locking script in a format P2PKH
that closes a coin located at the input or output of a transaction. Since only the key corresponding to the address can spend a coin, the closest analogy of the physical world to the address is the wallet in which the coins are stored (and from which they are spent).
Despite the fact that in Namecoin an address is often also used only twice, it is not necessary to receive and consume a single coin. The facts of reusing addresses will help us a little in the future.
We talked more about inputs, outputs and addresses in the chapter of Bitcoin 201.
So, two coins are formed at the exit of the transaction. The N2hgZoWaTKoJ7FPmLuytTow3XrCCfEj2ca
same special coin, weighing 0.01 NMC, to which the domain is bound, went to the address . An NKMMLwyMw4nwGuke6vd3AuDBMP18FWRaF1
ordinary coin with change was sent to the second address .
This is the most common transaction scheme. There are still options when there is more than one coin at the entrance, but their common property is that the coin with change is always exactly the same.
You can guess that such a transaction corresponds to a simple update of domain information. Payment for updating is carried out using one (less often several) coins belonging to one person. Indeed, since a transaction always has only one author, it must manage all the input addresses. Without this, he will not be able to create an unlocking script, which is needed in order to use the coins from this wallet.
Well, since all the change from this operation is collected in one coin, it is clear that this coin belongs to the same person as the coins at the entrance.
A similar scheme for Bitcoin is described in this work , where it is called one-time change
. It reflects the method by which native Bitcoin applications conduct transactions - bitcoind
andbitcoin-qt
. It is called one-time because of another feature of these applications. By default, they generate new addresses for coins at the output of the created transaction.
Namecoin, along with the Bitcoin code base, has inherited the bulk of the code for these applications, which are called namecoind
and namecoin-qt
. Regarding ordinary coins, we can safely use this heuristic without any changes.
The statistics of reuse of addresses for storing special coins shows that this rule is in most cases also observed for them. Reuse of such addresses is quite rare. Addresses used more than once, about 6% of the total; more than two times - about 1%. Based on the purpose of Namecoin, it seems reasonable to assume that most transactions with special coins on the network are simple creation and update operations, during which the domain owner does not change. Therefore, we can argue that such an operation corresponds to the withdrawal of a special coin to a new, previously unused address.
Now let's look at an example of a transaction with a reused address for a special output coin. To do this, take another transaction of the RTM group -b3c7ce9ca3a689c6236b9d6df3c257c5fab6c3985187669ccf731ac42a127a11
.
The address NDpWDEx1mBkUYywqxDTAZZeGCfUV4GkVE8
to which the special coin went was already used in previous transactions.
As mentioned earlier, the default scripts in the native client applications for Namecoin do not result in address reuse. To send a special coin to an existing address, the owner will need to make separate, optional efforts, finding out and indicating the exit address explicitly at the stage of transaction formation.
Why might this be required? The only mention of the situation in which the exit address is specified manually, I met only in the instructions for transferring the domain to another owner.
The conjecture is confirmed if we consider the further fate of the addresses at the exit of the transaction in question. In the diagram below, this transaction is marked by a bright green milestone. It can be seen that the next transaction 9e16f6be
on the stat-counter-4 domain took place using a money address NJ8xUePv
that does not have an explicit connection with the address used in the "parent" transaction. Obviously, the domain was transferred to the management of another person.
In the general case, this can be either the sale of a domain to another owner that is not related to the activities of the person in question, or the transfer of a domain between the accounts of one person. The second option is the simplicity and low cost of registering a new domain, as well as the lack of visible interest of organizations and trademark owners in registering domains in the .bit zone. We were not able to come up with at least a little justified motivation for buying a domain, noticed in malicious activity. Therefore, we believe that despite the possibility of transferring the domain to another person, transactions with reusable addresses for withdrawing a special coin represent a regrouping of assets between several accounts controlled by one group.
We formulate the above arguments in the form of a heuristic rule, which we will call common change:
, , , .
, , .
, , .
The scheme for using this rule is shown in the figure. Gray streams - ordinary coins, green - a special coin. The guides will be all the coins from the end of the transaction opposite the coin through which we came to this transaction: all exits are for upward movement, and all entrances are for downward movement.
We note several features of this heuristic. Firstly, bi-directional: it works both for the upward movement, when we know the owner of the entrance, and for the downward movement, when we know the owner of one of the coins at the exit.
Secondly, the optionality of having a special coin: despite the fact that in its absence the transaction is not related to updating the domain, the above logical reasoning regarding the owner of an ordinary coin at the output remains valid.
The pseudo code for testing a transaction for compliance with the common change rule would look like this:
def common_change(tx): result = {"guiding_outs": [], "guiding_ins": []} if len(tx.outs.money) != 1: return {} addr = tx.outs.money[0].address first_tx = namecoin.tx.find_one({"out.id": addr}, sort=[("block", 1)]) if first_tx.id != tx.id: return {} else: result["guiding_outs"] = tx.outs.all result["guiding_ins"] = tx.ins.all return result
Common spending
The heuristic considered above has another important property, in addition to bidirectionality. Common change - heuristic "without memory"; The verification result is determined only by the characteristics of the transaction in question and does not depend on the results of other heuristics and accumulated data. Such a heuristic is indispensable in the first iterations of a traversal, for the initial filling of a data set. On the other hand, it is easy to notice the limitations of its application. For example, it will focus on a transaction containing two or more cash outputs.
As an example of such a transaction, consider db4ff4082f39d0a501508706e627f26aa92712d27b4f633ded59917d201cfae5
. This transaction relates to the activities of the group managing the Dimnie botnet.
We went down this transaction through the addressMy7Ap3nH5f4X6Us2KiUWisd77wRpMG1MDY
that was used in the previous CC transaction as the login address. Despite the fact that his attitude towards the person being studied is beyond doubt, we cannot say the same (as well as the opposite) about any of the other exits and entrances. This may be a redistribution of coins between group addresses, in which case all addresses are controlled by the person we are interested in. Or is it, perhaps, a recharge from the addresses of any of the exchanges selling Namecoin tokens. Or a transfer from another network member that is not related to the activity of the person being studied. It is impossible to make an unambiguous conclusion on the attributes of this transaction alone.
Consider the addressN4XtLb7xpC4Zk72T8QcshKhTW17ZCyQ1j1
at the input of this transaction. This address has already been used previously (“earlier” for a downward movement means “in the future”, “in the direction of blockchain growth”) at the input of a CC transaction 6bffc741eb66de074c09a380fb5e6bd13d4bd5205c36a76e3682674dba08461e
, which allows us to consider this address to be managed by the person of interest to us. And since, as has already been shown, the keys to all the coins at the input of the transaction are controlled by one person (which cannot be said about the exits), we have reason to believe that all other inputs also belong to the group of interest to us.
The strict condition of the heuristic common spending looks very simple:
If it is known that at least one of the addresses at the input of the transaction is controlled by a certain person, then all other addresses at the inputs of this transaction are controlled by the same person. Coins at these entrances belong to the same person.
As you can see, this heuristic makes sense only for the downward movement. When moving in the direction of blockchain growth, we come to the transaction under study through one of the inputs. In this case, the rule condition is satisfied automatically, but does not say anything about the outputs of the transaction and does not allow you to continue moving in the upstream direction. In other words, this is a unidirectional heuristic.
The second feature of this heuristic, which is worth noting, is that here we first used the data accumulated as a result of checking previous transactions - a list of addresses managed by the person under investigation. For this reason, this secondary heuristic cannot be used for independent movement, without any primary heuristic that does not depend on the accumulated results (such as common change).
The pseudo code for testing a transaction for compliance with the common spending rule would look like this:
def common_spending(tx): result = { "guiding_ins": [] } for input in tx.get_ins(): if input.address in known_addresses: return {"guiding_ins": tx.ins.all} return {}
Known address
The last heuristic that we will consider in the framework of this section is the simplest of all. This is a secondary bidirectional heuristic that (since it is bidirectional) can be used for both upward and downward movement. The strict wording of the heuristic known address for the upward movement looks like this:
If it is known that the address at the input (output) of the transaction is controlled by a certain person, then the coins received at this address (spent from this address) belong to the same person.
Despite the fact that heuristic looks like a frank truism, this rule helps to find branches and intersections in coin flows and adds connectivity to the transaction tree. In addition, it allows you to not stop the movement on transactions that do not fall under other heuristics. An example is the transaction of the 7a35b9cb0a16b3eba92781be014555eaa4255bd17655bb00f2b3f42c3950ac69
already mentioned Dimnie botnet.
Having reached it in an upward movement, we will not be able to advance further with the help of common change, since the output is more than one ordinary coin. Looking at a transaction, we cannot say how many coins at the exit belong to the same person as the coin at the entrance - both, some one or none at all. The use of the known address heuristic allows you to move forward due to the fact that the addressMwMdTb8WQvoRW9jEW5dHn9SkkCJTRn31wQ
was involved in the CC transaction cf7ac8986f9855246c6cf26df9a24aa5645cb9258bf787e034a33e75101ae1fc
that created the domain that was previously seen in the upward movement d/sectools
.
For the sake of completeness, we give the pseudocode of the heuristic known address:
def known_address(tx): result = { "guiding_outs": [], "guiding_ins": [] } for output in tx.get_outs(): if output.address in known_addresses: result["guiding_outs"].append(output) for input in tx.get_ins(): if input.address in known_addresses: result["guiding_ins"].append(input) return result
So, now we have both the general bypass algorithm and the heuristics necessary for moving along the blockchain, so that we can put them together to get a little IOC from Namecoin.
Go!
Let's go through the RTM transactions with the upward and downward movement, starting with 37d40bc2f3ca7415908dc9e276593b50d3120158cd540cb088246f2e2cf88b16
. In the course of advancing on the blockchain, we will collect not only the IOC, but also the transactions themselves that satisfy the heuristics. We visualize coin flows between transactions using the Sankey chart.
The complete diagram is too large to be displayed in the format of this document, so I will give here only part of it that is necessary for the further story.
A stream of ordinary coins is highlighted in gray. The remaining colors correspond to the flows of special coins. A separate color is selected for each name. White milestones correspond to transactions that satisfy the heuristic conditions. The bright red milestones on the right are UTXO.
The chart element that I would like to draw attention to is highlighted by a blue milestone. This is a dangling entry - a coin that arose at the input of a transaction that the algorithm passed on the upward movement, but the transaction that created this coin did not meet him.
Dangling inputs are signs that the structure under study has side branches that are not connected to the main trunk along which the algorithm moves. In the case shown, this is another independent account. It, as can be seen in the diagram, begins to be used to pay for changes in the domains we already know. From this fact, we can conclude that this account is also controlled by the person under investigation. To get the IOC associated with operations on this account until it appears on the chart, we will run a separate downward movement, starting in a transaction with a dangling entry.
Similarly, in a downward movement, dangling exits may occur. For each of them, we will launch a separate upward movement starting from the corresponding transaction.
In addition to the transactions of the group that controls the RTM botnet, we also investigated the transactions of the groups that control the Shifu, Dimnie, and GandCrab botnets. As a result, 164 domains registered in the interests of these groups and 277 IP addresses associated with these names were found. At the time of this writing, of the collected UTXO belonging to these groups, 39 coins remained in effect.
The IOC lists, as well as the Namecoin addresses on which the unused coins of the groups remained, are given in Appendix A.
Conclusion
Real life testing is a challenge to almost any technology. By the mid-2000s, Wikipedia had become such a popular trusted source of information that, by changing the texts of articles, it became possible to control public opinion, spin, earn. This period in the history of the service is famous for its enormous revision wars - the aggressive use of the mechanism for correcting articles and the rollback of edits by several warring parties in order to win the dispute over the content of the article. Wikipedia pages turned into an international vanity fair, where everyone wanted to literally say the last word.
On the one hand, they began to fight the war of corrections by setting up special rules that, in case of a dispute, allow temporarily excluding the possibility of editing the article - until the debaters in the section "Discussion" find a compromise wording. On the other hand, the war of revisions made Wikipedia launch a dynamic mechanism for managing administrators' resources, which allowed them to be quickly involved in resolving conflicts in the hottest areas. Moreover, the encyclopedia took advantage of the public attention that clashes around individual articles drew to it in order to attract more participants to the editing of these articles and achieve the most correct and complete coverage of a particular topic.
Can Namecoin, like Wikipedia, grow up and cope with its challenge? Wait and see.
PS Tables with indicators of compromise are available on GitHub .
Posted by Alexey Goncharov, PT Expert Security Center