- [x] intro, changes gat, gcn, graphsage to effectiveness and efficiency
- [x] background changes to mention criminal networks
- [x] background too long. Formalize the background.
- [x] abstract
- [x] related work
- [x] Weighted DANI to Weight diffusion?
- [x] how many more masterminds are detected
- [x] JSON changes to powr
- [x] empirical results, talk about efficiency feature
- [x] system design, sub components.
- [x] graph change to inductive and tranductive setting
- [ ] change Figure 8 to seperate
- [ ] evaluation more
- [ ] Figure 3
- [ ] Case study
- [ ] Figure 7
- [ ] Case study
- [ ] seperate the figure 8
- [x] It was not obvious to me what open research question or problem this paper is trying to solve or answer. For example, the related work section mentions prior papers on this topic, but does not explain what is the improvement or difference in this work.
- [x] The paper is also missing an explicit problem statement section that might help the reader to understand this.
- [x] Important used terminology like "crowd pumps" vs. "time pumps" are introduced such that it is not clear to the reader what these terms mean. Here, concrete examples could help.
- [x] The paper attempts to identify masterminds behind pump and dump schemes, but does not define precisely what constitutes a mastermind in such event. Only brief and informal statements like "masterminds are equipped with prior knowledge" are provided without explaining what such prior knowledge might be or how it is used in manipulating the market.
- [x] The paper also talks about "signals" without explaining what signal means in this context.
- [x] Also the involved technical design decision are not clearly explained or justified. This paper models pump advertisement propagation as directed graph. Then, a particular algorithm called Diffusion Aware Network Inference (DANI) [25] is used to analyze the graph. The paper does not make it clear to the reader why such algorithm is used.
- [x] To provide one more example of writing that needs improvement, on page 7 it is said that "Together with human expertise, we have validated the truthfulness of our label. For a more comprehensive list of criteria to identify mastermind, we provide an appendix in ??" Such a sentence leaves many questions open regarding what kind of human expertise is needed and the mentioned appendix is missing from the submission.
- [x] The paper argues that by using the above-mentioned DANI algorithm, the authors are able to infer "criminal communication network" that represent "community dynamics" and "dominant influence within the network". This seems to be one of the main technical results of the paper, but I struggled to understand what exactly such terms mean.
- [x] More simply put, it seems to me that the paper considers entities that broadcast many messages and receive few messages as "masterminds". However, I would assume that such Telegram channels are not based on real (verified) identities, so I don't quite understand how the authors map channels to such identities.
Research contribution, practical usefulness:
- [x] Given writing issues, like the ones listed above, I cannot really comment how reliable the evaluation results achieved in this paper are. For now, let us assume that the results are correct. How useful in practice is the built mastermind detection system?
- [x] - Despite claiming the potential to improve the crypto market regulation, the paper ignored immediate real-world regulatory challenges (e.g., how do we decide what jurisdiction a Telegram channel belongs to?)
- [x] - The methodology seems standard. There seems limited contribution to security literature.
- [ ] - A lot of clarification is needed before readers can understand the paper.
- [x] I'm a bit confused by the goal, especially what information you want to extract and what regulatory challenges you plan to overcome. You started with a grand goal: "accurately identifying the masterminds behind crowd pumps becomes crucial for regulators to terminate the manipulation and ensure market stability." At this point, it sounded like you want to identify the entity behind pump-and-dump schemes so regulators can take legal action, which sounds challenging (since many of them are anonymous and perhaps operate across different jurisdictions) but cool.
- [x] However, later, you said, "This paper presents a novel approach for disrupting crowd pump schemes by identifying the Telegram channels used by masterminds to disseminate manipulative trading instructions." Now the goal becomes identifying Telegram channels spreading manipulative trading instructions. Do you still want to identify the entity behind these channels? Sound like no.
- [x] "the identification and termination of these mastermind channels effectively disrupt the criminal networks’ ability to propagate pump-and-dump schemes." This claim gives rise to a number of regulatory questions. Who would you envision to "terminate" these channels? I don't know if it is feasible to determine whose justification it is for a given Telegram channel.
- [x] By this point, it seems pretty clear that the goal is rather narrow --- it's a plain application of social network analysis, and it does not take real-world regulatory challenges into consideration. It's not clear what the challenges are. It's also unclear what new security insights are obtained.
- [ ] I still really wanted to understand the evaluation result, but I'm left very confused.
- [ ] How do you envision this will be used in practice, and what would be the accuracy and performance requirements?
- [x] How do you ensure your ground truth is accurate?
- [x] How did your method generalize to other social networks?
- [x] How does it compare to earlier works?
- [x] Overall, the paper can benefit from a clear statement of the goal, the approach, the real-world accuracy/performance requirement, and the evaluation results. There seems to be a big gap between detecting spamming telegram channels and solving a regulatory challenge.
- [x] Is it reliable evidence to justify closing of certain Telegram channel? The paper provides no discussion on such practical issues related to the practical usefulness of their system.
- [x] More generally, the main difference to previous papers and pump detection systems is that this paper tries to detect which Telegram channels are operated masterminds (without carefully defining what is a mastermind). The paper briefly hints that this could help regulators to protect the markets but does not explain how in practice the regulators could identify and penalize individuals behind such Telegram channels.
1. 数据收集写成系统,系统设计的角度上来写。现实中已经部署,我们已经发现了多少的mastermind
2. 写出impact,我们的找出了多少的mastermind,
3. 分析了什么时候到什么时候
~~4. regulation, 我们没有用private的data,我们用了public data。我们已经在~~
5. human experise: 强烈的引导性语言,引导性质的语言的。
6. The research contribution or novelty over prior work is not clear: 与过去的osn不一样,我们的temporal。安全上的insight
7. 我们build system,我们也跑了一些实验。检测结果,检测流程。我们做的东西不是玩具,做的不是玩具,而且还能用
8. figure 3,调大字体,换图片。
9. abstract?
10. java's comments, map
~~11. table1,少一些caption~~
2. Need to make the structure of paragraphs more clearly
3. powr for crowd pump signals figure
4. ~~Figure 2 icon chane to telegran. PPT design color and then change the color~~
~~4. Table 1, d1 and d2 need to be refered. Make the mastermind red in the . How we get the returns
~~5. figure 3, think? mastermind detect figure needs to be more deep ~~
6. ~~figure 4, the weights on edges need to be verified if they are the same~~
7. ~~Figure 9, remove number and mark the one we choose~~
8. Figure 10, remove traingin times. We need to mention the plot is for AUC performance. Group the 4 plots, with 2 above as temporal and 2 down as non-temporal. Fix the x-axis scale for the training time. We also need to compare.
9. discussion needs to be longer.
10. dataset needs to have one table to summarize. Periods of time, number of signals, number of crowd pump events.
1. ~~figure 1 and figure 2 put together ~~
2. ~~重画figure2, text, 大小,~~
3. ~~Paragraphses need to be put together ~~
~~4. table1 needs to put in appedix 1, mastermind section needs to highlighted
5. ~~dataset put after the methodology,~~
6. Figure 4, 参看yebo和cong的图, 模仿,画graph的示意图
7. 删figure 3 title, 调大字体,颜色再改
8. ~~figure 6 two columns~~
9. 内容,需要继续精简,继续读,然后再语言精简,不能继续
10. ~~figure 8 放appendix~~
11. 图要单独保存,然后再用subfigure
1. change the plots, to add more description under the graph
2. highlight cosine similarity is not good for the cases in which direct links show
3. the overall crowd pump network
4. out degree used for classification
7. Price charts associated with crowd pump
8. graph topological features
1. add criminal network to forencisc
2. in-ratio and out-ratio, show non-linear. However, !!!!!!!!applying only this feature to classify masterminds and peers did not yield satisfactory performance.
3. Topological features
5. see effciency
6. graph summary
1. abstract 2 paragraphs, methodology more, accuracy AUC, 综合准确率
2. time pump solves what problem using what methods, research gap, 现有方面做了什么,他们有什么问题, 他们没有trace masterminds, 我们去探索世纪
~~3. title 全大写~~
~~4. related work 放到后面~~
5. paragraph, 没有深度,不要太浅的信息
6. the overall crowd pump network
# TODO list
1. simple timestamp ranking and closeness ranking to see performance
2. time and crowd pump proportion
3. balanced accuracy
4. overview of the criminal network: the total graphs
5. detailed architech
6. graph features: directed DANI vs weighted DANI
7. data statisitics: mastermind vs non-mastermind (returns? edges?). The size of the criminal networks vs returns
8. trading volumes
9. summary from levels of signals, crowd pump event, crowd pump events.
10. Reference 数目最好能50+
!!!!!!!!The larger the OSN networks, the bigger the impact. background
% !!!!!!features, x
Formalization
% Define the edges and nodes, we need to demonstrate. How the signals transform into networks. Real world meaning
% The crowd pump signals in cascades in real world
1. Threat model, mastermind, no formalization, however, we need to explain why we can trace pump and dump
0418
1. balanced auc, 10% * 9 0s 90% 1s
2. treat the labels from 2 dimensions to 1 dimension
3. simple sorting and closeness centrality to detect
5. bootrap the training
1. sequences of node activations (message propagations)).
2. The accuracy plot with closness centrality
0414:
1. Mastermindd defintion and formalization: use math to represent and plots to demenstrate
2. Introduction, existing reseach limitation
3. Pump and dump in traditional and decentralized market
1. Introduction
a, b show important problems
a. 介绍什么是pump dump(damage and how the impact demonstrate)
Pump-and-dump schemes have emerged as a significant issue in the digital economy, particularly within the cryptocurrency markets. Unlike traditional financial markets, where pump-and-dump schemes often involve the dissemination of false information to manipulate stock prices, cryptocurrency pump-and-dump is predominantly a trade-based tactic. In these schemes, administrators typically send messages across social media platforms, instructing followers to buy specific tokens. This strategy leverages the rapid dissemination capabilities of social media to influence market behavior quickly and profitably.
b. pump and dump machenism and how soical network contrbute
what, who, where, why, and how
The masterminds of pump and dump schemes frequently utilize online social networks to organize their activities. Initially, they establish a group on an ONS and then actively recruit members by extending invitations across various ONS. Once the group achieves a predetermined membership threshold, the scheme organizers begin disseminating messages that encourage members to buy specific tokens. The masterminds of the scheme, having prior knowledge of the targeted tokens, typically acquire these tokens beforehand, positioning themselves to profit from the subsequent price increases driven by the collective purchasing actions of the group members.
To optimize profitability, the mastermind organize multiple channels to manipulate the same cryptocurrency. Based on coordination mechanism, cryptocurrency pump-and-dump schemes can be categorized into time-based and crowd-based strategies. Time-based pumps depend on synchronizing the pump of a specific token across various channels simultaneously. Conversely, crowd-based pumps rely on the distribution of instructional text messages across online social networks to manipulate the market.
c what other did (existing reseach limitation)
c. what others want to do to mitigate the pump and dump
The research community has extensively explored time pump schemes, yet the crowd pump pheomenon remains under-studied~\cite{Morgia2022}~\cite{Hamrick2021}. To prevent the time pump, researchers innovate detecting and prediction algorithm on which token on which exchange when will be pumped ~\cite{Hamrick2021a,Li2022,Hamrick2019TheSchemes}, because time pump happens at the same time and lasts most for a few minutes to hours so the exchangs could halt the trading on that commodity in a short period of time to protect the integrity of the market. However, due to the more complexed coordination mechanism and longer duration of crowd pump, there is no way for the exchanges to block the trading for days, so its important to distinguish the masterminds from the crowd pump schemes to accurate terminate them to ensure the market integrity.
d. what we do, our way is to trace masterind (actionable)
This paper aims to tackle the challenge by proposing a methodology for constructing network graphs and identifying masterminds within crowd pump schemes in the cryptocurrency market. We employ a dual approach, integrating market trading data with analysis of Online Social Networks (OSN) to construct social network graphs that elucidate the complex network of involved channels. Subsequently, we implement a Graph Neural Network (GNN)-based anomaly detection framework to pinpoint the principal actors in these schemes.
e. methods (pipeline and system, summarize our methods)
To test and evaluate our methods, we collect and process data in real time both from OSN and crypto centralized exchanges by scrapping the raw message text and raw crypto pricing data. Then, for the OSN data, we parse the name of the token from the raw text and record the timestamp of the messages as crowd pump signals and group them into information cascades. Further, we utilized the Fast Diffusion Aware Network Inference Algorithm (DINA) on the information casacdes to construct three different sets of edges by using different node embeddings for the underlying crypto crowd pump criminal networks and engineer features through topological analysis. For the crypto market data, we collect trade by trade and minute by minute data and engineer features to represent the impact of the signal on the crypto market. Finally, we test and evaluate our methods in the setting of classification task using the graph with various node embedding edges along with the topological and market impact features using custermed GAT, GCN, and GraphSage
f. contribution (list)
!!!!!!tables to show advantages:
g. evaluation (can write into contribution)
h. overview (optional)
commodity: C, (c1, c2, c3, …., cn)
Signals: S, (s1, s2, s3, …., sn)
Event Establishment (cascade), D, d1,d2
Time: T, (t1 ,t2)
Entity: i
For each commodity c, we select s from t1,t2 and do the event establishment. (Sorting each cascade based on infection time and only keep order. Then DINA construct the graph by maximising the community and diffusion process)
For each T1 (t11, t12), T2, T3. For each commodity, we construct the corresponding graphs GT1c1 (V,E), GT2c1, GT3c1.
Thus, we train on GT1C, and test on GT2C, and use the GT3C to do the validation.
The task is to classify V based on GTC
0412:
1. participants defined -- all parties, including us
1. Task on node classification: formalization on graph and nodes, mastermind and non-mastermind nodes
2. threat model: node controled by mastermind, and other nodes influenced by mastermind
3. definition on mastermind: what masterminds can do and what we can observe from the market
4. signals not in the data training or unseen, or other ways to avoid detection (from perspective of regulators)
0409:
1. change the label from two to one (mastermind and non-mastermind)
0408:
1. mastermind definition more clear (mastermind in a story to show the what we want to achieve)
- [x] bracket in a tuple format for the heatmap's x axis)
3.
- [ ] Time scale
0329:
1. area decrease and increase in parameters, plot it as a point to see the trend
0325:
1. the id and size of the plot
2. use closeness as the baseline
3. market value (total volume)
0322:
1. GAT , MORE PARAMETER TO TUNE, embedding demension
2. RANDOM SAMPLING IN GRAPHSAGE
3. 误报率 漏报率 f1 balanced precision accuracy acc
4. two demensions on experiment (time and not time)
1. in article layers, embedding dimension, sampling
2. damage in terms of trading volumes (make sure that the number is increased and highlight the impact)
# TODO: 20 for one model and random number (for traing speed and accuracy, requires 3%)
# Table for comparison
# Parameter tuning (model), once we got the most perfomant one, we tune random
# Save the best model in pytorch
# Results for saving
# plot
only the best model
Assessment of Chat Messages
Frequent use of emojis and slang (e.g., "haha," ":sweat_smile:"): Suggests a casual and informal communication style.
Inconsistent and fragmented thoughts: Messages often jump between different topics and lack a clear flow.
Unclear purpose: The messages seem conversational and aimless, without a specific goal or objective.
Vague and ambiguous statements: Many statements are open to interpretation and lack specificity.
Some potential inaccuracies: Some statements may be factually incorrect or based on speculation.
Repeated requests for financial assistance: The user frequently asks for money and suggests a lack of financial stability.
Sporadic use of technical terms: The occasional use of cryptocurrency-related terms suggests some knowledge of the subject, but the overall understanding may be limited.
Humorous and playful tone: The user often jokes and uses humor in their messages, creating a lighthearted atmosphere.
Random and unrelated references: The user occasionally mentions topics unrelated to the main conversation, such as Michael Jackson or Game of Thrones.
Lack of grammar and punctuation: Messages exhibit poor grammar and punctuation, making them difficult to read and understand at times.
If we have label algo, why machine learning.
Labeling needs human judgement
Use previous unseeb data
Label -> statistics analysis and complexed
Label -> human (eyes, knowledge, analysis) identify ()
label -> all the features, however, sometime when we do detection, we can access all features. Long time?
Long time,
1. Evaluation
- [ ] hyperparameter setting for 3d
Learning rate small -> big
number of layers 2 -> 5
keep records of the results of the experiments
Layers, lamda, balanced accuracy, false postive rate
- [x] Detection efficacy: accuracy, F1, FPR, FNR, ROC, AUC......
- [x] Comparison: GNN anomaly detection
- [ ] Volatility may not be a good feature. Volume and price should be more directed
- [ ] Why three days, from channels who have the reports, they report the duration of the crowd pump one or two day-ish
- [ ] The definition of mastermind: the mastermind are more likely to be the source of information and they are effective in their ego network.
- [ ] sentiment
- [x] Undirect to direct, showing directionality
- [x] Training speed:
time: every unit batch, cdf distribution of time
- [ ] Inference speed
time: every time slot of inference
- [x] talk about the number of nodes in one graph and how different embedding affect the speed.
- [ ] False positive fpr
2. Paper structure
- Introduction
- Related work
1. pump dump
2. financial anamoly detection
3. financial gnn
4. criminal
- Background & preliminary
- Threat model - (define the attack, what damage they done. roburst, if someone find their way to avoid)
- [ ] attackers publish all the message at the same time.
- [ ] mostly of the time, we can detect, when they change the message. Less influence
- [ ] adversary machine learning
- Methodology (pipeline, including data collection module -> data preprocess ->modules for detection, how)
- Evaluation setup (data description to show confidence, check how many channels other paper collected, what, when, where)
- Evaluation results
- ......
- Security analysis
- discussion
- Conclusion
- Appendix: Case study/mastermind feature. How masterminds look like (efficiency and effictive size)
3. pipeline needs to be addressed.
4. Threshold, what we are using and put into the paper
5. case study on how mastermind is identified
# WWW
Official Review of Submission1843 by Reviewer vDcW
Official ReviewReviewer vDcW23 Nov 2023 at 15:27 (modified: 02 Dec 2023 at 06:28)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer vDcW, AuthorsRevisions
Review:
Review summary
The paper applies techniques advanced by previous works to a dataset obtained from a startup, in a bid to detect the “masterminds” behind pump and dump schemes. The paper’s focus is specifically on so-called “crowd pump” schemes.
Pros
Attempts to identify bad people!
Tackles cryptocurrency pump-and-dump schemes, a very relevant and interesting problem
The data that is used comes from public online sources, and thus is relevant to WWW
Cons
I like this work, and I find it quite useful. But, I find that it should be perhaps more upfront about relying on highly processed data, existing techniques that are relatively widespread, and about what previous works did and didn’t do.
Considering the work’s positioning relative to previous works, I find it hard to recommend that the paper should be accepted.
In-depth review
- [ ] Minor typo on line 15 (missing a comma after “Telegram”): “specifically platforms like Tele-gram by exploiting data from both the cryptocurrency market and OSN”
- [ ] Very minor comment, but the market cap data in line 37 is relatively outdated (Nov ‘21 is ancient history when it comes to cryptocurrencies). Mentioning this because it’s really easy to get a more up to date number. Also, it is later written “This surge in popularity…”, but a surge has to be relative to some baseline, which is not established in the text.
- [ ] Line 39, typo. “market manipulative techniques”
- [ ] Citations [9] and [16] cover crowd pumps, and [9] in particular mentions that it covers: “3683 so-called target-based pump signals“. Although both are mentioned multiple times in the current work, they are cited for the first time only in line 164, quite a bit after line 77, which says “there is a lack of scholarly focus on the ’crowd pump’“. So, the picture that is painted is very misleading. Readers would benefit greatly if the paper would cite [9,16] immediately after the sentence “there is a lack of scholarly focus on the ’crowd pump’”, and furthermore include in the same position an overview of these works.
- [ ] Citation [1] is for a presentation. Although the presentation is by the CFTC, it is still a presentation. It is also relatively outdated (2017), and it is well-known that the regulatory approach to cryptocurrencies is changing rapidly, including the CFTC’s approach. It would be better to cite other sources, such as more up-to-date documents by the CFTC (e.g., dockets, rather than presentations), or academic papers on the topic (for example: Moffett, Taylor Anne. "CFTC & SEC: The Wild West of Cryptocurrency Regulation." U. Rich. L. Rev. 57 (2022): 713.)
- [ ] Line 79 refers to Figure 1, which includes in its caption the word “signals”. This word is again mentioned in lines 81, 94, 95, but it is not explained in the context of the paper.
- [ ] Contributions (1), (2), given in line 102, mention the novelty of constructing a network graph and identifying suspects. Specifically, line 105 says: “considering both the structural and attributional characteristic…”. This wording, and the reliance on GCNs, sounds very similar to the following paper, which is not cited: J. Jiang et al., "Anomaly Detection with Graph Convolutional Networks for Insider Threat and Fraud Detection," MILCOM 2019, 2019 IEEE Military Communications Conference (MILCOM), Norfolk, VA, USA, 2019, pp. 109-114, doi: 10.1109/MILCOM47813.2019.9020760. A very brief glance at the papers that cite Jiang et al shows that the methods are relatively well-known and popular. Indeed, the paper cites citation [6] in line 554: “Our choice of algorithm for mastermind detection is the Deep Anomaly Detection on Attributed Networks (DOMINANT) [ 6 ].” So, it seems that the techniques are common. Furthermore, citation [3] is mentioned in this paper as a work that identifies suspects using a different approach.
- [ ] Contribution (3), given in line 108, says: “Employing a refined processing pipeline”. This pipeline is described in Section 4 (see line 334: “The data analytics pipeline for identifying…”). It begins with “4.1.1 Signal detection. In the first step, Cloudburst collects mes-sage data from various online social networks.” In Section 3, it is explained that Cloudburst is a startup that provided all the data. According to Section 3, the data is quite extensive, pre-processed, and cross-references different sources. The bullet indeed specifies that the paper employs a refined pipeline, but perhaps the wording can be changed to more accurately reflect that the paper relies on a data source that has performed a lot of the heavy lifting. This may also extend to Figure 4, which includes “Signal Detection” as part of the pipeline. Do note that other cited papers have collected and processed such data themselves, and have published their code, lending more credibility to their results.
- [ ] The background section briefly goes over the difference between the “time pump” and the “crowd pump”. It seems to me that the difference is mainly due to time pumps being partially announced well in advance (although the identity of the tokens is announced in the last minute, exactly as with the crowd pump), and the duration (“time pumps” are usually brief, “crowd pumps” are usually longer). Citation [9] emphasizes that setting a target is a crucial difference. This is very clear: the lack of timing coordination requires some other coordination mechanism, in this case: setting a target. Such an explanation is very intuitive, and could be useful in this paper.
- [ ] Line 179 says: “However, the intricate dynamics of collaboration among OSN channels remains largely unexplored, as evidenced by the limited scholarly contributions in this specific realm [17, 21, 23].”, but, why are the given citations limited, rather than simply having a different focus? For example, [17], published in 2021, uses data & techniques that are very similar to the current paper (Telegram messages, neural networks, etc’), and even explicitly mentions multiple OSN channels collaborating.
- [ ] Line 186 says: “While Hamrick [ 9 ] makes a cursory mention of group coordination, the study lacks an exhaustive analytical dis-course or empirical corroboration of these assertions”, but [9] has a whole section on groups, and mentions that it covers: “3683 so-called target-based pump signals“. Furthermore, Section 5.2.1 (“Clustering and case study”) makes it seem as if the current work, although encompassing more cases, still focuses on case-studies * similarly to [9], albeit more rigorously.
- [ ] Line 197 has a broken citation.
- [ ] Line 202, typo: “information on the which token”
- [ ] Line 329, typo: “do not exhibit low market capitalization Figure 3.”
- [ ] Figure 4 is jagged.
- [ ] The paragraph starting with line 374 is very similar to Section 3 (and 3.2) of citation [9].
- [ ] Regarding the related work section, I would like to mention another related paper, which is relatively different with its approach, but still relevant for the task at hand, and thus should be of interest to the authors: Krishnan, Sundar, et al. "A Novel Text Mining Approach to Securities and Financial Fraud Detection of Case Suspects." International Journal of Artificial Intelligence and Expert Systems 10.3 (2022).
Questions:
Contributions (1) and (2) mention that the techniques used in this paper are novel and pioneering. Yet, a quick search found papers published a few years ago that employ similar techniques. Furthermore, this work cites [6] as the first work to propose the algorithm involved in mastermind detection. Given this, how are contributions (1) and (2) novel?
Ethics Review Flag: No
Ethics Review Description: -
Scope: 3: The work is somewhat relevant to the Web and to the track, and is of narrow interest to a sub-community
Novelty: 2
Technical Quality: 2
Reviewer Confidence: 4: The reviewer is certain that the evaluation is correct and very familiar with the relevant literature
We sincerely appreciate the reviewer for investing time and effort in evaluating our paper, and for offering valuable insights. In light of your constructive feedback, we plan to perform a significant revision. Regrettably, KDD is presently unable to accept our updated submission. However, we believe that your input has greatly improved the overall quality of our work. We trust that our responses will be comprehensive enough for you to contemplate assigning a higher rating to our submission.
------REVIEW 2-------
fficial Review of Submission1843 by Reviewer KuzS
Official ReviewReviewer KuzS22 Nov 2023 at 14:28 (modified: 02 Dec 2023 at 06:28)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer KuzS, AuthorsRevisions
Review:
In order to address the issue of massive pump-and-dump in the cryptocurrency market, the paper introduces a method to identify the major entities spearheading pump operations on online social networks, particularly platforms such as telegram. The proceedings became a variety of techniques to scrutinize potential pump deals and calculate impact scores from online social network channels. The paper conducts topological analysis of online social networks to explore the interconnections between different channels, which can accurately identify the main players involved in the pump-and-dump schemes of the crypto market.
Questions:
The following are areas for further improvement in the paper:
- [ ] 1. The paper does not explain how the statistical data in Figure 1 is obtained, and whether it is credible? Please elaborate
2. 2. Part 4.1.4 of the paper mentions: "By integrating this data with expert domain knowledge, we have identified three primary masterminds for each commodity." However, this step is not specified after the paper, please explain it in detail 3. The paper does not explain the horizontal and vertical coordinates in Table 1, so it is suggested to add corresponding explanations 4. The experimental results of the paper do not provide specific experimental data such as the accuracy of the model to support the experimental results, resulting in the lack of certain credibility of the experimental results of the paper 5. In the background section, please explain the concept of "time pump" in more detail, especially in section 2.2. 6. In the methodology section, please clearly explain Figure 4 to help readers better understand the research ideas. 7. Please refer to the proof of DANI algorithm in real network and synthetic network in Section 4.2 to support your research. 8. Please explain the relationship between "node attributes" and "market influence" in 4.3.
Ethics Review Flag: No
Ethics Review Description: NA
Scope: 4: The work is relevant to the Web and to the track, and is of broad interest to the community
Novelty: 5
Technical Quality: 4
Reviewer Confidence: 2: The reviewer is willing to defend the evaluation, but it is likely that the reviewer did not understand parts of the paper
Q1. The paper does not explain how the statistical data in Figure 1 is obtained, and whether it is credible? Please elaborate
Answer1: Figure 1 is obtained by comparing the total number of crowd pump signals and time pump signals.
Q2. Part 4.1.4 of the paper mentions: “By integrating this data with expert domain knowledge, we have identified three primary masterminds for each commodity.” However, this step is not specified after the paper, please explain it in detail
Answer2: we do this by paying attention to the below aspects. a. whether the channel is the initiator of crowd pump message b. whether the channel is consistent in meeting all the target they set c. whether the market is dramatically changed after the channel sending out pump message. d. how many pump messages have been sent by the channel, e. if the channel send report after the pump message f. wether the channel has a lot of refer links to other channels j. wether the channels cover a large range of coins
3. The paper does not explain the horizontal and vertical coordinates in Table 1, so it is suggested to add corresponding explanations
Answer3: we have added more explanations to table 1
4. The experimental results of the paper do not provide specific experimental data such as the accuracy of the model to support the experimental results, resulting in the lack of certain credibility of the experimental results of the paper
Answer4: we have added comparison in terms of AUC of models
5. In the background section, please explain the concept of “time pump” in more detail, especially in section 2.2.
Answer5: we have explained more about time pump, and emphasize the difference between time pump and crowd pump.
6. In the methodology section, please clearly explain Figure 4 to help readers better understand the research ideas.
Answer6: we have revised our pipeline figures and added more explanation
7. Please refer to the proof of DANI algorithm in real network and synthetic network in Section 4.2 to support your research.
Answer7: we have added figures to explain DANI and cited proof of DANI in the setting of real network and synthetic netwrok.
8. Please explain the relationship between “node attributes” and “market influence” in 4.3.
Answer8: the node attributes and market influences are similar concepts in different setting. For example, we computed the returns for each signals, which are later used as node attributes to represent the market influence of the signal.
------REVIEW 3-------
fficial Review of Submission1843 by Reviewer jq8Q
Official ReviewReviewer jq8Q21 Nov 2023 at 14:58 (modified: 02 Dec 2023 at 06:28)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer jq8Q, AuthorsRevisions
Review:
This paper aims to identify entities behind pump and dump schemes. The paper analyses a relevant topic in cryptocurrency ecosystem, however, the paper lacks clarity about the data collection and methodology used. Some of the figures, for example Figure 6 and 7, are unclear and the results could also benefit from further explanation. The authors state that their methods and analyses accurately identify the main players involved in these types of schemes but do not provide an accuracy measure or a baseline for comparison.
Questions:
S. 3.1. How did you ensure that the dataset that Cloudburst assembles was complete? Since they assemble data from Telegram, did you do an independent check of one of the channels to ensure completeness?
S. 3.3. I find figure 3 confusing. Can you explain that please?
S. 4.1.2. Why do you focus on a 3-day span post a crowd pump announcement? Why not 1-day or a week? What is the reasoning behind using Bayesian hierarchical logistic regression?
S. 4.1.4. How many channels are part of the subset selected?
Ethics Review Flag: No
Ethics Review Description: No issues found
Scope: 3: The work is somewhat relevant to the Web and to the track, and is of narrow interest to a sub-community
Novelty: 3
Technical Quality: 3
Reviewer Confidence: 3: The reviewer is confident but not certain that the evaluation is correct
This paper aims to identify entities behind pump and dump schemes. The paper analyses a relevant topic in cryptocurrency ecosystem, however, the paper lacks clarity about the data collection and methodology used. Some of the figures, for example Figure 6 and 7, are unclear and the results could also benefit from further explanation. The authors state that their methods and analyses accurately identify the main players involved in these types of schemes but do not provide an accuracy measure or a baseline for comparison.
We have revised the figures that are not illustrative and rewrite the data collection and methodology by adding more details and figures. Also, we have compared our method with other methods in terms of prediction accuracy.
S. 3.1. How did you ensure that the dataset that Cloudburst assembles was complete? Since they assemble data from Telegram, did you do an independent check of one of the channels to ensure completeness?
Answer 1: the dataset is complete because we have compared the dataset with other public dataset, Cloudburst has the highest numbers of messages and signals. Although we collabrate with Cloudburst, but we work together to build the data collection and precossing pipeline, which means we are actively checking the channels to ensure completeness.
S. 3.3. I find figure 3 confusing. Can you explain that please?
Figure 3 is a plot of coin's market cap distribution which emphasize that the crowd pump targets coins with higher market cap compared to all the coins in the market.
S. 4.1.2. Why do you focus on a 3-day span post a crowd pump announcement? Why not 1-day or a week? What is the reasoning behind using Bayesian hierarchical logistic regression?
Figure 3: we use 3 days because most pump messages from these channels report their duration as hours or 1 days with a few hours. So, if we use 3 days, we can cover most of the duration. We dont use longer days because we want to ensure the market movement is attributed to the pump and dump. We use Bayesian hierarchical logistic regression because we want to ensure our prelinimary analysis on the channels impact is constant with other mixed effect model.
S. 4.1.4. How many channels are part of the subset selected?
Answer: we have mainly focused on the English speaking channels because we found that the pump and dump messages are targeting north american and european traders. This accounts for 84 channels. But for each commodity, a different subset of channels are selected.
------REVIEW 4-------
Official Review of Submission1843 by Reviewer j3rB
Official ReviewReviewer j3rB16 Nov 2023 at 14:50 (modified: 02 Dec 2023 at 06:28)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer j3rB, AuthorsRevisions
Review:
This paper attempts to identify "masterminds" behind cryptocurrency pump-and-dump schemes (this is a form of fraudulent market manipulation in which someone persuades lots of people to buy an asset in order to inflate its price temporarily, allowing them to sell their own holdings for profit). They do this by statistical analysis of messages posted to Telegram channels frequented by cryptocurrency day traders; some of these channels are specifically for drumming up support for buy runs, others seem to be more general-purpose.
The methodology is solid, although poorly explained and not particularly novel. The authors seem to have limited their literature review to cryptocurrency pump-and-dump schemes, and so missed all the more general research on information propagation in social networks and other forms of cryptocurrency market manipulation. I would like to draw their attention specifically to the work of Jonathan Albright (start with "Influencers, Amplifiers, and Icons: A Systematic Approach to Understanding the Roles of Islamophobic Actors on Twitter") and Kyle Soska (start with "User Participation in Cryptocurrency Derivative Markets").
The data set is limited to a single feed provided by a commercial company, and this company seems only to look at Telegram as a source of information. As there are many other places online where cryptocurrency traders communicate with each other, the data set is grossly incomplete at best, and the authors seem to be oblivious to this.
The conclusions are weak: they identified only a handful of Telegram channels that originate some pump-and-dump schemes. There's no discussion of how many schemes there actually are and what proportion of these their "mastermind" channels are actually responsible for. Nor do they identify specific people as the originators of the schemes; it's possible that all they have found is channels devoted to pumping and dumping. The paper's overall narrative is also weak, leaving the reader with little or no sense of how important any of this is in the grand scheme of cryptocurrency fraud.
I think this paper needs a lot more work before publication, and I also think The Web Conference is not an appropriate venue. It'd be more appropriate for Financial Crypto, Advances in Financial Technologies, or a security-focused conference. What the authors have now might be enough for a workshop paper, but I'm not sure what workshops exist for this subject.
Questions:
Concrete suggestions for improvement: Most importantly, you need an overall narrative and more interesting findings. Your introduction led me to believe that you were going to attribute pump-and-dump events to specific people and not just to Telegram channels. As I said before, it's possible that you have only identified a couple of channels devoted to crypto market manipulation. That the schemes originate there and then spread to more general crypto trading groups is a useful result, but only confirms what we already knew from general research on information flow in social networks (again, read Jonathan Albright's papers). Talk of "comprehensive", "broad spectrum" data sets led me to expect your analysis to cover a much larger set of channels, with statistics on how many pump-and-dump events originate from each (figure 5 seems like a step in this direction but you could have taken this analysis much farther).
Second most importantly, it's fine to limit the study to a single data source, but you need to assess how much data you have missed by only using that source. Ideally, you would analyze the transaction history of each cryptocurrency to get a solid estimate of how many pump-and-dump events actually occurred (I realize that this might be a paper in itself, but I bet someone has already written that paper) and then compare that to the number of pump-and-dump events you can attribute to a source in a Telegram channel. Failing that, at least make some kind of argument that Telegram channels are an important source of pump-and-dump events, and some kind of argument (not relying on what Cloudburst told you) that Cloudburst's data set is actually a significant sample of crypto trading discussion on Telegram.
Third most importantly, much of sections 3 and 4 is what we call "purple prose" in English literary criticism -- using long, fancy words when short, common words will do. This makes it hard to understand what you did. Academic writing is supposed to be precise but it is also supposed to be clear. You should revise the entire paper with the goal of using short, common words as much as possible. In case you don't already know how to do that, I recommend the book "The Art of Readable Writing" by Rudolf Flesch.
On a related note, make sure every term you use is defined the first time you use it. In particular, "crowd pump" and "time pump" should be defined in the introduction, not in the background, since they are first used in the introduction. Also make sure your definitions are clear to readers who are familiar with cryptocurrency but not the history of financial fraud, or vice versa. For example, your explanation of pump-and-dump leaves out any explanation of the "dump" part, i.e. that the "mastermind" will sell into the rise, profiting at the expense of the crowd, who will not all be able to sell at the elevated price. This means a reader who hasn't heard of pump-and-dump before will be left wondering why it is considered a form of fraud, rather than "just" a more efficient social-media version of what financial advice columnists are doing when they recommend people buy stock in some company.
Ethics Review Flag: No
Ethics Review Description: n/a
Scope: 2: The connection to the Web is incidental, e.g., use of Web data or API
Novelty: 3
Technical Quality: 2
Reviewer Confidence: 2: The reviewer is willing to defend the evaluation, but it is likely that the reviewer did not understand parts of the paper
First, thank you so much for pointing out that the data desribption is not comprehensive. In our study, all the messages are related to pump and dump. To be more specific, every message includes the target price for selling, which is the dump part. Also, we sincerely appreciate the literature shared by you. We have taken a close look into Jonathan Albright's work, it gives us more intuitive into the broad research area of hate speech and information passing. We believe our work can also be applied to find other main criminals in the setting of information cascade. I want to mention that although the data come from a startup, yet we have closely worked with them to collect and process the data. They are many channels out there, we will try to collect as many channels as possible. Thanks to the suggestion to see how many crowd pumps are actually initiated by the masterminds. We have found that the masterminds account for a small proportion to the whole sample of channels, but they are nearly half the information cascade initiator. Although we only find a handful of mastermind channels, but it would easy to attribute the schemes to specific people by selecting the admins and owners of the channel.
The paper’s overall narrative is also weak, leaving the reader with little or no sense of how important any of this is in the grand scheme of cryptocurrency fraud.
Yes, we should work on the narratives a lot more. We plan to emphasize that by catching the masterminds, the amount of pump signals will decrease a lot, and we plan to emphasize that the by catching the masterminds is the most effective way to counter this problem.
Most importantly, you need an overall narrative and more interesting findings. Your introduction led me to believe that you were going to attribute pump-and-dump events to specific people and not just to Telegram channels. As I said before, it’s possible that you have only identified a couple of channels devoted to crypto market manipulation. That the schemes originate there and then spread to more general crypto trading groups is a useful result, but only confirms what we already knew from general research on information flow in social networks (again, read Jonathan Albright’s papers). Talk of “comprehensive”, “broad spectrum” data sets led me to expect your analysis to cover a much larger set of channels, with statistics on how many pump-and-dump events originate from each (figure 5 seems like a step in this direction but you could have taken this analysis much farther).
Answer 1: thank you so much for the suggestion, and we have made progress to find that 1. the masterminds account for a small proportion to the whole sample of channels, but they are nearly half the information cascade initiator. 2. The masterminds are effective and efficient in their first degree out ego network, which means that they are more likely to dominate the pump message passing, because the crowd pump message is more likely to come from masterminds directly without going through the peers network.
Second most importantly, it’s fine to limit the study to a single data source, but you need to assess how much data you have missed by only using that source. Ideally, you would analyze the transaction history of each cryptocurrency to get a solid estimate of how many pump-and-dump events actually occurred (I realize that this might be a paper in itself, but I bet someone has already written that paper) and then compare that to the number of pump-and-dump events you can attribute to a source in a Telegram channel. Failing that, at least make some kind of argument that Telegram channels are an important source of pump-and-dump events, and some kind of argument (not relying on what Cloudburst told you) that Cloudburst’s data set is actually a significant sample of crypto trading discussion on Telegram.
Answer 2: Thank you so much for the suggestion. We believe telegram channels are the main sources are pump and dunmp information because previous study has all focused on it. This could be attribute to the encrypto nature of telegram which make it harder to trace the masterminds. Through the close work with Cloudburst, we have also tried to collect as many channels as possible.
Third most importantly, much of sections 3 and 4 is what we call “purple prose” in English literary criticism – using long, fancy words when short, common words will do. This makes it hard to understand what you did. Academic writing is supposed to be precise but it is also supposed to be clear. You should revise the entire paper with the goal of using short, common words as much as possible. In case you don’t already know how to do that, I recommend the book “The Art of Readable Writing” by Rudolf Flesch.
Answer 3: Thank you so much for pointing out the problem, this is a really valuable feedback and we have rewrite the
papers in simpler sentences.
On a related note, make sure every term you use is defined the first time you use it. In particular, “crowd pump” and “time pump” should be defined in the introduction, not in the background, since they are first used in the introduction. Also make sure your definitions are clear to readers who are familiar with cryptocurrency but not the history of financial fraud, or vice versa. For example, your explanation of pump-and-dump leaves out any explanation of the “dump” part, i.e. that the “mastermind” will sell into the rise, profiting at the expense of the crowd, who will not all be able to sell at the elevated price. This means a reader who hasn’t heard of pump-and-dump before will be left wondering why it is considered a form of fraud, rather than “just” a more efficient social-media version of what financial advice columnists are doing when they recommend people buy stock in some company.
Answer4: Thank you so much for the concrete suggestions. We have revised the part which is not fully illustrated. Your emphasis on the dump will greatly improve the overall quality and make the study more contributive.
------REVIEW 5-------
Official Review of Submission1843 by Reviewer RKmg
Official ReviewReviewer RKmg29 Oct 2023 at 18:38 (modified: 02 Dec 2023 at 06:28)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer RKmg, AuthorsRevisions
Review:
This paper aims to trace the masterminds that organize crowd pump activities on online social networks.
Pros: The paper discusses a very interesting topic, i.e., the pump-and-dump activities from a crowd activity perspective. The idea of leveraging GNN for extracting historical collaboration information is reasonable.
However, some issues exist and should be addressed during rebuttal, please refer to the questions.
Questions:
Question 1: The real-world implication and significance of identifying mastermind behind crow pump activities are not clear, because detecting of masterminds happens after pump activities. The authors should claim the significance of the paper to convince its readers.
Question 2: The authors should provide more interesting findings related to crowed pump activities. Current analyses are far from comprehensive. For example, the collaboration of channels in Figure 6 can be plotted in a chronological order to investigate how did these channels collaborate and how did the collaboration strategies evolve across the time. Why these channels adopt these collaboration strategies? (For example, the strategy might be related to the market capitalization of pumped coins.) Without more comprehensive analyses and insights provided, the quality and influence of this study will be limited.
Question 3: The content should be self-contained. For example, in Section 4.2, the paper claims "our modified DANI algorithm", which is confusing for readers who do not have background for DANI algorithm. To make it clear, the authors should provide more formal illustration (figures) and formulations for DANI and the adaptation proposed in this paper.
One suggestion: Improve the writing and stop using rare words like "To counter this conundrum, we offer a strategy to single out key entities orchestrating crowd pump maneuvers on Online Social Networks (OSNs)"=> "To address this problem, we propose an approach to identify the masterminds hidden behind multiple participant groups that primarily organize crowd pump activities".
Ps: If the above questions can be addressed during the rebuttal, I will consider raising my scores.
Ethics Review Flag: No
Ethics Review Description: I select no.
Scope: 4: The work is relevant to the Web and to the track, and is of broad interest to the community
Novelty: 3
Technical Quality: 3
Reviewer Confidence: 4: The reviewer is certain that the evaluation is correct and very familiar with the relevant literature
Answer 1, the real world impact is that if we can terminate the mastermind, a huge chunk of the information cascades will be destroyed and decrease the number pump message, which will greatly increase the market stability. Through the collaboration with cloudburst, our research has been used to help regulators to better understand the market and potentially terminate the mastermind. Also, the method can be applied to traditional financial market, and other source of information can be integrated such as twitter, reddit, and discord.
Question 2: The authors should provide more interesting findings related to crowed pump activities. Current analyses are far from comprehensive. For example, the collaboration of channels in Figure 6 can be plotted in a chronological order to investigate how did these channels collaborate and how did the collaboration strategies evolve across the time. Why these channels adopt these collaboration strategies? (For example, the strategy might be related to the market capitalization of pumped coins.) Without more comprehensive analyses and insights provided, the quality and influence of this study will be limited.
Answer 2: thank you so much for the suggestion, and we have made progress to find that 1. the masterminds account for a small proportion to the whole sample of channels, but they are nearly half the information cascade initiator. 2. The masterminds are effective and efficient in their first degree out ego network, which means that they are more likely to dominate the pump message passing, because the crowd pump message is more likely to come from masterminds directly without going through the peers network, 3. We investigate a crowd pump in detail, and run test on the size of the network and market cap to see correlation.
Question 3: The content should be self-contained. For example, in Section 4.2, the paper claims “our modified DANI algorithm”, which is confusing for readers who do not have background for DANI algorithm. To make it clear, the authors should provide more formal illustration (figures) and formulations for DANI and the adaptation proposed in this paper.
Answer 3: we have adapted use pseudocode to illustrate the modified DANI algorithm. Moreover, we have used graph to better illustrate our methodology.
Answer 4: We have revised the writing and use direct and simple sentences
# Tokenomics
----------------------- REVIEW 1 ---------------------
SUBMISSION: 106
TITLE: Tracing the Masterminds Behind Cryptocurrency Pump-and-Dump Schemes
AUTHORS: Honglin Fu, Yebo Feng and Jiahua Xu
----------- Overall evaluation -----------
SCORE: 3 (strong accept)
----- TEXT:
This very interesting study examines cryptocurrency market manipulation through the identification of key entities involved in pump-and-dump schemes. What is special about this paper is that the authors are able to shed light on the mechanisms and players involved in these illegal schemes by examining social network data.
Typically pump and dump schemes work on channels on social network platforms, typically Telegram and Discord. Previous authors have examined pump and dump schemes, but they were not able to identify key players in this illegal activity, nor were they able to explore the interconnections between different channels.
Hence, this paper goes well beyond what has been done. They collaborated with Cloudburst, a startup that specializes in providing security solutions for the cryptocurrency market and analyzed the data from Cloudburst that has software that attempts to detect and follow market manipulation events, like pump-and-dump schemes. The API provides information on the which cryptocurrency is being pumped, the exchange being targeted, the corresponding text messages, and the number of participants in a channel. This is an impressive data collection effort!
With the data and their model, they are able to do a topological analysis to investigate the interconnectedness among channels. This enables them to determine the entities in these market manipulation schemes. They find that there is high level of concentration, that is key entities are involved in many schemes.
If regulators were able to identify these identities, they might be able to greatly reduce the number of pump and dumps.
This is a very impressive paper!
----------------------- REVIEW 2 ---------------------
SUBMISSION: 106
TITLE: Tracing the Masterminds Behind Cryptocurrency Pump-and-Dump Schemes
AUTHORS: Honglin Fu, Yebo Feng and Jiahua Xu
----------- Overall evaluation -----------
SCORE: 0 (borderline paper)
----- TEXT:
This paper looks at the connection between Telegram channel "crowd pump" activity and resultant market movements.
Strengths:
- very topical
- some interesting results, although these stop short of what is suggested in the title (no specific "masterminds" are traced!) and do not appear to have been analysed in much depth
Weaknesses:
- [ ] has a preliminary feel about it
- [ ] does not appear to be reproducible
- [ ] appears not to feature so much novelty from a methodological perspective
- [ ] paper is not so well written with many typos (Singal in Fig 5, USTD instead of USDT etc.)
# VLDB
Reviewer #2
Questions
1. Overall Rating
Reject
2. Relevant for PVLDB
No
3. Are there specific revisions that could raise your overall rating?
No
5. Paper Summary. In one solid paragraph, describe what is being proposed and in what context, and briefly justify your overall recommendation.
Paper 590, with the title "Tracing the Masterminds Behind Cryptocurrency Pump-and-Dump Schemes" analyzes the pump-and-dump schemes on cryptocurrency markets using data from social media, in particular, from Telegram. The authors classify pump-and-dump into crowd and time. The analytics pipeline consists of pump signal detection from social media and market trading data (exchanges) followed by clustering. The data and signals are taken from Cloudburst which does the heavy lifting in this analytics pipeline. While the topic is interesting, I found the analysis part disappointing, with no clear observation or take away message. Another issue is that the paper seems to be a data science paper (but not really a scalable one), but it is submitted to the regular research track. These, together with writing and presentation issues, make this paper not ready for publication at this time.
6. Three (or more) strong points about the paper. Please be precise and explicit; clearly explain the value and nature of the contribution.
I can only find one strong point, that is, the topic is interesting and a pipeline that can detect pump-and-dump in advance would be very useful. At the same time, the topic is quite niche for the database community, hence, this paper is not 100% suitable for VLDB.
7. Three (or more) weak points about the paper. Please clearly indicate whether the paper has any mistakes, missing related work, or results that cannot be considered a contribution; write it so that the authors can understand what is seen as negative.
W1. This is a data science paper but it is submitted to the regular research track. I don't find this paper to be relevant to any of the research track sections (e.g., algorithms, systems, information system architectures).
W2. It is unclear what is the significant contribution of this paper. It seems to me that existing systems, datasets and techniques are combined to get the results. The analysis results could have been the key contribution, but there is no novel observation or take away message.
W3. The writing and presentation of the paper need to be significantly improved: there are missing references, the figures are not explained, there are some typos, etc. Please see my detailed comments below.
8. Novelty. Please give a high novelty rating to papers on new topics, opening new fields, or proposing truly new ideas; give medium ratings to "delta" papers and those on well-known topics but still with some valuable contribution. (Note: For SDS and EA&B papers, novelty does not need to be in the form of new algorithms or models. Instead, novelty for SDS can be new understanding of issues related to data science technologies in the real world. Novelty for EA&B can be new insights into the strengths and weaknesses of existing methods or new ways to evaluate existing methods.)
Novelty unclear
9. Significance
Improvement over existing work
10. Technical Depth and Quality of Content
Insignificant contribution
11. Experiments. (Reminder: EA&B papers should have a higher bar for experiments.)
Obscure, not really sure what is going on and what the experiments show
12. Presentation
Reasonable: improvements needed
13. Detailed Evaluation (Contribution, Pros/Cons, Errors). Please number each point and provide as constructive feedback as possible.
D1. Missing references - in general, the paper needs more references. Some examples:
- "While the operational mechanisms of the time pump have been
well-explored through comprehensive academic research" - references needed.
- "Cloudburst" - reference needed.
D2. It is recommended to float the figures and table on top or bottom of a page.
D3. What do the circles and their colors represent in Figures 6 and 8?
D4. Writing issues (some examples):
- Online Social Networks (OSN) - it is sufficient to define it once then you can use the acronym OSN
- "pump and dump" vs. "pump-and-dump" - please use only one form for consistency
- "procedure.In" - space missing
- "pump-and-dump scheme.The" - space missing
14. Revision. If revision is required, list specific required revisions you seek from the authors. Please number each point.
N/A
Reviewer #3
Questions
1. Overall Rating
Weak Reject
2. Relevant for PVLDB
Yes
3. Are there specific revisions that could raise your overall rating?
No
5. Paper Summary. In one solid paragraph, describe what is being proposed and in what context, and briefly justify your overall recommendation.
The paper studies pump and dump schemes in cryptocurrency markets. The authors devise a data analysis pipeline and they analyze the results that they obtain. They use data from the cryptocurrency market and online social networks. The paper also describes the different concepts required to understand the topic.
6. Three (or more) strong points about the paper. Please be precise and explicit; clearly explain the value and nature of the contribution.
S1. Interesting problem with real-world application.
S2. Description of concepts that are required for the paper but are not necessarily known by PVLDB audience.
S3. Well-written paper.
7. Three (or more) weak points about the paper. Please clearly indicate whether the paper has any mistakes, missing related work, or results that cannot be considered a contribution; write it so that the authors can understand what is seen as negative.
W1. The challenges are not clear and convincing.
W2. The novelty of the paper is unclear.
W3. The scope of the paper is limited to the specific use case and dataset.
8. Novelty. Please give a high novelty rating to papers on new topics, opening new fields, or proposing truly new ideas; give medium ratings to "delta" papers and those on well-known topics but still with some valuable contribution. (Note: For SDS and EA&B papers, novelty does not need to be in the form of new algorithms or models. Instead, novelty for SDS can be new understanding of issues related to data science technologies in the real world. Novelty for EA&B can be new insights into the strengths and weaknesses of existing methods or new ways to evaluate existing methods.)
Novelty unclear
9. Significance
Improvement over existing work
10. Technical Depth and Quality of Content
Syntactically complete but with limited contribution
11. Experiments. (Reminder: EA&B papers should have a higher bar for experiments.)
Very nicely support the claims made in the paper
12. Presentation
Excellent: careful, logical, elegant, easy to understand
13. Detailed Evaluation (Contribution, Pros/Cons, Errors). Please number each point and provide as constructive feedback as possible.
D1. The fifth paragraph in the introduction where the challenges are listed is not clear. Neither the missing academic inquiry nor the role of social networks mentioned is clearly articulated.
D2. The last paragraph of the introduction mentions the contributions wrt existing solutions, however, they have not been discussed up to that point.
D3. The data analysis pipeline and the results are tightly coupled to the use case. This makes it hard to generalize on algorithms/systems/approaches/best practices in data management.
D4. The approach followed in the current form of the paper might fit better some other track with more application-specific focus.
Reviewer #4
Questions
1. Overall Rating
Reject
2. Relevant for PVLDB
No
3. Are there specific revisions that could raise your overall rating?
No
5. Paper Summary. In one solid paragraph, describe what is being proposed and in what context, and briefly justify your overall recommendation.
The paper analyzes the output of a proposed analytics pipeline to discover entities pursuing illegal manipulation of the cryptocurrency market.
Although I agree that the problem tackled is important, I can hardly see the relevance of this paper to VLDB, since its focus is not on data analysis/management, rather on analysis of the results. To further emphasize this, consider that only one paper included in the bibliography (reference [7]) has been published in a venue sharing similarities with VLDB.
For this, it is also hard for me to assess technical contribution and to identify strong/weak points of the manuscript.
6. Three (or more) strong points about the paper. Please be precise and explicit; clearly explain the value and nature of the contribution.
Problem is clearly interesting (although for an audience different than the one attending VLDB)
7. Three (or more) weak points about the paper. Please clearly indicate whether the paper has any mistakes, missing related work, or results that cannot be considered a contribution; write it so that the authors can understand what is seen as negative.
Typical VLDB attendees will hardly learn something from the paper.
8. Novelty. Please give a high novelty rating to papers on new topics, opening new fields, or proposing truly new ideas; give medium ratings to "delta" papers and those on well-known topics but still with some valuable contribution. (Note: For SDS and EA&B papers, novelty does not need to be in the form of new algorithms or models. Instead, novelty for SDS can be new understanding of issues related to data science technologies in the real world. Novelty for EA&B can be new insights into the strengths and weaknesses of existing methods or new ways to evaluate existing methods.)
Novelty unclear
9. Significance
No impact
10. Technical Depth and Quality of Content
Questionable work
11. Experiments. (Reminder: EA&B papers should have a higher bar for experiments.)
Obscure, not really sure what is going on and what the experiments show
12. Presentation
Reasonable: improvements needed
13. Detailed Evaluation (Contribution, Pros/Cons, Errors). Please number each point and provide as constructive feedback as possible.
Overall, I believe that the authors should submit the paper to a completely different conference/journal.
# USENIX
Review 363A
===========================================================================
Paper summary
-------------
The paper focuses on the problem of cryptocurrency market manipulation, particularly on detecting and characterizing wash trading and pump-and-dump schemes. The paper leverages data collected by Cloudburst, a company that focuses on cryptocurrency-related crime, obtained from multiple sources such as Telegram channels, cryptocurrency prices over time, and information about cryptocurrency trades made on the Binance exchange. The paper then develops a methodology for identifying wash trades and studies how pump-and-dump schemes are conducted.
Detailed comments for authors
-----------------------------
The paper focuses on an important and timely problem that has the potential to have a significant impact (e.g., monetary loss) on people involved in the cryptocurrency market. I liked that the paper leverages datasets from multiple diverse sources, including data from Telegram, trades on Binance, as well as information about cryptocurrency prices. The paper is also well-written, structured, and easy to follow. At the same time, I have some concerns with this paper, which are mainly related to the difference between this work and previous work, the data collection methodology and its representativeness, as well as the focus on one exchange (with specific trading pairs only). Below, I provide more details on my concerns with some suggestions for improvements.
First, I have some concerns about the novelty of this work. The problems of wash trading and cryptocurrency pump-and-dump schemes are widely studied by the research community and it is unclear what new elements or new findings emerge from this work. The paper claims that their work provides the most comprehensive study/view of these schemes and that they reason about the masterminds of these schemes via online social media data. For the comprehensiveness claim, the paper does not allow the readers to assess how comprehensive this study is, mainly because it lacks adequate discussion and description of the data collection methodology and how its conducted by the company. With regards to the masterminds behind this, the paper devotes only a tiny part of it and is not the main focus of the paper. How these schemes are orchestrated via OSNs is also studied in previous work (see work by Xu and Livshits). Overall, after reading this work, I am wondering what are the new empirical insights that we extract from this work that were not covered by previous work. I strongly encourage the authors to highlight the novel parts of their work and expand their discussion on the related work to clearly articulate what is the difference between their work and other previous efforts focusing on the same problems.
Second, the paper lacks a discussion of how the data is collected by the Cloudburst company, whether the data collection is done is a systematic manner, and how comprehensive is the collected dataset. This is a crucial part of this paper, since one of the main claims of this work is that it provides the most comprehensive view of this problem, however, without properly discussing how the data is collected, this claim is unsubstantiated. At the same time, I believe that the various datasets are also not properly described in the paper. I suggest to the authors to explain how the data is collected by the Cloudburst company and discuss its comprehensiveness and limitations. For example, is the OSN data based on a list of Telegram channels? How are the channels selected? How many channels are included in the dataset? For the trade data, the paper mentions that the data is between 1st of April and 1st of May, but fails to mention for what years.
Another concern is that the paper focuses on trades from a single exchange and for specific trading pairs (i.e., the ones involving the USDT coin), which provides only a limited view of the trading activity. Overall, it’s unclear how this limitation is affecting the comprehensiveness and how the results might change when we consider trading data from other exchanges or with other trading pairs. I suggest to the authors to explain what is the main underlying reason for this methodological decision and include some discussion about how this limitation might affect the presented results.
Also, the paper lacks an evaluation of the accuracy and performance of the methodology for identifying wash trades and crypto pump-and-dump schemes, and perhaps more importantly, how this performance compares with previous efforts in detecting these schemes. I suggest to the authors to include results on the efficacy of the presented approach in identifying these schemes and how it compares with previous efforts/classifiers that aim to detect these cryptocurrency schemes.
Ethics consideration
--------------------
1. No
Reasons to accept the paper
---------------------------
- Important and timely problem that have a real impact on people involved in cryptocurrency trades
- The paper leverages data from diverse sources to study the problem from different points of view
- The paper is well-written and easy to follow
Reasons to not accept the paper
-------------------------------
- Unclear what is the delta of this work with previous work that studies/characterizes pump-and-dump schemes
- The paper is based on a dataset collected by a company and the data collection methodology/representativeness of this dataset is unknown
- The paper focuses only on a specific exchange and only on cryptocurrencies that are traded with USDT
- The paper lacks an evaluation of the performance of the proposed approach and how it compares with previous efforts/classifiers to identify these schemes.
Recommended decision
--------------------
4. Reject
Questions for authors' response
-------------------------------
1. How is the dataset collected, and how comprehensive is it?
2. What is the performance of the proposed approach and how it compares with previous efforts?
3. What is the delta between this work and previous work on identifying these schemes and characterizing them (e.g., Xu and Livshits, Morgia et al., Hu et al, and Mirtaheri et al.)
Writing quality
---------------
2. Well-written
Confidence in recommended decision
----------------------------------
3. Highly confident (would try to convince others)
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Review 363B
===========================================================================
Paper summary
-------------
This paper proposes several detection algorithms for discovering bad behavior in cryptocurrency trading. It focuses on three different aspects: wash trading (selling/buying coins to oneself in order to fake volume), pump and dumps (artificially inflating the price of a coin to make a profit when selling it), and online community detection of fraudulent accounts.
Detailed comments for authors
-----------------------------
This is an interesting paper on a very timely topic. Regulators, such as the SEC in the US, are getting increasingly concerned about misbheavior in crypto exchanges (centralized or DeFi) and are desperately trying to better understand the ins and outs of market manipulation in crypto exchanges. This paper inscribes itself in a long line of work on the subject, and attempts to look at two major types of villainy: wash trading and pump and dumps.
Pump and dumps have been extensively studied in the literature -- if memory serves, they're at the core of JT Hamrick's PhD thesis. I understand from the paper that you are indirectly claiming that you are the first ot look at coordination on OSN of pumps and dumps. Maybe I misunderstood, because this seems a bit too strong a claim. Hamrick in particular looked at coordination signals on various channels, I believe. That's not so much of an issue, but I think you really need to show clear evidence of how your work differs from, and improves on, the related efforts. In that respect, the related work section should probably be moved up front, so that the reader can immediately grasp the novelty of the paper. I failed to do that, frankly.
Along the same lines, the paper is missing, in my opinion, some crisp takeaways. There is a figure toward the end of the paper about the potentialy amount of wash trading taking place, but that's a very rough back of the envelope calculation, taking an average fraction of suspicious trades, and multiplying them by the volume on Binance. I would have much prefered to see a discussion of how prevalent wash trading is on certain exchanges... but then again, this might have been done already. See:
1) https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.nber.org%2Fpapers%2Fw30783&data=05%7C01%7Chonglin.fu.22%40ucl.ac.uk%7Cbdd75557b56246bde5ce08db846db2b5%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C638249379173982781%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=amvNNsXe1bvx4sK82sDRMxYhZx9DKAlCudBIXuHl9Hw%3D&reserved=0
2) https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.sec.gov%2Fcomments%2Fsr-nysearca-2019-01%2Fsrnysearca201901-5164833-183434.pdf&data=05%7C01%7Chonglin.fu.22%40ucl.ac.uk%7Cbdd75557b56246bde5ce08db846db2b5%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C638249379173982781%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WMXGdYd9q3kzMFzhcadOxYxVcGVuyW0fEL74h3NE7o8%3D&reserved=0
among others.
I fear that, without properly situating the paper in this large context, it is very difficult to assess the novelty claims as well as the scientific advances it proposes. I thus remain negative at the moment.
Question:
How does the paper manage to successfully distinguish between wash trading and simple arbitrage?
Nits:
Don't use reference as nouns
"1st of April to 1st of May" - of which year
4.1.1 uses a bunch of terms defined in 4.1.2 and is impenetrable
Ethics consideration
--------------------
1. No
Required changes
----------------
* Much better comparative evaluation with related work (JT Hamrick et al., Tsuchiya, ...)
* Clear take-aways
Reasons to accept the paper
---------------------------
* Important topic, relevant to regulators
* The beginning of the paper is really engaging and well-written
* Interesting numbers about wash trading
Reasons to not accept the paper
-------------------------------
* Very unclear how it improves state-of-the-art detection of pump and dumps. Hamrick looked at coordination on social media channels, for instance, so the claims of novelty there ring a bit strange.
* The different detection algorithms seem somewhat disconnected.
* I'm not completely sure what the main take-aways of the paper are. What could exchanges do? Who can intervene?
Recommended decision
--------------------
4. Reject
Questions for authors' response
-------------------------------
- How does your paper advance science compared to related work (see above)
- How do you distinguish between wash trading and other schemes e.g., arbitrage (I'm not fully convinced by your parametric explanation)
- What are the main takeaways
Writing quality
---------------
3. Adequate
Confidence in recommended decision
----------------------------------
3. Highly confident (would try to convince others)
# KDD
Official Review of Paper763 by Reviewer V6U4
KDD 2023 Conference Applied Data Science Track Paper763 Reviewer V6U4
27 Mar 2023KDD 2023 Conference Applied Data Science Track Paper763 Official ReviewReaders: Program Chairs, Paper763 Area Chairs, Paper763 Reviewers Submitted, Paper763 AuthorsShow Revisions
Note: I have read and agree with the ADS track's policy on behalf of myself.
Paper Category: Scientific Discovery
Influence: Yes, the authors showed statistics/results of how users have benefited from the solution.
Novelty: 3
Technical Soundness: No (some main elements are problematic)
Impact: Fair solution to an important problem, but there are alternatives around
Presentation: Hard to read, due to poor presentation
Reproducibility: No - Not included. No reproducibility information included in the paper.
Summary:
The paper aims at analyzing the entities involved in pump-and-dump schemes in online social networks (OSN) by exploiting both cryptocurrency market data and OSN (notably Telegram) data. The authors rely on data gathered from Cloudburst, which, as described, is " a startup that specializes in providing security solutions for the cryptocurrency markets". Given the data, the authors employ different strategies (simple time-weighted averages, linear mixed regression models and Bayesian linear models) to score and (based on the scores) quantify the importance of different (Telegram) channels to the success of a pump-and-dump event.
Paper Strength:
Very relevant problem
Nice joint exploration of market + social network data
Paper Weakness:
Very hard to understand what exactly was done by the authors and what was already provided in the data by Cloudburst
Design decisions, notably with respect to the models adopted to analyze channel importance, are not clearly justified and seem too ad-hoc
Reproducibility is quite hard (if not impossible): data is not available and it is not clear how one could get such data and how to extract the considered signals from it.
Detailed Evaluation And Suggestions For Authors:
I started reading this paper with great interest. I found the introduction motivating and clearly presented. However, starting from Section 3 (Dataset), the text becomes quite blurry, and the descriptions of how the data was obtained, processed as well as the steps proposed in Section 4 are quite vague. As such, I would find it quite hard (if not impossible) to reproduce this work, even if I chose to gather data myself (in the absence of the dataset used).
For instance, it is said that the OSN data was gathered from Telegram channels. How many channels? How were these channels found? For how long were they monitored? Also, Section 3.2 discusses market event signals identified in the gathered messages. How were these signals identified? What kind of pre-processing, cleaning, data representation was used to process the messages and extract those signals?
Section 4.1, which is part of the "Methodology section" presents the signal detection. However, it is not clear whether this was done by the authors on raw data gathered by Cloudburst, or whether the data already makes these signals explicit. Listing 1 suggests that a lot is already provided in the original data. Thus it is unclear what exactly the authors had to do in this step. The description is confusing and vague, as to specific steps executed.
Section 4.2 then describes how the authors propose to quantify the importance of different channels to pump-and-dump events. THis section describes different modeling approaches to quantify such importance (varying from very simplistic approaches to a bit more sophisticated yet still pretty standard method). However, it is not clear why such approaches were selected and whether they are indeed appropriate to the task? Are there any assumptions that justify these choices? THey just seem too ad-hoc.
Section 4.3 discusses a network graph built from the data, but it is just too briefly described, What exactly are the entities/vertices? What are the edges?
I also think the paper has some very strong claims that need some calibration. One such example is "Our results demonstrate that our approach can accurately identify all the entities and masterminds behind pump-and-dump schemes." Even if very effective, it is hard to believe that it can identify ALL masterminds behind such schemes.
Overall Evaluation: 1: A below-the-bar paper (reject). I believe the paper is clearly below the standards for KDD.
Confidence Level: 2: I have passing familiarity with this area.
Official Review of Paper763 by Reviewer SzFo
KDD 2023 Conference Applied Data Science Track Paper763 Reviewer SzFo
27 Mar 2023KDD 2023 Conference Applied Data Science Track Paper763 Official ReviewReaders: Program Chairs, Paper763 Area Chairs, Paper763 Reviewers Submitted, Paper763 AuthorsShow Revisions
Note: I have read and agree with the ADS track's policy on behalf of myself.
Paper Category: Scientific Discovery
Influence: Yes, the authors have designed their solution for a particular audience (but no statistics/results of how the audience benefit yet).
Novelty: 2
Technical Soundness: Mostly, but with major flaws (e.g., missing baselines)
Impact: Fair solution to an important problem, but there are alternatives around
Presentation: Mostly clear, with minor readability issues
Reproducibility: Yes - Poor. Some description provided, but it is clearly insufficient information for reproducibility.
Summary:
The paper deals with the characterization of pump-and-dump schemes, i.e. manipulation of the coin market to inflate the coin price and sell at a profit, in the cryptocurrency ecosystem. The approach integrates data from the cryptocurrency market and from Telegram groups driving pump-and-dump schemes. Specifically, the paper focuses on time and crowd pump-and-dump schemes. The paper also defines a few ranking scores to evaluate the efficacy of the Telegram channels in engaging their members and successfully implementing the scheme. Finally, the paper evaluates the relationships among the channels.
Paper Strength:
The topic is up-to-date and refers to an issue that may afflict cryptocurrency adopters. The methodology may be applied by computer forensics to identify and close Telegram channels manipulating the market.
The paper integrates data from different domains into a single workflow for the identification, scoring and characterization of pump-and-dump schemes supported by Telegram channels.
Paper Weakness:
Although the emphasis on the characterization and identification of masterminds in the introduction and in the related works, the discussion on the nature of masterminds, their role and an in-depth analysis of the relationships among Telegram channels is only marginal in the structure of the paper
The last step of the pipeline, i.e. the construction of the network among the Telegram channel and its analysis is scarcely detailed
The assessment w.r.t. the state of art is not clear
Detailed Evaluation And Suggestions For Authors:
The main goal of the paper, i.e. unveiling the masterminds behind pump-and-dump schemes in the cryptocurrency market, is interesting and up-to-date, although not completely new as also stated in the related work section. In terms of methodology and pipeline of the analysis, the main strengths are the integration of different data sources for assessing impact scores for the Telegram channels and the representation of the channels by a network. The combination of these aspects has led to a correlation analysis supporting the presence of a few important channels which seem to play a role in pump-and-dump schemes. However, a detailed characterization and discussion on such "mastermind" channels are missing. For instance, for crowd pump-and-dump schemes authors highlighted a cooperative organization but did not investigate which are the cooperation patterns, the temporal action leading to cooperation, and the role of channels in cooperative patterns (are there leader-follower schemes? ). In general, a more in-depth analysis of the graph structure as well as of the static and temporal properties of channels is expected. A further concern is related to the definition of links and their weights. First, it is not clear if the frequency of simultaneous transmission of pump message is a weight or if it is used in a threshold mechanism (link with frequency less than delta is discarded); second, the effect of the time window for considering due events simultaneous (now one minute) should be discussed since it seems that crowd pump-and-dump schemes are characterized by a temporal scale larger then time pump-and-dump. Moreover, since links are weighted I would ask if these patterns hold when computing the centrality channels by node strength. Finally, in the related work section it is not clear how the paper is placed w.r.t. the other references that have adopted social media data: from the sentence "more comprehensive view of the pump-and- dump phenomenon by considering both market and social network data" emerges that the combination of market and social media data is the main novelty element, but in Table 4 there are different methods which use both market and social data.
Overall Evaluation: 2: An OK paper, but likely not good enough for KDD (weak reject). I vote for rejecting it, although I would not be upset if it were accepted.
Confidence Level: 2: I have passing familiarity with this area.
Official Review of Paper763 by Reviewer ccoY
KDD 2023 Conference Applied Data Science Track Paper763 Reviewer ccoY
26 Mar 2023 (modified: 26 Mar 2023)KDD 2023 Conference Applied Data Science Track Paper763 Official ReviewReaders: Program Chairs, Paper763 Area Chairs, Paper763 Reviewers Submitted, Paper763 AuthorsShow Revisions
Note: I have read and agree with the ADS track's policy on behalf of myself.
Paper Category: None of the Above but in scope for the ADS Track
Influence: Yes, the authors showed statistics/results of how users have benefited from the solution.
Novelty: 4
Technical Soundness: Yes
Impact: A good solution to a rather narrowly defined problem for a small target group of users
Presentation: Clear
Reproducibility: Yes - Excellent. It provides excellent and complete information that will make it easy to reproduce the insights/results.
Summary:
The paper proposes a novel approach to identify the masterminds or key players behind pump-and-dump schemes in the cryptocurrency market by analyzing both cryptocurrency market data and online social networks (OSNs) data like Telegram. The proposed approach involves scanning OSN sites and using regular expressions (regexs) to extract relevant signals. The paper presents a comprehensive analysis of pump-and-dump schemes from a market trading perspective, combined with the extracted pump signals to calculate the impact scores of OSN channels. The paper employs intuitive scoring, mixed effect model, and Bayesian hierarchical model for the process, and OSN profiling and topology analysis to examine the connections between channels and users to pinpoint the masterminds.
Paper Strength:
The proposed approach is novel and offers a comprehensive solution to the challenge of identifying key players behind pump-and-dump schemes in the cryptocurrency market. By analyzing both market and social network data, the approach sheds light on the mechanisms and players involved in these schemes.
Code is available to reproduce the results. The dataset used in this paper may be helpful for other researchers.
The paper employs multiple models and techniques in channel analysis, providing greater flexibility in overcoming the constraints imposed by the underlying data structure. The evaluation of the entities involved in pump-and-dump schemes is comprehensive and accurate.
Paper Weakness:
The dataset is collected from Cloudburst, and the proposed method is evaluated on the dataset. According to line 227, Cloudburst already provides pump-and-dump detection API. What is the contribution of the proposed method compared with Cloudburst? It seems Cloudburst already applies OSN messages for pump-and-dump detection.
Lack of performance comparison with the methods mentioned in the related works.
Detailed Evaluation And Suggestions For Authors:
Possible weaknesses refer to the above section.
Overall Evaluation: 2: An OK paper, but likely not good enough for KDD (weak reject). I vote for rejecting it, although I would not be upset if it were accepted.
Confidence Level: 1: I learnt the area as I reviewed the paper.
Official Review of Paper763 by Reviewer RFri
KDD 2023 Conference Applied Data Science Track Paper763 Reviewer RFri
13 Mar 2023KDD 2023 Conference Applied Data Science Track Paper763 Official ReviewReaders: Program Chairs, Paper763 Area Chairs, Paper763 Reviewers Submitted, Paper763 AuthorsShow Revisions
Note: I have read and agree with the ADS track's policy on behalf of myself.
Paper Category: Deployed/Almost Deployed
Influence: Yes, the authors have designed their solution for a particular audience (but no statistics/results of how the audience benefit yet).
Novelty: 3
Technical Soundness: Mostly, but with major flaws (e.g., missing baselines)
Impact: Fair solution to an important problem, but there are alternatives around
Presentation: Clear
Reproducibility: Yes - Fair. The information provided represents a fair effort to make it possible for readers to reproduce the results.
Summary:
This article proposes a framework to identify the most important actors of the pump and dump schemes on cryptography markets. Those schemes consist in having many people buy a certain cryptocurrency, in order to sell theirs after its price has risen up. This practice is illegal and can impact negatively crypto markets, destroying stability, harming investors, and undermining the trust people have in it. But unlike classical markets, those ones are usually decentralized and anonymous due to the nature of the blockchain. This makes it hard to understand who is behind those schemes. Using data from pumps observed on cryptomarkets and data from channels of social networks on which those schemes are organized, the actors claim they can identify who is behind those pump and dump.
Paper Strength:
The article is clear and pleasant to read. The subject and the problem are presented with details, even for someone not familiar with this market scheme. In the same manner, the short synthesis regarding related work or the description of the protocol are easy to understand.
The proposed method is interesting and promising, as a lot of data is available and combining social networks talks and effects on markets feels like having all the information needed to tackle the problem.
Paper Weakness:
While being very pleasant, the article spends too much time on context, presenting all stages of a pump for example, and too little on actual contributions and explaining results.
The presented goals are not achieved. What are "masterminds" is not clearly defined. The article only gives scores to chat groups and claims it has precisely identified all the entities behind the scheme, while having no ground truth on who is behind those groups and how they succeeded. We don't really see how the analysis might allow a "deeper understanding of the pump-and-dump scheme" as claimed.
The presented protocol aims, in the end, at creating a network of groups on the social network. It is therefore unfortunate to see that only very few tools from complex networks analysis have been used on the obtained network, leading to unclear explanations for the value of group scores.
Detailed Evaluation And Suggestions For Authors:
See above.
Overall Evaluation: 2: An OK paper, but likely not good enough for KDD (weak reject). I vote for rejecting it, although I would not be upset if it were accepted.
Confidence Level: 2: I have passing familiarity with this area.