Try โ€‚โ€‰HackMD

How to start doing research: general methodology plus tips and tricks from a CS graduate student

This post is a collection of ideas, tips and tricks, along with supporting software and websites, for new graduate students (and anyone interested) to start doing research; It's a compilation of many scattered posts I've read throughout the years.

Disclaimer: many of these are from the perspective of a CS student working on deep learning for computer vision, so your mileage may vary.

Methodology summary

The general methodology I would recommend to someone just starting on research is to:

  1. Find a rough topic or field you're interested on, and read an overview/review/survey article on the topic. This should allow you to get a general idea of the past and current state of the field, along with possible future directions.
  2. Obtain a more in-depth understanding of the field by going through the cited articles on the review. Summarize their contributions and identify trends in the literature, along with possible gaps in their works. Also, familiarize yourself with high-impact authors and where they publish, so as to stay up-to-date with their latest work.
  3. Find and solve a research problem. Maybe the most difficult part, but the general idea is that you either create a new problem (and a solution), or propose an improvement/solution to existing one.
  4. Publish. This is about how to share your idea with the scientific community and the world. It's about finding a suitable journal or conference, and writing an appropriate, quality paper for the venue.
  5. Profit?

How to find a rough topic or field?

This only applies to Computer Science (CS), Machine Learning (ML), Deep Learning (DL), Computer Vision (CS), Natural Language Processing (NLP) and related (sub) fields, but a great starting point is Papers with Code. Their website allows to look by application, method, task, dataset, along with trending and new papers, usually with source code. Starting from thending can be a great way to find topics that catch your attention, along with code to not have to start programming from scratch, specially since reproducibility can be a huge problem in these fields.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

How to find (and solve) a research problem?

I would recommend readers to check out this article, published on SIGKDD. It covers in detail the research and publication process, focused on CS, but many ideas could be abstracted to other fields.

Elaborating on the third point, the author summarizes possible resarch contributions in this slide:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

He also mentions the possibility, specially for young scientists like us, to make incremental improvements. This is controversial, but the fact is sometimes you're not in a position to go for a high-risk, high-reward problem and solution, so settling down for low-hanging fruit is a must in some fields, at least when beginning.

Additionally, he mentions importance of a problem, complexity/simplicity of a solution, problem relaxation (making assumptions), and looking into domain experts and other fields for solutions as important ideas in the research process.

How to write a (publishable) research paper?

With regards to the publication process, he stresses the importance of clear, quality and organized writing as a key factor for success. Open-sourcing of code to help reproduce results also helps.

Among other ideas he mentions are 1) the importance of the first page as an anchor for the (re)viewers, as how most of the time they decide by this point if the paper is worth reading or not, and therefore if they accept it or not, and 2) clear, high-quality, easy-to-interpret figures (and tables).

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

Tips and tricks

Tip No. 1: Read. Read a lot.

This may seem obvious, but I would say depending on the field the first step to stay on the top is to be knowledgeable of what others are doing, and the only way to do that is by reading their work. I wrote a post on how to read papers, with examples from common papers and literature surveys, for deep learning and computer vision for remote monitoring of physiological signals, in case you're not familiar with the process of quickly going through papers.

Finding papers

However, how to find papers to read is another critical problem. In my case I use three main sources when doing literature surveys: Google, Google Scholar, and Microsoft Academic. I specially fancy the former since there's so many neat statistics, though I usually go through the three. In general, you can look by keywords, date, authors, institutions, publication venues, fields, among others.

Asides from this, another cool tool I use when looking for related, derivative and prior works from a particular paper is Connected Papers. It allows to quickly visualize papers connections based on a similarity metric inspired by co-citation and bibliographic coupling. It also allows to visualize papers relative influence, since the size of the nodes are related to their citation count.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’
An example of Connected Papers paper similarity space.

Bibliography management and tracking papers

For this, Zotero comes to the rescue! There's other alternatives but I like this one due to its simplicity. You download the desktop app and a browser add-on, and with a simple click you can add papers, articles or anything to this bibliography manager. You create an account and it automatically syncs all the items in your bibliography, but without the attachments (PDF files).

Asides from this, there's add-ons/plugins to further extend its functionality. A popular one is Zotfile to allow Zotero to automatically extract annotations from PDFs, along with managing and syncing attached PDFs for your items. Using this it's possible to setup cloud storage for your whole bibliography.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

Tip No. 2: Write. Write a lot.

You may be seeing a pattern in here. But most of us learn by practice, and writing is no exception. Reading without writing is like watching cooking videos and then doing no actual cooking; you know a lot of theory but you may lack how to put it in practice.

Note taking

For this I suggest to take notes of the papers you read, either in Zotero, making slides in Powerpoint/Google Slides, and/or what I personally prefer the most, Evernote. Evernote allows you to organize and create notes with all sorts of attachments and carry them with you on a variety of devices. Depending on the field, your group policities, and your current state, writing a blog may also be a good alternative.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’
Evernote's GUI.

Publications

With regards to publications, the first step is to find a suitable conference or journal. In most fields journals are more prestigious since the peer-review process is more thorough. In CS related fields however, many conferences hold more prestige than journals. Examples include: CVPR, [I/E]CCV, NIPS, ICML, ICLR, IJCAI, AAAI, SIGKDD, etc.

Therefore, identifying the target is a crucial step. In academia, there's quite a few metrics used to measure publication impact, such as H-5 index, Impact Factor (IF), SJR, JCR (Journal Citation Reports) h-index. Usually these are correlated, but not necessarily. Good starting points to look for conferences and journals include:

  1. Google Scholar's Top conferences and journals by field and sub-field
  2. Microsoft Academic
  3. Guide2Research
  4. CORE Conference Rankings
  5. CORE Journal Ranking

My personal favorite is Guide2Research due to its ease of use. There's options to look for both conferences and journals, for a variety of CS fields, and includes an option to filter by due only.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

For writing publication papers itself, I recommend Overleaf, using LaTeX. It not only allows for more control compared to traditional word editors such as Microsoft Word and Google Docs, but it's all online and allows for easy collaboration on projects. It takes some time since it follows a programmatic style (pretty similar in my opinion to HTML), but the results in my opinion are worth it.

Conclusion

This post provides a general methodology for how to approach research, along with tips and tricks to become more proficient in the process.

If you like this post, or have any questions, feel free to leave a comment or contact me on any of my socials, found at the bottom of my Github Pages.

References

[1]E. Keogh, โ€œHow to do good research, get it published in SIGKDD and get it cited!,โ€ p. 173.