Online Research Tools for Data Practitioners

How Data Management professionals can benefit from technical tools for productivity and speed to insight.

Author:  Data Strategy Professionals team  |  Post Date:  Feb 20, 2024  |  Last Update:  Apr 4, 2024  |  Related Posts

Maintaining a competitive edge in Data Management isn't just about technical acumen or leadership ability. Delivering value requires continuously learning how to take advantage of new tools. Staying on top of the latest developments is particularly advantageous given recent advancements in AI.

tools on a table
Photo by Pixabay on Pexels

Deep learning techniques are transforming the field of data analytics, and AI coding assistants such as GitHub Copilot are augmenting the work of software engineers. Some analysts and programmers might worry that innovations like these could make some of their professional skills obsolete, but the wisest professionals understand how to keep up with the changing landscape by using these tools to augment their skills.

How can Data Management professionals do the same? While the work of a Data Strategist might be less technical than that of a software engineer or a data analyst, that doesn't mean you can't benefit from technical tools too.

This article presents a curated list of tools to find information online. You can leverage these tools to enhance your productivity when you need to find information online. Whether that involves general web searches or diving deep into academic research, having the right tools at your disposal can help you succeed.


Contents

You may have noticed Google's search results have become less useful over time given that specialists known as Search Engine Optimization (SEO) engineers seem to have flooded the web with tons of mediocre content. This content serves little useful purpose, but because it is tuned to game the search algorithm, it shows up at the top of your results.

The proliferation of SEO-engineered content has accelerated with the development of generative AI, which enables SEO engineers to scale up their efforts by an order of magnitude. Fortunately, it's still possible to cut through the noise if you have the right tools at your disposal.

Google tricks

For example, did you know you can filter for specific websites or keywords on Google? Appendingsite:to a website domain will return only results from that domain (e.g., appending), putting keywords in quotes ("cybersecurity") will return only results containing those keywords, and prefixing keywords with a hyphen (-cybersecurity) will exclude results with those keywords. To learn many more small tricks like these, we recommend this in-depth guide by Gwern Branwen.

Kagi

You can also ditch Google altogether with alternatives such as Kagi, a subscription-based search engine that comes with niceties and gives you more control over your search experience. It has no ads, combines listicles into a separate group from the other results, and enables you to block or boost specific domains across all of your searches.

Kagi search engine
Screenshot of Kagi search engine

Perplexity

Perplexity is another "smarter" search engine: it's like Google's quick answer feature on steroids. It uses AI to automatically read the websites most relevant to your query and write a report for you with its findings, complete with citations.

Perplexity search engine
Screenshot of Perplexity search engine

Exa

If you have a trickier query in mind, Exa is a handy search engine for hunting down specific websites or answering niche questions. It gives you a paragraph box to describe what you're looking for and uses advanced natural language processing to help deliver the exact results you're looking for.

Lastly, as you explore the online jungle, it's common to run into links that don't work anymore. The next time you click on a promising resource and find yourself staring at a page that reads "404: Page Not Found", you can use the Internet Archive's Wayback Machine to retrieve older versions of dead links. The Internet Archive is an organization that makes backups of public websites to prevent them from becoming lost to the sands of time, providing an invaluable service to the digital explorer.

For even more convenience, you can download this open-source browser extension which gives you quick access to the Wayback Machine's mirror of the current page you're on, as well as any cached versions of the page created by search engines like Google, if they exist.

Research

While the previous tools are useful for general research online, a different set of techniques is required to find useful information in academic literature.

Reviewing papers allows data managers to benefit from the rigorous yet cutting-edge knowledge generated by the academic community, ultimately improving their data management practices. They can find potential sources of data, identify emerging trends in data management, or learn evidence-tested business techniques they can use to stand out from their competitors.

arXiv

arXiv provides open access to papers from mostly quantitative fields. You can use arXiv Xplorer to find papers that are semantically similar to a search query. This search engine uses natural language processing to attempt to understand the semantic meaning of your query. It then uses the text in the papers themselves, not just the abstract and title, to unearth relevant papers.

Another way to explore arXiv's database is to use Paperscape, a more visual tool. It uses a graph layout to show you connections between papers and their citations and references.

Semantic Scholar

For papers that are not in arXiv, you can use Semantic Scholar to search through a larger swath of academia. It gives you more control over your search than Google Scholar and provides you with an integrated paper viewer that keeps track of acronym definitions and references.

Connected Papers is another tool that's similar to Paperscape in that it visualizes connections between related papers, but it's based on Semantic Scholar's database instead of arXiv's.

Connected Papers
Screenshot of Connected Papers

Last but not least, the author's personal favorite tool for finding research papers is Elicit. This is an AI-powered paper search engine that makes literature reviews much easier. It extracts key elements of relevant papers, such as the number of participants in an experiment or the main conclusion, into a table, and writes a summary of the findings from across the papers for you, much like Perplexity.

Elicit columns
Screenshot of Elicit columns
Elicit query
Screenshot of Elicit query

Zotero

Zotero is a multi-platform program that helps you collect, organize, and annotate academic papers. It's the de facto standard reference management tool for academics, so it integrates with many other programs, like browsers and note-taking applications.

Unpaywall

In the previous section, we mentioned a browser extension that enables you to quickly find backups of webpages that have gone offline. A similar situation for an academic researcher is running into a paywall that blocks their ability to access a paper that would aid in their research. Fortunately, while it can't grant you access to papers that are truly only for those who pay, consider the Unpaywall browser extension to quickly navigate to listings of given papers in legal open-access repositories.

Consensus

Consensus is a powerful AI search engine that uses OpenAI’s GPT-4 to search amongst over 200 million academic papers. It supports users to find academic publications for questions about the relationship between concepts, yes/no questions, the effects of a concept, and more.

woman with a laptop
Photo by Tatiana Syrikova on Pexels

Datasets

Many data practitioner roles could benefit from access to the wide range of high-quality, publicly available datasets out there. The following tools offer access to datasets on various subjects and from different sources.

Google Data Search functions as a comprehensive search engine for datasets, enabling users to find datasets published across the web. Leveraging metadata from dataset repositories that adopt standard schema.org markup, it offers a broad view of available data across various subjects and disciplines.

Kaggle

Kaggle is a Data Science competition platform where users can upload their own datasets for others to download and use. Datasets exist for various different categories from movie ratings to credit card fraud trends, and often come with descriptions of the dataset’s key features.

Data.gov

Under the OPEN Government Data Act, the US government is required to make its data publicly available. At the time of writing, Data.gov gives access to just under 300,000 datasets from across county, city, state, and federal government entities.

Conclusion

Success in Data Management requires staying on top of the latest advancements in technology, including the latest tools to enhance your workflow. The evolution of tools, driven by advancements in technologies such as generative AI, presents both challenges and opportunities for ambitious data practitioners.

The challenges lie in adapting to the ever-changing technological environment. raditional skills risk become, new skills can be difficult to learn, and staying up-to-date with the latest developments can be time-consuming.

The opportunities come from leveraging powerful research tools to enhance productivity. Using substantially faster and more reliable tools to find information increases a data practitioner’s work efficiency and quality, giving them a competitive edge over those not using such tools.

Embracing cutting-edge research tools is not just about staying relevant; it's about enhancing one's ability to manage, analyze, and leverage data in innovative ways. By integrating these resources into their workflow, data practitioners can ensure they remain at the forefront of their field.

Note: a version of this post originally appeared on the Roman's personal blog.

Roman Hauksson

Roman Hauksson

Guest blogger

Roman is interested in using his career to help solve the world's most important and neglected problems and ensure humanity has a long and flourishing future. In particular, he is interested in pursuing technical research to align powerful artificial intelligence systems with human values.

Mac Jordan

Mac Jordan

Data Strategy Professionals Research Specialist

Mac supports Data Strategy Professionals with newsletter writing, course development, and research into Data Management trends.