Tabula

From PeaceTech Wiki
Jump to: navigation, search

Top contributors to this page: DerekPeaceTech

Logo
To find an image, do a Google image search of the tool. Find an image about 600px wide, if possible. Copy the image URL and paste into the box below.
Pm5XvZr.png
Name
What is the name of the tool?
Tabula
Tool Class
To what "family" does the tool belong? Pick as many categories as are appropriate. EG: Data Collection, Security, etc. The wiki will automatically create a link to the relevant tool class.
Data Scraper
Developer
Who made the tool? Sometimes the tool name and the developer name are the same, and sometimes, they are different.
Manuel Aristarán, Mike Tigas and Jeremy B. Merrill
Date Created
What was the date when the tool was first launched?
2014/05/25
Still Maintained?
Do the developers of the tool continue to work on it, adding features and fixing problems?
Yes
Open Source?
Is the tool's source code original source code freely available and may it be redistributed and modified? (This makes it possible for users to employ the tool for free, with some technical effort.)
Yes
Platforms
Pick the platforms on which the user would primarily use the tool. Be as selective as possible. If a tool works mainly in the browser, select "Web". If the user can technically load the interface on a smartphone but have a very poor experience, do not select "Android" or "iOS". "Mac" and "PC" are for apps that must be installed to a machine in order to run, as opposed to running from a browser.
PC, Mac
Website
What is the main website where a user can learn more about the tool? (Remember to include "http://)."
http://tabula.technology
Payment Structure
How, if at all, does a user pay to use the tool. Select all that apply: "Free" if there are components of a service that can be used for free; "Paid" if payment is required (either one-time, or at a given rate upon use) for use of the tool's full functionality; "subscription" if a monthly payment is required.
free
Languages Supported
What languages does the app's interface support? Note: some apps allow users to communicate in other languages but require the user to use an English language interface. In this case, the app is considered only to support "English".
English
Skill Level Needed
*
  • Beginner - The tool has a simple interface that assists the user, either through automatic processes or simple guides, in working with the tool. The user is able to do most things that they need to do without knowledge of advanced concepts, like code.
  • Intermediate - The tool allows the user to perform many tasks without knowledge of advanced concepts, but about an equal amount of functionality requires advanced knowledge. Some training may be required to use the tool.
  • Advanced - The tool requires advanced knowledge or training in order to use most of its functionality.
Intermediate

Tool Description

Tabula is an Open Source Data Scraper that allows the users to extract tables of information from .pdf files:

From the developer: "If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. "

XwvJRJ4.png

Tabula was developed with the support of ProPublica, La Nación DATA, Knight-Mozilla OpenNews, The New York Times. Tabula was designed by Jason Das.

What Makes This Tool Different from Others in its Class?

Tabula may be considered to be similar to Excel Online or Tessaract OCR, which are both free scraping tools. Excel Online, however, is not geared towards tables within PDFs, and Tessaract OCR assumes some level of coding knowhow.

Links to Tutorial Content

From the Developer

Download and Install

1. Download the version of Tabula for your operating system:

  • Windows: tabula-win.zip (PGP sig)
  • Mac OS X: tabula-mac.zip (PGP sig)
  • Linux/Other: tabula-jar.zip (PGP sig), view README.txt inside for instructions

2. Extract the zip file. (Instructions: Windows, Mac)

3. Go into the folder you just extracted. Run the "Tabula" program inside.

4. A web browser will open. If it doesn't, open your web browser, and go to http://localhost:8080. There's Tabula!

Use

1. Upload a PDF file containing a data table.

2. Browse to the page you want, then select the table by clicking and dragging to draw a box around the table.

3. Click "Preview & Export Extracted Data". Tabula will try to extract the data and display a preview. Inspect the data to make sure it looks correct. If data is missing, you can go back to adjust your selection.

4. Click the "Export" button.

5. Now you can work with your data as text file or a spreadsheet rather than a PDF! (You can open the downloaded file in Microsoft Excel or the free LibreOffice Calc)

Other Tutorials

Projects that use this tool