Data Extraction Full Guideline
Data Extraction Full Guideline
HitApp Description
When a user is looking for something in the Search Engine (e.g., Bing), they can get results in
different forms (e.g., links, text passages, maps etc.).
One of these forms is called Micro-Answer – a block of structured information about the needed
topic that contains important highlights the user is looking for.
For example, consider a person as an entity. A person has Attributes such as “Name”, “Date of
birth”, “Height”, “Age”, etc. These Attributes have Values differing for each entity. The Attribute
“Age” could have a Value of “32” for one person and “12” for another.
Your task is to find Values for the provided Attributes on a webpage to create Micro-
Answers.
Task Process
1. Familiarize yourself with the webpage on the right: understand the main topic of the page
and review the contents to get a sense of the layout.
2. Go through the given list of attributes. Using your understanding of the topic and how the
contents are laid out on the web page, try to find values that are applicable to the provided
attributes.
Note: sometimes attributes are not exactly worded on a webpage. Use your
logical thinking to try finding values applicable to such attributes.
3. Capture a value for an attribute by first selecting its cell in the table and then clicking on the
value on the page. The value will be automatically added to the table and become highlighted on
the webpage.
Note: if you don’t see the value on the page, click the ‘NP’ (Not Present) button.
4. Once the selected label is annotated correctly, hit ‘Enter’ on your keyboard to move on to
the next label in the list.
Note: if you work with a table of several similar entities, you will move on to the
next attribute in the row.
5. Annotate all the given attributes. Double check if all the needed values are captured and
click ‘Submit’.
Note: if you have several similar entities with the same attributes on a web page,
you must annotate all the entities.
Interface explanation and Examples
Your screen is divided into two sides. You are given a web page on the right side and a list
of attributes on the left side.
The left side contains a list of general attributes such as Name of the specific topic of the
page, Description (summary, abstract, overview, etc.) about the topic, Image of the entity, related
to the topic (e.g., photo of an actor/product/animal etc.), etc.
Note: Use a scrollbar to scroll the tables on the left side. They may not fit into
the screen.
Sometimes you can find several similar entities with the same attributes, but different values
on a webpage. In that case you have a special table of entities to fill on the left side. Attributes are
arranged horizontally inside the table. Each column – is an attribute. Each row – is an entity with
its own attributes.
That kind of entity could have a different presentation on the right side. Here you can find some
examples:
Example 1: - A table
A table of Registration History contains several rows (entities). As you can see – each row is a
record in registration history. Each record has Date, Owner, and Location attributes.
In this case the filled table of attributes would look this way:
Note: if you are out of empty rows – press “Add Row” button.
In this case the filled table of attributes would look this way:
Note: If there is a table of similar entities on the left side, but there are not any
related objects on the right side – press “Entire Table Is Missing” button to
remove entire table.
The idea is that you may have not only the general attributes section on the left side of the screen,
but also you may have several tables for similar entities with the same attributes.
You should find all the entities on the webpage and label them.
Important Notes
3. Press “Ctrl” + “click your left mouse button” to automatically select a list of values.
Select the cell of the value in the table, then press “Ctrl” and capture multiple values using the left
mouse button to select them all at once.
4. Press “Auto” button to automatically select a list of values.
Select the cell of the value in the table, then capture the first of multiple values using the left mouse
button and press the “Auto” button.
Example:
Be careful using this tool! Always double-check the table. Sometimes it can capture wrong
data!
Example:
6. Fill in a table of similar entities automatically.
Sometimes you can find several similar entities with the same attributes, but different values on a
webpage. You can save a lot of time labeling the table if you try “Fill Automatically” button.
Note: The first row doesn't necessarily need to be filled in with values from the
first object, rather it needs a corresponding value from any of the objects.
Note: You also can skip a column or several columns in the first row. In that
case, automatically will be filled in only the columns you filled in in the first row.
A skipped column
HitApp’s Tools
1. Buttons description
Button Description
“Not Present” button. Click it when you cannot find an
NP
attribute on the web page.
“Edit” button. Edit the text you selected on the web page.
E Note: you cannot add any extra characters to the text you
select, you can only clean characters.
C “Clear” button removes the whole text at once.
“Next” button changes the highlighted cell to the cell adjacent
N
to current highlighted cell on the answer pane.
+ “Add new cell” button adds one more value for an attribute.
- “Remove Last Cell” button removes last value of an attribute.
“Clear all value cells” button removes all values of an
--
attribute at once.
“Auto” button. Click it to fill in a list of values automatically.
Auto
Note: You must fill in the first row before!
“Add Row” button. Click it when you are out of empty rows in
Add Row
a table of similar objects.
“Delete Selected Row” button. Select a row in a table of
Delete Selected Row similar objects and click the button when you want to delete
the row.
“Clear All Rows” button. Click it when you want to clear all
Clear All Rows
the values of a table of similar objects.
“Fill automatically” button. Click it when you want to fill in a
table of similar objects automatically. The table is highlighted
Fill automatically green.
Note: You must fill in the first row and one more cell from the
second row before!
“Mark missing cells as 'Not Present'” button fills the whole
Mark missing cells as
table it allied to with “Not Present” values”. Use it if there are
'Not Present'
not any related objects on the right side.
2. “Take selection” tool
Instead of clicking a text box, you can drag your mouse to select a text and then click “take
selection”. This helps when you need to extract certain data which can’t be selected as a single
cell.
In this case you can remove popups. Check “Enable remove popup” checkbox and press
“Remove Popup” button.