0% found this document useful (0 votes)
18 views11 pages

Data Extraction Full Guideline

Uploaded by

davidvelela408
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views11 pages

Data Extraction Full Guideline

Uploaded by

davidvelela408
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Extraction for Micro Answers

HitApp Description

When a user is looking for something in the Search Engine (e.g., Bing), they can get results in
different forms (e.g., links, text passages, maps etc.).
One of these forms is called Micro-Answer – a block of structured information about the needed
topic that contains important highlights the user is looking for.

Example of Bing results


without Micro-Answer with Micro-Answer

The Micro-Answer consists of Attributes and Values.


An Attribute is the name of a quality or characteristic that belongs to an entity. Other entities of
the same type will have similar attributes, though, the Values of the attributes can be different for
each entity.

For example, consider a person as an entity. A person has Attributes such as “Name”, “Date of
birth”, “Height”, “Age”, etc. These Attributes have Values differing for each entity. The Attribute
“Age” could have a Value of “32” for one person and “12” for another.

Your task is to find Values for the provided Attributes on a webpage to create Micro-
Answers.
Task Process

1. Familiarize yourself with the webpage on the right: understand the main topic of the page
and review the contents to get a sense of the layout.
2. Go through the given list of attributes. Using your understanding of the topic and how the
contents are laid out on the web page, try to find values that are applicable to the provided
attributes.

Note: sometimes attributes are not exactly worded on a webpage. Use your
logical thinking to try finding values applicable to such attributes.

3. Capture a value for an attribute by first selecting its cell in the table and then clicking on the
value on the page. The value will be automatically added to the table and become highlighted on
the webpage.

Note: if you don’t see the value on the page, click the ‘NP’ (Not Present) button.

4. Once the selected label is annotated correctly, hit ‘Enter’ on your keyboard to move on to
the next label in the list.

Note: if you work with a table of several similar entities, you will move on to the
next attribute in the row.

5. Annotate all the given attributes. Double check if all the needed values are captured and
click ‘Submit’.

Note: if you have several similar entities with the same attributes on a web page,
you must annotate all the entities.
Interface explanation and Examples

Your screen is divided into two sides. You are given a web page on the right side and a list
of attributes on the left side.

The left side contains a list of general attributes such as Name of the specific topic of the
page, Description (summary, abstract, overview, etc.) about the topic, Image of the entity, related
to the topic (e.g., photo of an actor/product/animal etc.), etc.

Note: Use a scrollbar to scroll the tables on the left side. They may not fit into
the screen.

Sometimes you can find several similar entities with the same attributes, but different values
on a webpage. In that case you have a special table of entities to fill on the left side. Attributes are
arranged horizontally inside the table. Each column – is an attribute. Each row – is an entity with
its own attributes.
That kind of entity could have a different presentation on the right side. Here you can find some
examples:
Example 1: - A table

A table of Registration History contains several rows (entities). As you can see – each row is a
record in registration history. Each record has Date, Owner, and Location attributes.

In this case the filled table of attributes would look this way:

Note: if you are out of empty rows – press “Add Row” button.

Example 2: - A grid of objects


You can see a grid of rentals from a booking webpage. Every rental has a Picture, a Rating, a
Description, and a Price per night, etc.
Example 3: - A list of objects
A list of university professors is presented on a webpage. Each professor has Name, Title, Faculty,
Count of Students and Count of Courses.

In this case the filled table of attributes would look this way:

Note: In examples 2 and 3 attributes are not exactly worded on a webpage.


Use your logical thinking to try finding values applicable to such attributes.

Note: If there is a table of similar entities on the left side, but there are not any
related objects on the right side – press “Entire Table Is Missing” button to
remove entire table.

The idea is that you may have not only the general attributes section on the left side of the screen,
but also you may have several tables for similar entities with the same attributes.
You should find all the entities on the webpage and label them.
Important Notes

1. Sometimes attributes do not exist on a webpage.


Sometimes you can see an attribute at the table on the left side, but there is no corresponding
attribute on the webpage on the right side. In this case just press “Not Present” (NP) button to
mark the attribute as not presented.
2. Select all data that is applicable to a single value.
Sometimes a bounding box captures only one part of the value. Use “take selection” tool to
capture all the needed data.
E.g., “Address” is a Single value attribute and should not be divided.

Note: You cannot divide one box to make multiple values.

2. Capture images the same way as you do with text values


You can capture images the same way as you do it with text. You can select it as a single value,
or you can use “take selection” tool.

3. Press “Ctrl” + “click your left mouse button” to automatically select a list of values.
Select the cell of the value in the table, then press “Ctrl” and capture multiple values using the left
mouse button to select them all at once.
4. Press “Auto” button to automatically select a list of values.
Select the cell of the value in the table, then capture the first of multiple values using the left mouse
button and press the “Auto” button.

Example:

1. Select the cell of value in the


table.

2. Capture the first value.

3. Press the button.

Be careful using this tool! Always double-check the table. Sometimes it can capture wrong
data!

5. Clean all unnecessary punctuation


Delete all extra punctuation in added values (colons, dashes etc.) by clicking the E (Edit) button

Example:
6. Fill in a table of similar entities automatically.
Sometimes you can find several similar entities with the same attributes, but different values on a
webpage. You can save a lot of time labeling the table if you try “Fill Automatically” button.

In that case a table is highlighted green. You need to do 4 steps:


1. Fill in the first row labeling a relevant entity. The row is highlighted green.
2. Fill in one more cell from the second row. It is also highlighted green.
3. Press “Fill Automatically” button.
4. Double-check the labeling. Correct values if necessary.

1. Fill in the first row.

2. Fill in one more


cell from the second
row.

3. Press the button.

Note: The first row doesn't necessarily need to be filled in with values from the
first object, rather it needs a corresponding value from any of the objects.

Note: You also can skip a column or several columns in the first row. In that
case, automatically will be filled in only the columns you filled in in the first row.
A skipped column
HitApp’s Tools

1. Buttons description

Button Description
“Not Present” button. Click it when you cannot find an
NP
attribute on the web page.
“Edit” button. Edit the text you selected on the web page.
E Note: you cannot add any extra characters to the text you
select, you can only clean characters.
C “Clear” button removes the whole text at once.
“Next” button changes the highlighted cell to the cell adjacent
N
to current highlighted cell on the answer pane.
+ “Add new cell” button adds one more value for an attribute.
- “Remove Last Cell” button removes last value of an attribute.
“Clear all value cells” button removes all values of an
--
attribute at once.
“Auto” button. Click it to fill in a list of values automatically.
Auto
Note: You must fill in the first row before!
“Add Row” button. Click it when you are out of empty rows in
Add Row
a table of similar objects.
“Delete Selected Row” button. Select a row in a table of
Delete Selected Row similar objects and click the button when you want to delete
the row.
“Clear All Rows” button. Click it when you want to clear all
Clear All Rows
the values of a table of similar objects.
“Fill automatically” button. Click it when you want to fill in a
table of similar objects automatically. The table is highlighted
Fill automatically green.
Note: You must fill in the first row and one more cell from the
second row before!
“Mark missing cells as 'Not Present'” button fills the whole
Mark missing cells as
table it allied to with “Not Present” values”. Use it if there are
'Not Present'
not any related objects on the right side.
2. “Take selection” tool

Instead of clicking a text box, you can drag your mouse to select a text and then click “take
selection”. This helps when you need to extract certain data which can’t be selected as a single
cell.

3. “Remove popup” tool

Sometimes you can find a popup on a web page. Like this:

In this case you can remove popups. Check “Enable remove popup” checkbox and press
“Remove Popup” button.

If you want to get it back – press “Recover” button.

You might also like