0% found this document useful (0 votes)
29 views

Data Acquisition

The document describes various methods for loading, cleaning, filtering, transforming, programmatically generating, and visualizing data in .NET. Some key methods include: - Loading data from CSV, fixed length, ARFF, HTML tables, and other file formats. - Cleaning data by removing outliers, duplicate rows, values between/not between given values, and values matching/not matching regular expressions. - Filtering and slicing data by filtering columns by values or regex, running SQL queries, sorting, modifying column names, and more. - Transforming data by splitting on column values, merging tables, rounding values, and more. - Programmatically generating data by adding rows with formulas and

Uploaded by

api-286344277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Data Acquisition

The document describes various methods for loading, cleaning, filtering, transforming, programmatically generating, and visualizing data in .NET. Some key methods include: - Loading data from CSV, fixed length, ARFF, HTML tables, and other file formats. - Cleaning data by removing outliers, duplicate rows, values between/not between given values, and values matching/not matching regular expressions. - Filtering and slicing data by filtering columns by values or regex, running SQL queries, sorting, modifying column names, and more. - Transforming data by splitting on column values, merging tables, rounding values, and more. - Programmatically generating data by adding rows with formulas and

Uploaded by

api-286344277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Acquisition

LoadCSV
LoadFixedLength
LoadARFF
LoadHTMLTable
LoadDataTable
LoadFlatFile
LoadTSV
PrettyDump
ToTable
ToHTMLTable
ToDataTable
ToARFF

Loads a CSV file


Loads file with fixed length fields
Loads ARFF (WEKA supported file format)file
Loads a HTML table
Loads a ADO.NET DataTable
Loads a flat file given list of delimiters
Loads tab (\t) separated values
Dumps the data beautifully to Console
Generates a table from a list of tuples
Generates a HTML table from internal table
Generates a ADO.NET DataTable
Generates a ARFF notation from the table

Data Cleansing
ExtractOutliers
RemoveOutliers
Distinct
RemoveIfBetween
RemoveIfNotBetween
RemoveMatches
RemoveNonMatches
RemoveIfBefore
RemoveIfAfter
RemoveIfBetween
RemoveIfNotAnyOf
RemoveIfAnyOf
RemoveLessThan
RemoveLessThanOrEqualTo
RemoveGreaterThan
RemoveGreaterThanOrEqualTo
RemoveIf<T>
RemoveIfNot<T>

Extracts the outliers for the given column


Removes outliers in-place
Removes duplicate rows by calculating hash key
Removes if values are between given values
Removes if values are not between given values
Removes if values match a given regular expression
Removes if values dont match a given regular
expression
Removes if the dates are before the given date
Removes if the dates are after the given date
Removes if the dates are between given dates
Removes if values are not any of the given values
Removes if the values are any of the given values
Removes if values are less than the given value
Removes if values are less or equal than the given
value
Removes if the values are greater than the given
value.
Removes if the values are greater or equal than the
given value
Removes if the given predicate for T matches.
Removes if the given predicate for T doesnt match.

Filtering and Slicing/Dicing


Filter
FilterByRegex
RunSQLQuery
SortBy
SortInThisOrder
ModifyColumnName
ValuesOf
ValuesOf<T>
AddRow
ExtractAndAddAsColumn
TranformCurrencyToNumeric
AddColumn
CumulativeFold
CumulativeSum

Transform

Filters out the given column for given values


Filters out given column by the given regex
Runs SQL (standard) on the table
Sorts the table by the given column
Sorts the table as per a given list for a given
column.
Changes column name.
Returns all values of the given column
Returns all values of the given column casting to T.
Adds a new rows given the row as a dictionary
Extracts elements and add them in a different column
provided the regex and column name.
Identifies and removes currency symbols to make them
numeric so that sorting works as expected.
Adds a new column given the values and column name
Folds a column cumulatively given a scheme
Folds a column cumulatively to generate running sum

Filtering and Slicing/Dicing


Transform
Histogram
SplitOn

Transforms values of a column by a rule


Generates histogram for a given column
Splits the table based on values of the given column
Merges two tables based on the column that has common
values automagically
Merges two tables and keeps the keeps the duplicate
rows by default
Finds rows that are exclusively available on a single
table but not on the second one
Finds common rows from two tables
Checks whether a table is a subset of another table
or not
Merges multiple columns to a single column provided
the scheme of merge and the new column name
Drops mentioned column names
Picks only the mentioned columns in the given order
Generates a new table with random sample
Takes top N rows
Takes bottom N rows
Takes top N % rows
Takes bottom N % rows
Takes N rows from the middle
Generates multiple tables by splitting all the rows
as per the given row count per table
Generates multiple tables by taking specified number
of columns for each table.
Randomly shuffles the table in-place
Rounds off each numeric column to the given digits
count. Has an overload that allows specific precision
for each column.
Flattens a table by a scheme applied for all numeric
columns for each rows horizontally.
Flattens a table by a scheme applied for all the
numeric columns vertically.

MergeByColumns
Merge
Exclusive
Common
IsSubset
MergeColumns
Drop
Pick
RandomSample
Top
Bottom
TopNPercent
BottomNPercent
Middle
SplitByRows
SplitByColumns
Shuffle
RoundOffTo
Aggregate
AggregateColumns

Programmatic Data Generation

AddRows
AddRowsByShortHand
AddColumn

Add new rows given a formula as string and


a precision for decimal digits
Adds new rows by shorthand. Internally
calls AddRows after expanding the shorthand
notation. Supports programming short hand
notations like +=, -=, *=, /=, ++ and -Adds a new column given a formula and a
precision for decimal digits

Data Visualization Adapters


ToBasicBootstrapHTMLTable
ToBootstrapHTMLTableWithColoredRows

ToPieByGoogleDataVisualization

ToBarChartByGoogleDataVisualization

Generates a basic bootstrap table. The


default bootstrap table class is striped.
Generates a bootstrap table with colored
rows. You can specify a predicate for each
different classes of rows, info, error,
danger and success.
Generates a pie chart from the table using
Google data visualization API and the given
column. Changing an enum, this method can
be used to generate Pie,Donut and 3D Pie
chart.
Generates a bar chart using google
visualization API and a given column.

You might also like