关于
Retired - Data Architect experienced in all things DATA especially DevOps, GitLab…
Thad的文章
参与
-
What is the best way to identify irrelevant data for specific algorithms?
Hidden characters (non-printable characters, as an example the ASCII control character range 0-31) sometimes are found in seemingly clean data. It's important to perform a cursory analysis if hidden control characters might have been used in a meaningful way in the source data. Examples of where control characters might have been used as hidden separators include partitioning of key-value pairs, record rows, or hidden structure of long text. I have seen hidden char(30) record separators that were used to separate records where there seemingly was no apparent visual separation between record rows in a CSV file. Pay careful attention to how these control chars might have been used in the source data to help unlock and use hidden structure.
动态
-
What a great year with #OpenRefine! I am very happy to have met so many contributors in person at the CZI Open Science conference in Boston…
What a great year with #OpenRefine! I am very happy to have met so many contributors in person at the CZI Open Science conference in Boston…
Thad Guidry点赞
-
I'm currently working on improving my #OpenRefine skills to assist me in improving structured data attached to #BiodiversityHeritageLibrary images in…
I'm currently working on improving my #OpenRefine skills to assist me in improving structured data attached to #BiodiversityHeritageLibrary images in…
Thad Guidry点赞
-
Thanks to the DMG MORI team for getting this install done so quick after delivery so we can get up and running! #800mmPalletsOnDeck…
Thanks to the DMG MORI team for getting this install done so quick after delivery so we can get up and running! #800mmPalletsOnDeck…
Thad Guidry点赞
工作经历
志愿者经历
-
Contributor
schema.org
- 至今 13 年 9 个月
Science and Technology
Contributing towards better modeling of Types and Properties for structured data on the web.
出版作品
-
Review of new parseHtml() Function in Google Refine
Personal Blog
...Using jsoup's simple selector syntax, I was able to easily parse out company websites from LinkedIn's public pages. The example below says select the div called data-table that contains the term Website and return the 2nd <a href> htmlText. In Refine, the ordering starts at [0], so in this case [1] gives the 2nd href link....
所做项目
荣誉奖项
-
Google Summer of Code Certificate of Appreciation
Google
https://drive.google.com/file/d/1a14a_crsX7X4iT_yx3lu6o7ixs_bRr3C/view?usp=sharing
-
2019 Gitlab Top Contributor
Gitlab
https://about.gitlab.com/community/top-annual-contributors/
语言能力
-
English
母语或精通双语
参与组织
-
HomiHQ
CMO
- 至今 -
W3C Entity Reconciliation Community Group
member
- 至今https://www.w3.org/community/reconciliation/
-
Schema.org
Experts Panel
- 至今 -
OpenRefine Steering Committee
Director
-http://openrefine.org/
-
Freebase
Community Expert
-Technical recruitment and support of community initiatives involving Freebase data, Type systems, Metadata modeling, and Linked Data.
收到的推荐信
5 位会员推荐了Thad
加入领英,即可查看其他相似会员
其他姓名为Thad Guidry的会员
领英上有其他 2 位姓名为 Thad Guidry 会员
查看其他姓名为Thad Guidry的会员