Babies Learning Language - Methods (07-08)
Babies Learning Language - Methods (07-08)
learning.
Meta-analyses can be immensely informative – yet they are rarely used by researchers. One
reason may be because it takes a bit of training to carry them out or even understand
them. Additionally, MAs go out of date as new studies are published.
There's a tension in discussions of open science, one that is also mirrored in my own
research. What I really care about are the big questions of cognitive science: what makes
people smart? how does language emerge? how do children develop? But in practice I spend
quite a bit of my time doing meta-research on reproducibility and replicability. I often hear
critics of open science – focusing on replication, but also other practices – objecting that
open science advocates are making science more boring and decreasing the focus on
theoretical progress (e.g., Locke, Strobe & Strack). The thing is, I don't completely
disagree. Open science is not inherently interesting.
Sometimes someone will tell me about a study and start the description by saying that it's
pre-registered, with open materials and data. My initial response is "ho hum." I don't really
care if a study is preregistered – unless I care about the study itself and suspect p-hacking.
Then the only thing that can rescue the study is preregistration. Otherwise, I don't care
about the study any more; I'm just frustrated by the wasted opportunity.
So here's the thing: Although being open can't make your study interesting, the failure to
pursue open science practices can undermine the value of a study. This post is an attempt
to justify this idea by giving an informal Bayesian analysis of what makes a study
interesting and why transparency and openness is then the key to maximizing study value.
Read more »
F r i d a y, N o v e m b e r 1 0 , 2 0 1 7
Reproducible research. Here's a blogpost on why I advocate for using RMarkdown to write
papers. The best package for doing this is papaja (pronounced "papaya"). If you don't use
RMarkdown but do know R, here's a tutorial.
Data sharing. Just post it. The Open Science Framework is an obvious choice for file
sharing. Some nice video tutorials make an easy way to get started.
Increasingly, my solution is co-work. The idea is that collaborators schedule time to sit
together and do the work – typically writing code or prose, occasionally making stimuli or
other materials – either in person or online. This model means that when conceptual or
presentational issues come up we can chat about them as they arise, rather than waiting to
resolve them by email or in a subsequent meeting.** As a supervisor, I love this model
because I get to see how the folks I work with are approaching a problem and what their
typical workflow is. This observation can help me give process-level feedback as I learn
how people organize their projects. I also often learn new coding tricks this way.***
Read more »
Labels: Methods
F r i d a y, O c t o b e r 6 , 2 0 1 7
For those of us who study child development – and especially language development – the
Child Language Data Exchange System (CHILDES) is probably the single most important
resource in the field. CHILDES is a corpus of transcripts of children, often talking with a
parent or an experimenter, and it includes data from dozens of languages and hundreds of
children. It’s a goldmine. CHILDES has also been around since way before the age of “big
data”: it started with Brian MacWhinney and Catherine Snow photocopying transcripts (and
then later running OCR to digitize them!). The field of language acquisition has been a
leader in open data sharing largely thanks to Brian’s continued work on CHILDES.
Despite these strengths, using CHILDES can sometimes be challenging, especially for the
most casual or most in-depth interactions. Simple analyses like estimating word
frequencies can be done using CLAN – the major interface to the corpora – but these
require more comfort with command-line interfaces and programming than can be
expected in many classroom settings. On the other end of the spectrum, many of us who
use CHILDES for in-depth computational studies like to read in the entire database, parse
out many of the rich annotations, and get a set of flat text files. But doing this parsing
correctly is complicated, and often small decisions in the data-processing pipeline can lead
to different downstream results. Further, it can be very difficult to reconstruct a particular
data prep in order to do a replication study. We've been frustrated several times when
trying to reproduce others' modeling results on CHILDES, not knowing whether our
implementation of their model was wrong or whether we were simply parsing the data
differently.
To address these issues and generally promote the use of CHILDES in a broader set of
research and education contexts, we’re introducing a project called childes-db. childes-db
aims to provide both a visualization interface for common analyses and an application
programming interface (API) for more in-depth investigation. For casual users, you can
explore the data with Shiny apps, browser-based interactive graphs that supplement
CHILDES’s online transcript browser. For more intensive users, you can get direct access to
pre-parsed text data using our API: an R package called childesr, which allows users to
subset the corpora and get processed text. The backend of all of this is a MySQL database
that’s populated using a publicly-available – and hopefully definitive – CHILDES parser, to
avoid some of the issues caused by different processing pipelines.
Read more »
T h u r s d a y, J u n e 1 , 2 0 1 7