Part 3: Analyzing Conversations in Zoom

Introduction

Virtual meetings afford granular data on how people communicate with one another. With this granularity comes both opportunities and challenges. With respect to opportunities, researchers can use virtual meetings to track the flow and content of communications on a second-by-second basis. This enables deriving insights into who talks to whom and how different people may communicate with one another in distinct ways.

With respect to challenges, granular data often requires researchers–who may be used to smaller sample datasets–to contend with incredibly large volumes of information. For example, if a researcher were to collect data from 100 60-minute group meetings, this would yield 6000 minutes of recorded speech. Depending on the rate of speech, this could present 50,000 to 60,000 spoken utterances.

Because virtual meetings present data on events in time (e.g., a spoken sentence or chat message) made by an individual within a virtual meeting, they require close attention to levels of analysis. For many questions, researchers are not interested in the fine-grained events; rather, researchers are interested in using those fine-grained events to measure attributes of individuals or groups within certain segments of time. zoomGroupStats provides basic functionality to derive these kinds of aggregated metrics. Whether a given aggregation is appropriate for assessing some construct, however, will fundamentally depend on the phenomenon under investigation.

Overview of text-based data in Zoom

zoomGroupStats provides functions for analyzing conversations that occur through two channels in a virtual meeting:

transcribed spoken language: Zoom’s cloud recording feature includes an option for transcription of language transmitted through meeting participants’ microphones. This transcription is performed by otter.ai.
text-based chat messages: During a Zoom meeting, users can send chat messages. In the chat file included in a Zoom Cloud recording, only publicly facing messages are captured. These are the messages that users send to the group as a whole. If users send one another direct/private messages, these are not captured in the downloadable file.

Cleaning and modifying text-based data

Text analysis is a dynamic area brimming with innovation. zoomGroupStats does not currently include direct functions for cleaning or modifying the text that is captured in either a transcript or a chat file. Because, however, the text for each of these files is stored in a variable, it is straightforward to use functions from other packages to clean or otherwise modify text before conducting conversation analysis.

Depending on your research questions and the scale of your dataset, you may wish to manually review and correct the transcribed audio content. Just like transcriptions done by humans, transcriptions produced by otter.ai will have errors. A manual review of a transcription can correct for these errors and provide a sharper analysis of text.

About the Zoom `transcript` file

If you have followed the steps outlined in Part 2, you will have a single list object from your batch analysis. Within this list you will find a transcript dataset. transcript is a data.frame that is the parsed audio transcript file. Each row represents a single marked “utterance” using Zoom’s cloud-based transcription algorithm. Utterances are marked as a function of pauses in speech and/or speaker changes.

# Three records from the sample transcript dataset
 head(batchOut$transcript, 3)
#>   utteranceId utteranceStartSeconds  utteranceStartTime utteranceEndSeconds
#> 1           1                 4.859 2020-09-04 15:00:04                7.41
#> 2           2                 9.540 2020-09-04 15:00:09               12.99
#> 3           3                15.630 2020-09-04 15:00:15               17.76
#>      utteranceEndTime utteranceTimeWindow      userName
#> 1 2020-09-04 15:00:07               2.551 Andrew Knight
#> 2 2020-09-04 15:00:12               3.450 Andrew Knight
#> 3 2020-09-04 15:00:17               2.130 Andrew Knight
#>                               utteranceMessage utteranceLanguage batchMeetingId
#> 1    Okay, so we're recording. We're streaming                en    00000000001
#> 2                    We have that setup. Okay.                en    00000000001
#> 3 It's like Mary Kate's here. I'll let her in.                en    00000000001

Each row contains identifying information for the utterance, including what meeting it was in (batchMeetingId) who said it (userName), when it was said (utteranceStartTime, utteranceStartSeconds), and how long it lasted (utteranceTimeWindow). There is also an indicator of the language for the utterance (utteranceLanguage). This indicator is used with some text analysis packages.

About the Zoom `chat` file

Also included in your batch output file will be a chat dataset. chat is a data.frame that is the parsed text-based chat file. Each row represents a single chat message submitted by a user. Note that non-ASCII characters will not be correctly rendered in the message text.

# Three records from the sample transcript dataset
 head(batchOut$chat, 3)
#>   messageId messageSeconds         messageTime      userName
#> 1         1           1274 2020-09-04 15:21:14 Andrew Knight
#> 2         2           1295 2020-09-04 15:21:35 Andrew Knight
#> 3         3           1321 2020-09-04 15:22:01   Ringo Starr
#>                                                                                                               message
#> 1 Hello Everyone - It’s great to see folks dropping into the session. We’ll get started here in just about 5 minutes.
#> 2                                        Please don’t hesitate to use the chat to say hello to friends and colleagues
#> 3        Also drop in questions and comments throughout the discussion in that chat and I’ll respond as we go thanks!
#>   messageLanguage batchMeetingId
#> 1              en    00000000001
#> 2              en    00000000001
#> 3              en    00000000001

Like transcript, each row contains identifying information for the chat message, including what meeting it was in (batchMeetingId) who posted it (userName), and when it was posted (messageTime, messageSeconds). There is also an indicator of the language for the message (messageLanguage).

Analyzing conversations in `transcript` and `chat`

zoomGroupStats includes functions that aid in deriving common conversation metrics at several levels of analysis. Each of the functions can be applied to either the chat or the transcript file.

Performing sentiment analysis

Sentiment analysis–the assessment and/or classification of language according to its emotional tone–is among the most ubiquitous kinds of text analysis. zoomGroupStats provides the ability to perform sentiment analysis on the utterances or messages in transcript and chat files. Because this type of analysis scores pieces of text, I recommend conducting this analysis first. The sentiment metrics can then be included in downstream conversation analyses that aggregate aspects of the conversation to the individual or meeting levels.

Using the textSentiment() function, there are two different types of sentiment analysis that you can request:

syuzhet - This is a lexicon-based analysis using the syuzhet package. A lexicon-based analysis uses pre-existing dictionaries to measure the sentiment of individual words in a piece of text. In essence, this approach is a word-counting method.
aws - This is an approach that relies on machine learning through Amazon Web Services. Rather than focusing on individual words, this method draws upon a trained model that assesses attributes of the text as a whole. To request this type of analysis, you must have appropriately configured your AWS credentials.

In deciding which method to use, you should consider your research objectives. In general, the aws method will provide greater validity for assessing sentiment. However, it also will take longer to run and, for larger datasets, will incur financial costs.

# You can request both sentiment analysis methods by including them in sentMethods
 transcriptSent = textSentiment(inputData=batchOut$transcript, idVars=c('batchMeetingId','utteranceId'), textVar='utteranceMessage', sentMethods=c('aws', 'syuzhet'), appendOut=FALSE, languageCodeVar='utteranceLanguage')

# This does only the aws sentiment analysis on a chat file
 chatSent = textSentiment(inputData=batchOut$chat, idVars=c('batchMeetingId', 'messageId'), textVar='message', sentMethods=c('aws'), appendOut=FALSE, languageCodeVar='messageLanguage')

The results of textSentiment come as a named list, with items for aws and/or syuzhet:

# This does only the syuzhet analysis on the transcript and appends does not append it to the input dataset
 transcriptSent = textSentiment(inputData=batchOut$transcript, idVars=c('batchMeetingId','utteranceId'), textVar='utteranceMessage', sentMethods=c('syuzhet'), appendOut=FALSE, languageCodeVar='utteranceLanguage')
#> Running syuzhet lexicon-based sentiment analysis

head(transcriptSent$syuzhet)
#>   batchMeetingId utteranceId wordCount syu_anger syu_anticipation syu_disgust
#> 1    00000000001           1         8         0                0           0
#> 2    00000000001           2         5         0                0           0
#> 3    00000000001           3        12         0                0           0
#> 4    00000000001           4         6         0                0           0
#> 5    00000000001           5         9         0                1           0
#> 6    00000000001           6         2         0                0           0
#>   syu_fear syu_joy syu_sadness syu_surprise syu_trust syu_negative syu_positive
#> 1        0       0           0            0         0            0            0
#> 2        0       0           0            0         0            0            0
#> 3        0       0           0            0         0            0            0
#> 4        0       0           0            0         0            0            0
#> 5        0       1           0            0         0            0            1
#> 6        0       0           0            0         0            0            0

The appendOut option in textSentiment gives you the ability to merge the sentiment metrics back to the original input data. I usually do this so that I can incorporate these metrics into downstream conversation analyses.

# This does only the syuzhet sentiment analysis on a chat file and appends it to the input dataset
 chatSent = textSentiment(inputData=batchOut$chat, idVars=c('batchMeetingId', 'messageId'), textVar='message', sentMethods=c('syuzhet'), appendOut=TRUE, languageCodeVar='messageLanguage')
#> Running syuzhet lexicon-based sentiment analysis
  head(chatSent$syuzhet)
#>   batchMeetingId messageId messageSeconds         messageTime      userName
#> 1    00000000001         1           1274 2020-09-04 15:21:14 Andrew Knight
#> 2    00000000001         2           1295 2020-09-04 15:21:35 Andrew Knight
#> 3    00000000001         3           1321 2020-09-04 15:22:01   Ringo Starr
#> 4    00000000001         4           1476 2020-09-04 15:24:36 Andrew Knight
#> 5    00000000001         5           1802 2020-09-04 15:30:02 Andrew Knight
#> 6    00000000001         6           1851 2020-09-04 15:30:51 Andrew Knight
#>                                                                                                               message
#> 1 Hello Everyone - It’s great to see folks dropping into the session. We’ll get started here in just about 5 minutes.
#> 2                                        Please don’t hesitate to use the chat to say hello to friends and colleagues
#> 3        Also drop in questions and comments throughout the discussion in that chat and I’ll respond as we go thanks!
#> 4                                                        Wow, Joe - that looks like a nice spot for joining a webinar
#> 5                                  Please feel free to add questions, comments, ideas, and resources here in the chat
#> 6                                                                              And Turbo Pascal…programming language!
#>   messageLanguage wordCount syu_anger syu_anticipation syu_disgust syu_fear
#> 1              en        20         0                0           0        0
#> 2              en        14         0                0           0        0
#> 3              en        19         1                0           1        1
#> 4              en        12         0                0           0        0
#> 5              en        14         0                0           0        0
#> 6              en         4         0                0           0        0
#>   syu_joy syu_sadness syu_surprise syu_trust syu_negative syu_positive
#> 1       0           0            0         0            0            0
#> 2       0           0            0         0            0            0
#> 3       0           1            0         0            1            1
#> 4       0           0            0         0            0            0
#> 5       1           0            0         1            0            1
#> 6       0           0            0         0            0            0

Note that I have not included the aws output in this vignette because it requires a call to a third-party service.

Performing conversation analysis

Conversation analysis entails using the exchanges of communications among meeting members to assess attributes of individuals, dyads, and groups. zoomGroupStats currently includes two basic kinds of conversation analysis.

The textConversationAnalysis() function will provide a descriptive assessment of either the transcript or the chat file.

# Analyze the transcript, without the sentiment metrics
convoTrans = textConversationAnalysis(inputData=batchOut$transcript, inputType='transcript', meetingId='batchMeetingId', speakerId='userName')

textConversationAnalysis() provides a list with output at two levels of analysis–the meeting level (first item) and the speaker level (second item). These items are named according to the type of input that you have provided.

# This is output at the meeting level. (Note that the values across meetings are equivalent because the sample dataset is a replication of the same meeting multiple times.)
head(convoTrans$transcriptlevel)
#>   batchMeetingId transcriptStartTime   transcriptEndTime
#> 1    00000000001 2020-09-04 15:00:04 2020-09-04 15:55:41
#> 2    00000000002 2020-09-05 15:03:19 2020-09-05 15:58:56
#> 3    00000000003 2020-09-06 15:20:04 2020-09-06 16:15:41
#>   utteranceTimeWindow_sum utteranceTimeWindow_x utteranceTimeWindow_sd
#> 1                2658.961              8.863203               8.863203
#> 2                2658.961              8.863203               8.863203
#> 3                2658.961              8.863203               8.863203
#>   utteranceGap_x utteranceGap_sd numUtterances numUniqueSpeakers
#> 1        2.26806        25.58092           300                 6
#> 2        2.26806        25.58092           300                 6
#> 3        2.26806        25.58092           300                 6
#>   totalTranscriptTime silentTime_sum burstinessRaw
#> 1            3337.111         678.15     0.8371172
#> 2            3337.111         678.15     0.8371172
#> 3            3337.111         678.15     0.8371172

Variable	Description
batchMeetingId	The meeting identifier that you specified
transcriptStartTime	When the first utterance was recorded
transcriptEndTime	When the last utterance ended
utteranceTimeWindow_sum	Total number of seconds of speaking time
utteranceTimeWindow_x	Mean duration, in seconds, of utterances
utteranceTimeWindow_sd	Standard deviation of the duration, in seconds, of utterances
utteranceGap_x	Mean duration, in seconds, of silent time between consecutive utterances
utteranceGap_sd	Standard deviation of the duration, in seconds, of silent time between consecutive utterances
numUtterances	Count of the number of utterances in the meeting
numUniqueSpeakers	Count of the number of unique speakers in the meeting. Note that this includes any utterances for which the speaker is UNIDENTIFIED.
silentTime_sum	Total number of seconds of silent time
burstinessRaw	A measure of the concentration of utterances in time

# This is output at the speaker level
head(convoTrans$speakerlevel)
#>   batchMeetingId        userName  firstUtteranceTime   lastUtteranceTime
#> 1    00000000001   Andrew Knight 2020-09-04 15:00:04 2020-09-04 15:55:26
#> 2    00000000001 George Harrison 2020-09-04 15:00:27 2020-09-04 15:00:44
#> 3    00000000001     John Lennon 2020-09-04 15:00:39 2020-09-04 15:01:54
#> 4    00000000001  Paul McCartney 2020-09-04 15:00:24 2020-09-04 15:54:48
#> 5    00000000001     Ringo Starr 2020-09-04 15:00:31 2020-09-04 15:00:31
#> 6    00000000001    UNIDENTIFIED 2020-09-04 15:02:30 2020-09-04 15:02:48
#>   utteranceTimeWindow_sum utteranceTimeWindow_x utteranceTimeWindow_sd
#> 1                 356.071              7.912689               7.912689
#> 2                   4.290              1.430000               1.430000
#> 3                  19.890              2.841429               2.841429
#> 4                2275.050              9.401033               9.401033
#> 5                   1.980              1.980000               1.980000
#> 6                   1.680              0.840000               0.840000
#>   utteranceGap_x utteranceGap_sd numUtterances
#> 1     11.3093182      66.5995130            45
#> 2      0.9300000       0.2954657             3
#> 3      1.0714286       0.5614394             7
#> 4      0.6949587       0.4954115           242
#> 5      0.5400002              NA             1
#> 6      0.7649999       0.5727565             2

Variable	Description
batchMeetingId	The meeting identifier that you specified
userName	The speaker identifier that you specified
firstUtteranceTime	Timestamp for this person’s first utterance
lastUtteranceTime	Timestamp for this person’s last utterance
utteranceTimeWindow_sum	Total number of seconds of this person’s speaking time
utteranceTimeWindow_x	Mean duration, in seconds, of this person’s utterances
utteranceTimeWindow_sd	Standard deviation of the duration, in seconds, of this person’s utterances
utteranceGap_x	Mean duration, in seconds, of silent time before this person speaks after a prior utterance
utteranceGap_sd	Standard deviation of the duration, in seconds, of silent time before this person speaks after a prior utterance
numUtterances	Count of the number of utterances this person made in this the meeting

If you have already conducted a sentiment analysis using the textSentiment() function, you can further include those attributes. Note that currently you can only analyze one sentiment analysis method at a time. For example, here is a request for an analysis of the chat file:

# Analyze the conversation within the chat file, including the sentiment metrics
convoChat = textConversationAnalysis(inputData=chatSent$syuzhet, inputType='chat', meetingId='batchMeetingId', speakerId='userName', sentMethod="syuzhet")

The names of the items in the list output for chat are chatlevel and userlevel:

# This is output at the meeting level
head(convoChat$chatlevel)
#>   batchMeetingId       chatStartTime         chatEndTime messageNumChars_sum
#> 1    00000000001 2020-09-04 15:21:14 2020-09-04 15:36:49                 527
#> 2    00000000002 2020-09-05 15:21:14 2020-09-05 15:36:49                 527
#> 3    00000000003 2020-09-06 15:21:14 2020-09-06 15:36:49                 527
#>   messageNumChars_x messageNumChars_sd messageGap_x messageGap_sd
#> 1              52.7           41.30927     103.8889      141.3793
#> 2              52.7           41.30927     103.8889      141.3793
#> 3              52.7           41.30927     103.8889      141.3793
#>   numUniqueMessagers numMessages totalChatTime burstinessRaw syu_anger.sum
#> 1                  5          10           935     0.1528548             1
#> 2                  5          10           935     0.1528548             1
#> 3                  5          10           935     0.1528548             1
#>   syu_anticipation.sum syu_disgust.sum syu_fear.sum syu_joy.sum syu_sadness.sum
#> 1                    0               1            1           1               1
#> 2                    0               1            1           1               1
#> 3                    0               1            1           1               1
#>   syu_surprise.sum syu_trust.sum syu_negative.sum syu_positive.sum
#> 1                0             1                1                2
#> 2                0             1                1                2
#> 3                0             1                1                2
#>   syu_anger.pct syu_anticipation.pct syu_disgust.pct syu_fear.pct syu_joy.pct
#> 1           0.1                    0             0.1          0.1         0.1
#> 2           0.1                    0             0.1          0.1         0.1
#> 3           0.1                    0             0.1          0.1         0.1
#>   syu_sadness.pct syu_surprise.pct syu_trust.pct syu_negative.pct
#> 1             0.1                0           0.1              0.1
#> 2             0.1                0           0.1              0.1
#> 3             0.1                0           0.1              0.1
#>   syu_positive.pct
#> 1              0.2
#> 2              0.2
#> 3              0.2

Variable	Description
batchMeetingId	The meeting identifier that you specified
chatStartTime	The time of the first chat message in this meeting
chatEndTime	The time of the last chat message in this meeting
messageNumChars_sum	Total number of characters chatted in meeting
messageNumChars_x	Mean number of characters per message chatted in meeting
messageNumChars_sd	Standard deviation of the number of characters per message chatted in meeting
messageGap_x	Mean duration, in seconds, of time between chat messages in this meeting
messageGap_sd	Standard deviation of the duration, in seconds, of time between chat messages in this meeting
numUniqueMessagers	Number of individuals who sent chat messages in this meeting.
numMessages	Total number of messages sent in this meeting
totalChatTime	Amount of time between first and last messages
burstinessRaw	Measure of the concentration of chat messages in time
…	Additional variables depend on the type of sentiment analysis you may have requested.

# This is output at the speaker level
head(convoChat$userlevel)
#>   batchMeetingId        userName numMessages    firstMessageTime
#> 1    00000000001   Andrew Knight           6 2020-09-04 15:21:14
#> 2    00000000001 George Harrison           1 2020-09-04 15:36:43
#> 3    00000000001     John Lennon           1 2020-09-04 15:36:47
#> 4    00000000001  Paul McCartney           1 2020-09-04 15:36:49
#> 5    00000000001     Ringo Starr           1 2020-09-04 15:22:01
#> 6    00000000002   Andrew Knight           6 2020-09-05 15:21:14
#>       lastMessageTime messageNumChars_sum messageNumChars_x messageNumChars_sd
#> 1 2020-09-04 15:36:42                 392          65.33333           33.46441
#> 2 2020-09-04 15:36:43                   7           7.00000                 NA
#> 3 2020-09-04 15:36:47                  12          12.00000                 NA
#> 4 2020-09-04 15:36:49                   8           8.00000                 NA
#> 5 2020-09-04 15:22:01                 108         108.00000                 NA
#> 6 2020-09-05 15:36:42                 392          65.33333           33.46441
#>   messageGap_x messageGap_sd syu_anger.sum syu_anticipation.sum syu_disgust.sum
#> 1        180.4      152.9895             0                    0               0
#> 2          1.0            NA             0                    0               0
#> 3          4.0            NA             0                    0               0
#> 4          2.0            NA             0                    0               0
#> 5         26.0            NA             1                    0               1
#> 6        180.4      152.9895             0                    0               0
#>   syu_fear.sum syu_joy.sum syu_sadness.sum syu_surprise.sum syu_trust.sum
#> 1            0           1               0                0             1
#> 2            0           0               0                0             0
#> 3            0           0               0                0             0
#> 4            0           0               0                0             0
#> 5            1           0               1                0             0
#> 6            0           1               0                0             1
#>   syu_negative.sum syu_positive.sum syu_anger.pct syu_anticipation.pct
#> 1                0                1             0                    0
#> 2                0                0             0                    0
#> 3                0                0             0                    0
#> 4                0                0             0                    0
#> 5                1                1             1                    0
#> 6                0                1             0                    0
#>   syu_disgust.pct syu_fear.pct syu_joy.pct syu_sadness.pct syu_surprise.pct
#> 1               0            0   0.1666667               0                0
#> 2               0            0   0.0000000               0                0
#> 3               0            0   0.0000000               0                0
#> 4               0            0   0.0000000               0                0
#> 5               1            1   0.0000000               1                0
#> 6               0            0   0.1666667               0                0
#>   syu_trust.pct syu_negative.pct syu_positive.pct
#> 1     0.1666667                0        0.1666667
#> 2     0.0000000                0        0.0000000
#> 3     0.0000000                0        0.0000000
#> 4     0.0000000                0        0.0000000
#> 5     0.0000000                1        1.0000000
#> 6     0.1666667                0        0.1666667

Variable	Description
batchMeetingId	The meeting identifier that you specified
userName	The individual identifier you specified
firstMessageTime	The time of this person’s first chat message in this meeting
lastMessageTime	The time of this person’s last chat message in this meeting
messageNumChars_sum	Total number of characters this person chatted in meeting
messageNumChars_x	Mean number of characters per message this person chatted in meeting
messageNumChars_sd	Standard deviation of the number of characters per message this person chatted in meeting
messageGap_x	Mean duration, in seconds, of time before this person sends a chat message after a prior message
messageGap_sd	Standard deviation of the duration, in seconds, of time before this person sends a chat message after a prior message
…	Additional variables depend on the type of sentiment analysis you may have requested.

Windowed conversation analysis

One of the unique strengths of collecting data using virtual meetings is the ability to assess dynamics–how meeting characteristics and participants’ behavior changes over time. Beyond analyzing the raw events over time, zoomGroupStats enables you to run the textConversationAnalysis above within temporal windows in a given meeting. By windowing, and aggregating data within the window, you can derive more reliable indicators of attributes than relying solely on the raw events.

For example, using the following function call, you could analyze how conversation attributes–who is speaking alot, what is the sentiment of speech–change throughout a meeting, in 5-minute (windowSize=300 seconds) increments.

 win.convo.out = windowedTextConversationAnalysis(inputData=batchOut$transcript, inputType='transcript', meetingId='batchMeetingId', speakerId='userName', sentMethod="none", timeVar="utteranceStartSeconds", windowSize=300)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%

The output of windowedTextConversationAnalysis is a list with two data.frames as items:

# View the window-level output
head(win.convo.out$windowlevel)
#>   windowId windowStart windowEnd batchMeetingId transcriptStartTime
#> 1        1           0       299    00000000001 2020-09-04 15:00:04
#> 2        2         300       599    00000000001                <NA>
#> 3        3         600       899    00000000001 2020-09-04 15:10:15
#> 4        4         900      1199    00000000001 2020-09-04 15:15:02
#> 5        5        1200      1499    00000000001 2020-09-04 15:20:06
#> 6        6        1500      1799    00000000001 2020-09-04 15:25:03
#>     transcriptEndTime utteranceTimeWindow_sum utteranceTimeWindow_x
#> 1 2020-09-04 15:02:52                 112.411              2.882333
#> 2                <NA>                   0.000                    NA
#> 3 2020-09-04 15:15:01                 272.490              9.731786
#> 4 2020-09-04 15:20:05                 287.670             11.506800
#> 5 2020-09-04 15:25:03                 271.740              8.234545
#> 6 2020-09-04 15:30:08                 289.140              9.638000
#>   utteranceTimeWindow_sd utteranceGap_x utteranceGap_sd numUtterances
#> 1               2.882333      1.4471053       1.4377420            39
#> 2                     NA             NA              NA             0
#> 3               9.731786      0.5222222       0.3300272            28
#> 4              11.506800      0.6550000       0.3376388            25
#> 5               8.234545      0.7912500       0.5997459            33
#> 6               9.638000      0.5617241       0.3392120            30
#>   numUniqueSpeakers totalTranscriptTime silentTime_sum burstinessRaw
#> 1                 6             167.401          54.99  -0.003245657
#> 2                 0               0.000           0.00            NA
#> 3                 2             286.590          14.10  -0.225514982
#> 4                 1             303.390          15.72  -0.319714619
#> 5                 1             297.060          25.32  -0.137674098
#> 6                 1             305.430          16.29  -0.246978824

Variable	Description
windowId	Incrementing numeric identifier for the temporal window
windowStart	Number of seconds from start of transcript when this window begins
windowEnd	Number of seconds from start of transcript when this window ends
…	All other variables correspond to the textConversationAnalysis output; but, they are calculated within a given temporal window

# View the output for speakers within windows
head(win.convo.out$speakerlevel)
#>   batchMeetingId      userName windowId windowStart windowEnd
#> 1    00000000001 Andrew Knight        1           0       299
#> 2    00000000001 Andrew Knight        2         300       599
#> 3    00000000001 Andrew Knight        3         600       899
#> 4    00000000001 Andrew Knight        4         900      1199
#> 5    00000000001 Andrew Knight        5        1200      1499
#> 6    00000000001 Andrew Knight        6        1500      1799
#>    firstUtteranceTime   lastUtteranceTime utteranceTimeWindow_sum
#> 1 2020-09-04 15:00:04 2020-09-04 15:02:50                  56.371
#> 2                <NA>                <NA>                   0.000
#> 3 2020-09-04 15:10:15 2020-09-04 15:13:36                 210.570
#> 4                <NA>                <NA>                   0.000
#> 5                <NA>                <NA>                   0.000
#> 6                <NA>                <NA>                   0.000
#>   utteranceTimeWindow_x utteranceTimeWindow_sd utteranceGap_x utteranceGap_sd
#> 1              3.523187               3.523187          2.320       1.8074370
#> 2                    NA                     NA             NA              NA
#> 3             10.027143              10.027143          0.498       0.3121336
#> 4                    NA                     NA             NA              NA
#> 5                    NA                     NA             NA              NA
#> 6                    NA                     NA             NA              NA
#>   numUtterances
#> 1            16
#> 2             0
#> 3            21
#> 4             0
#> 5             0
#> 6             0

This output will provide a record for each possible speaker within each possible window. This is done so that valid zeros (e.g., no speaking) are represented in the dataset.

Variable	Description
batchMeetingId	Meeting identifier requested
userName	Speaker identifier requested
windowId	Incrementing numeric identifier for the temporal window
windowStart	Number of seconds from start of transcript when this window begins
windowEnd	Number of seconds from start of transcript when this window ends
…	All other variables correspond to the textConversationAnalysis output; but, they are calculated within a given temporal window

Next Steps

In the final part of this guide, you will learn how to process and anlayze video files downloaded from Zoom sessions.

Andrew P. Knight

2021-05-11

Introduction

Overview of text-based data in Zoom

Cleaning and modifying text-based data

About the Zoom `transcript` file

About the Zoom `chat` file

Analyzing conversations in `transcript` and `chat`

Performing sentiment analysis

Performing conversation analysis

Windowed conversation analysis

Next Steps

Part 3: Analyzing Conversations in Zoom

Andrew P. Knight

2021-05-11

Introduction

Overview of text-based data in Zoom

Cleaning and modifying text-based data

About the Zoom transcript file

About the Zoom chat file

Analyzing conversations in transcript and chat

Performing sentiment analysis

Performing conversation analysis

Windowed conversation analysis

Next Steps

About the Zoom `transcript` file

About the Zoom `chat` file

Analyzing conversations in `transcript` and `chat`