Each video or audio file in the corpus is assigned a unique code that allows it to be identified based on several criteria: the type of humorous text (mon=monologue; chis=joke; ske=sketch; ven=ventriloquist), speaker characteristics (sex = M/H), generation or age (young=1, adult=2, elderly=3), speaker’s professional category (non-professional=0, professional=1), the province code, and the corresponding number within the corpus. For example, the code 0171-CHI-BARH21 refers to a joke (number 0171 in Humcor) from a male, adult, professional speaker from the province of Barcelona.
In the transcription of the files, the general rules of Spanish spelling are followed, with the exception of capital letters, which are reserved exclusively for proper names. For encoding, a minimal markup system based on Standard Generalized Markup Language (SGML) is used, following the specifications of the Text Encoding Initiative (TEI). Conventional punctuation marks are not used; instead, specific tags are employed to indicate pauses of varying durations.