General emotional labor scale: Development and establishing psychometric properties

This study outlines the process followed in the development of the General Emotional Labor Scale-English (GELS-E). The scale construction process was divided into different phases. The first phase entailed the item generation and the second phase was the pilot study. Phase three and four dealt with developing the norms and establishing the psychometric properties (reliability and validity) of the final 30 items. It can be concluded that the GELS-E was developed as a measure that can be used by researchers to assess emotional labor in professionals. Such a measure of emotional labor can prove to be beneficial for researchers and organizations alike. It can be used to understand the performance and effects of emotional labor and arrange trainings needed to help employees with the performance of emotional labor.


Introduction
Emotional labor is a concept introduced by Hochschild (1983) who described it as the work of modifying ones' emotions to display appropriate work-related emotions (Hochschild 2003). Emotional labor is the conscious control of emotions in order to exhibit suitable verbal and body language. Ashforth and Humphrey (1993) described it further by focusing on the outward expressions of emotions i.e. 'impression management'. According to them performance of emotional labor involves surface and deep acting, as well as genuine acting. This was considered because even though an employee might not be genuinely feeling an emotion the expression of it is part of their job requirement and therefore is labor. F. Mohsin, N. Ayub Morris and Feldman (1996) related emotional labor to organizational goals. They defined it as the 'effort, planning, and control' necessary to exhibit the emotions desired by the organization. The interactionist viewpoint was followed in this conceptualization i.e. the interaction between the employee as an individual and the organizational factors. Grandey (2000), in an attempt to consolidate these viewpoints, presented a comprehensive definition integrating the feelings, the process and the expression of emotions at work.
Theories of emotional labor have presented three different dynamics -internal states, internal processes and external displays of emotions. The definition and concept of emotional labor followed in the current study encompasses these three dynamics. Internal states includes the concept of emotional dissonance (the state when individuals experience an incongruity between their felt and displayed emotions) (Hochschild 2003). Internal processes involve the regulation of emotions, which include surface, deep and genuine acting, and act as a bridge between the felt state and the displayed emotions (TM 2004).
When surface acting is the mode of performance, expression of emotion occurs only at the face level. An example could be a flight attendant having a bad day, but they would 'put on a mask' and display a positive attitude towards the passengers (Miller et al 2007). The second form is deep acting. This involves working to change ones' felt emotions. Hochschild (2003) described it as 'getting into character', this can be done through simulations of the required feelings until these feelings are actually felt (Ashkanasy and Daus 2002). Ashforth and Humphrey (1993) presented genuine acting -they believe that it is not necessary that employees do not feel the emotions they are required to display. For instance, a nurse who is supposed to display sympathy towards a patient may actually feel it as well.
The Pakistani society differs greatly from the developed western world. Many of the western countries are individualistic societies, however, the Pakistani society is predominantly collectivistic in nature. Pakistanis are typically close to their families and friends and many live in joint families or at least elderly parents still reside with their children even after the children are married (Taqui et al 2007). Additionally, Pakistan is majorly an Islamic nation and the masses try to abide by the Islamic laws and guidelines. Religion provides Muslims with guidelines about outward behavior with regard to situations. For instance, females are to display modesty when interacting with the opposite sex. Hence, the performance and experience of emotional labor by the Pakistani workforce is likely to differ from that of the labor force of other countries.
Emotional labor is a predominant requirement in the service sector. The economy of Pakistan consists of a relatively large service sector; research has shown that it contributes more than fifty percent to the country's GDP (Ahmed and Ahsan 2011). The objective of the current study was to tap this niche by developing a scale to measure the phenomena. The idea behind the development of the scale is to construct a measure that is culturally relevant to Pakistan and can provide greater understanding of emotional labor and its experience by the Pakistani workforce.
A review of existing scales showed gaps in measurement of emotional labor, for instance the Teacher Emotional Labor Scale (Cukur 2009) (Brotheridge and Lee 2003) does not incorporate genuine acting as means of emotional labor. Additionally, the aim was to use easy English so that the Pakistani masses can understand and to tap the cultural effect on the performance of emotional labor.

Method
Following Grandey's comprehensive approach, emotional labor was operationalized as the process of managing internal emotions and the outward expression of emotions in order to adhere to organizational rules. Deep, surface and genuine acting were determined as the subscales. Surface acting was operationalized as the display of emotions, even if they are actually not being felt. Deep acting was defined as the effort put in to modify felt emotions in order to try and feel them. Genuine acting was operationalized as the expression of spontaneous and genuinely felt emotions. Moreover, three facets were determined for each subscale: variety, refers to the diversity of emotions; intensity denotes the strength of the emotions and the 'general' items measure aspects of emotional display through different means such as the tone of voice, body language and so on.

Phase I -Part 1
Procedure: A thorough literature review was conducted to understand the phenomenon and to generate items for the scale. Initially a total of 100 items were generated that measured deep, surface and genuine acting, duration, frequency, variety and intensity of emotional labor and the experience of emotions at work. All the items were theoretically based. 49 items were then drawn through thorough subjective analysis and with the input of two organizational psychologists.
The 49 item scale was sent to an English Language expert and to four psychologists for language assessment and face and subjective validity of the scale. The corrections and modifications suggested by the above experts were analyzed and incorporated.

Phase I -Part 2: Pilot Study
Sample: The sample consisted of 510 working individuals living in Karachi. Snowball sampling was used. Of the participants that answered 48.2% were females and 51.8% were males. A majority of them (48%) were between the ages of 20 and 30 years, 49.4% of the respondents were married and 50.6% were postgraduates.
Procedure: The 49 items developed for the GELS were distributed to the target population. Consent was obtained from the respondents. In addition the participants were made aware of the fact that their participation in the study Business Review: (2020) 15(1):83-96 85 F. Mohsin, N. Ayub was voluntary and they have the right to withdraw. Also, confidentiality was assured. The sample was asked to fill out a demographic information form and the entire GELS via Google Docs. The data collected was used for item analysis, establish internal consistency and to define the cut-off points. For this purpose statistical analysis such as descriptive statistics, item total correlations, factor analysis and factor loadings of each subscale were conducted using the Statistical Package for the Social Sciences version 17.
Procedure: The respondents answered a set of questionnaires which included a consent form, demographic information form and the GELS via Google Docs. For test-retest reliability, after a two week (15 days) interval, the participants were asked to answer the GELS again.
Cronbach's alpha of the GELS-E, item-subscale correlations, Cronbach's alpha values of each subscale, and correlation between the two administrations of the scale were analyzed. Statistical analysis was conducted using the Statistical Package for Social Sciences version 17.

Phase II -Part 2: Establishing Validity
Sample: The scale was administered on a sample of 303 individuals. In this sample 55.4% of the participants were females and 44.6% were males; 46.2% were between the ages of 20-30, 26.1% were between 31-40 years, 11.9% were between 41-50 years, 13.2% were between 51-60 years; and 2.6% were above 60 years of age. Majority of the respondents (59.1%) held a postgraduate degree. The participants were from a variety of different professions and occupations. Most of them (57.8%) were married. Non-random, snowball sampling was used for the purpose of this study.
Research Instruments: The demographic information form comprised of items related to the respondents' age, gender, education, and marital-status, number of dependents, socio-economic status, profession/occupation, and nature of organization (public or private).
The Emotional Labor Scale was developed by Brotheridge and Lee (2003). This scale measures Emotional Labor according to international norms and therefore could help establish concurrent and convergent validity. The items were to be answered on a 5-point Likert scale ranging from never to always. The Emotional Labor scale includes five subscales: frequency, intensity, variety, surface acting and deep acting. In the current sample, the internal reliability of the scale was 0.788.
General Emotional Labor Scale... Emotion Regulation Questionnaire (ERQ) was developed by Gross and John (2003). The ERQ was used as a measure of emotional regulation, which is related and influences how individuals perform emotional labor. ERQ consists of two subscales: reappraisal and suppression. It is a 10-item instrument with a 7point response scale (strongly disagree to strongly agree). In the current sample the internal consistency of the scale was 0.814.
Discrete Emotions Emotional Labor Scale (DEELS) was developed by TM (2004). It consists of three subscales; Genuine Acting, Faked Expression and Suppression and measures them on 14 distinct emotions. The DEELS subscale of genuine expression established convergent validity of the genuine acting subscale of GELS-E, and the faked expression and suppression subscales performed validity analysis of the surface acting subscale of GELS-E. The response scale ranges from 1 (never) to 5 (many times a day) for genuine expression and faked expression of emotions. However, for suppression the response scale includes a 0 (I never feel this), 1 (never) to 5 (many times a day). The alpha value for the scale in the current sample was 0.788.
Schutte Self-report Emotional Intelligence Test (SSEIT) (1998) was developed by Schutte, Malouff, Hall, Haggerty, Cooper, Golden, and Dornheim. SSEIT assesses emotional intelligence and comprises of 33 self-report items. The items are to be responded on a 5-point scale (1 is strongly disagree and 5 is strongly agree) and three items (5, 28 and 33) are reverse scored. In the current sample the scale yielded an alpha value of 0.852.
Positive Affect and Negative Affect Schedule (PANAS) was developed by Watson, Clark and Tellegen in 1998. The scale measures positive and negative affectivity which have been described as moderators and consequences of emotional labor. Therefore it was imperative to establish discriminant validity. PANAS is a 10 item inventory, to be responded on a 5-point scale where 1= very slightly or not at all and 5 = extremely. The internal reliability of the scale in the current sample was 0.779.
Satisfaction with Life Scale (SWLS) was developed by Diener et al (1985). It is a 5-item scale scored on a 7-point scale ranging from Strongly Disagree (1) to Strongly Agree (7). The scale shows high internal reliability of 0.886 in the current sample.
Copenhagen Burnout Inventory (CBI): Burnout is said to be a common outcome of emotional labor, therefore distinguishing them was essential. The CBI is a public domain instrument developed as part of a PUMA study conducted in 2005. The authors of the scale are Kristensen, Borritz, Villadsen and Christensen. It contains 19 items answered on a 5-point scale (from always or to a very high degree to never/almost never or to a very low degree). In the current sample, the internal reliability of the scale was 0.755. Ryff's Psychological Well-Being Scale (1989 the original scale consisted of 120 items and was developed by Ryff. For the current research, the shorter 18-item version of the scale was used. The responses range from strongly agree to strongly disagree answered on a 7-point scale. Based on the current sample, the internal reliability of the scale yielded an alpha value of 0.863. Procedure: Face and subjective validity of the scale was established in two stages. In the first stage the initial item pool developed was sent to a language Business Review: (2020) 15(1):83-96 87 https://ir.iba.edu.pk/businessreview/vol15/iss1/12 DOI: https://doi.org/10.54784/1990-6587.1011 F. Mohsin, N. Ayub expert and four psychologists. In the second stage, 14 psychology students (7 undergraduate and 7 graduate students) were approached and were asked to review the items.
To establish convergent and discriminant validity permission to use the above mentioned scales was obtained from the relevant authors. The questionnaires, along with an informed consent form and the demographic information form were sent to the participants via Google Docs. Statistical analysis of the data was conducted using Statistical Package for the Social Sciences version 17.

Results
The initial pool of 49 items was reduced to 30 items through factor analysis (table 2), descriptive statistics and item-total correlations (table 1). Corresponding items that had high loadings and item-total correlations greater than 0.50 were retained as part of the scale. Research posits that in exploratory factor analysis, factor loadings are deemed to be significant only when they are higher than 0.30 or 0.40 (Floyd and Widaman 1995). The Kaiser-Meyer-Olkin Measure (KMO) value was found to be 0.889 and the Bartlett test result in the current sample was found to be highly significant (p < 0.001), and therefore factor analysis is deemed appropriate.
Exploratory factor analysis determined the construct validity of the scale and examined the underlying structure of the scale. Table 2 displays the factor loadings for the 49 items GELS-E. The table shows that the three hypothesized subscales in fact came forth as the three factors; Surface, Deep and Genuine Acting and each comprised of 10 items. Item-Subscale Correlations were computed to assess each item of a factor significantly related to the particular subscale (tables 3, 4, & 5).
The Cronbach's alpha score obtained for the GELS-E reveals a highly significant correlation (table 6). Table 7 presents the internal consistency of all the three subscales. Additionally, test-retest reliability was calculated and the correlation coefficient shows that the test retest reliability of the GELS-E is highly significant (r = 0.912; p < 0.001; n = 110).
Convergent validity of GELS-E was assessed by correlating it with Emotional Labor Scale (Brotheridge and Lee 2003); the two facets of Emotion Regulation Questionnaire and with Discrete Emotions Emotional Labor Scale (TM 2004). Statistical analysis revealed a positive correlation between GELS-E with Emotional Labor Scale (r = 0.671; p < 0.001; n = 303) and a positive relationship was observed between GELS-E with ERQ's Reappraisal (r = 0.686; p <0.001; n = 303) and GELS-E with ERQ's Suppression (r = 0.544; p < 0.001; n = 303). A positive correlation was also observed between GELS-E with DEELS (r = 0.713; p < 0.001; n = 303).

Discussion
The aim of the research study was to develop a consistent, reliable and valid measure of emotional labor. The scale developed consists of 30 items that measure Emotional Labor through the three subscales of Surface, Deep, and Genuine Acting. The scale was developed as a self-report questionnaire that is to be answered on a Likert type 5-point response scale. Whilst developing the items certain basic rules and principles were kept in mind (Clark and Watson 2016). The language of the items was kept simple, direct, colloquialisms and slang language was avoided and language that could be understood by a range of different populations was utilized; difficult and complex items were avoided as were double-barreled items. Negatively coded items were not included in the scale, studies show that scales that include positively and negatively worded items contain measurement errors (DiStefano and Motl 2006;Quilty et al 2006  General Emotional Labor Scale...  .641** 0.000 **. Correlation is significant at the 0.01 level (2-tailed),*. Correlation is significant at the 0.05 level (2-tailed), n=110. https://ir.iba.edu.pk/businessreview/vol15/iss1/12 DOI: https://doi.org/10. 54784/1990-6587.1011 Published by iRepository, December 2020 F. Mohsin, N. Ayub    The next consideration was to determine the response format. In the case of the GELS, a 5-point Likert response format was deemed appropriate. This is because the five options are allowed for the measurement of not only the occurrence but also for the frequency with which emotional labor occurred. Comrey and Lee (1992) favor multiple choice response format saying that "multiplechoice item formats are more reliable, give more stable results, and produce better scales". Likert type response scale that has five to eight options is appropriate and desirable for questionnaires (Lietz 2010).
Moreover, reverse worded or reverse scored items were not developed for the scale. There has been a certain amount of controversy surrounding whether such items should be included in a questionnaire or not. Negatively worded items tend to confuse respondents and hence they decrease the reliability of an instrument. Research shows that when some of these reverse worded items are randomly included in a measure, it has adverse effects on the psychometric properties of the instrument (Harrison and McLaughlin 1991). Empirical studies have been conducted; the findings of which indicate that scales that include positively and negatively worded items contain measurement errors which can lead to problems in the analysis and interpretation of the results (DiStefano and Motl 2006;Quilty et al 2006).
Determining the reliability of a measure is an integral part of scale construction. Reliability refers to how consistent a measure or measurement procedure is, the reliability coefficient or indices are indicative of the reproducibility of the scores of the measure (John and Benet-Martínez 2000). Two different forms of reliability, internal consistency and test-retest reliability were established. Reliability of a scale is an essential aspect as without reliability, validity cannot exist.
Determining the validity of a scale is also a significant aspect of developing a new scale. In fact, developing a valid measure of the chosen construct is probably the first and foremost goal of scale development. Validity refers to the extent to which an instrument actually measures the construct it was developed to measure (Kimberlin and Winterstein 2008). Content validity of the scale was established via subjective validity and face validity by approaching psychologists, language experts and subject matter experts to review and critique the items developed for the General Emotional Labor Scale. Additionally, 14 students further evaluated the scale. According to Cook and Beckman (2006), the content measures the entirety of a construct and nothing else and this can be obtained by wording the items appropriately and by seeking critique by subject matter experts.
Establishing construct validity is essential for developing a sound measure. Construct validity is said to encompass other forms of validity, such as concurrent, predictive, convergent and discriminant validities (Haynes et al 1995). Concurrent validity has been defined as the extent to which the scores obtained from a test agree with the scores obtained from another test that measures the same or similar construct (Zheng and De Jong 2011). Similar to concurrent validity, convergent validity also observes the relationship between similar measures evaluating the same construct. Convergent validity has been defined as the extent to which different measures that are designed to tap the same construct correlate with each other (Cunningham et al 2001). Discriminant validity on the other hand, refers to how distinct and dissimilar different measures and scales are (Guo et al 2008).
Content validity of both versions of the scale was established by asking 14 students, 7 undergraduate level psychology students and 7 graduate level psychology students to read each of the 30 items and then match it with the subscale they understood it to be measuring. The students matched each item to its predetermined subscale, thus showing that the items effectively measured the subscales.
The convergent validity of the General Emotional Scale-English (GELS-E) was assessed against the Emotional Labor Scale by Brotheridge and Lee (2003) Business Review: (2020) Gross and John (2003) and the Discrete Emotions Emotional Labor Scale (DEELS) by TM (2004). The correlation between the three pre-established scales and GELS-E was observed in order to determine that GELS-E actually measures emotional labor. Strong positive correlations were observed between GELS-E and the three pre-existing scales, thus establishing convergent validity and concurrent validity of the GELS-E. Discriminant validity of the General Emotional Labor Scale-English (GELS-E) was measured by correlating the scores of GELS-E against five different pre-existing scales namely; the Schutte Self-report Emotional Intelligence Test (SSEIT) (Schutte et al 1998), the Positive Affect and Negative Affect Schedule (PANAS) by Watson et al (1988), the Satisfaction with Life Scale (SWLS) by Diener et al (1985), the Copenhagen Burnout Inventory (CBI) developed as part of a PUMA study (Kristensen et al 2005), and Ryff's Psychological Well-Being Scale (Ryff and Keyes 1995). The correlation coefficients obtained depicting the relationship between the GELS-E and the above mentioned five scales showed negligible positive correlation for SSEIT and SWLS and negative correlations or inverse relationships for PANAS, CBI and Ryff's Psychological Well-being Scale. These observations depict that GELS-E is distinct and measures a construct separate from the constructs measured by the above stated five instruments.

Implications
Developing a scale measuring emotional labor can prove to be quite beneficial for those interested in research as well as for professionals. Researchers studying emotions and the role of emotions in work life can make use of such a scale. The unique quality of the GELS is that it was developed in a third-world Asian country -Pakistan. Considering the dynamics of Pakistan and its demographics, the practice of emotional labor was likely to differ to some extent from western, developed countries. Therefore, the norms established for the scale are pertinent to the said country. However, the scale can be utilized in other countries that have similar demographics, such as India and Bangladesh. Moreover, the scale was developed in a manner to avoid cultural boundaries, so that it can be used world over.
The psychometric properties of GELS were determined in the urban city of Karachi. Karachi as a city consists of a diverse range of demographics. Numerous individuals come to the city in search of work. Therefore, the residents of the current city are multi-ethnic. They belong to different religions, different areas of Pakistan, work in a variety of occupations and the age range is wide. Thus, testing the scale in such a city allowed it's norms to be developed over a range of population.
A recent study by Mastracci et al (2010) brought forth the significance of training in what they call 'emotive skills'. According to them, these emotive skills are imperative for employees working in highly emotional jobs such as law enforcement and disaster services. They suggested that training of such 94 Business Review: (2020) 15(1):83-96 https://ir.iba.edu.pk/businessreview/vol15/iss1/12 DOI: https://doi.org/10. 54784/1990-6587.1011 General Emotional Labor Scale... skills should be part of the teaching curriculum. The GELS-E can prove to be beneficial for conducting need analysis for such trainings and assessment of current performance of emotional labor. This can eventually contribute towards the reduction of the negative impact of emotional labor on employees.
6 Limitations and directions for future research The current study was not free of limitations. Firstly, although the norms of the GELS-E were established in Pakistan, the respondents were all residents of the metropolitan, urban city of Karachi. Future research ought to focus on assessing the psychometric properties of the scale in different cities and rural areas of Pakistan (and other countries).

Conclusion
The current study set out to create a reliable, valid and sound measure of emotional labor. The idea was to develop an instrument that has its norms established in the developing, Asian country of Pakistan, where religion, traditions and customs play a major role in an individuals' work life and their expression of emotions. Psychometric properties of the scale were rigorously checked so that the scale assessed emotional labor in a thorough manner. Three subscales were determined as part of the GELS-E which are Surface, Deep and Genuine Acting. These three subscales relate to the method and expression of emotions at work to meet organizational demands. Furthermore, the three facets of each subscale enable in-depth analysis of the mode and expression of surface, deep and genuine acting and the intensity and variety of emotions expressed.